Large models like BERT, GPT-3, and T5 boast billions of parameters and extensive training data, enabling them to discern intricate patterns and yield high accuracy. However, their widespread use raises privacy concerns regarding the unauthorized exposure of sensitive user information. Machine unlearning emerges as a solution, allowing for removing specific data from trained models without complete retraining. Yet, existing unlearning methods designed for smaller models need help with the complexities of larger models, facing challenges in pinpointing data influence, coping with computational demands, and maintaining overall performance amid data removal.
IEEE researchers have developed LMEraser, an efficient unlearning method for large models that address privacy concerns in machine learning. LMEraser employs a divide-and-conquer approach, partitioning the dataset into public and private segments. It utilizes adaptive prompt tuning to isolate data influence, reducing computational costs while maintaining model performance. By freezing the backbone parameters post-pre-training and employing an adaptive prompt tuning mechanism, LMEraser achieves precise unlearning with minimal impact on accuracy. Experimental results demonstrate a significant reduction in unlearning costs, making LMEraser a pioneering solution for large model privacy protection.
Prompt tuning is a technique to adapt pre-trained models for new tasks by adding small learnable vectors, or “prompts,” to input data, avoiding full model retraining. It’s computationally efficient, allowing a single model to handle multiple tasks. Vision Transformers (ViT) are commonly used in visual prompt tuning, with methods like VPT and VP integrating prompts into image embeddings. Machine unlearning removes specific data from trained models without complete retraining, which is crucial for privacy. Exact methods completely remove data influence, but they’re resource-intensive. Approximate methods aim to reduce influence efficiently, using techniques like influence functions, though they face scalability challenges.
LMEraser employs a multi-step method to efficiently handle data in large models, addressing privacy and unlearning challenges. Initially, it partitions the dataset into public and private segments to ensure sensitive data isolation. The model backbone is pre-trained solely on public data to avoid privacy risks and stabilize the model. Private data are then adaptively clustered based on diversity, allowing for tailored prompt tuning. This adaptive approach ensures efficient unlearning by re-optimizing prompts and classifier heads only for affected clusters when data removal is required. Thus, LMEraser achieves precise unlearning without full model retraining, maintaining performance and privacy.
The evaluation of LMEraser focuses on model utility and unlearning efficiency. The model utility is assessed by image classification accuracy, ensuring no compromise during unlearning. Unlearning efficiency is measured by time and computational costs. Using ImageNet-22K as the public dataset and smaller datasets like CIFAR-10, CIFAR-100, GTSRB, and SVHN as private datasets, LMEraser is compared with baselines like retraining from scratch and SISA. Tests are conducted on Nvidia Tesla V100-FHHL GPUs using PyTorch v2.1.2 and CUDA 12.1. Results demonstrate LMEraser’s better performance and efficiency in handling unlearning requests.
In conclusion, LMEraser represents a breakthrough in exact unlearning techniques tailored for large-scale models. By leveraging prompt tuning, it adeptly sequesters the impact of private data, ensuring robust privacy protection. Its adaptive approach to prompt tuning strikes a delicate balance between efficient unlearning and safeguarding model performance. Extensive experiments affirm LMEraser’s efficacy in achieving precise unlearning while upholding accuracy standards, underscoring its versatility across diverse datasets and expansive model architectures.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 40k+ ML SubReddit
For Content Partnership, Please Fill Out This Form Here..
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.