Fine-tuning pre-trained models has become the basis for achieving state-of-the-art results across various tasks in machine learning. This practice involves adjusting a model, initially trained on a large dataset, to perform well on a more specific task. One of the challenges in this field is the inefficiency associated with the need for numerous fine-tuned models to achieve optimal performance. The go-to approach has been to average the weights of multiple fine-tuned models to improve accuracy, a computationally expensive and time-consuming process.
Current strategies, WiSE-FT (Model Soup) merges weights of fine-tuned models to improve performance. It reduces variance through weight interpolation and emphasizes the proximity of merged weights to the center of the weight distribution. This approach outperforms other fine-tuning techniques such as BitFit and LP-FT. However, this method requires many models, raising questions about efficiency and practicality in scenarios where models must be developed from scratch.
Researchers at the NAVER AI Lab have introduced Model Stock, a fine-tuning methodology that diverges from conventional practices by requiring significantly fewer models to optimize final weights. What sets Model Stock apart is its utilization of geometric properties in the weight space, enabling the approximation of a center-close weight with only two fine-tuned models. This innovative approach simplifies the optimization process while maintaining or enhancing model accuracy and efficiency.
In implementing Model Stock, the team conducted CLIP architecture experiments, focusing primarily on the ImageNet-1K dataset for in-distribution performance analysis. They extended their evaluation to out-of-distribution benchmarks to further assess the method’s robustness, specifically targeting ImageNet-V2, ImageNet-R, ImageNet-Sketch, ImageNet-A, and ObjectNet datasets. The choice of datasets and the minimalistic approach in model selection underscore the method’s practicality and effectiveness in optimizing pre-trained models for enhanced task-specific performance.
Model Stock’s performance on the ImageNet-1K dataset showed a remarkable top-1 accuracy of 87.8%, indicating its effectiveness. When applied to out-of-distribution benchmarks, the method achieved an average accuracy of 74.9% across ImageNet-V2, ImageNet-R, ImageNet-Sketch, ImageNet-A, and ObjectNet. These results demonstrate not only its adaptability to various data distributions but also its capability to maintain high levels of accuracy with minimal computational resources. The method’s efficiency is further highlighted by its computational cost reduction, requiring only two models for fine-tuning compared to the extensive model ensemble traditionally employed.
In conclusion, the Model Stock technique introduced by the NAVER AI Lab significantly refines the fine-tuning process of pre-trained models, achieving notable accuracies on both ID and OOD benchmarks with just two models. This method reduces computational demands while maintaining performance, showcasing a practical advancement in machine learning. Its success across diverse datasets emphasizes the potential for broader application and efficiency in model optimization, presenting a step forward in addressing current machine learning practices’ computational and environmental challenges.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 39k+ ML SubReddit
Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.