In natural language processing, the spotlight is shifting toward the untapped potential of small language models (SLMs). While their larger counterparts have dominated the landscape, the question lingers: just how critical is model size for effective problem-solving? The study explores this pivotal question, delving into SLMs’ advantages and introducing TinyGSM.
Researchers from Carnegie Mellon University and Microsoft Research introduce TinyGSM, a synthetic dataset comprising 12.3 million grade school math problems and Python solutions generated by GPT-3.5. It is a study tool for small language models (SLMs) in mathematical reasoning. The approach leverages the high-quality dataset and utilizes a verifier to enhance performance, surpassing larger models in accuracy.
The study addresses the efficacy of data utilization versus conventional scaling laws in model improvement, emphasizing the significance of synthetic data generation in data-scarce scenarios. It notes the compensatory effect of increasing dataset size for smaller model sizes. The use of verifiers to select optimal responses from multiple candidates is highlighted as successful in prior works.
The study addresses the under-explored potential of SLMs in mathematical reasoning, focusing on breaking the 80% accuracy barrier on the challenging GSM8K benchmark for grade school math problems. Researchers propose leveraging high-quality datasets like TinyGSM and a verifier model for optimal output selection from multiple candidate generations to achieve this. The study explores synthetic data generation, prompt-engineered data, and a teacher-student scenario to enhance small model performance, introducing TinyGSM as a synthetic dataset demonstrating high accuracy on the GSM8K benchmark.
TinyGSM, a synthetic dataset of grade school math problems with Python solutions, is entirely generated by GPT-3.5. By fine-tuning a 1.3B generation model and a 1.3B verifier model on TinyGSM, the verifier selects optimal outputs from multiple candidates, enhancing model accuracy. Filtering ensures data quality, excluding short problems or non-numeric content. Exploring different solution formats suggests scaling the verifier as a more efficient use of model parameters, drawing connections to GAN training insights. Emphasizing high-quality datasets and verifier use, the study underscores achieving high accuracy with small language models.
TinyGSM is introduced, a synthetic dataset of grade school math problems and Python solutions generated by GPT-3.5. Fine-tuning a 1.3B generation model and a 1.3B verifier on TinyGSM achieves a remarkable 81.5% accuracy on the GSM8K benchmark, surpassing much larger models. The model’s performance rivals that of the GSM8K dataset, and it exhibits robustness with 75.6% accuracy on SVAMP without further fine-tuning. The study emphasizes the verifier’s efficacy in optimal response selection, suggesting scaling it as a more efficient use of model parameters. High-quality datasets and including irrelevant context contribute to improved small language model performance.
In conclusion, the study highlights the potential of SLMs for improving grade school mathematical reasoning. By employing high-quality datasets like TinyGSM and a verifier model, SLMs can surpass larger models in accuracy on the GSM8K benchmark. The study also emphasizes the importance of using quality datasets and verifiers, which can help bridge the performance gap between student and teacher models. The results suggest that SLMs can be a promising approach for achieving efficient and effective mathematical reasoning tasks.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 34k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.