Image by Author
In today’s tech-savvy world, we’re surrounded by mind-blowing AI-powered wonders: voice assistants answering our questions, smart cameras identifying faces, and self-driving cars navigating roads. They’re like the superheroes of our digital age! However, making these technological wonders work smoothly on our everyday devices is tougher than it seems. These AI superheroes have a special need: significant computing power and memory resources. It’s like trying to fit an entire library into a tiny backpack. And guess what? Most of our regular devices like phones, smartwatches, etc. don’t have enough ‘brainpower’ to handle these AI superheroes. This poses a major problem in the widespread deployment of the AI technology.
Hence, it is crucial to improve the efficiency of these large AI models to make them accessible. This course: “TinyML and Efficient Deep Learning Computing” by MIT HAN lab tackles this core obstacle. It introduces methods to optimize AI models, ensuring their viability in real-world scenarios. Let’s take a detailed look at what it offers:
Course Structure:
Duration: Fall 2023
Timing: Tuesday/Thursday 3:35-5:00 pm Eastern Time
Instructor: Professor Song Han
Teaching Assistants: Han Cai and Ji Lin
As this is an ongoing course, you can watch the live streaming at this link.
Course Approach:
Theoretical Foundation: Starts with foundational concepts of Deep Learning, then advances into sophisticated methods for efficient AI computing.
Hands-on Experience: Provides practical experience by enabling students to deploy and work with large language models like LLaMA 2 on their laptops.
1. Efficient Inference
This module primarily focuses on enhancing the efficiency of AI inference processes. It delves into techniques such as pruning, sparsity, and quantization aimed at making inference operations faster and more resource-efficient. Key topics covered include:
- Pruning and Sparsity (Part I & II): Exploring methods to reduce the size of models by removing unnecessary parts without compromising performance.
- Quantization (Part I & II): Techniques to represent data and models using fewer bits, saving memory and computational resources.
- Neural Architecture Search (Part I & II): These lectures explore automated techniques for discovering the best neural network architectures for specific tasks. They demonstrate practical uses across various areas such as NLP, GAN, point cloud analysis, and pose estimation.
- Knowledge Distillation: This session focuses on knowledge distillation, a process where a compact model is trained to mimic the behavior of a larger, more complex model. It aims to transfer knowledge from one model to another.
- MCUNet: TinyML on Microcontrollers: This lecture introduces MCUNet, which focuses on deploying TinyML models on microcontrollers, allowing AI to run efficiently on low-power devices. It covers the essence of TinyML, its challenges, creating compact neural networks, and its diverse applications.
- TinyEngine and Parallel Processing: This part discusses TinyEngine, exploring methods for efficient deployment and parallel processing strategies like loop optimization, multithreading, and memory layout for AI models on constrained devices.
2. Domain-Specific Optimization
In the Domain-Specific Optimization segment, the course covers various advanced topics aimed at optimizing AI models for specific domains:
- Transformer and LLM (Part I & II): It dives into Transformer basics, design variants, and covers advanced topics related to efficient inference algorithms for LLMs. It also explores efficient inference systems and fine-tuning methods for LLMs.
- Vision Transformer: This section introduces Vision Transformer basics, efficient ViT strategies, and diverse acceleration techniques. It also explores self-supervised learning methods and multi-modal Large Language Models (LLMs) to enhance AI capabilities in vision-related tasks.
- GAN, Video, and Point Cloud: This lecture focuses on enhancing Generative Adversarial Networks (GANs) by exploring efficient GAN compression techniques (using NAS+distillation), AnyCost GAN for dynamic cost, and Differentiable Augmentation for data-efficient GAN training. These approaches aim to optimize models for GANs, video recognition, and point cloud analysis.
- Diffusion Model: This lecture offers insights into the structure, training, domain-specific optimization, and fast-sampling strategies of Diffusion Models.
3. Efficient Training
Efficient training refers to the application of methodologies to optimize the training process of machine learning models. This chapter covers the following key areas:
- Distributed Training (Part I & II): Explore strategies to distribute training across multiple devices or systems. It provides strategies for overcoming bandwidth and latency bottlenecks, optimizing memory consumption, and implementing efficient parallelization methods to enhance the efficiency of training large-scale machine learning models across distributed computing environments.
- On-Device Training and Transfer Learning: This session primarily focuses on training models directly on edge devices, handling memory constraints, and employing transfer learning methods for efficient adaptation to new domains.
- Efficient Fine-tuning and Prompt Engineering: This section focuses on refining Large Language Models (LLMs) through efficient fine-tuning techniques like BitFit, Adapter, and Prompt-Tuning. Additionally, it highlights the concept of Prompt Engineering and illustrates how it can enhance model performance and adaptability.
4. Advanced Topics
This module covers topics about an emerging field of Quantum Machine Learning. While the detailed lectures for this segment are not available yet, the planned topics for coverage include:
- Basics of Quantum Computing
- Quantum Machine Learning
- Noise Robust Quantum ML
These topics will provide a foundational understanding of quantum principles in computing and explore how these principles are applied to enhance machine learning methods while addressing the challenges posed by noise in quantum systems.
If you are interested in digging deeper into this course then check the playlist below:
https://www.youtube.com/watch?v=videoseries
This course has received fantastic feedback, especially from AI enthusiasts and professionals. Although the course is ongoing and scheduled to conclude by December 2023, I highly recommend joining! If you’re taking this course or intend to, share your experiences. Let’s chat and learn together about TinyML and how to make AI smarter on small devices. Your input and insights would be valuable!
Kanwal Mehreen is an aspiring software developer with a keen interest in data science and applications of AI in medicine. Kanwal was selected as the Google Generation Scholar 2022 for the APAC region. Kanwal loves to share technical knowledge by writing articles on trending topics, and is passionate about improving the representation of women in tech industry.