Nowadays, nobody will be surprised by running a deep learning model in the cloud. But the situation can be much more complicated in the edge or consumer device world. There are several reasons for that. First, the use of cloud APIs requires devices to always be online. This is not a problem for a web service but can be a dealbreaker for the device that needs to be functional without Internet access. Second, cloud APIs cost money, and customers likely will not be happy to pay yet another subscription fee. Last but not least, after several years, the project may be finished, API endpoints will be shut down, and the expensive hardware will turn into a brick. Which is naturally not friendly for customers, the ecosystem, and the environment. That’s why I am convinced that the end-user hardware should be fully functional offline, without extra costs or using the online APIs (well, it can be optional but not mandatory).
In this article, I will show how to run a LLaMA GPT model and automatic speech recognition (ASR) on a Raspberry Pi. That will allow us to ask Raspberry Pi questions and get answers. And as promised, all this will work fully offline.
Let’s get into it!
The code presented in this article is intended to work on the Raspberry Pi. But most of the methods (except the “display” part) will also work on a Windows, OSX, or Linux laptop. So, those readers who don’t have a Raspberry Pi can easily test the code without any problems.
Hardware
For this project, I will be using a Raspberry Pi 4. It is a single-board computer running Linux; it is small and requires only 5V DC power without fans and active cooling:
A newer 2023 model, the Raspberry Pi 5, should be even better; according to benchmarks, it’s almost 2x faster. But it is also almost 50% more expensive, and for our test, the model 4 is good enough.