There is also a chat version. The models are available on the Hugging Face hub:
Falcon 180B is completely free and state-of-the-art. But it’s also a huge model.
Can it run on your computer?
Unless your computer is ready for very intensive computing, it can’t run Falcon 180B out-of-the-box. You will need to upgrade your computer and use a quantized version of the model.
In this article, I explain how you can run Falcon-180B on consumer hardware. We will see that it can be reasonably affordable to run a 180 billion parameter model on a modern computer. I also discuss several techniques that help reduce the hardware requirements.
The first thing you need to know is that Falcon 180B has 180 billion parameters stored as bfloat16. A (b)float16 parameter is 2 bytes in memory.
When you load a model, the standard Pytorch pipeline works like this:
- An empty model is created: 180B parameters * 2 bytes = 360 GB