It’s been no more than a year now, where GPT stardust ✨ covered almost any sector globally. More and more experts, from any field, crave to utilise Large Language Models (LLM) in order to optimise their workflow. Evidently, the corporate world could not be absent from this new trend’s safari. The future promises unprecedented possibilities, yet wrapped in the suited… cost.
The scope of this project is to demonstrate an end-to-end solution for leveraging LLMs, in a way that mitigates the privacy and cost concerns. We will utilise LLMWare, an open-source framework for industrial-grade enterprise LLM apps development, the Retrieval Augmented Generation (RAG) method [1], and the BLING — a newly introduced collection of open-source small models, solely run on CPU.
Concept
After successfully predicting Jrue Holiday’s 🏀 transfer to Milwaukee Bucks, Data Corp took on a new project: assisting a FinTech SME to optimise its decision-making with AI. That is, to build a tool that will manipulate the millions(!) of proprietary docs, query state-of-the-art GPT like models and provide Managers with concise, optimal information. That’s all very well, but it comes with two major pitfalls:
- Security: Querying a commercial LLM model (i.e. GPT-4) essentially means sharing proprietary information over the internet (how about all those millions of docs?). A data breach would compromise the firm’s integrity for sure.
- Cost: An automated tool like the above will foster the Managers’ productivity, but there is no free lunch. The anticipated daily queries might count up to hundreds and given the ‘GPU-thirsty’ LLMs, the aggregated cost might easily get out of control.
The above limitations led me to a tricky alternative:
How about developing a custom tool that will consume proprietary knowledge and…