The idea is that we split the workflow into two streams to optimize costs and stability, as proposed with the LATM architecture, with some additional enhancements for managing data and memories specific to Data Recipes …
Stream 1: Recipes Assistant
This stream uses LLM agents and more powerful models to generate code snippets (recipes) via a conversational interface. The LLM is instructed with information about data sources — API specifications and Database Schema — so that the person creating recipes can more easily conversationally program new skills. Importantly, the process implements a review stage where generated code and results can be verified and modified by a human before being committed to memory. For best code generation, this stream uses more powerful models and autonomous agents, incurring higher costs per request. However, there is less traffic so costs are controlled.
Stream 2: Data Analysis Assistant
This stream is used by the wider group of end-users who are asking questions about data. The system checks memory to see if their request exists as a fact, e.g. “What’s the population of Mali?”. If not, it checks recipes to see if it has a skill to get the answer, eg ‘How to get the population of any country’. If no memory or skill exists, a request is sent to the recipes assistant queue for the recipe to be added. Ideally, the system can be pre-populated with recipes before launch, but the recipes library can actively grow over time based on user telemetry. Note that the end user stream does not generate code or queries on the fly and therefore can use less powerful LLMs, is more stable and secure, and incurs lower costs.
Asynchronous Data Refresh
To improve response times for end-users, recipes are refreshed asynchronously where feasible. The recipe memory contains code that can be run on a set schedule. Recipes can be preemptively executed to prepopulate the system, for example, retrieving the total population of all countries before end-users have requested them. Also, cases that require aggregation across large volumes of data extracted from APIs can be run out-of-hours, mitigating —albeit in part— the limitation of aggregate queries using API data.
Memory Hierarchy — remembering skills as well as facts
The above implements a hierarchy of memory to save ‘facts’ which can be promoted to more general ‘skills’. Memory retrieval promotion to recipes are achieved through a combination of semantic search and LLM reranking and transformation, for example prompting an LLM to generate a general intent and code, eg ‘Get total population for any country’ from a specific intent and code, eg ‘What’s the total population of Mali?’.
Additionally, by automatically including recipes as available functions to the code generation LLM, its reusable toolkit grows such that new recipes are efficient and call prior recipes rather than generating all code from scratch.
By capturing data analysis requests from users and making these highly visible in the system, transparency is increased. LLM-generated code can be closely scrutinized, optimized, and adjusted, and answers produced by such code are well-understood and reproducible. This acts to reduce the uncertainty many LLM applications face around factual grounding and hallucination.
Another interesting aspect of this architecture is that it captures specific data analysis requirements and the frequency these are requested by users. This can be used to invest in more heavily utilized recipes bringing benefits to end users. For example, if a recipe for generating a humanitarian response situation report is accessed frequently, the recipe code for that report can improved proactively.
This approach opens up the possibility of a community-maintained library of data recipes spanning multiple domains — a Data Recipes Hub. Similar to code snippet websites that already exist, it would add the dimension of data as well as help users in creation by providing LLM-assisted conversational programming. Recipes could receive reputation points and other such social platform feedback.
As with any architecture, it may not work well in all situations. A big part of data recipes is geared towards reducing costs and risks associated with creating code on the fly and instead building a reusable library with more transparency and human-in-the-loop intervention. It will of course be the case that a user can request something new not already supported in the recipe library. We can build a queue for these requests to be processed, and by providing LLM-assisted programming expect development times to be reduced, but there will be a delay to the end-user. However, this is an acceptable trade-off in many situations where it is undesirable to let loose LLM-generated, unmoderated code.
Another thing to consider is the asynchronous refresh of recipes. Depending on the amount of data required, this may become costly. Also, this refresh might not work well in cases where the source data changes rapidly and users require this information very quickly. In such cases, the recipe would be run every time rather than the result retrieved from memory.
The refresh mechanism should help with data aggregation tasks where data is sourced from APIs, but there still looms the fact that the underlying raw data will be ingested as part of the recipe. This of course will not work well for massive data volumes, but it’s at least limiting ingestion based on user demand rather than trying to ingest an entire remote dataset.
Finally, as with all ‘Chat with Data’ applications, they are only ever going to be as good as the data they have access to. If the desired data doesn’t exist or is of low quality, then perceived performance will be poor. Additionally, common inequity and bias exist in datasets so it’s important a data audit is carried out before presenting insights to the user. This isn’t specific to Data Recipes of course, but one of the biggest challenges posed in operationalizing such techniques. Garbage in, garbage out!
The proposed architecture aims to address some of the challenges faced with LLM “Chat With Data”, by being …
- Transparent — Recipes are highly visible and reviewed by a human before being promoted, mitigating issues around LLM hallucination and summarization
- Deterministic — Being code, they will produce the same results each time, unlike LLM summarization of data
- Performant — Implementing a memory that captures not only facts but skills, which can be refreshed asynchronously, improves response times
- Inexpensive— By structuring the workflow into two streams, the high-volume end-user stream can use lower-cost LLMs
- Secure — The main group of end-users do not trigger the generation and execution of code or queries on the fly, and any code undergoes human assessment for safety and accuracy
I will be posting a set of follow-up blog posts detailing the technical implementation of Data Recipes as we work through user testing at DataKind.
Large Language Models as Tool Makers, Cai et al, 2023.
Unless otherwise noted, all images are by the author.
Please like this article if inclined and I’d be delighted if you followed me! You can find more articles here.