TLDR
In this article we explore OpenAI’s brand new GPT-4-Turbo model, using its increased 128k token context window to pass in a full document corpus for information retrieval. This crude brute force method — only possible with larger context windows — is simple to implement and doesn’t require document embeddings and semantic search as used in Retrieval Augmented Generation (RAG). Applied to humanitarian disaster situation reports published on the amazing ReliefWeb platform — compressed using Sparse Priming Representations (SPR)— we show that GPT-4-Turbo is able to answer basic questions about recent disasters. However, even with the recent decrease in OpenAI’s token costs, this approach is prohibitively expensive and prompting the preview GPT-4-Turbo model is very slow, sometimes taking up to a minute to respond. As with all LLM information retrieval patterns, it of course crucial to implement a validation framework to ensure hallucination and information ommision are controlled. That said, GPT-4-Turbo offers a great step forward in capabilities, especially as performance improves and costs come down, adding to the rapidly expanding LLM toolkit.
Given the frantic pace of developments in the last couple of months with the release of autogen, memgpt, Semantic Kernal, and OpenAI’s GPTs and GPT-4-Turbo, I thought I would do a series of articles which compare some of the techniques these new tools offer for conversational information retrieval. Each has its own advantages and disadvantages, and some present a potential paradigm shift in how we use Large Language Models (LLMs). It’s a pretty amazing time, but using these new techniques in real-world applications isn’t always as easy as initial exploration sometimes suggests.
OpenAI’s recent announcement at DevDay included a few new features that could potentially change the Generative AI landscape. One of these was the release (in preview) of GPT-4-Turbo with an increased context window (prompt) limit of 128k tokens, compared to 32k previously. Previously Claude AI offered the largest commercial context limit of 100k tokens, so GPT-4-Turbo is a step beyond this. Additionally, OpenAI maintains that their new model is more proficient at instruction following and will be 3 times cheaper than GPT-4. As the lead in many LLM benchmarks, any advance of GPT-4 is important.
So why are increased token limits a big deal? When prompting an LLM you can include past conversation, so one immediate benefit is that it remembers what you were talking about earlier if you provide conversation history in every prompt. This is useful in order to reference facts in earlier conversations that might be important right now. A larger context window means you can also ‘Preload’ the chat with supporting information, such as document content and data.
But there is a downside.
More tokens mean higher cost and slower performance because with the transformer architecture memory and computational requirements increase quadratically (much faster than a simple straight line). Also, there is some research that suggests that longer context windows degrade LLM accuracy (Liu et al, 2023).
At time of writing, GPT-4-Turbo is in preview mode only, available as model ‘gpt-4–1106-preview’. To call it we will use the openai Python pakage like this …
import openaiopenai.api_key = '<YOUR KEY GOES KEY>'
def run_llm(query, system_prompt, reference_content):
llm_query = {
"temperature": 1.0,
"max_tokens": 2000,
"top_p": 0.95,
"frequency_penalty": 0,
"presence_penalty": 0,
}
response = openai.ChatCompletion.create(
model=model,
messages=[ {
"role":"system",
"content": system_prompt
},
{
"role":"user",
"content": query
}
],
temperature=llm_query['temperature'],
max_tokens=llm_query['max_tokens'],
top_p=llm_query['top_p'],
frequency_penalty=llm_query['frequency_penalty'],
presence_penalty=llm_query['presence_penalty'],
stop=None
)
answer = response['choices'][0]['message']['content']
return answer
We will explore GPT-4-Turbo’s increased token limit by using it to analyze Humanitarian Disaster Situation Reports on the amazing ReliefWeb platform. These reports (known as ‘Sitreps’) are vital for monitoring and reacting to humanitarian disasters around the world. They also provide a text corpus of data which can be compressed (summarized) to fit into GPT-4-Turbo’s context window for our analysis.
ReliefWeb has a really great API for accessing content, so we will use this to extract a list of disasters and situation reports …
import requests
import os
from bs4 import BeautifulSoup
import re
import pandas as pd
import PyPDF2
import traceback
import json
import ast
from langchain.text_splitter import RecursiveCharacterTextSplitter
import tiktokendef auto_translate(text):
"""
This function automatically detects language and translates to english
Parameters:
text(str): The text to be translated
Returns:
text (str): Translated text if in another language, otherwise
input text
"""
try:
lang = translator.detect(text)
lang = lang.lang
print(f"Language detected: {lang}")
q = translator.translate(text, dest='en')
text = q.text
except Exception as e:
print("An exception occurred trying to translate")
return text
def get_safe_name(name):
"""
This function takes a string and returns a version of it that is
safe to use as a filename.
Parameters:
name (str): The string to be converted to a safe filename.
Returns:
name (str): The safe filename.
"""
name = str(name)
name = re.sub("[^0-9a-zA-Z]+", "_", name)
name = re.sub(r"_$","", name)
if len(name) == 0:
name = 'Unknown'
return name
def download_pdf(url, download_path):
"""
Function to download a PDF from a URL and save locally
Parameters:
url (str): Location of online PDF file
download_path (str): Folder where to save PDF
"""
response = requests.get(url)
with open(download_path, 'wb') as f:
f.write(response.content)
def save_text(content, file_path):
"""
Function to save text to local file
Parameters:
content (str): Text to save
file_path (str): Folder where to save
"""
with open(file_path, 'w') as file:
print(f'Saving {file_path}')
file.write(content)
def extract_text_from_pdf(pdf_path):
"""
Function to extract text from PDF file
Parameters:
pdf_path (str): Path to PDF file
Returns:
text (str): Text extracted from PDF file
"""
print(pdf_path)
pdf_reader = PyPDF2.PdfReader(pdf_path)
text = ''
for page_num in range(len(pdf_reader.pages)):
page_obj = pdf_reader.pages[page_num]
text += page_obj.extract_text()
return text
def get_rw_data(keyword, filter, sort, fields, endpoint, limit=10, \
save_body_to_text=False):
"""
Function to extract data from ReliefWeb API. For API details see:
https://apidoc.rwlabs.org/?utm_medium=blog&utm_source=reliefweb+website&utm_campaign=api+doc+launching+2016_06
Parameters:
keyword (str): Search string
filter (dict): ReliefWeb filter json
sort (dict): ReliefWeb sort json
fields (list): List of fields to return
endpoint (str): API Endpoint, eg reports, disasters
limit (int): Maximum records to return
save_body_to_text (bool) : Flag to save body to text file, including any PDFs on page
Returns:
all_data (pandas dataframe): Dataframe of data from API
"""
query = {
"appname": "myapp",
"query": {
"value": keyword
},
"filter":filter,
"sort": sort,
"limit": limit,
"fields": fields
}
endpoint = f"{reliefweb_api_url}/{endpoint}?appname=apidoc&query[value]="
print(f"Getting {endpoint} ...")
all_data =[]
response = requests.post(endpoint, json=query)
if response.status_code == 200:
data = response.json()
for article in data["data"]:
article_url = article['fields']['url']
try:
r = article['fields']
print(article_url)
article_response = requests.get(article_url)
if save_body_to_text:
soup = BeautifulSoup(article_response.text, 'html.parser')
main_content = [p.text for p in soup.find_all('p')]
article_text = ' '.join(main_content)
save_text(article_text, docs_folder + '/{}.txt'.format(get_safe_name(article['fields']['title'])))
for link in soup.find_all('a'):
href = link.get('href')
if href.endswith('.pdf'):
download_path = os.path.join(docs_folder, href.split('/')[-1])
if href.startswith('/attachments'):
pdf_url = f'{reliefweb_pdf_url}{href}'
else:
pdf_url = href
download_pdf(pdf_url, download_path)
print(f". Downloaded PDF {download_path} from {pdf_url}")
article_text = extract_text_from_pdf(download_path)
r['article_text'] = article_text
r['reliefweb_query'] = keyword
all_data.append(r)
except Exception as e:
print(f"An exception occurred trying to extract {article_url}")
tb_str = ''.join(traceback.format_exception(None, e, e.__traceback__))
print(tb_str)
all_data = pd.DataFrame(all_data)
for f in ['disaster','theme']:
if f in list(all_data.columns):
all_data[f] = all_data[f].astype(str)
return all_data
else:
print(f"Request failed with status {response.status_code} {response.text}")
return None
In the above functions, some points of interest …
- If the ReliefWeb content refers to a PDF, we extract the text from that
- Any text is auto-translated to English using Google Translate API
- We always capture sources for attribution
- We haven’t bothered with API response pagination for this quick analysis
Here is how we call the function to get disaster situation reports since the 1st of November 2023 …
filter = {
"operator": "AND",
"conditions": [
{
"field": "disaster.status",
"value": "ongoing"
},
{
"field": "format.name",
"value": "Situation Report"
},
{
"field": "date.created",
"value": {
"from": "2023-11-01T00:00:00+00:00",
"to": "2023-11-30T23:59:59+00:00"
}
}
]
}
sort = ["date.created:desc"]
endpoint = "reports"
fields = {
"include": ["title", "body", "url", "source", "date", "format", "theme",
"country", "status", "primary_country", "disaster", "language", "id"]
}
reliefweb_query = ""
articles = get_rw_data(reliefweb_query, filter, sort, fields, endpoint, 1000, True)
The situation reports are saved to the file system as text files …
for index, row in articles.iterrows():
date = row['date']['created']
source = row['source'][0]['name']
title = row['title']
id = row['id']
filename = f'{get_safe_name(title)}__{id}_{get_safe_name(date)}.txt'text = f'''
title: {title}
source: {source}
date: {date}
id: {id}
{row['article_text']}
'''
text = text.encode('utf-8','ignore').decode('utf-8','ignore')
print(text)
save_text(text, docs_folder + '/' + filename)
We could just prompt GPT-4-Turbo with raw text from the situation reports. However, much of that text is irrelevant — such as stop words, headers, and footers — so including all of it would very quickly exceed even the increased token limits of GPT-4-Turbo. We will instead use a technique called Sparse Priming Representations (SPR) to compress documents to their key facts, modified a little to try and preserve quantitative information.
Here is the system prompt we’ll provide to GPT-4-Turbo to compress our documents …
# MISSION
You are a Sparse Priming Representation (SPR) writer. An SPR is a
particular kind of use of language for advanced NLP, NLU, and NLG
tasks, particularly useful for the latest generation of Large Language
Models (LLMs). You will be given information by the USER which you
are to render as an SPR.# THEORY
LLMs are a kind of deep neural network. They have been demonstrated
to embed knowledge, abilities, and concepts, ranging from reasoning
to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of an LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way. The exception are quantitative metrics, which you MUST preserve.
# METHODOLOGY
Render the input as a distilled list of succinct statements,
assertions, associations, concepts, analogies, and metaphors.
The idea is to capture as much, conceptually, as possible but with
as few words as possible while preserving all quantitative details.
Write it in a way that makes sense to you, as the future audience
will be another language model, not a human.
This will summarize a raw text status report into something like this for a situation report titled “African Polio Laboratory Network Bulletin (week 1–42, 2023)” …
- Information comes from World Health Organization.
- Weekly dengue cases decreasing, 12,634 reported this week.
- 78 dengue-related deaths this week.
- Cumulative cases since January 2023: 267,680, including 1,333 deaths.
- Cases decreasing in all divisions except Sylhet.
- Increase in dengue cases in Faridpur, Khulna, Rajshahi, Rajbari districts.
- WHO orientation session for entomologists in Dhaka occurred on 26 October.
- Overall Case Fatality Rate (CFR) is 0.5%, but 0.62% for this week.
- Dhaka division reports highest number of cases and deaths.
- In this week, 13,430 people released from hospital after treatment.
- Male-to-female death ratio: 43% vs 57%.
- Most deaths among 16-55 years age group.
- Four dengue virus types circulating: DENV-1 (2.2%), DENV-2 (68.1%), DENV-3 (25.4%), DENV-4 (0.2%).
- Dengue prevention & clean-up week launched nationwide from 29 October to 4 November.
- WHO expert entomologist deployed to Bangladesh for technical support in outbreak response.
- In the Rohingya camps, weekly dengue cases continue to decline. Total cumulative cases are 12,969 including 17 deaths (CFR: 0.1%).
Which is of course much less text than the original document.
I wouldn’t advise using this compression without significant analysis and checks to control for information omission, but for our tests, it will suffice.
Here is the code for compressing reports …
# gpt4 turbo is 128k
chunk_size = 100000llm_query = {
"prompt": "",
"temperature": 1.0,
"max_tokens": 2000,
"top_p": 0.95,
"frequency_penalty": 0,
"presence_penalty": 0,
"system_prompt":"""
# MISSION
You are a Sparse Priming Representation (SPR) writer. An SPR is a
particular kind of use of language for advanced NLP, NLU, and NLG
tasks, particularly useful for the latest generation of Large Language
Models (LLMs). You will be given information by the USER which you
are to render as an SPR.
# THEORY
LLMs are a kind of deep neural network. They have been demonstrated
to embed knowledge, abilities, and concepts, ranging from reasoning
to planning, and even to theory of mind. These are called latent abilities and latent content, collectively referred to as latent space. The latent space of an LLM can be activated with the correct series of words as inputs, which will create a useful internal state of the neural network. This is not unlike how the right shorthand cues can prime a human mind to think in a certain way. Like human minds, LLMs are associative, meaning you only need to use the correct associations to "prime" another model to think in the same way. The exception are quantitative metrics, which you MUST preserve.
# METHODOLOGY
Render the input as a distilled list of succinct statements,
assertions, associations, concepts, analogies, and metaphors.
The idea is to capture as much, conceptually, as possible but with
as few words as possible while preserving all quantitative details.
Write it in a way that makes sense to you, as the future audience
will be another language model, not a human.
"""
}
# Save texts
for index, row in articles.iterrows():
date = row['date']['created']
source = row['source'][0]['name']
report = row['title']
id = row['id']
text = row['article_text']
primary_country = row['primary_country']['name']
disaster = ''
disaster_types = ''
for d in ast.literal_eval(row['disaster']):
disaster += f"{d['name']}; "
for t in d['type']:
if 'primary' in t and t['primary'] == True:
disaster_types += f"{t['name']}; "
d = {
"disaster": disaster,
"date": date,
"disaster_types": disaster_types
}
prefix = ""
filename = f'{get_safe_name(report)}__{id}_{get_safe_name(date)}.txt'
header = f'- report: "{report}"\n- disaster: "{disaster}"\n' + \
f'- disaster_types: "{disaster_types}"\n' + \
f'- primary_country: "{primary_country}"\n- source: "{source}"\n' + \
f'- date: "{date}"\n- id: "{id}"\n'
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=chunk_size, chunk_overlap=100
)
texts = text_splitter.split_text(text)
print(f"\n\n================ {report} =================\n")
print(primary_country)
print(disaster)
print(len(texts))
summarized_text = ''
for i, t in enumerate(texts):
response = openai.ChatCompletion.create(
model=model,
messages=[ {
"role":"system",
"content":llm_query['system_prompt']
},
{
"role":"user",
"content":t
}
],
temperature=llm_query['temperature'],
max_tokens=llm_query['max_tokens'],
top_p=llm_query['top_p'],
frequency_penalty=llm_query['frequency_penalty'],
presence_penalty=llm_query['presence_penalty'],
stop=None)
summary = response['choices'][0]['message']['content']
summarized_text += "\n" + summary
summarized_text = auto_translate(summarized_text)
summarized_text = header + summarized_text
summarized_text = summarized_text.split("\n")
summarized_text_prefixed = ''
for s in summarized_text:
summarized_text_prefixed += f"{prefix}{s}\n"
print(summarized_text_prefixed)
save_text(summarized_text_prefixed, docs_folder2 + '/' + filename)
You’ll notice in the above we added some metadata about the report plus the SPR summary returned by GPT-4-Turbo. The compressed reports are then saved as text files.
We will also extract a high-level list of disasters from ReliefWeb to use in our system prompt, as an aid to information requests …
filter = {
"operator": "AND",
"conditions": [
{
"field": "status",
"value": "ongoing"
},
{
"field": "date.event",
"value": {
"from": "2020-01-01T00:00:00+00:00",
"to": "2023-11-30T23:59:59+00:00"
}
}
]
}
sort = ["date.event:desc"]
endpoint = "disasters"
fields = {
"include": ["name", "description", "date", "url", "id","status","glide"]
}
reliefweb_query = ""
disasters = get_rw_data(reliefweb_query, filter, sort, fields, endpoint, 1000, False)
display(disasters)
disasters.to_csv('disasters.csv')
This gives us a concise list …
We now have a list of disasters and compressed situation reports — from Nov 1st to Nov 10th — listing key facts from those disasters.
Let’s combine them into one text file for use as part of the system prompt for GPT-4-Turbo …
disasters = pd.read_csv('disasters.csv')
concatenated_content = "=========== this section gives a list of DISASTERS =========== \n\n "+ disasters.to_csv()
concatenated_content += "\n\n=========== this section provides disater REPORTS for each disaster =========== "
for f in os.listdir(docs_folder2):
with open(f"{docs_folder2}/{f}", "r") as file:
file_content = file.read()
concatenated_content += f"\n\n----- report: {f} ----- \n\n"
concatenated_content += file_content + "\n\n"
How many tokens and what’s the cost?
def num_tokens_from_string(string: str, encoding_name: str) -> int:
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
gpt4_token_cost_per_1000 = 0.01
cost = (num_tokens/1000.0)*gpt4_token_cost_per_1000
return num_tokens, costtokens, cost = num_tokens_from_string(concatenated_content,"cl100k_base")
OpenAI Tokens: 82001 ($0.82001)
So given the cost of $0.01 per 1,000 tokens input, the text created above comes out at $0.82 a prompt. There is also some completion token cost, $0.03 per 1000 tokens, but this should be much less than input cost as there are far fewer tokens.
Ouch!
We knew this brute force technique isn’t the best way to tackle our particular task, but the high cost is another reason.
Now we have our text, we can build a system prompt …
def run_llm(query, reference_content):llm_query = {
"temperature": 1.0,
"max_tokens": 2000,
"top_p": 0.95,
"frequency_penalty": 0,
"presence_penalty": 0,
}
response = openai.ChatCompletion.create(
model=model,
messages=[ {
"role":"system",
"content": f"""You are a ReliefWeb disasters bot. You
provide information on diasters.
If asked for a list of disasters, just provide the list
ONLY using the 'Disasters' section below.
If asked for details of a disaster, extract that information
from 'Situation Reports' section below.
{reference_content}"""
},
{
"role":"user",
"content": query
}
],
temperature=llm_query['temperature'],
max_tokens=llm_query['max_tokens'],
top_p=llm_query['top_p'],
frequency_penalty=llm_query['frequency_penalty'],
presence_penalty=llm_query['presence_penalty'],
stop=None
)
answer = response['choices'][0]['message']['content']
return answer
You’ll notice that the system prompt includes instructions about which sections in the concatenated text to use for different types of information. Through some quick prompt experimentation — I wouldn’t really call it ‘engineering’ — this produced better results.
OK, here goes …
answer = run_llm("What's the latest status regaring ther Nepalese earthquake
this year?", concatenated_content)
print(answer)
The latest status regarding the Nepal earthquake as of November 2023 is as
follows:- Earthquake: Western Nepal, 6.4 magnitude, occurred on 3 November 2023,
with an epicenter in Ramidanda, Jajarkot.
- Casualties: 154 deaths, 364+ injuries.
- Displacement: Approximately 250,000 people affected, with significant
numbers of homes damaged or destroyed.
- Infrastructure: Over 62,000 houses across 13 districts sustained
damage (26,557 fully, 35,455 partially). Several government buildings were also damaged.
- Humanitarian Response: The United Nations and other humanitarian
organizations have been active in relief efforts, providing medical tents, child-friendly spaces, WASH supplies, and psychosocial support, among other assistance.
- Recovery and Rehabilitation: The recovery and rehabilitation phase is
in planning, with international reinforcements arriving to aid in coordination and strategy development.
- Education and Health Services: Education has been disrupted for
about 80,000 children due to damaged schools, and health services
have been impacted, increasing the risk of malnutrition and complicating maternal and newborn care.
Please note that this is a condensed summary of the available data up until
10 November 2023, and further updates may follow as the situation evolves
and more assessments are made.
On the whole, this is a pretty good summary of the disaster. The facts agree well compared with the ReliefWeb disaster page (note this page may change since this article, as the disaster evolves) and the latest situation report used for this study.
One fact though is very slightly misaligned, the data said that tragically 153 people died, whereas the model returned 154.
answer = run_llm("How many people died in the Nepal earthquake in 2023 occur?",
concatenated_content)
print(answer)In the Nepal Earthquake of November 2023, there were 153 fatalities.
So it is ‘known’ but hallucinated in the first prompt. As with all applications of LLM, validation and checks are key to catching issues like this.
Let’s get some information about the organizations creating situation reports (for the period November 1st-10th) …
answer = run_llm("List the organizations providing sitreps for the Nepal
Earthquake", concatenated_content)
print(answer)The organizations providing situation reports (sitreps) for the Nepal Earthquake include:
- UN Office for the Coordination of Humanitarian Affairs (OCHA)
- Nepal Red Cross Society
- International Federation of Red Cross And Red Crescent Societies (IFRC)
- UN Resident Coordinator in Nepal
- World Health Organization (WHO)
- UN Children's Fund (UNICEF)
- UN Country Team in Nepal
Compared to the same query in ReliefWeb this seems correct.
Let’s ask some more nuanced questions …
answer = run_llm("How many displaced people are children?",
concatenated_content)
print(answer)In the provided data, there are references to children affected by disasters
in different reports, but the specific number of displaced children is not
directly mentioned in any of the excerpts. It would be necessary to review
more detailed reports or data sources that focus specifically on displacement
figures for children to determine the exact number affected by displacement.
Looking at the situation reports, I was unable to find this information either. A more detailed back-check is required to establish this is true though.
Let’s ask about potential disease and health impacts …
answer = run_llm("What disease risks are there for the Nepal 2023 earthquake?",
concatenated_content)
print(answer)The disease risks for the Nepal 2023 earthquake, as reported by UNICEF in
their Humanitarian Situation Report No. 2, are primarily focused on
malnutrition, particularly among children under five years old, and
psychosocial stress affecting the affected population. The disruption of
healthcare and WASH (Water, Sanitation, and Hygiene) services due to
the earthquake has further exacerbated the health risks.
The key disease risks and health concerns include:
1. Malnutrition: The earthquake and its aftermath have disrupted the food
supply chain, which can lead to an increased risk of malnutrition among
vulnerable populations, especially children.
2. Maternal and Newborn Health: Pregnant women and newborns might face
increased health risks due to the disruption of health care services and
a lack of access to essential medical care and facilities.
3. Psychosocial Stress: The trauma caused by the earthquake, the loss of
loved ones, and the displacement can lead to significant psychosocial
stress and mental health issues.
4. Injuries and Physical Health: With many injured in the aftermath,
there is an increased need for medical treatment for physical injuries
and trauma care.
5. Communicable Diseases: Overcrowding in temporary shelters can lead
to the spread of communicable diseases, particularly in the absence of
proper sanitation and hygiene facilities.
6. Waterborne and Vector-borne Diseases: Lack of clean water and
sanitation increases the risk of waterborne diseases like diarrhea,
cholera, and dysentery. There is also a concern for vector-borne
diseases such as malaria and dengue fever.
7. Non-communicable Diseases: Disruption of routine health services
can affect the management and treatment of chronic non-communicable diseases.
8. Disruption of Routine Immunization: The potential disruption of
routine immunization services can increase the risk of vaccine-preventable
diseases.
In response to these health risks, UNICEF and other health sector partners
have been providing health services, medical supplies, and psychosocial
support, as well as ensuring continued access to nutrition for children
and mothers.
Looking at the UNICEF report in the data, even with our SPF summarization the above seems to capture the main points. However, we only have report number 1 in the data used here, but the above mentions report number 2. Another minor hallucination, again illustrating that any LLM responses need automatic back-checking.
This is a very quick analysis to get an idea of how well GPT-4-Turbo — with its increased 128k context window and improved instruction following — performs by including all the text data needed for information retrieval as a system prompt, without any other processing. Is this brute force approach the best technique for our task?
Probably not, at least not yet.
Performance is a serious consideration. The prompts were taking 30 seconds or longer, not great for a delightful user experience.
Cost is also prohibitive. With more tokens needed to provide a full corpus with each prompt, there is increased cost — in this analysis $0.82 for every prompt! — so other techniques will undoubtedly be more attractive to many organizations, at least while LLM costs are at a premium. That said, I am reminded of the cost of storage over the years, and maybe we will see the same decrease for LLMs over time.
Alternatives such as generating code to query data, using functions registered with the LLM, and multiple agents for validating results may offer cheaper and more accurate options. They would also eliminate the need to compress documents in order to fit the corpus in the context window, thus avoiding information loss.
That said, we were able to show that prompting GPT-4-Turbo can support basic information retrieval on a corpus of compressed documents provided in the system prompt. This has some benefits in being very straightforward to implement – you just give the LLM all your stuff and ask questions. As AI advances and costs decrease this may become a very common technique in the future.