Evaluating Long Context Large Language Models | by Yennie Jun

There is a race towards language models with longer context windows. But how good are they, and how can we know?

The context window of language models have been growing at an exponential rate in the last few years. Figure created by the author.

This article was originally published on Art Fish Intelligence.

The context window of large language models — the amount of text they can process at once — has been increasing at an exponential rate.

In 2018, language models like BERT, T5, and GPT-1 could take up to 512 tokens as input. Now, in summer of 2024, this number has jumped to 2 million tokens (in publicly available LLMs). But what does this mean for us, and how do we evaluate these increasingly capable models?

The recently released Gemini 1.5 Pro model can take in up to 2 million tokens. But what does 2 million tokens even mean?

If we estimate 4 words to roughly equal about 3 tokens, it means that 2 million tokens can (almost) fit the entire Harry Potter and Lord of the Ring series.

(The total word count of all seven books in the Harry Potter series is 1,084,625. The total word count of all seven books in the Lord of the Ring series is 481,103. (1,084,625 +…

Source link