Large language models (LLMs) face a hurdle in handling long contexts due to their constrained window length. Although the context window length can be extended through fine-tuning, this incurs significant training and inference time costs, adversely affecting the LLM’s core capabilities.
Current LLMs, such as Llama-1 and Llama-2, have fixed context lengths, hindering real-world applications. Though fine-tuning can extend context length, it will result in considerable costs due to the quadratic computing complexity of self-attention, impacting both training and inference. Continuous training on long sequences may compromise LLMs’ general capabilities in shorter contexts. There’s a need for cost-effective mechanisms enabling context extension without compromising existing capabilities in pre-trained LLMs.
Researchers from the Beijing Academy of Artificial Intelligence, Gaoling School of Artificial Intelligence, and Renmin University of China have proposed Activation Beacon. It leverages the idea that LLM’s raw activations contain redundant information, condensing them with minimal loss. This condensed form enables the LLM to grasp a broad context within a short window. Like sparse attention and context compression, Activation Beacon effectively extends context quality, supports diverse lengths, and ensures compatibility with existing LLMs. Its technical designs enhance training and inference efficiency, making it a promising solution.
Using special tokens called beacons, Activation Beacon achieves a condensing ratio (α) of L/k (k ≪ L), optimizing information intake. The beacons employ three attention schemes, with stepwise expansion proving the most effective. Beaconed Auto-Regression combines condensed and raw activations in sliding windows, predicting the next token efficiently. Beacon, a plug-and-play LLM module, is trained by auto-regression, ensuring minimal impact on short-context processing while introducing long contextual information. Stepwise sampled condensing ratios enhance training efficiency and generalize beacons for diverse context lengths.
Activation Beacon excels in long-context language modeling, surpassing Llama-2-7B and outperforming fine-tuning-free methods. It gradually improves language modeling as context length extends from 4K to 32K, effectively utilizing expanded information. Compared to fine-tuned full-attention methods, Activation Beacon achieves comparable or superior performance with significantly higher efficiency. The method maintains quality generation even at 100K and extends to 400K, marking a remarkable 100x increase over Llama-2-7B. In LongBench tasks, Activation Beacon matches or surpasses fine-tuned baselines, showcasing its effectiveness in diverse real-world applications without compromising LLM’s original capabilities.
As a plug-and-play module, it introduces long contextual information while preserving LLM’s short-context capabilities. Employing sliding windows for streaming processing enhances efficiency in both inference and training. Diverse condensing ratios, sampled during training, enable effective support for a broad range of context lengths. Experimental results confirm Activation Beacon is an effective, efficient, and low-cost method for extending LLM context length.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our 36k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our Telegram Channel