Introducing our latest work, OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs 🚀. OneGen enables LLMs to perform retrieval during generation, utilizing less training data while achieving impressive performance and efficiency.
#NLP #LLMs #RAG #Generation #Retrieval #Efficient #EntityLinking
📖 Paper:
huggingface.co/papers/2409.0…
⌨️ Code:
github.com/zjunlp/OneGen
🧩 Why OneGen: Many current tasks rely on both the retrieval and generative capabilities of models.
Traditional approaches typically employ a separate retrieval model and a generation model to accomplish such composite tasks. Since the representation spaces of generative and retrieval models do not overlap, their mode of interaction is through text.
In RAG tasks, a query undergoes a forward pass in the generative model and another in the retrieval model, necessitating two forward computations in total.
Furthermore, this current pipeline approach is susceptible to the accumulation of errors. Besides, in multi-turn dialogues, reformulation of the query is often essential.
Our work, however, circumvents the need for the query to undergo two forward computations and also eliminates the requirement for query reformulation, while adopting an end-to-end training methodology.
🔥 Contribution:
1️⃣ We propose a training-efficiency, inference-efficiency, and pluggable framework OneGen that is particularly suitable for tasks interleaved with generation and retrieval.
2️⃣ Our model, fine-tuned on less training data, demonstrates better performance on six RAG datasets and six entity linking datasets on average.
3️⃣ We demonstrate the efficiency of OneGen at inference, highlighting a speed improvement as the length of query increases or retrieval frequency increases, compared to other LLM alternatives.
💡Solution Overview: Our core idea is to integrate generation and retrieval to the same context by allocating the retrieval task to retrieval tokens generated in an autoregressive manner, thus enabling LLM to perform both tasks in a single forward pass.
🔬Results: We evaluate the effectiveness of our method on two main tasks that require both generation and retrieval: RAG (including single-hop QA which needs single-retrieval and multi-hop QA which needs multi-retrieval) and Entity Linking (EL).
Empirical results show OneGen outperforms the previous pipeline solutions. Moreover, further analysis demonstrates OneGen can enhance retrieval capability when jointly trained, with no sacrifice in generation capability. In addition, we demonstrate superior inference speed and memory consumption of OneGen compared with other LLM alternatives, particularly as retrieval frequency increases.
We're excited to hear your thoughts and feedback ! 🎙️