Top LLM papers of the week
Papers link -
linkedin.com/pulse/top-llm-p…
[1] RAG Evaluation Dataset Generation Framework
Current RAG benchmarks don't effectively evaluate RAG systems' performance across various vertical domains. This paper introduces RAGEval, a framework for automatically generating RAG evaluation datasets. RAGEval summarizes a schema from seed documents, applies the configurations to generate diverse documents, and constructs question-answering pairs according to both articles and configurations. RAGEval also introduces three new metrics: Completeness, Hallucination, and Irrelevance.
[2] Generate High-Quality Instruction Data with Open-Sourced LLMs
Creating instruction datasets is costly and time-consuming, requiring manual annotation or expensive API calls to proprietary LLMs. This paper introduces FANNO, a novel framework to generate diverse and high-quality instruction datasets comparable to human-annotated ones.
[3] 500x Prompt Compressor
Prompt compression is essential for improving inference speed, reducing costs, and enhancing user experience. This paper introduces 500xCompressor, a method that compresses extensive natural language contexts into as little as one special token. It adds only about 0.3% additional parameters and achieves compression ratios between 6x and 480x. The results demonstrate that the LLM retained 62.26-72.89% of its capabilities compared to using non-compressed prompts.
[4] Fine-Grained Machine-Generated Text Detection Tool
This paper introduces LLM-DetectAIve system for fine-grained detection of machine-generated texts four categories namely human-written, machine-generated, machine-written machine-humanized, and human-written machine-polished. Unlike previous detectors that perform binary classification, LLM-DetectAIve offers insights into varying degrees of LLM intervention during text creation. This is highly useful areas like education, where any use of LLMs may be prohibited.
[5] EfficientRAG
Retrieval-augmented generation (RAG) methods face challenges with complex questions, particularly multi-hop queries. EfficientRAG approach avoids multiple calls to LLMs by iteratively generating new queries and outperforms existing RAG methods on three open-domain multi-hop question-answering datasets.
[6] Synthetic Medical Text Generation Framework
This paper introduces MedSyn, a new framework for generating medical text and it combines LLMs with a Medical Knowledge Graph (MKG). The MKG is used to sample prior medical information for prompts, which are then used to generate synthetic clinical notes using GPT-4 and fine-tuned LLaMA models. MedSyn generated synthetic data results in an increase in accuracy of up to 17.8% in the ICD code prediction task.
[7] RAG Foundry
This paper introduces RAGFoundry, an open-source framework for enhacing LLMs for RAG use cases. RAG FOUNDRY integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. Results show the framework's effectiveness by augmenting and finetuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets.
[8] BioRAG
BioRAG is a novel LLM-based Retrieval-Augmented Generation (RAG) framework specifically designed to overcome the challenges inherent in life science question-answering systems. Rigorous experiments have demonstrated BioRAG's superior performance compared to fine-tuned LLMs, LLMs with search engines, and other scientific RAG frameworks.
[9] BioMamba
BioMamba is an LLM based on Mamba architecture, specifically designed for biomedical text mining. It has been pre-trained on an extensive corpus of biomedical literature, making it uniquely suited for tasks in this domain. BioMamba achieves significant performance improvements over existing models like BioBERT and general-domain Mamba. Importantly, it achieves a 100× reduction in perplexity and a 4× reduction in cross-entropy loss on the BioASQ test set. [Paper]
[10] LLM Safety Evaluation Toolkit
WalledEval is a comprehensive AI safety testing toolkit designed to evaluate both open-source and closed-source large language models (LLMs). WalledEval features over 35 safety benchmarks which cover various crucial areas including Multilingual safety, Exaggerated safety and Prompt injection.
#llms #toppapers #llmpapers #generativeai #researchpapers #research