Filter
Exclude
Time range
-
Near
🥳#StructFlowBench is a structurally annotated multi-turn benchmark that leverages a structure-driven generation paradigm to enhance the simulation of complex dialogue scenarios. 🥳StructFlowBench is now part of the #CompassHub! 😉Feel free to download and explore it—available for public use. 🤗hub.opencompass.org.cn/datas…
1
3
798
25 Feb 2025
🏷️:StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following 🔗:arxiv.org/abs/2502.14494.pdf
1,562
10. StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following 🔑 Keywords: Multi-turn instruction following, Structural flow modeling, LLM evaluation, Dialogue turns, Custom dialogue flows 💡 Category: Natural Language Processing 🌟 Research Objective: To introduce StructFlowBench, a benchmark addressing the structural dependency between dialogue turns in multi-turn instruction following for large language models (LLMs). 🛠️ Research Methods: Development of a structural flow framework with six inter-turn relationships, followed by systematic evaluations of 13 leading LLMs using established automatic evaluation methodologies. 💬 Research Conclusions: The evaluations revealed significant deficiencies in current models' comprehension of multi-turn dialogue structures, indicating a gap in existing benchmarks that focus mainly on fine-grained constraint satisfaction without addressing structural dependencies. 👉 Paper link: huggingface.co/papers/2502.1…
1
28
📚 AI Native Daily Paper Digest - 20250224 🌟 Follow @AINativeF for the latest insights on AI Native. Covering AI research papers from Hugging Face, featured in the image. 💡 Stay updated with the latest research trends and dive deep into the future of AI! 🚀 #AI #HuggingFace #AIPaper #AINative #AINF — Appendix: Today's AI research papers — 1. LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers 2. SurveyX: Academic Survey Automation via Large Language Models 3. MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction 4. Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model 5. PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data 6. SIFT: Grounding LLM Reasoning in Contexts via Stickers 7. VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues 8. LightThinker: Thinking Step-by-Step Compression 9. Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models 10. StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following 11. MoBA: Mixture of Block Attention for Long-Context LLMs 12. KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding 13. ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
3
2
134