Research group working the fields of Information Retrieval, Natural Language Processing, Data Mining, Machine Learning, and Artificial Intelligence.

Joined September 2019
133 Photos and videos
27 Oct 2025
We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use. huggingface.co/datasets/cora…
3
18
88
19,331
27 Oct 2025
The data spans 7 text domains: 🌐 Web: Wikis, GitHub, social media 💬 Political: Parliamentary proc., speeches ⚖️ Legal: Court decisions, law 📰 News: Newspaper archives 🏦 Economics: Public tenders 📚 Cultural: Heritage collections 🔬 Scientific: Papers, books, journals
1
6
286
27 Oct 2025
For full technical details compliance Datasheet see our preprint @ arxiv.org/abs/2510.13996 As for German-specific models trained on this data... stay tuned 👀

6
226
18 Jul 2025
Come join us at the poster session at ICTIR 2025 to discuss: - Axioms for Retrieval-Augmented Generation webis.de/publications.html#m… - Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins webis.de/publications.html#g…
1
1
7
472
18 Jul 2025
Honored to win the ICTIR Best Paper Honorable Mention Award for "Axioms for Retrieval-Augmented Generation"! Our new axioms are integrated with ir_axioms: github.com/webis-de/ir_axiom… Nice to see axiomatic IR gaining momentum.
1
5
15
608
18 Jul 2025
6
182
18 Jul 2025
Thrilled to announce that @MattiWiegmann has successfully defended his PhD! 🎉🧑‍🎓 Huge congratulations on this incredible achievement! #PhDDefense #AcademicMilestone
6
175
16 Jul 2025
Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation. 📄 webis.de/publications.html#g…
1
6
20
573
Webis Group retweeted
Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :) The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API. More details are available at: trec-tot.github.io/guideline…
7
14
599
22 Jun 2025
Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.
1
2
6
283
22 Jun 2025
Results on BEIR demonstrate that our method matches teacher distillation effectiveness, while using only 13.5% of the data and achieving 3-15x training speedup. This makes effective bi-encoder training more accessible, especially for low-resource settings.
1
92
Webis Group retweeted
Short: Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-ranking webis.de/publications.html#s… Full: Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders webis.de/publications.html#s…
7
29
976
Webis Group retweeted
What an honor to receive both the best short paper award and the best paper honourable mention award at #ECIR2025. Thank you to all the co-authors @maik_froebe @hscells @ShengyaoZhuang @bevan_koopman @guidozuc @bennostein @martinpotthast @matthias_hagen 🥳
4
6
43
1,284
Webis Group retweeted
1
6
404
📢 Our paper "The Viability of Crowdsourcing for RAG Evaluation" has been accepted to #SIGIR2025 ! We compared how good humans and LLMs are at writing and judging RAG responses, assembling 1800 responses across 3 styles, and 47K pairwise judgments in 7 quality dimensions. 🧵➡️
1
5
15
534
🧵 3/4 This fundamentally challenges previous assumptions about RAG evaluation and system design. But we also show how crowdsourcing offers a viable and scalable alternative! Check out the paper for more. 📝 Preprint @ downloads.webis.de/publicati…⚙️Code/Data is openly available.

1
1
4
183
🧵 4/4 Credit and thanks to the author team @LukasGienapp, Tim Hagen, @maik_froebe, @matthias_hagen, @bennostein, @martinpotthast, and @hscells – you can also catch some of them at #ECIR2025 currently if you want to chat about RAG!
1
7
192