Joined June 2021
Photos and videos
Stefan Bejgu retweeted
27 Jan 2025
Four of our industrial #PhD students, @SBejgu, @PereLluisHC, @alescire94 and @SimoneTedeschi_, were awarded their #PhD in #AI last Friday with the best grades (and two cum laude)! Congrats all! 👏 🎉 With @RNavigli, their advisor and Babelscape's scientific director, in the photo
5
12
536
Stefan Bejgu retweeted
Want to know if an AI is lying? LLM-OASIS helps detect factual accuracy in AI outputs with 81k training examples. LLM-OASIS introduces the largest dataset for training factuality evaluators, created by extracting and falsifying information from Wikipedia articles. This enables end-to-end verification of AI-generated text accuracy. ----- 🤔 Original Problem: LLMs still produce hallucinations in their outputs. Existing factuality evaluation resources are limited by being task-specific, small in size, or focused only on simple claim verification. ----- 🔧 Solution in this Paper: → LLM-OASIS extracts claims from Wikipedia passages using an LLM-based pipeline. → The system falsifies selected claims by introducing subtle but critical factual errors. → It generates pairs of factual and unfactual texts based on the original and modified claims. → The dataset covers 81k Wikipedia pages with 681k claims for training factuality evaluators. ----- 💡 Key Insights: → Task-agnostic factuality evaluation is possible with a large-scale synthetic dataset → Wikipedia provides reliable source material for generating factual/unfactual pairs → Human validation confirms high quality of automated data generation (90% accuracy) ----- 📊 Results: → GPT-4 achieves 60% accuracy on end-to-end factuality evaluation → 68% accuracy with Retrieval Augmented Generation → Human validation shows 96.78% accuracy for claim extraction → Dataset creation pipeline maintains 89-98% accuracy across all steps
3
4
15
1,919
Stefan Bejgu retweeted
✨ Meet #ResiDual, a novel perspective on the alignment of multimodal latent spaces! Think of it as a spectral "panning for gold" along the residual stream. It improves text-image alignment by simply amplifying task-related directions! 🌌🔍 arxiv.org/abs/2411.00246 [1/6]
2
11
30
3,048
Stefan Bejgu retweeted
25 Oct 2024
🚀 Today marks the start of the @MakerFaireRome 🎉 We’re super excited to be part of it and introduce #Vera, our new LLM-powered fact-checking tool! 🤖🧠 Here’s a sneak peek of what you can expect at our booth! 👀✨ #MakerFaireRome #FactChecking #LLM #ArtificialIntelligence
1
4
7
617
Stefan Bejgu retweeted
16 Oct 2024
✨Tired of verifying #AI-generated info?😵 🔎Meet Vera, our #LLM-based fact-checker using trusted sources from the Web or your knowledge base. 💥Check out the live demo at Rome #MakerFaire2024 (Oct 25-27)! More info 👉: babelscape.com/article/babel… #FactChecking #Misinformation
2
8
457
Stefan Bejgu retweeted
8 Sep 2024
🔵🔴When do distinct learning processes learn similar representations? Detecting patterns and conditions for this to happen is an open direction: a thread🧵 Working on this topic? Submit at: openreview.net/group?id=Neur… DEADLINE: 20 Sept See you at @NeurIPSConf! 🔵🔴 [1/N]
1
14
49
5,358
Stefan Bejgu retweeted
Post #ACL2024nlp dinner in #Bangkok with most of the presenting/attending band from our group @Babelscape. Left to right: @SBejgu @LorenzoProiet13 @giumartinelli_ @19Stefano97 @RiccardoRicOrl @FMTucci @RNavigli @KarimAsh14 & Celebrating our outstanding paper award! #NLProc
3
12
835
Stefan Bejgu retweeted
Come to chat with us at the poster session C in #EACL2024, starting now in room Radisson! #NLProc
2
21
811
Stefan Bejgu retweeted
Exciting strides in text summarization with LLMs 🚀but verifying their factual accuracy is still an open challenge 🤔 We introduce FENICE, a factuality-oriented metric for summarization with a strong focus on interpretability🔍arxiv.org/abs/2403.02270 #NLProc #LLMs #Factuality

2
10
20
1,494
Stefan Bejgu retweeted
📢Happy to share that "Neuralign: A Context-Aware, Cross-Lingual and Fully-Neural Sentence Alignment System for Long Texts" has been accepted to #EACL2024 (main) 🫂Huge thanks to my co-authors @SBejgu @SimoneTedeschi_ @ConiaSimone @RNavigli 📃More details coming soon! #NLProc
6
14
819
Stefan Bejgu retweeted
How to Mitigate Hallucinations in Large Language Models (#LLMs)?🤔 In this new @Medium article, I review the most recent research on mitigating hallucinations, and explain the main methods that are used to address this issue. 📑 generativeai.pub/how-to-miti… #AI #NLP #GPT4 #LLM
1
2
15
649
Stefan Bejgu retweeted
30 Nov 2023
Tomorow at 5pm @SBejgu will present our research work on word alignment in 14 language pairs! @CLiC_it_conf #CliCit2023, joint with @SapienzaNLP and many other partners! #NLProc #LLMs
6
10
744
Stefan Bejgu retweeted
27 Feb 2023
Excited about #ChatGPT for your business? Check out #Emotionary! The revolutionary #multilingual AI system that understands #emotions: #analyze customer reviews, #track feelings in #news, #socialmedia & #chatbot conversations! babelscape.com/emotionary
12
26
1,523
Stefan Bejgu retweeted
📢 It looks like relative representations are here to stay! I'm beyond thrilled to announce that our work has been selected as one of the notable top 5% (oral) papers at #iclr23 ! 🥳 x.com/moschella_luca/status/… [1/5]
Welcome Relative Representations, enabling zero-shot communication between latent spaces without any training! arxiv.org/abs/2209.15430 It turns out that distinct neural networks learn intrinsically equivalent latent spaces [1/6]
3
37
266
54,628
Stefan Bejgu retweeted
The Rome Workshop on 10 Years of #BabelNet & Multilingual Neurosymbolic Natural Language Understanding was a great success, with productive in-person discussions, amazing talks & >100 online participants! Thanks! @ERC_Research @Babelscape @SapienzaNLP @SapienzaRoma @WikiResearch
15
37
Stefan Bejgu retweeted
Open & commercial Neural Machine Translation models heavily suffer from disambiguation biases! We present DiBiMT, our novel benchmark for lexical-semantic bias in MT at #ACL2022! By @Valahaar @FedeMartelli25 @FrancescoSaina @RNavigli @ELEXIS_EU #NLProc 📝:researchgate.net/publication…
12
21
Stefan Bejgu retweeted
31 Mar 2022
Empower your natural language applications with WordAtlas! #WordAtlas is the next-generation multilingual knowledge graph. What makes it special is its linkage between words and concepts in hundreds of languages. babelscape.com/wordatlas
10
15
Stefan Bejgu retweeted
Classy is a @PyTorch-based library for the fast prototyping and sharing of deep neural network models. It wraps the best libraries like PyTorch Lightning, Transformers, @streamlit and offers them to users with a simple CLI interface. Try it here: github.com/sunglasses-ai/cla…
17
27
Stefan Bejgu retweeted
23 Mar 2022
@babelscape is a deep tech company founded in 2016 focused on multilingual Natural Language Processing with the main goal of enabling semantic text understanding and scaling enterprise applications to many languages without effort. @RNavigli @DIAGSapienza @SapienzaNLP
14
20