Im a research scientist in NLP, currently working on the OpenGPT-X, EuroLingua-GPT, and TrustLLM to build open-source multilingual LLMs for Europe.

Joined February 2017
1 Photos and videos
Pinned Tweet
26 Nov 2024
🌟 𝐓𝐞𝐮𝐤𝐞𝐧-7𝐁-𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭 𝐢𝐬 𝐡𝐞𝐫𝐞! The first LLM from OpenGPT-X is now available free of charge on Hugging Face. For me, OpenGPT-X represents a significant milestone in Germany’s NLP research landscape, demonstrating how 𝐩𝐫𝐚𝐠𝐦𝐚𝐭𝐢𝐬𝐦 and 𝐬𝐜𝐢𝐞𝐧𝐭𝐢𝐟𝐢𝐜 𝐫𝐢𝐠𝐨𝐫 can come together to create impactful results. 🚀 𝐖𝐡𝐲 𝐭𝐡𝐢𝐬 𝐞𝐱𝐜𝐢𝐭𝐞𝐬 𝐦𝐞: - International Benchmark: OpenGPT-X shows that Germany can deliver projects of international caliber. This is crucial for retaining the highly skilled professionals trained here. - Beacon for Innovation: Projects like this inspire and highlight what’s possible. They act as magnets for talent in computer science. 👩‍💻 𝐌𝐲 𝐑𝐨𝐥𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐫𝐨𝐣𝐞𝐜𝐭 (𝐚𝐧𝐝 𝐋𝐢𝐟𝐞!): For about a year, we’ve been closely monitoring the training of Teuken-7B, overseeing progress daily, and adjusting processes based on new insights. This intensive but rewarding work has laid the foundation for future LLMs in EuroLingua. At the same time, I’ve been raising two little humans at home—monitoring their progress, navigating surprises, and making adjustments as needed! Let’s just say, whether it’s AI models or toddlers, both require patience, consistency, and a good sense of humor. 😊 🏗️ 𝐈𝐧𝐯𝐞𝐬𝐭𝐢𝐧𝐠 𝐢𝐧 𝐄𝐮𝐫𝐨𝐩𝐞’𝐬 𝐀𝐈 𝐅𝐮𝐭𝐮𝐫𝐞: Over time, we’ve built a robust, future-ready framework for Europe: •Multilingual Evaluation: We created benchmarks and a leaderboard that covers 21 European languages to systematically assess AI models. •Custom Training Framework: Starting from scratch, we developed “Modalities,” an open-source training framework that will power upcoming models like EuroLingua. •Data Pipeline: We are building a European data pipeline capable of processing multiple petabytes of data following the latest insights in research, ensuring scalability for future demands. 💡 𝐖𝐡𝐲 𝐎𝐩𝐞𝐧 𝐒𝐨𝐮𝐫𝐜𝐞 𝐌𝐚𝐭𝐭𝐞𝐫𝐬 𝐟𝐨𝐫 𝐀𝐈: Open Source removes barriers to learning, sharing, and improving systems. It provides the essential freedoms to: 1. Use the system for any purpose. 2. Study how it works. 3. Modify it as needed. 4. Share it freely. By opening up Teuken-7B, we’re fostering collaboration, transparency, and innovation to ensure Europe’s digital sovereignty. 📣 𝐓𝐞𝐮𝐤𝐞𝐧-7𝐁-𝐢𝐧𝐬𝐭𝐫𝐮𝐜𝐭 𝐢𝐬 𝐣𝐮𝐬𝐭 𝐭𝐡𝐞 𝐛𝐞𝐠𝐢𝐧𝐧𝐢𝐧𝐠! OpenGPT-X and this model represent the foundation for even more groundbreaking work. 👉 𝐄𝐱𝐩𝐥𝐨𝐫𝐞 𝐚𝐧𝐝 𝐠𝐞𝐭 𝐢𝐧𝐯𝐨𝐥𝐯𝐞𝐝: - Model Card and Technical Information huggingface.co/openGPT-X - Leaderboards huggingface.co/spaces/openGP… - OpenGPT-X Discord discord.com/invite/RvdHpGMvB… - Modalities github.com/Modalities/modali… A big thank you to the entire team, our partners, and the BMWK for supporting this project! #OpenSource #AI #DigitalSovereignty #Teuken7B #EuroLingua #OpenGPTX
2
5
10
614
Michael Fromm retweeted
How do we make LLMs more factually reliable? Join our TrustLLM webinar on 14 April, 10–11 CET 👉 Register here: events.teams.microsoft.com/e… 📌 Please note that the webinar will be recorded.
1
1
56
Michael Fromm retweeted
10 Jul 2025
Introducing Grok 4, the world's most powerful AI model. Watch the livestream now: x.com/i/broadcasts/1lDGLzplW…
5,198
7,237
29,057
28,269,344
Michael Fromm retweeted
🚀 𝙎𝙝𝙖𝙥𝙞𝙣𝙜 𝙩𝙝𝙚 𝙁𝙪𝙩𝙪𝙧𝙚 𝙤𝙛 𝙈𝙪𝙡𝙩𝙞𝙡𝙞𝙣𝙜𝙪𝙖𝙡 𝘼𝙄 𝙬𝙞𝙩𝙝 𝙏𝙚𝙪𝙠𝙚𝙣-7𝘽 Join us for a talk with Dr. Michael Fromm (@fraunhofer.bsky.social) on June 21st (3pm CEST) as he shares insights into the Teuken-7B project. #AI #NLP #Teuken7B #Teuken #OpenGPTX
1
2
4
141
Michael Fromm retweeted
🚀 New Preprint We introduce JQL: a highly efficient, modular pipeline for multilingual pre-training data curation. 📄 𝐀𝐫𝐗𝐢𝐯: arxiv.org/abs/2505.22232 🤗 𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞: huggingface.co/spaces/JQL-AI… 🔧 𝐆𝐢𝐭𝐇𝐮𝐛: github.com/JQL-AI/JQL-Annota…
2
6
20
2,639
Michael Fromm retweeted
4 Apr 2025
TrustLLM secures 500K node hours on EuroHPC Leonardo BOOSTER for AI Act-compliant LLM training! 👉 Read more: trustllm.eu/eurohpc-awards-5… #TrustLLM #EuroHPC #trustworthyAI
1
2
3
117
Michael Fromm retweeted
🤩 The OpenEuroLLM project, led by Charles University, was launched today at the Carolinum, bringing together 20 of Europe's top institutions, companies and computing centres to create powerful, open and multilingual Language Learning Models (LLMs) for European languages. 🌍 "The OpenEuroLLM project and the use of open language models will help companies to increase their global competitiveness while contributing to Europe's digital sovereignty,“ underlined Professor Jan Hajič, the project's lead coordinator from the Faculty of Mathematics and Physics at Charles University. 🤝 The project OpenEuroLLM is funded by the European Commission under the Digital Europe programme and co-financed by industry and providers in individual countries, including the Ministry of Education of the Czech Republic.
1
2
3
217
11 Mar 2025
Happy to be a (small) part of it!
Kick-off successfully completed. Go OpenEuroLLM team! openeurollm.eu/
1
59
Michael Fromm retweeted
The Moore's Law Update NOTE: this is a semi-log graph, so a straight line is an exponential; each y-axis tick is 100x. This graph covers a 1,000,000,000,000,000,000,000x improvement in computation/$. Pause to let that sink in. Humanity’s capacity to compute has compounded for as long as we can measure it, exogenous to the economy, and starting long before Intel co-founder Gordon Moore noticed a refraction of the longer-term trend in the belly of the fledgling semiconductor industry in 1965. I have color coded it to show the transition among the integrated circuit architectures. You can see how the mantle of Moore's Law has transitioned most recently from the GPU (green dots) to the ASIC (yellow and orange dots), and the NVIDIA Hopper architecture itself is a transitionary species — from GPU to ASIC, with 8-bit performance optimized for AI models, the majority of new compute cycles. There are thousands of invisible dots below the line, the frontier of humanity's capacity to compute (e.g., everything from Intel in the past 15 years). The computational frontier has shifted across many technology substrates over the past 128 years. Intel ceded leadership to NVIDIA 15 years ago, and further handoffs are inevitable. Why the transition within the integrated circuit era? Intel lost to NVIDIA for neural networks because the fine-grained parallel compute architecture of a GPU maps better to the needs of deep learning. There is a poetic beauty to the computational similarity of a processor optimized for graphics processing and the computational needs of a sensory cortex, as commonly seen in the neural networks of 2014. A custom ASIC chip optimized for neural networks extends that trend to its inevitable future in the digital domain. Further advances are possible with analog in-memory compute, an even closer biomimicry of the human cortex. The best business planning assumption is that Moore’s Law, as depicted here, will continue for the next 20 years as it has for the past 128. (Note: the top right dot for Mythic is a prediction for 2026 showing the effect of a simple process shrink from an ancient 40nm process node) ---- For those unfamiliar with this chart, here is a more detailed description: Moore's Law is both a prediction and an abstraction. It is commonly reported as a doubling of transistor density every 18 months. But this is not something the co-founder of Intel, Gordon Moore, has ever said. It is a nice blending of his two predictions; in 1965, he predicted an annual doubling of transistor counts in the most cost effective chip and revised it in 1975 to every 24 months. With a little hand waving, most reports attribute 18 months to Moore’s Law, but there is quite a bit of variability. The popular perception of Moore’s Law is that computer chips are compounding in their complexity at near constant per unit cost. This is one of the many abstractions of Moore’s Law, and it relates to the compounding of transistor density in two dimensions. Others relate to speed (the signals have less distance to travel) and computational power (speed x density). Unless you work for a chip company and focus on fab-yield optimization, you do not care about transistor counts. Integrated circuit customers do not buy transistors. Consumers of technology purchase computational speed and data storage density. When recast in these terms, Moore’s Law is no longer a transistor-centric metric, and this abstraction allows for longer-term analysis. What Moore observed in the belly of the early IC industry was a derivative metric, a refracted signal, from a longer-term trend, a trend that begs various philosophical questions and predicts mind-bending AI futures. In the modern era of accelerating change in the tech industry, it is hard to find even five-year trends with any predictive value, let alone trends that span the centuries. I would go further and assert that this is the most important graph ever conceived. A large and growing set of industries depends on continued exponential cost declines in computational power and storage density. Moore’s Law drives electronics, communications and computers and has become a primary driver in drug discovery, biotech and bioinformatics, medical imaging and diagnostics. As Moore’s Law crosses critical thresholds, a formerly lab science of trial and error experimentation becomes a simulation science, and the pace of progress accelerates dramatically, creating opportunities for new entrants in new industries. Consider the autonomous software stack for Tesla and SpaceX and the impact that is having on the automotive and aerospace sectors. Every industry on our planet is going to become an information business. Consider agriculture. If you ask a farmer in 20 years’ time about how they compete, it will depend on how they use information — from satellite imagery driving robotic field optimization to the code in their seeds. It will have nothing to do with workmanship or labor. That will eventually percolate through every industry as IT innervates the economy. Non-linear shifts in the marketplace are also essential for entrepreneurship and meaningful change. Technology’s exponential pace of progress has been the primary juggernaut of perpetual market disruption, spawning wave after wave of opportunities for new companies. Without disruption, entrepreneurs would not exist. Moore’s Law is not just exogenous to the economy; it is why we have economic growth and an accelerating pace of progress. At Future Ventures, we see that in the growing diversity and global impact of the entrepreneurial ideas that we see each year — from automobiles and aerospace to energy and chemicals. We live in interesting times, at the cusp of the frontiers of the unknown and breathtaking advances. But, it should always feel that way, engendering a perpetual sense of future shock.
545
2,008
8,335
12,929,894
Michael Fromm retweeted
The reality of the Turing test
268
1,207
15,575
853,457
Michael Fromm retweeted
30 Nov 2024
The European research project OpenGPT-X has released the language model “Teuken-7B”, specifically designed to align with European values, data protection standards, and linguistic diversity. It was trained with the 24 official languages of the EU and consists of 7 billion parameters. The model is freely available on the Hugging Face platform and can also be used for commercial projects. The project began in 2022 to create an alternative to the dominant AI models from the US (such as GPT-4, Llama, or Gemini). Its goal is to promote European independence in AI technology and support scientific as well as commercial applications. OpenGPT-X is led by the Fraunhofer Institutes IAIS and IIS, with contributions from other research institutions and companies. The model aims to drive the development of transparent and adaptable AI solutions for science and industry.
6
2
11
715
Michael Fromm retweeted
We have new model Teuken-7B-instruct, Multilingual, Open Source, Made in Europe 🇪🇺
🇪🇺🗝️ Europäisch. Multilingual. Open Source. Das ist »Teuken-7B-instruct« Release Day! Das im Projekt @OpenGPTX entwickelte LLM Teuken-7B-instruct steht ab sofort kostenfrei auf Hugging Face zum Download bereit. #LLMs ℹ️opengpt-x.de/en/models/teuke…
3
3
25
3,555
Michael Fromm retweeted
Teuken 7B Instruct: an European model released Finally some good news from Europe. The Frauenhofer Institute has trained its own 7b model and it can keep up with the “big players” such as Llama 3.1 8b. This is so important for Europe's survival in the AI era. In this respect, I expressly welcome the fact that with Teuken 7B Instruct, a European model is finally being released that can at least keep up in the SLM league.
🇪🇺🗝️ Europäisch. Multilingual. Open Source. Das ist »Teuken-7B-instruct« Release Day! Das im Projekt @OpenGPTX entwickelte LLM Teuken-7B-instruct steht ab sofort kostenfrei auf Hugging Face zum Download bereit. #LLMs ℹ️opengpt-x.de/en/models/teuke…
7
9
81
12,597
Michael Fromm retweeted
The OpenGPT-X project has launched "Teuken-7B," a 7 billion parameter language model trained in all 24 EU languages, available for download on Hugging Face. Developed by the Fraunhofer Institutes, this open-source model provides a commercially usable tool reflecting a European perspective. #AI iais.fraunhofer.de/en/press/… @CurieuxExplorer, @PawlowskiMario, @mvollmer1, @gvalan, @ipfconline1, @LaurentAlaus, @Shi4Tech, @kalydeoo, @Ym78200, @Nicochan33, @Fabriziobustama, @3itcom, @chidambara09, @Analytics_699, @tewoz, @ahier, @EvanKirstel, @sallyeaves, @FrRonconi, @DigitalColmer, @HaroldSinnott, @fogle_shane, @rshevlin, @jeffkagan, @jeancayeux, @RLDI_Lamy, @pierrepinna, @dinisguarda, @thomas_dettling, @sarbjeetjohal, @SpirosMargaris, @jblefevre60, @BetaMoroney, @puneetsinghal22, @enilev, @DimitriHommel, @CRudinschi, @StefanoDeCupis, @HeinzVHoenen, @aure79lien, @fogoros, @Kevin_ODonovan, @jorgecunha, @GlenGilmore
2
28
39
1,854
Michael Fromm retweeted
Mehrsprachig & #OpenSource: Seit heute steht das große KI-Sprachmodell @OpenGPTX zum Download bereit. Unter Mitwirkung der #TUDresden wurde das #LLM mit den 24 Amtssprachen der EU trainiert, ist kostenfrei & kann für #KI-Anwendungen angepasst werden. ℹ️ tu-dresden.de/tu-dresden/new…
4
12
711
Michael Fromm retweeted
26 Nov 2024
Die europäische KI "Teuken 7B" startet und soll auch in der ARD eingesetzt werden. Mit Beteiligung des WDR wurde die KI mit deutscher Sprache trainiert. In europäischen Clouds genutzt können so hohe Standards für Datenschutz eingehalten werden. presse.wdr.de/plounge/wdr/un…
6
5
14
1,662
Michael Fromm retweeted
Replying to @OpenGPTX
@OpenGPTX just released the large language model #LLM Teuken-7B! It was trained with the 24 official languages of the #EU & has seven billion parameters. As a technological foundation, the free model can be adapted, supplemented & specialized for applications of Generative #AI.
2
1
4
230
Michael Fromm retweeted
🇪🇺🗝️ Europäisch. Multilingual. Open Source. Das ist »Teuken-7B-instruct« Release Day! Das im Projekt @OpenGPTX entwickelte LLM Teuken-7B-instruct steht ab sofort kostenfrei auf Hugging Face zum Download bereit. #LLMs ℹ️opengpt-x.de/en/models/teuke…
6
13
34
21,391
Michael Fromm retweeted
18 Nov 2024
Falcon 9 lifts off from pad 39A in Florida
550
1,605
15,810
2,150,374