Ai2

Ai2

804 Photos and videos

Tweets

Pinned Tweet

Ai2

@allen_ai

May 7

Today we’re bringing new NSF OMAI compute online with NVIDIA Blackwell Ultra-powered systems, turning a $152M national investment from @NSF & @NVIDIA into a foundation for truly open AI research. 🧵

129

289,170

Ai2

Ai2

@allen_ai

Jun 12

Building an LLM means evaluating it over & over as it changes. Tweak a hyperparameter or scale the model up, & every new checkpoint sends you back through the same benchmarking loop. We're releasing olmo-eval, a workbench built for this kind of iterative model development. 🧵

3,685

more replies

Ai2

Ai2

@allen_ai

Jun 12

After training a model with a new intervention, olmo-eval lets you line two model checkpoints up question by question—holding everything else fixed. The comparison view makes it easier to see real gains & regressions.

722

Ai2

Ai2

@allen_ai

Jun 12

If you find yourself asking "how does this model checkpoint differ from the last, and where did it improve/regress?", that's what olmo-eval is for. We're releasing it openly so the community can build on it. 💻 Code: github.com/allenai/olmo-eval 📝 Blog: allenai.org/blog/olmo-eval

GitHub - allenai/olmo-eval

Contribute to allenai/olmo-eval development by creating an account on GitHub.

github.com

732

Bodhisattwa Majumder

Ai2 retweeted

Bodhisattwa Majumder

@mbodhisattwa

Jun 11

So impactful! Excellent work from @sewon__min et al! Alternatively, it points to so much about “novelty” of a generation and trace its history: clearly critical for scientific discovery with these models!

Ai2

@allen_ai

Jun 11

LLMs are no longer created w/ human data alone. They rely on other models to generate & filter data, evaluate outputs, & guide dev work. So what is a modern LLM built on? Olmo 3 → 89 model 183 dataset dependencies; Nemotron 3 → 273 560 We made ModSleuth to trace this. 🧵

2,743

Sewon Min

Ai2 retweeted

Sewon Min

@sewon__min

Jun 11

One day I tried tracing all of Olmo's dependencies manually. A few hours later, I realized I can't do it and gave up. Then @sadhikesaven and @CoderBak ModSleuth 🔥 Turns out Olmo and Nemotron have hundreds of dependencies that are super deep, recursive, and not easily visible. I'm glad I gave up early 😅 Spoiler: I thought this would be a one-week Claude Code project. It was not. The hard part wasn't information extraction (which Claude Code is good at). The hard part was something much trickier. Check out the paper to learn more! (And yes, if a model release says it used Claude Code, ModSleuth will trace that too... which means the model depends on Claude Code, which has its own dependencies, and ModSleuth itself depends on Claude Code 🤯)

Ai2

@allen_ai

Jun 11

145

22,251

Ai2

Ai2

@allen_ai

Jun 11

250

85,450

more replies

Ai2

Ai2

@allen_ai

Jun 11

ModSleuth generates a graph that surfaces what's nearly impossible to find manually, including: 📜 Hidden license inheritance 🔗 Train/eval coupling 📝 Documentation inconsistencies 🤖 Models used as judges, filters, OCR systems, & data generators

1,459

Ai2

Ai2

@allen_ai

Jun 11

As LLM pipelines become more complex, we need tools like ModSleuth to find out & identify what artifacts models are built on. ▶️ Demo: modsleuth.cal-data-audit.org 📄 Paper: arxiv.org/abs/2606.12385

1,376

Kyle Wiggers

Ai2 retweeted

Kyle Wiggers

@Kyle_L_Wiggers

Jun 10

𝗔𝗖𝗘𝟮𝗦-𝗦𝗛𝗶𝗘𝗟𝗗 , our new climate emulator that learns to separate the effects of sea surface temperature & CO2, is now on @huggingface—check it out → huggingface.co/allenai/ACE2S…

allenai/ACE2S-SHiELD-plus · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

Ai2

@allen_ai

Jun 9

Today we're introducing ACE2S-SHiELD , a climate emulator that learns to separate the effects of sea surface temperature & CO2. It accurately handles scenarios where previous versions of our ACE family of climate emulators produced inaccurate results. 🧵

2,631

Iz Beltagy

Ai2 retweeted

Iz Beltagy

@i_beltagy

Jun 10

I'm hiring Senior Research Engineers. Come build open-source vision-language models from zero to hero: pretraining, mid-training, post-training, RL, the whole pipeline. job-boards.greenhouse.io/the…

Senior Research Engineer, Olmo Molmo

Seattle, WA

job-boards.greenhouse.io

168

21,515

Ai2

Ai2

@allen_ai

Jun 9

5,846

more replies

Ai2

Ai2

@allen_ai

Jun 9

Trained on the new & existing data, ACE2S-SHiELD accurately handles the scenarios earlier ACE models were good at as well as the ones they struggled with. It's more flexible than ACE2-SHiELD ACE2-SOM combined, using ~25% fewer training samples than either alone.

683

Ai2

Ai2

@allen_ai

Jun 9

This work was done in collaboration with @NOAA's Geophysical Fluid Dynamics Laboratory. → Read more about ACE2S-SHiELD in our preprint: arxiv.org/abs/2606.07928

Disentangling the effects of sea surface temperature and CO$_2$ in...

While previous versions of the Ai2 Climate Emulator (ACE) have been trained with CO$_2$ as a forcing, they are only accurate within a narrow range of scenarios, for example climate over the last...

arxiv.org

641