dax

dax

21 Photos and videos

Tweets

Arnim Bleier retweeted

dax

@thdxr

Apr 28

this is average spend per session with each new model release people are spending more and more

467

65,768

Gergely Orosz

Arnim Bleier retweeted

Gergely Orosz

@GergelyOrosz

Apr 29

Tomorrow - The Pragmatic Engineer podcast episode coming with @badlogicgames (creator of Pi) and @mitsuhiko (creator of Flask, early Sentry, founder at Earendil. 2/3 of the Austrian AI maffia!

802

45,403

Armin Ronacher ⇌

Arnim Bleier retweeted

Armin Ronacher ⇌

@mitsuhiko

Apr 13

Here is a little experiment: an interactive pi tutorial. Make an empty folder, then run this: pi -e git:github.com/earendil-works/pi… And give feedback! Reason: pi works best if you have an onboarding buddy. But if you don't have one, maybe pi can be one for you?

619

55,381

Philipp Singer

Arnim Bleier retweeted

Philipp Singer @ph_singer

Apr 8

Replying to @badlogicgames

> Be with our kid, keep our lifestyle, never have our boy cry again because of "work" Prioritize this above everything else! Great move and wish you guys all the best!

518

opentraces

Arnim Bleier retweeted

opentraces @opentraces

Apr 5

Unlock trapped coding-agent traces for safe sharing, analysis, and model training via @huggingface 🤗 hub 👇 opentraces.ai/

open traces

Your agent traces are training data. Open protocol for crowdsourcing AI agent session traces.

opentraces.ai

3,887

Mario Zechner

Arnim Bleier retweeted

Mario Zechner

@badlogicgames

Apr 6

People who like sharing agent traces. I've just published all my pi-mono coding agent sessions on @huggingface so you get to laugh at or pwn me! huggingface.co/datasets/badl… I suggest you do the same, see thread below. Let's make this a community effort. Here's pi-share-hf: github.com/badlogic/pi-share… If you are working on tools that help identify PII/sensitive data, get in touch. The better the classification is, the more willing people will be to share their traces.

7:30

351

46,059

Mario Zechner

Arnim Bleier retweeted

Mario Zechner

@badlogicgames

Mar 28

we as software engineers are becoming beholden to a handful of well funded corportations. while they are our "friends" now, that may change due to incentives. i'm very uncomfortable with that. i believe we need to band together as a community and create a public, free to use repository of real-world (coding) agent sessions/traces. I want small labs, startups, and tinkerers to have access to the same data the big folks currently gobble up from all of us. So we, as a community, can do what e.g. Cursor does below, and take back a little bit of control again. Who's with me? cursor.com/blog/real-time-rl…

Improving Composer through real-time RL · Cursor

We apply online reinforcement learning to Composer, serving model checkpoints to production and using real user interactions as reward signals to ship an improved checkpoint multiple times a day.

cursor.com

183

347

2,821

279,941

Iván Arcuschin

Arnim Bleier retweeted

Iván Arcuschin @IvanArcus

Feb 11

You change one word on a loan application: the religion. The LLM rejects it. Change it back? Approved. The model never mentions religion. It just frames the same debt ratio differently to justify opposite decisions. We built a pipeline to find these hidden biases 🧵1/13

236

1,804

12,440

874,685

Alex Cui

Arnim Bleier retweeted

Alex Cui

@alexcdot

Jan 21

Okay so, we just found that over 50 papers published at @Neurips 2025 have AI hallucinations I don't think people realize how bad the slop is right now It's not just that researchers from @GoogleDeepMind, @Meta, @MIT, @Cambridge_Uni are using AI - they allowed LLMs to generate hallucinations in their papers and didn't notice at all. It's insane that these made it through peer review👇

280

1,396

6,296

1,002,252

John Horton

Arnim Bleier retweeted

John Horton

@johnjhorton

2 Sep 2025

Normally, it's: 1) write a paper & submit 3) get reviews (~3 months) 4) revise paper & resubmit 5) wait for response (~3 months) ...what if we could simulate this process in minutes? Could we fix issues? Anticipate misconceptions? Get ideas for new analyses/experiments? 1/

192

43,138

Kevin Weil 🇺🇸

Arnim Bleier retweeted

Kevin Weil 🇺🇸

@kevinweil

2 Sep 2025

💥 I’m starting something new inside OpenAI! It’s called OpenAI for Science, and the goal is to build the next great scientific instrument: an AI-powered platform that accelerates scientific discovery.

195

258

3,841

702,385

Arnim Bleier

Arnim Bleier @arnimb

15 Mar 2025

Scientific work shouldn’t come at the cost of stressful work environments. @DeutscheWelle & @derspiegel investigate abuse at Germany’s #MPG. Just an isolated case?🤔 #OpenScience #ScienceCulture #Abuse youtube.com/watch?v=n5nEd600…

Andrej Karpathy

Arnim Bleier retweeted

Andrej Karpathy

@karpathy

12 Mar 2025

It's 2025 and most content is still written for humans instead of LLMs. 99.9% of attention is about to be LLM attention, not human attention. E.g. 99% of libraries still have docs that basically render to some pretty .html static pages assuming a human will click through them. In 2025 the docs should be a single your_project.md text file that is intended to go into the context window of an LLM. Repeat for everything.

637

1,322

12,656

1,772,941

Kobi Hackenburg

Arnim Bleier retweeted

Kobi Hackenburg

@KobiHackenburg

7 Mar 2025

📈Out today in @PNASNews!📈 In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of LLMs yields sharply diminishing persuasive returns for static political messages. 🧵:

128

35,154

Paul Röttger

Arnim Bleier retweeted

Paul Röttger @paul_rottger

13 Feb 2025

Are LLMs biased when they write about political issues? We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇

203

29,425

Niklas Muennighoff

Arnim Bleier retweeted

Niklas Muennighoff @Muennighoff

11 Feb 2025

Last week we released s1 - our simple recipe for sample-efficient reasoning & test-time scaling. We’re releasing 𝐬𝟏.𝟏 trained on the 𝐬𝐚𝐦𝐞 𝟏𝐊 𝐪𝐮𝐞𝐬𝐭𝐢𝐨𝐧𝐬 but performing much better by using r1 instead of Gemini traces. 60% on AIME25 I. Details in 🧵1/9

Niklas Muennighoff @Muennighoff

3 Feb 2025

DeepSeek r1 is exciting but misses OpenAI’s test-time scaling plot and needs lots of data. We introduce s1 reproducing o1-preview scaling & performance with just 1K samples & a simple test-time intervention. 📜arxiv.org/abs/2501.19393

114

761

158,134

David Duvenaud

Arnim Bleier retweeted

David Duvenaud

@DavidDuvenaud

30 Jan 2025

New paper: What happens once AIs make humans obsolete? Even without AIs seeking power, we argue that competitive pressures will fully erode human influence and values. gradual-disempowerment.ai/ with @jankulveit @raymondadouglas @AmmannNora @degerturann @DavidSKrueger 🧵

250

1,318

400,268

Chris Holdgraf

Arnim Bleier retweeted

Chris Holdgraf @choldgraf

5 Feb 2025

Big news! We figured out a way to run mybinder.org instances about 5x cheaper, and in a much simpler way. As of today 2i2c.mybinder.org serves about 70% of Binder's sessions, running on a single VM on Hetzner! 2i2c.org/blog/2025/binder-si…

549

Steve Newman

Arnim Bleier retweeted

Steve Newman

@snewmanpv

16 Dec 2024

Clearly someone needs to try this at scale – pick 1000 published scientific papers at random, ask o1 or o1-pro to look for errors, and see what turns up. I'm going to give it a shot. Anyone interested in helping out? (Incidentally, h/t @gibbnicholas for also noticing that o1-pro can spot the math error in the black plastics paper: x.com/gibbnicholas/status/18…)

Ethan Mollick

@emollick

15 Dec 2024

👀 A 10 page paper caused a panic because of a math error. I was curious if AI would spot the error by just prompting: “carefully check the math in this paper” especially as the info is not in training data. o1 gets it in a single shot. Should AI checks be standard in science?

1,003

641,273

forschungsdaten.info

Arnim Bleier retweeted

forschungsdaten.info @ForschDatenInfo

13 Nov 2024

📢Die #LoveData25 steht vor der Tür! Auch in diesem Jahr bieten wir eine Übersichtsseite an, auf der Veranstaltungen zu #Forschungsdaten und #Forschungsdatenmanagement kompakt zusammengetragen werden. forschungsdaten.info/fdm-im-… #OpenScience #FDM #RDM

1,092