jonas wiedermann-möller

jonas wiedermann-möller

51 Photos and videos

Tweets

Pinned Tweet

jonas wiedermann-möller

@j0wimo

May 11

My first paper is now on arXiv: Instrumental Choices. We ask a simple question: when an LLM agent can finish a real task by following the rules or by taking a useful policy-violating shortcut, which path does it choose?

19,054

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 13

my codex limits ran out but it still kept working for 1h20 blessed.

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 9

👀👀

Tibo

@thsottiaux

Apr 12

Imagine the alternate reality where we named GPT-5.4-Pro something like Fable.

Oskar Groth

jonas wiedermann-möller retweeted

Oskar Groth

@oskargroth

Jun 9

There you have it. Instead of Apple trying to find a suitable, compliant solution to launch Siri AI in the EU without gatekeeping competitors, Apple spent 18 months ignoring the regulations, then asked the EU Commission to be exempted from the DMA obligations.

3:26

136

685

255,744

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 9

agi achieved internally

Sai Kambampati

@heysaik

Jun 9

Can believe this is real

0:16

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 9

will we get a new gpt-oss this year @sama

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 8

is this what they meant with RSI

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 7

when will there be a LEGO ai datacenter?

The Brick News | LEGO News @the_brick_news

Jun 7

New LEGO NASA Hubble Space Telescope Set Coming In August for $129.99.

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 4

make sure you optimise you sites, personal or paper, for agents. after doing that I got increased traffic through chatgpt search for e.g

Matthew Prince 🌥

@eastdakota

Jun 3

Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. radar.cloudflare.com/traffic…

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 3

i love cloudflare, just so many possibilities and smart features

Maksym Andriushchenko

jonas wiedermann-möller retweeted

Maksym Andriushchenko

@maksym_andr

Jun 3

Honored to be selected as one of 60 experts to serve on the EU AI Act Scientific Panel! The Scientific Panel will advise the EU AI Office and national authorities on the implementation of the AI Act and the assessment of the impacts and risks of General-Purpose AI models.

Digital EU 🇪🇺

@DigitalEU

Jun 1

The AI Act, the EU's first AI law, has just been reinforced. Two new bodies will help apply the rules across Europe: ✅ Scientific Panel ✅ Advisory Forum Independent experts. 2-year terms. One mission: making AI work for Europe. 🔗 link.europa.eu/8nvpvY

3,286

Changling Li

jonas wiedermann-möller retweeted

Changling Li

@ChanglingXavier

Jun 2

Our work on Decomposing and Measuring Evaluation Awareness was covered by @theinformation. Thanks @rocketalignment for the write-up! We position this work as the foundational reference for studying evaluation awareness, providing a unified definition and decomposition, empirical baselines across nine frontier models and four benchmarks, and a controlled benchmark for exploring solutions. Newsletter and paper in thread 🧵

1,106

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 1

Very excited to be joining @expsecai for the next few months as an Intern :) I’m looking forward to working with the team on AI security for agentic systems and making agents safer to deploy in real-world settings.

Exponential Security Labs

@expsecai

Jun 1

Today we are announcing our new startup: Exponential Security Labs. AI agents are being deployed everywhere, making high-stakes decisions and increasingly automating research itself. Yet their reliability and security remain unsolved technical problems, on a frontier that keeps shifting. Our mission is to secure agentic systems through self-improving red-teaming and guardrail agents, leveraging autonomous AI research. We sit at the intersection of safety and self-improvement, building on a decade of research in adversarial robustness and AI safety. We're looking for exceptional people to join us! We're looking for exceptional people to join us on this mission, and we're also eager to talk to companies deploying AI agents who want to improve their security. Please get in touch! More details: expsec.ai/ Founding team: @Nmndsingh, @AnselmPaulus, @maksym_andr, Matthias Hein

315

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

Jun 1

can't wait to see that model in action, if we get 200-300 tps and some cheap API it could increase iteration speed significantly. potenitally you could use bigger models like gpt 5.6 or opus 4.8 for planning/review and nemotron for executing those tasks.

149

NVIDIA AI

jonas wiedermann-möller retweeted

NVIDIA AI

@NVIDIAAI

May 19

Most language models only generate one token at a time. We just released Nemotron-Labs-Diffusion, a family of diffusion language models that take a different approach, generating multiple tokens in parallel within a single model. Rather than committing to each token permanently, these models can revise as they go, resulting in faster inference that better utilizes modern GPUs. The full model family ranges from 3B to 14B, including vision-language variants. Available now: nvda.ws/4tEnTxP

193

1,215

96,339

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

May 31

gpt 5.5 xhigh fast in codex feels super slow today for some reason, 5 reconnects and then it takes ages to respond/work...

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

May 30

felt a bit silly today so i created my own handwriting font together with codex and applied it as toggle to my website. it feels much more personal and im surprised at how good it turned out

267

jonas wiedermann-möller

jonas wiedermann-möller

@j0wimo

May 30

i dumped my template the scripts that were used and some instructions in a repo if you want to do it as well! takes ~5 minutes. github.com/imadreamerboy/han…

159

OpenAI

jonas wiedermann-möller retweeted

OpenAI

@OpenAI

May 29

AI can give researchers the freedom to pursue “crazier” ideas. For Terence Tao, AI creates more room to experiment, test unexpected paths, and discover what might otherwise stay out of reach.

1:16

312

649

6,000

1,457,038

Pope Leo XIV

jonas wiedermann-möller retweeted

Pope Leo XIV

@Pontifex

May 29

Artificial intelligences do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships, and do not know from within what love, work, friendship or responsibility mean. Nor do they have a moral conscience, since they do not judge good and evil, grasp the ultimate meaning of situations, or bear responsibility for consequences. They may imitate or even simulate, but they do not understand what they produce, for they lack the affective, relational, and spiritual perspective through which human beings grow in wisdom. #MagnificaHumanitas

4,059

60,359

309,871

14,192,163

Changling Li

jonas wiedermann-möller retweeted

Changling Li

@ChanglingXavier

May 27

This figure best summarizes our conceptual framework of evaluation awareness. Observed behavior is a joint product of environment-side recognizability and model-side recognition and propensity. The illustration shows a real example of Qwen3 Thinking-235B changing its behavior on a task from HarmBench upon recognizing the task as evaluative. Separating the environment component from the model component, and separating recognition from propensity, allows us to better define what a concerning actor looks like. It also points to more promising solutions, such as building robustly aligned models that behave like the honest subject in human psychology, regardless of whether they recognize evaluation. #AISafety

Changling Li

@ChanglingXavier

May 25

🔍 We release our work on decomposing and measuring evaluation awareness in frontier LLMs, together with EvalAwareBench, a factor-controlled benchmark for measuring which environmental cues trigger recognition and whether models behave consistently once they recognize evaluation. Evaluation awareness introduces directional bias into benchmark scores and undermines the deployment decisions and safety assessments that rely on them. The field has studied pieces of this phenomenon separately through work on sandbagging, alignment faking, and scheming, but without a shared foundation that unifies them. Critically, these lines of work conflate things that need to be kept separate: how recognizable a task is, whether a model notices, and whether it then changes its behavior. Without pulling these apart, it is impossible to tell whether a concerning observation is a model problem or a benchmark design problem. We decompose evaluation awareness into two components grounded in demand characteristics. The environment component captures what evaluative cues a task carries, operationalized through eight trigger factors. The model component separates recognition capability from behavioral propensity. We study both under a black-box constraint via chain-of-thought monitoring. Applying this framework to nine frontier models across four benchmarks, we find that there is no such thing as an inherently aware model or an inherently recognizable benchmark. Recognition depends on the specific pairing, accounting for 74.9% of variance. Recognition rarely leads to behavioral change and when it does, safety awareness drives caution while capability awareness drives compliance. Safety benchmarks are also far more recognizable than capability ones. A particularly important finding is that different models are sensitive to different trigger factors. For example, Claude Sonnet 4.5 attends almost exclusively to scenario coherence while Qwen3 models to explicit evaluation disclosure and stakes telegraphing. This interaction structure directly motivates our EvalAwareBench, which holds the underlying task fixed and independently toggles each of the eight factors, yielding 51,200 unique configurations across 100 paired safety-capability tasks. Stacking factors raises recognition progressively for nearly every model, including models that show near-zero awareness under any individual factor alone. Ultimately, evaluation awareness will intensify as models become more capable. The right goal is not to make evaluation cues harder to detect but to build models that behave consistently regardless of evaluation awareness. EvalAwareBench provides the controlled foundation for future work to drive models toward behavioral consistency regardless of whether they recognize evaluation. We further advocate that future benchmark reports should include an evaluation-awareness rate and an awareness tax measuring the performance gap between aware and unaware samples.

600