AlphaSignal AI

AlphaSignal AI

291 Photos and videos

Tweets

AlphaSignal AI

@AlphaSignalAI

56m

SIA is interesting because it treats agents as editable systems. Not just prompts. Tools, parsers, verifiers, harness code, and weights all become update targets. The paper reports: > 70.1% on LawBench > - 1,017 µs CUDA kernel > 0.289 mse_norm on denoising Public repo makes you pick "--focus harness" or "--focus weights". The paper’s automatic switch is the missing piece. Real lesson: a self-improving agent is only as good as its verifier.

1:32

128

Arsh Shah Dilbagi

AlphaSignal AI retweeted

Arsh Shah Dilbagi

@arshdilbagi

16h

Introducing Adaline 2.0 - The Agent Self-Improvement Layer Adaline turns Traces into Behaviors, Behaviors surface Issues, Issues become auto-generated Evals Data, Adaline then generates new agent candidates and tests them. You review the winners and ship!

1:34

107

5,678

617

448,063

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

17h

The biggest bottleneck for computer-use agents just got automated away. Reinforcement learning broke open math and coding. But for agents clicking around real software, progress stalled. The bottleneck was generating training data at scale. CUA-Gym is a pipeline that solves this. It synthesizes verifiable tasks for computer-use agents end to end. The setup uses three coordinated coding agents: > Generator writes environment setup scripts > Discriminator drafts the reward function blind > Orchestrator iterates until both align The team also built mock versions of 94 popular apps. These include Slack, Notion, Salesforce, and Gmail clones. Rewards read state directly, skipping flaky screenshot judges. The resulting dataset holds 32,112 verified tuples across 110 environments. A trained model hits 72.6% on OSWorld-Verified, matching Claude Sonnet 4.6. A smaller 3B version matches its 17B base with 10x fewer parameters. The full system, dataset, and models are open source.

0:53

1,453

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

17h

Repo: github.com/xlang-ai/CUA-Gym Check out alphasignal.ai/newsletter to get a daily summary of the latest breakthrough news, models, papers and repos. Read by 300,000 devs.

GitHub - xlang-ai/CUA-Gym: Scalable pipeline for synthesizing verifiable RLVR training data for...

Scalable pipeline for synthesizing verifiable RLVR training data for computer-use agents - xlang-ai/CUA-Gym

github.com

507

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

20h

“Fable 5 prompt leak / jailbreak” was so serious that it got banned? or the model was so powerful for a prompt leak? Well, they said “it’s a non-universal jailbreak..”

AlphaSignal AI

@AlphaSignalAI

Jun 12

x.com/i/article/206545901930…

3,654

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

20h

Researchers just made frozen AI models smarter without retraining them. Large language models run each input through their layers exactly once. Researchers asked a simple question. Can you squeeze more reasoning out of a finished model without retraining it? A new paper called "Training-Free Looped Transformers via Numerical ODE Integration" says yes. The trick is treating each layer as one step in solving a math equation. Looping a layer naively breaks things, since later layers expect a specific input. So instead they replace one big step with several smaller damped ones. This gives frozen models extra thinking time at inference. What makes this practical: > No fine-tuning required > Works on existing checkpoints > Strongest on hard knowledge tests Gains reached 2.64 points on MMLU-Pro and 2.01 on GPQA. It held positive across 87% of tested combinations. If old weights hide this much capacity, what else are we leaving unused?

1,124

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

20h

Paper: alphaxiv.org/abs/2605.23872 Check out alphasignal.ai/newsletter to get a daily summary of the latest breakthrough news, models, papers and repos. Read by 300,000 devs.

403

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 13

You installed that AI skill without scanning it. So did 99% of developers. 26.1% of published skills contain vulnerabilities. 36% contain prompt injection vectors. A skill can be dangerous without a single line of malicious code. NVIDIA open-sourced a scanner built specifically for this. It's called SkillSpector.

AlphaSignal AI

@AlphaSignalAI

Jun 12

x.com/i/article/206538315031…

1,271

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 13

x.com/i/article/206561561381…

4,399

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 13

Paper: arxiv.org/abs/2605.27276 Repo: github.com/hexo-ai/sia

SIA: Self Improving AI with Harness & Weight Updates

Humans are the bottleneck in building and improving AI. Both the models and the agents that wrap them are written, tuned, and corrected by people. The long-horizon goal of an AI that can figure...

arxiv.org

482

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

Reminder before 22 Jun: Learn how to use Claude Fable 5 properly, save tokens, maximize efficiency, and build before it goes to API only.

999

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

x.com/AlphaSignalAI/status/2…

AlphaSignal AI

@AlphaSignalAI

Jun 11

x.com/i/article/206507779938…

1,057

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

x.com/AlphaSignalAI/status/2…

AlphaSignal AI

@AlphaSignalAI

Jun 12

x.com/i/article/206545901930…

752

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

x.com/i/article/206545901930…

7,976

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

Leaked System-Prompt: github.com/elder-plinius/CL4…

CL4R1T4S/ANTHROPIC/CLAUDE-FABLE-5.md at main · elder-plinius/CL4R1T4S

LEAKED SYSTEM PROMPTS FOR CHATGPT, CLAUDE, GEMINI, GROK, PERPLEXITY, CURSOR, LOVABLE, REPLIT, AND MORE! - AI SYSTEMS TRANSPARENCY FOR ALL! 👐 - elder-plinius/CL4R1T4S

github.com

625

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

We finally know why bigger models are smarter. It's not the data. More training data was supposed to fix small models. A new paper shows why it cannot. Researchers proved some tasks need model scaling, not data scaling. A small model fails them even with infinite data. The cause is competition over neurons. Frequent tasks grab capacity first and keep it. Rare task updates get overwritten before the next example arrives. The model learns, then forgets, in an endless loop. Scaling breaks the loop in three steps: 1. Common tasks get fully learned 2. Their gradients fade to nothing 3. Rare features accumulate safely The team pretrained OLMo models from 4M to 4B parameters. They injected novel tasks at controlled frequencies during training. Only the largest models learned the rare ones. Interference between their gradients nearly disappeared. How many tasks is your model silently skipping?

1,588

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

Paper: arxiv.org/abs/2605.29548 Check out alphasignal.ai/newsletter to get a daily summary of the latest breakthrough news, models, papers and repos. Read by 300,000 devs.

Why Larger Models Learn More: Effects of Capacity, Interference,...

Larger models learn tasks smaller models do not. What drives this phenomenon? We develop a simple phenomenological argument that power-law scaling already suggests that a larger model will be able...

arxiv.org

336

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

x.com/i/article/206538315031…

3,273

AlphaSignal AI

AlphaSignal AI

@AlphaSignalAI

Jun 12

Repo: github.com/NVIDIA/SkillSpect…

GitHub - NVIDIA/SkillSpector: Security scanner for AI agent skills. Detect vulnerabilities,...

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks. - NVIDIA/SkillSpector

github.com

657