Researching and building general intelligence

Joined January 2015
91 Photos and videos
Evolving skills to hillclimb against benchmarks is a key module for self-evolving agents. Very excited for this new open repo from Sentient.
2
1
13
1,927
Himanshu Tyagi retweeted
This is precisely why I'm excited about sentient.xyz/arena. The goal is to crowdsource as many different solutions as possible for the hardest AI reasoning challenges. The solutions space is so vast nowadays that we have to pursue large volume and evolutionary algorithms to help us explore in parallel
The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later. I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run: github.com/karpathy/autorese… Alternatively, a PR has the benefit of exact commits: github.com/karpathy/autorese… but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back. I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.
4
5
44
6,165
Himanshu Tyagi retweeted
Applications are now live! Cohort 0 starts March 13th in Presidio with OpenHands, OpenRouter, alphaXiv, Fireworks, Dedalus Labs, Franklin Templeton, Founders Fund and Pantera. → $25K in prizes → 3 weeks building state-of-the-art AI agents → Many more surprises Apply below 👇
565
102
731
137,906
Himanshu Tyagi retweeted
Today we are launching the next phase of AI reasoning development with Founders Fund, Franklin Templeton, Pantera Capital, Fireworks AI, OpenRouter, OpenHands, Dedalus Labs, alphaXiv, and more. AI is advancing at a relentless pace, but there are many reasoning capabilities we have yet to discover. Announcing Arena—an evaluation-driven platform for ideation, prototyping, and high-quality data generation—with top AI developers advancing SOTA performance on real-world enterprise reasoning tasks.
110
81
426
271,830
Himanshu Tyagi retweeted
31 Dec 2025
Quick and nostalgic look of our work in 2025. See you all in 2026: the year of open-source reasoning.
231
87
670
86,490
12 Dec 2025
There is more where this is coming from @iiscbangalore @artparkindia
The first-ever deeptech demo night at SPC Bangalore, was stacked with some seriously cool builds! Here's a glimpse of how people are solving hard problems in hard-tech, from India. 🧵
95
1
79
6,352
Himanshu Tyagi retweeted
11 Dec 2025
Building a general-purpose AI agent with only open-source models is hard. Making it consistent, reliable, and fast enough for production usage is even harder. We at @SentientAGI have been optimizing both👇 Today we’re revealing SERA (Semantic Embeddings & Reasoning Agent): the AI architecture behind SERA-Crypto, our state-of-the-art agent for token research, DeFi analysis, and on-chain reasoning, combining 50 APIs into market insights. 👉 #1 open-source agent on DMind, ahead of Perplexity Finance & Gemini, within ~2% of GPT-5 Medium on Web3 reasoning 👉 #1 on our live crypto benchmark (198 real user queries across 11 categories), beating GPT-5, Grok 4, Gemini 2.5 Pro, and Perplexity Finance More in 🧵
11 Dec 2025
Announcing SERA-Crypto (Semantic Embedding & Reasoning Agent): our new reasoning architecture built for SOTA crypto research. #1 open-source agent on DMind #1 on our live crypto benchmark Outperforms GPT-5, Grok 4, Gemini 2.5 Pro, and Perplexity Finance…all under 45 seconds.
79
13
163
8,505
11 Dec 2025
When you want fast reasoning, good old semantic similarity is not bad. Use it to setup your prompts dynamically, all the way to the right tool call. This is what we use for our live crypto knowledge agent which integrates search and about 10 different structured data APIs.
11 Dec 2025
Announcing SERA-Crypto (Semantic Embedding & Reasoning Agent): our new reasoning architecture built for SOTA crypto research. #1 open-source agent on DMind #1 on our live crypto benchmark Outperforms GPT-5, Grok 4, Gemini 2.5 Pro, and Perplexity Finance…all under 45 seconds.
39
2
108
4,242
10 Nov 2025
If diffusion models drive all creative arts, we will learn that humans are not more creative than a kettle dissipating heat to boil water. A bit sad...
240
11
221
8,235
22 Oct 2025
ROMA is a very simple and versatile architecture that recursively breaks complex queries into simpler ones. This method of coordinating multiple agents/tools/models is apt for deep research, long horizon tasks and boosting the power of models. This is emerging as an important primitive for multiagent reasoning systems across industries. This new version of the repo is more builder friendly and comes with prompt optimizer capabilities of DSPy. You can build a lot of stuff on it!
[1/8] 🧵 🚀 ROMA (Recursive Open Meta Agents) v0.2.0 is here! Many exciting features have been added to streamline research/production threads: for better reliability and a builder-friendly ecosystem for high-performance recursive multi-agent systems. Stay tuned for the upcoming paper with some exciting results!We've completely rebuilt our framework using@DSPyOSS In this thread: the motivation and technical details behind ROMA, exciting research directions we're exploring, and our vision for recursive agents going forward github.com/sentient-agi/ROMA
256
21
384
42,663
Himanshu Tyagi retweeted
14 Oct 2025
We’re excited to announce that @NeurIPSConf—the biggest AI conference in the world—has accepted 4 of our papers across various categories. Some might even call it “full-stack excellence” 😁 Here’s a sneak peek at our work that’s been recognized for their breakthroughs: ➡️ OML 1.0 (Main Track): scalable LLM fingerprinting—a hundredfold improvement on legacy fingerprinting attempts for open models, injecting 24,576 persistent prints while the previous max was ~100 fingerprints…without any drop in model performance. ➡️ LiveCodeBenchPro (Data & Benchmark Track): our customized benchmark focusing on programming ability, illustrating the true capabilities of models’ coding performance. On this benchmark, we were able to create models 10x smaller, using 20% of the data, to achieve comparable results to competing models. ➡️ MindGames Arena (Competition Track): selected by NeurIPS to run an AI competition for agents to improve themselves through social games. The next paradigm of AI improvement comes through self-optimization, and we’re extremely excited to be hosting this first-of-its-kind competition to create self-improving AI. ➡️ OML (Workshops & Tutorials—Lock-LLMs): our work established the challenge and solution around model security: a primitive that lets builders develop open models with verifiable, cryptographically enforced control under white-box access. Stay tuned for deep-dive threads throughout the week!
948
304
1,754
616,947
10 Oct 2025
This is not what I meant by dog fooding
9 Oct 2025
Replying to @SentientAGI
@SentientAGI needs to have a hot dog eating competition, 5 hot dogs is weak, I'll outeat @sandeepnailwal with 20 😤
86
5
186
12,004
I have been dabbling with using AI for maths too. There is this COLT 2020 paper where we tried to find best way to quantize Gaussian observations for mean estimation. We couldn't establish exact optimality because we had some extra dependencies coming in our bounds. I had (wishfully) conjectured a neat bound for Gaussian moments, but unfortunately could never show it. We had to make do with an uglier bound--see Lemma 26 in the appendix proceedings.mlr.press/v125/a… I have been playing around with my AI Lean setup to improve such bounds from my past research. ChatGPT tried to improve this bound and, surprisingly, claimed the same optimistic bound I was hoping for to be true. After I pushed back, it found a mistake in its analysis. Finally, it did provide a cleaner proof than what we had. Moreover, it provided examples to show that this bound was, in general, tight! See here (: chatgpt.com/share/68df6473-b… There is a whole new learning curve of using AI for maths. Models are very assertive and confident in their wrong claims, but by indicating mistakes and contradictions, you can get it to fix itself. Crazy! I have been building a stack for myself for this using @SentientAGI ROMA too. @SebastienBubeck Any new tools coming for maths from OpenAI?

Well, this time it's by Terence Tao himself: mathstodon.xyz/@tao/11530642…
152
22
352
27,262
27 Sep 2025
Shared economy of MCP servers
26 Sep 2025
The biggest application of MCP servers: building them
62
5
145
9,454
16 Sep 2025
Nobody in this company knocks the door
16 Sep 2025
Our newest piece of the GRID just got an upgrade 😁 GRID is the world’s largest network of intelligence, containing agents, models, data sources, frameworks, and Sentient Chat—the infrastructure that stitches it all together.
166
5
350
18,117
Himanshu Tyagi retweeted
15 Sep 2025
T-24 hours Lock in for tomorrow
15 Sep 2025
I got in trouble, but I think it was worth.
1,291
416
2,395
192,714
Himanshu Tyagi retweeted
🚨 Sentient AI’s ROMA is taking over Github — it was trending as #1 Global Repository on Github in the last 2-3 days! This happens very rarely for a crypto x AI projects, if at all. ROMA helps developers and enterprises to get better inferences, specially on long-horizon tasks. @SentientAGI is already seeing some strong enterprise demand for it in such early days. Read 🧵 👇
224
68
539
61,497
Himanshu Tyagi retweeted
12 Sep 2025
#2 trending GitHub repo, ROMA is cooking. In one month, I think we're going to get some insane capabilities. Start building today and show me what you've made with it, gonna make a website highlighting the best use-cases we've seen.
4 Sep 2025
Announcing ROMA (Recursive Open Meta Agent): our new multi-agent framework that sets SOTA in reasoning search. Seal-0: 45.6% FRAMES: 81.7% SimpleQA: 93.9% 🧵 Read more about how recursive coordination lets agents tackle complex queries.
190
26
369
19,908