exploring AI & Robotics in San Francisco

Joined May 2023
23 Photos and videos
- retweeted
Introducing Mave. Redefining mental health for today’s Homo sapiens. Order now at mavehealth.com
336
202
1,612
682,170
- retweeted
"One of the very confusing things about the models right now: how to reconcile the fact that they are doing so well on evals. And you look at the evals and you go, 'Those are pretty hard evals.' But the economic impact seems to be dramatically behind. There is [a possible] explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. So you don't have to think if it's going to be this data or that data. When people do RL training, they say, 'Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing.' You say, 'Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?' If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance"
The @ilyasut episode 0:00:00 – Explaining model jaggedness 0:09:39 - Emotions and value functions 0:18:49 – What are we scaling? 0:25:13 – Why humans generalize better than models 0:35:45 – Straight-shotting superintelligence 0:46:47 – SSI’s model will learn from deployment 0:55:07 – Alignment 1:18:13 – “We are squarely an age of research company” 1:29:23 – Self-play and multi-agent 1:32:42 – Research taste Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify. Enjoy!
84
170
1,725
501,946
21 Nov 2025
#ChatGPT vs #Claude vs #Perplexity battle: can AI build complex financial models? I love testing all 3 tools on complex real tasks. just heard @OpenAI hired bankers to teach #ChatGPT fin modeling. I had a task that would normally take me 6 hrs so I gave it to AI Here how it went:
8
60
21 Nov 2025
Happy to share the prompt - ping me if you want to try it yourself
22
21 Nov 2025
Replying to @OpenAI
Here is the Calculator Claude built docs.google.com/spreadsheets…
21
21 Nov 2025
Replying to @OpenAI
#result: 6 hours manual → 1 hour with Claude #takeaway: this isn't the first time I've seen Claude dominate on complex multi-step tasks. for financial/ops modeling specifically, it's not even close. ChatGPT couldn't generate working formulas. Perplexity won't let you export
36
21 Nov 2025
Replying to @OpenAI
#Claude: Claude amazes me in tasks like this one - it delivered a fully functional XLS file with 110 interconnected formulas following fin modeling best practices. proper SUMIF matching, conditional brand logic, correct data flow from raw inputs to client invoice. 90% prod ready
39
21 Nov 2025
Replying to @OpenAI
Perplexity: generated basic structure but only viewable in-app. I asked multiple times how to download it. response: "You cannot download the calculator... the download button is not available in the chat interface." cool, so I can see it but can't use it
1
21
21 Nov 2025
@perplexity_ai maybe I am missing something - can you pls guide me how to download the excel file the app generated - @AravSrinivas I know you are constantly looking at user feedback so flagging might be a bug - happy to share the prompt
41
21 Nov 2025
Replying to @OpenAI
#ChatGPT: produced an Excel file with hardcoded numbers. no formulas. just static data. completely unusable. would have to rebuild from scratch
20
21 Nov 2025
Replying to @OpenAI
I designed the prompt with explicit architecture, prepared raw data files (PowerBI exports, storage data, MSA pricing), and fed the EXACT same input to all 3 systems
48
21 Nov 2025
Replying to @OpenAI
backstory: in my undergrad I interned at Goldman and spent 18 hours daily building models. so when I needed to build a 3PL billing calculator, I knew exactly what it needed: 8 interconnected sheets with 110 SUMIF formulas, data aggregation, conditional logic
27
30 Oct 2025
9pm Wednesday at @PipelexAI evening hackathon I was building invoice OCR recognition tool Among demos - Zillow listing scam detector, calorie counter and Free Food events finder in #SanFrancisco
3
3
189
23 Oct 2025
#vibecoding with this new app called @TrySolidCom which is similar to @lovable For deployment I tried @zeaburapp - lucky me the founders of Zeabur happen to sit right in front of me 😎 @bchen0809
2
103
15 Oct 2025
#n8n discount guys -30% in the middle of the video after my speech about not eating carbs 🤣
82
9 Oct 2025
Hackathon past weekend @BurningHeroesFA
59
6 Sep 2025
#vibecoding on Saturday morning
59
26 Jun 2025
#meta #ai conference
1
89
25 Jun 2025
What is #Apache iceberg? @Meta Scale conference
56
21 Oct 2024
“Men turned their thinking over to machines in the hope that this would set them free” this book is relevant now more than ever #dune #ai #agi
40