omkaar

omkaar

243 Photos and videos

Tweets

Pinned Tweet

omkaar

@omkizzy

Mar 17

I hand-wrote a 500-LoC RL stack to make hacking on RL research much easier. Most RL stacks are either massive and unhackable, or duct-taped research scripts. I am open-sourcing Mithrl, a modular RLVR stack. Next items on my checklist: adding more complex environment examples, supporting multi-gpu async RL, and QoL fixes. I might scrap external runtime dependencies (Huggingface PEFT vLLM) and write purpose-built, simpler versions from scratch if I feel the need. If you want to experiment with RL and are looking to own sovereign tools, I’d love to get on call, understand your requirements and help integrate for free.

1:53

173

17,182

omkaar

omkaar

@omkizzy

Jun 13

all the AI replies focused on degraded battery and energy usage, maybe all same model?

Jeff Dean

@JeffDean

Jun 13

People replace their phones every ~4 yrs. This means there are hundreds of millions of old phones discarded each year that are still perfectly usable as computing devices. @Google in collabration with @UCSD is exploring how to turn these old phones into cloud-computing “phone clusters”. Putting phones back in service in this way can directly reduce the environmental footprint of computing by avoiding the need for further raw material extraction, and taking advantage of the embodied carbon already incurred from manufacturing these devices, and modern phones actually are already quite powerful computers. Read more in the blog below ⬇️

1,035

omkaar

omkaar

@omkizzy

Jun 11

mirror mirror on the wall, who's the sparsest of them all

omkaar

omkaar

@omkizzy

Jun 5

Grateful to be a small part of this! Rishi and team were so fun to work with

Rishi Desai

@rishi_desai2

Jun 5

Can coding agents stay coherent over a 1 billion token budget? Can they build Slack from scratch? Rewrite a JAX codebase in PyTorch? Build a C compiler in Rust? Enter SWE-Marathon: a benchmark for autonomous long-horizon software work.

202

omkaar

omkaar

@omkizzy

May 26

KV cache at openai actually means Khosla Ventures cash

248

omkaar

omkaar

@omkizzy

May 6

greg's blogs are awesome

774

omkaar

omkaar

@omkizzy

May 6

how is there a data structure for everything

225

omkaar

omkaar

@omkizzy

Apr 25

bullish!

Sid

@sid_srk

Apr 24

am i inspiring the people around me? am i inspired by the people around me? how eager am i to uncover my top-end? what am i pretending not to know? A New Reach For The Curious

257

omkaar

omkaar

@omkizzy

Apr 16

pessimists are right, optimists make money cautious optimists??

315

omkaar

omkaar

@omkizzy

Apr 16

good long-horizon work

rajan agarwal @_rajanagarwal

Apr 16

introducing FrontierSWE: a new coding benchmark that tests AI models in ultra long-horizon tasks! we gave agents 20 hours to solve hard problems in performance eng, novel implementations and ML research, yet these tasks remain largely unsaturated

1,122

omkaar

omkaar

@omkizzy

Apr 16

in case this helps someone, my claude.md was simple. something along the lines of: """ {brief summary of eval task} {what is being measured and ideally what the agent does to get 1.0 reward} In a loop, run the eval on our harness, sift through the logs of our new run in a subagent and try to find any evidence of reward hacking. once you do, reason about what change to the environment can fix this. Patch it and run the eval again. """ Long-horizon evals are tedious to get right, often the biggest blocker is how long the eval runs itself. meta loops are nice. Sandbox companies are going to make so much money because evals will only run longer

omkaar

@omkizzy

Apr 15

helping out w a long-horizon eval and my env kept getting reward-hacked. I started an overnight claude meta process of patching reward hacks as they come and it worked wonderfully.

1,259

omkaar

omkaar

@omkizzy

Apr 15

if anyone asks who the best ios dev I know is, it's kabir...

Kabir Oberai

@kabiroberai

Apr 13

x.com/i/article/204375245148…

487

omkaar

omkaar

@omkizzy

Apr 15

a world where everyone writes beautiful and interesting blogs like this, is a fantastic world to live in

evan

@evanliin

Apr 9

blog blog blog blah blah evanlin.ca/writing/exploring…

1,632

omkaar

omkaar

@omkizzy

Apr 15

helping out w a long-horizon eval and my env kept getting reward-hacked. I started an overnight claude meta process of patching reward hacks as they come and it worked wonderfully.

2,341

omkaar

omkaar

@omkizzy

Mar 31

henceforth has officially become my favourite word

223

omkaar

omkaar retweeted

omkaar

@omkizzy

Mar 31

Replying to @ProfTomYeh

I built omkaark.com/model-viz/ as an internal tool. its open source, feel free to take and change it without attribution! its a DSL to make model diagrams cc: @rasbt

674

omkaar

omkaar

@omkizzy

Mar 31

happy to help, have thus far helped quite a few teams w/ this! it's simple, build a strong eval set (the hard part), setup runners to parallelize across @modal (setup containers w harness) and collect per-step results. the other interesting part is attributing commits to regressions which is a harder one but solvable. put all this as part of your CI and don't merge without looking at the report.

omkaar

@omkizzy

Mar 28

do people run regression tests on harnesses? i wonder if codex/cc have tests internally before each release to make sure performance has not unexpectedly regressed when changing compaction, subagents, tools, etc.

872

saksham

omkaar retweeted

saksham

@sakshambatraa

Mar 30

for my next adventure, @michael_trbo and I will be working together to build a tinyLPU! for our first checkpoint, we reinvented the MXM: the language processing unit's matrix multiplication engine. here's how we did it

0:08

6,363

omkaar

omkaar

@omkizzy

Mar 28

I find this insane... ran out of modal credits, so codex got it working on docker instead?? actual agency

751

omkaar

omkaar

@omkizzy

Mar 28

do others find it hard to do small-scale, low-budget RL research as well? OS models (even 3b) are fantastic at most envs, producing great envs is a lot of eng lift. trying to find good OS envs / tasks, qwen2.5-3b has a low mean reward on

11,407

omkaar

omkaar

@omkizzy

Mar 28

**can't do long-horizon (> 16k toks) very expensive

526