GPU dev tools at Apple. Ex-Google (gRPC, Node.js, Chromium), ex-NVIDIA (CUDA Toolkit).

Joined July 2008
731 Photos and videos
Pinned Tweet
I am building a machine learning framework in C . So far I put almost 2 years of my leisure time into it and hope to be able to show it off around end of this year. Uchen ([oochen]) is Ukrainian for "student". My goal is to help people build smarter software and have some fun.
5
2
66
26,195
Russia hit an Orthodox Church that was established in 1051, well before Moscow was founded even if we believe Russian falsified history. This is pure envy. Russia wants to forget that their Church originated from Kyiv. They are embarrassed to admit that their language, faith and culture all were brought from Ukraine.
๐Ÿšจ BREAKING: Russian strike hits the historic Kyiv Pechersk Lavra, a world-famous Orthodox monastery and UNESCO site, igniting fire on the roof of the Dormition Cathedral.
1
4
151
Add to my CLAUDE.md: - **Trust nothing.** The whole stack โ€” uchen-core, training binaries, rollup math, eval kernels, WASM inference โ€” is home-grown. A clean `decision.json`, a tight CI, a converging loss curve are hypotheses until cross-checked against bit-identicality probes, sha256 pins, expected wall-time ranges, and an independent path. Surprising numbers (good or bad) are first a bug suspicion. Single-line bzl / shuffle / seed bugs silently destroy eval signal while loss curves look fine; that has happened before and the ยง0.1 / ยง0.2 rules exist because of it.
1
60
Finally finished Invincible. Started reading it around 20 years ago.
1
55
Eugene Ostroukhov retweeted
Literally the worst cable management I've ever seen in my life
124
473
5,999
147,926
Model is still pretty stupid (basically have to do brute-force walk). MCTS counter is model inferences. As model becomes smarter I will be doing less inferences, will be making its turns much quicker. Also, this is a training model that does some unnecessary computations. Shipped model will have less layers.
79
When I was starting UchenML I wanted to do fun stuff - C templates, multi-process, memory management. And not this all "Gumbel MCTS, multi-head loss, SGD-momentum, loss.h."
9
555
I use Claude Code all the time. I do not see difference in model outputs between different models or effort levels. I choose Opus just in case, but a few time had sessions that defaulted to Sonnet. I only knew it because I saw it somewhere in UI.
1
4
284
Asked ChatGPT what my model definition would look like in PyTorch and hypothetical Rust version of UchenML. C wins! My goal was explicitly not to have to include same information twice. I know what previous layer outputs - why do I need to specify it again?
5
329
My Gmail inbox now shows a large Upgrade button. Sorry Google but I am already paying for 2 AI subscriptions.
1
163
Write maintainable code.
Jun 12
Name a thing you can do better than Claude
5
154
If you need this, you are probably doing something wrong. Also, see std::visit
You have a variant<Dog, Cat, Raccoon> You also have an enum { Dog, Cat, Raccoon } that you manually keep in sync with it One day someone adds Fox to the variant but forgets the enum. The switch compiles and the bug ships What if the enum wrote itself? ๐Ÿงต๐Ÿ‘‡
3
347
No need for the debugger.
2
150
One should really review the code LLMs write. They make really subtle bugs. Also MaxLanes is not the "worse case scenario".
3
383
Eugene Ostroukhov retweeted
Regardless of what I say about AI, somebody always considers my post overly negative or overly positive. It's a polarizing topic for sure. If you are in the middle, everybody hates you on both sides. I have a pragmatic view about AI. It's a good tool, but it needs a lot of micro-management to be viable, and even then it often frustrates me. Still I use it because it improves my overall productivity. But the quality is slightly worse, even with constant micro-management.
29
15
394
16,749
I once wrote a project without loops. Map/filter/reduce only. In Python.
3
299
Rendering is usually not slow. What is slow is a huge number of network requests. Many web sites are composed of disjointed pieces (e.g. advertisements) that access data from different servers. E.g. send a request for the list items. Then for each list item fetch thumbnail. Scroll may reveal more items.
A modern video game can render billions of triangles per second. Meanwhile, add 1,000 divs to a webpage and suddenly we're in performance-budget territory. I'm sure there's a good technical explanation for this.
3
1
12
1,676
Claude: "Land the gradcheck fix. After ~6 attempts at analytical debugging I could not localize the bug; the math is structurally correct per the statistician, the ReLU mask placement is correct, the scratch is 64-byte aligned, the parameter spans are disjoint. Both reviewers and I agree: it's likely a subtle SIMD interaction the debugger skill or a print-instrumented A/B against a scalar reference can crack quickly โ€” but I couldn't.What I did not do:" Is it for realz? Expecting me to debug its code?
2
293
Claude started doing this too. I stopped using Codex because of this junk.
1
16
2,340