10X Builder | AI Performance Engineer | Co-founded @mybridgecard(YC S22)

Joined July 2018
115 Photos and videos
Festus retweeted
A single OCR error almost killed a $1.7M energy infrastructure build for a Fortune 500 company powering AI data centers. The code said T12C3. OCR read 712C3. False alerts cascaded. Pipeline nearly stopped. This is why we built AwaitVerify. (1/5)
1
1
3
52
TLDR: I built @usetagAI, a local-first intention inbox for the things you save because they might matter later. Built majorly with @GoogleDeepMind Gemma 4 models, @cactuscompute, and @OpenAI Codex. More details below. 🧵 x.com/usetagAI/status/205646…

May 18
Introducing Tag. A local-first intention inbox for the things you save because they might matter later. Saved screenshot to detected intention to source-backed card to action. No login. No cloud sync. Open source: github.com/Ayobami-00/tag Beta waitlist: forms.gle/1mPQuYme2nKVytps9
3
5
27
227,990
A huge shoutout to @OpenAI Codex. @usetagAI was built with Codex. AI assisted code generation and AI assisted image-gen iteration on the final logo. The part that was most surprising to me was the demo. I needed something quick and I thought why not try codex. All I needed was to tell Codex to use my local simulator for driving the app and taking screenshots and also use @hume_ai API for the voice-over. That led to the beautiful result here: youtu.be/hwiTligAAxU 🧵
1
2
325
I’d love to know what you think. Do you also save things and forget why? If you want to try @usetagAI before the beta, it’s open source! Head over to github.com/Ayobami-00/tag and build locally. Contributions are welcome! For the beta waitlist, join up here: forms.gle/1mPQuYme2nKVytps9 Looking forward to how @usetagAI helps you never lose the reason you saved something.
1
85
I work with GPUs from time to time, whether it’s benchmarking with @vllm_project, @sglang, or @nvidia Dynamo, or doing kernel optimization work for open-source models. Again and again, I find myself asking the same question as this tweet: x.com/snwy_me/status/2037070… So I built Capacitor: github.com/Ayobami-00/capaci…, an open-source Rust CLI for watching scarce GPU capacity across cloud GPU providers.

Mar 26
where the fuck are all of the GPUs going?!? i need literally one 8xH100 node and i cannot for the life of me get one ANYWHERE
1
2
208
It currently supports Vast.ai, Lambda Cloud, and Runpod, with cross-provider search from one terminal command: cap watch --providers vast,lambda,runpod --gpu H100 --max-price 9 --once I also created @gpucapacitor, which will post GPU availability signals, rare capacity sightings, price notes, and general GPU market observations.
1
124
The goal for Capacitor is simple: Find capacity, Compare providers, Track patterns, Eventually reserve GPUs and run workloads. If you use GPUs, I’d love feedback. What would make this genuinely useful in your workflow?
25
Whaoooo! Love it!
Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography
59
You’re on @huggingface , you find an open source model, and you hit the "Files and versions" tab. Instead of one clean app file, you see a list of JSONs and something called safetensors. Most people just copy the AutoModel.from_pretrained snippet and move on but what are these files actually doing? 🧵 We're goigng to use the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model as a motivating example. You can find the link to the hugging face repo below: huggingface.co/TinyLlama/Tin…
2
48
End to end, this is how the inference engines use the files. When you run the from_pretrained snippet, the engine reads the config.json to build the model's physical skeleton before pinning the safetensors weights to your VRAM. Simultaneously, it loads the tokenizer to translate your words into math and follows the generation_config to decide exactly how to behave and when to stop talking.
1
22
While we used TinyLlama as the example here, this "package" structure is the universal standard for almost every model on the Hugging Face Hub, from 1B parameters to 70B . Next time you see those "boring" JSONs, remember they’re the glue holding the AI together.
23