Festus

Festus

115 Photos and videos

Tweets

Pinned Tweet

Festus

@_enfinity

Apr 16

I recently got a PR merged into @nvidia's Dynamo (A Datacenter Scale Distributed Inference Serving Framework) and the process was a great A/B test of KV data transfer over TCP vs KV data transfer over RDMA. PR link: github.com/ai-dynamo/dynamo/… 🧵

feat(recipes): add Qwen3-32B-FP8 vLLM disaggregated single-node recipe by Ayobami-00 · Pull Request...

Overview: Add a production-ready vLLM disaggregated single-node recipe for Qwen3-32B-FP8. This fills the gap where recipes/qwen3-32b-fp8/ previously only provided TensorRT-LLM configurations and ad...

github.com

286

Tunde | Building AwaitHumans

Festus retweeted

Tunde | Building AwaitHumans

@llm_money

Jun 10

A single OCR error almost killed a $1.7M energy infrastructure build for a Fortune 500 company powering AI data centers. The code said T12C3. OCR read 712C3. False alerts cascaded. Pipeline nearly stopped. This is why we built AwaitVerify. (1/5)

Festus

Festus

@_enfinity

May 18

TLDR: I built @usetagAI, a local-first intention inbox for the things you save because they might matter later. Built majorly with @GoogleDeepMind Gemma 4 models, @cactuscompute, and @OpenAI Codex. More details below. 🧵 x.com/usetagAI/status/205646…

Tag

@usetagAI

May 18

Introducing Tag. A local-first intention inbox for the things you save because they might matter later. Saved screenshot to detected intention to source-backed card to action. No login. No cloud sync. Open source: github.com/Ayobami-00/tag Beta waitlist: forms.gle/1mPQuYme2nKVytps9

1:21

227,990

more replies

Festus

Festus

@_enfinity

May 18

A huge shoutout to @OpenAI Codex. @usetagAI was built with Codex. AI assisted code generation and AI assisted image-gen iteration on the final logo. The part that was most surprising to me was the demo. I needed something quick and I thought why not try codex. All I needed was to tell Codex to use my local simulator for driving the app and taking screenshots and also use @hume_ai API for the voice-over. That led to the beautiful result here: youtu.be/hwiTligAAxU 🧵

Tag: Never Lose the Reason You Saved Something

Introducing Tag.Tag is a local-first intention inbox for the thin...

youtube.com

325

Festus

Festus

@_enfinity

May 18

I’d love to know what you think. Do you also save things and forget why? If you want to try @usetagAI before the beta, it’s open source! Head over to github.com/Ayobami-00/tag and build locally. Contributions are welcome! For the beta waitlist, join up here: forms.gle/1mPQuYme2nKVytps9 Looking forward to how @usetagAI helps you never lose the reason you saved something.

GitHub - Ayobami-00/tag: Never lose the reason you saved something. A local-first intention inbox...

Never lose the reason you saved something. A local-first intention inbox for screenshots, links, notes, and saved posts. Source-backed cards. No login. No cloud sync. - Ayobami-00/tag

github.com

Festus

Festus

@_enfinity

May 9

I work with GPUs from time to time, whether it’s benchmarking with @vllm_project, @sglang, or @nvidia Dynamo, or doing kernel optimization work for open-source models. Again and again, I find myself asking the same question as this tweet: x.com/snwy_me/status/2037070… So I built Capacitor: github.com/Ayobami-00/capaci…, an open-source Rust CLI for watching scarce GPU capacity across cloud GPU providers.

snwy

@snwy_me

Mar 26

where the fuck are all of the GPUs going?!? i need literally one 8xH100 node and i cannot for the life of me get one ANYWHERE

208

Festus

Festus

@_enfinity

May 9

It currently supports Vast.ai, Lambda Cloud, and Runpod, with cross-provider search from one terminal command: cap watch --providers vast,lambda,runpod --gpu H100 --max-price 9 --once I also created @gpucapacitor, which will post GPU availability signals, rare capacity sightings, price notes, and general GPU market observations.

Rent GPUs | Vast.ai

Rent high-performance cloud GPUs at low cost with Vast.ai. Instantly deploy GPU rentals for AI, machine learning, deep learning, and rendering. Flexible pricing, fast setup, and global availability

vast.ai

124

Festus

Festus

@_enfinity

May 9

The goal for Capacitor is simple: Find capacity, Compare providers, Track patterns, Eventually reserve GPUs and run workloads. If you use GPUs, I’d love feedback. What would make this genuinely useful in your workflow?

Festus

Festus

@_enfinity

Apr 29

Whaoooo! Love it!

Dwarkesh Patel

@dwarkesh_sp

Apr 29

Did a very different format with @reinerpope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Recommend watching this one on YouTube so you can see the chalkboard. 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

2:13:40

Festus

Festus

@_enfinity

Apr 23

You’re on @huggingface , you find an open source model, and you hit the "Files and versions" tab. Instead of one clean app file, you see a list of JSONs and something called safetensors. Most people just copy the AutoModel.from_pretrained snippet and move on but what are these files actually doing? 🧵 We're goigng to use the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model as a motivating example. You can find the link to the hugging face repo below: huggingface.co/TinyLlama/Tin…

more replies

Festus

Festus

@_enfinity

Apr 23

End to end, this is how the inference engines use the files. When you run the from_pretrained snippet, the engine reads the config.json to build the model's physical skeleton before pinning the safetensors weights to your VRAM. Simultaneously, it loads the tokenizer to translate your words into math and follows the generation_config to decide exactly how to behave and when to stop talking.

Festus

Festus

@_enfinity

Apr 23

While we used TinyLlama as the example here, this "package" structure is the universal standard for almost every model on the Hugging Face Hub, from 1B parameters to 70B . Next time you see those "boring" JSONs, remember they’re the glue holding the AI together.