Joined September 2023
391 Photos and videos
Pinned Tweet
I got a 1T (trillion) parameter model running on my MacBook Pro. Kimi-K2. 1.029T params. ~1 TB raw weights. 524 GB converted. ~1.7 tok/s. Yesterday it was 671B. Today it's 1T. Same laptop. Same M4 Max. No cloud. When I say we: I mean Claude and me.
73
96
1,455
142,556
Daniel Isaac retweeted
Most “AI image apps” are just screenshots traveling to a server. I wanted: • weights on device • generation on device • images never leaving the phone So I spent the last week turning sd.cpp into an actual iOS developer experience. SwiftPM install. One simple generate() call. Completely open sourced.
7
7
52
11,033
This is an interesting perspective on the evolution of software I’m curious to understand what the integration of agents will look like as this is becoming a more prevalent feature that will be a non-negotiable in the near future However, the issue that I see, is this a layer of infrastructure as certain companies fall under regulations that require data sovereignty This means that certain sectors will need on premise compute. “on premise is the new cloud compute” With the rapid advancement of source models, future software (for specific types of companies) will be required to include an orchestration layer to leverage on premise infrastructure.
Pharma companies are all about life saving drugs. But from the perspective of software, pharmaceutical companies manage and author a series of documents of ever-increasing complexity, accuracy and criticality. Viewed in this way, the operating system for the pharma industry can be reimagined as a content management platform that helps scientists and pharma execs manage pre clinical, clinical and post clinical development of drugs and their go to market. We are building this exact suite for a multi billion dollar pharma company. The result is more money spent on lifesaving drug R&D and a more streamlined interface with regulators because the documents required to move along their process are increasingly pristine and machine verifiable. This doesn’t just apply to pharma. We are doing similar things in manufacturing, finance, aerospace & defense and medical devices. 8090’s practice and our Software Factory platform excels particularly well in regulated environment where vibe coding won’t get the job done. If you want to see if we can help you, please be in touch. Sales@8090.ai
1
315
Iykyk
/radio
2
295
Very excited for this
Finally able to talk about what I've been heads-down on for 6 months at @nvidia 🦀⚡ We just open-sourced cuda-oxide — an experimental rustc backend that lets you write CUDA kernels in pure Rust. No DSLs. No FFI. No source-to-source step. Single source. Short🧵👇
9
875
I got bored/curious so started replicating quantum computing sims on the macbook nothing novel. just running existing tests against MLX on the M4 Max to see what the hardware actually does 21 tests in. some highlights:
1
4
540
Systems are everything now. Your ability to understand and design great system architecture will determine your success as a “builder” {software engineer} in this next era of software To Jensen’s point on the all in podcast ~We now need people who define great systems, iterate quickly and understand how to leverage ai systems. Writing code is a thing of the past.
Every time I see a tweet saying “I can vibe code this in a weekend” - I think of the slack notification system.. It takes time, persistence and effort to get the details right. Sure, a lot of simple workflows will get vibe coded away. And maybe you can put this in Claude Code and get the code right in one shot. But quality, depth and great systems will still have value and take time. You can’t vibe code lessons. Now and forever.
3
378
Update: I've been training a LoRA on Qwen3-Coder-30B-A3B for a week. It wasn't actually training. Every forward pass was ignoring the adapter. Every "best val_loss" I logged was batch noise. I caught it with one assertion.
2
1
23
2,197
I added one line to the training loop: assert max(losses) - min(losses) > 1e-6 After step 1: 0.0000. Every population member returned the exact same loss. Identical. The perturbations weren't reaching the forward pass.
1
1
354
Here's the fix: type(module).__call__ = patched Python resolves obj(x) via the type, not the instance. The dunder I set was silently ignored. Post-fix A/B: base adapter 0.0005 nats. Below noise floor. Now I can actually test if eggroll works on 30B.
6
332
Nice
Introducing... Gemma 4 Multimodal Fine-Tuner for  Apple Silicon - LoRA fine-tunning toolkit for Gemma LLM - runs locally on macOS via PyTorch and Metal - streams data from Google Cloud to your machine - fine-tune on audio, image and text - easy-to-use CLI wizard If you want to fine-tune the new Gemma 4 on text, images, or audio without renting an H100 or copying a terabyte of data to your laptop, this is the only toolkit that does it all on Apple Silicon.
1
1
38
8,148
me vs Apple software i spent 80 hours trying to make my Rust fine-tuning beat Apple's MLX on val_loss it worked (kinda) 11 of 15 windows lower val_loss at every seq length
2
24
2,296
a critical part of how i run any experiment: i perform a spread to 'profile' characteristics that aren't clear to me yet then i define the trends in the data and double down on interesting or unclear results never optimize blindly. understand the shape first.
1
1
192
next: prefill batching to close the wall time gap SSD temperature experiments (Apple's recent paper) maybe ANE forward pass fine-tuning a 30B model on a MacBook should be a real option we're getting closer
1
3
204
Now porting to Rust via Rustane. ANE GPU CPU. Bare metal Accelerate and Metal kernels. ~700 lines of Rust. MoE forward path. LoRA injection. EGGROLL on Qwen3-30B end-to-end. Apple just dropped the SSD paper: temperature-scaled loss, 42% to 55% on code benchmarks. Next up.
2
2
26
2,105
Fine-tuning 30B used to need a GPU cluster. Now it's overnight on a MacBook. A dev fine-tunes on their codebase. A radiologist on imaging reports. A quant on trading signals. No cloud. No API. No data leaves the device. Specialized models. Hardware you already own. Private Data is the new Gold
4
2
37
1,917