Joined April 2023
208 Photos and videos
Pinned Tweet
Feb 2
Aquin is out. vibe code your LLM in 2 mins.
37
27
165
16,548
Ash retweeted
aquin dot app pip install aquin
Introducing Aquin CLI, the mechanistic interpretability and model analysis on tooling! Install aquin on a CUDA machine, connect a session, load a model, and run inspections, simulations, evals, or watch external training runs from the terminal. Every result mirrors live to the web dashboard at aquin.app. Get early access: aquin.app/waitlist pip install aquin
1
1
4
93
Jun 12
for the first time you can simulate fine-tuning before committing any compute with @AquinF03, see exactly which features strengthen or suppress, which samples hurt generalization, which layers go dead!
7
2
21
977
Jun 12
Everything, LiSSA, NTK diagonal, sharpness, all runs on one primitive: the Hessian-vector product. One extra backward pass per call, no matrix ever stored. Four hard invariants every run, not guidelines, enforced, and the simulation fails loudly if any break. Pass 2 is the heavy lift, pass 3 is cheap, and everything downstream unlocks in parallel the moment the checkpoint saves. Simulate, read, clean, repeat until you'd ship it.
1
4
85
Jun 12
the simulation answers one question: should I train? by the time it completes you know more than you would after a real training run because real training doesn't score your samples by influence. commit the compute when the prediction looks good. not before. read the full lit here: aquin.app/research/simulatio…
4
55
Jun 7
I hate reading books but Marcus Aurelius Meditations is valuable to me, specially cause Marcus Aurelius never intended to publish it. The book is basically private notes written by the most powerful man in the world.
5
118
Jun 4
Built a work agent for myself today
1
5
183
Jun 3
I think I'm in love with graphical/visual representations of ML model internals, it's too attractive and charming it's love at first sight every time.
3
5
140
May 31
At @AquinF03, we're continuing to make all existing evals and benchmark tools obsolete: 1/3 Custom evals: write your own scorer in Python and you get access to activations and SAE features, so you can do things like: "check whether a specific feature fired above threshold on a response" which no external eval harness can do! 2/3 Benchmark Builder now can run weight evals differently in a suite, and export results in multiple formats. 3/3 Auto-suggestions: agent observes and proactively suggests most relevant evals, with just one click to run.
2
2
6
710
May 30
At @AquinF03, we just shipped SAE support for Embedding models. - Feature decomposition: see which concepts are firing and how strongly. contrastive mode shows what's different between two texts. - Feature browser: ranks concepts by how much they fire across your corpus, with auto-generated labels and top examples. - Co-activation network: concepts don't fire alone, they cluster. tight clusters are semantic domains, loose ones are general purpose. - Circuit tracing: see where in the stack each concept appears and how it builds up. some grow steadily from early on, others snap on right at the end. - Steering: boost a concept and your embedding pulls toward it, suppress it and it moves away. re-ranks your retrieval corpus so you can see exactly how results shift. - Absorption and polysemy diagnostics: absorption is when two concepts always fire together, polysemy is when one concept fires on completely unrelated things. Aquin catches both automatically. - Retrieval faithfulness: zeros out each concept and sees how much retrieval drops. high activation doesn't mean high importance. - Cross-model feature matching: finds which concepts two models share and which ones are unique to each. updated literature: aquin dot app slash embeddings
May 25
Glad to announce that @AquinF03 now supports embedding models: Geometry inspection, retrieval evaluation, fine-tuning monitoring, and embedding diff across checkpoints. here's how we support them:
3
1
16
778
May 28
2 months building and researching interpretability tooling at @AquinF03 and I discovered that our users are divided into two groups: 1. People working on Interpretability 2. People leveraging their ML work with Interpretability First group builds on top of our tooling and experiments. Second group uses tooling for existing pipelines, and to debug/improve their ML work. At @AquinF03, we care about both. We're shipping a lot, and every release could turn into a experiment or study or a paper. Come build and research with us: aquin.app/bounty
3
2
9
503
May 26
Introducing @AquinF03's Devkit! basically Aquin.app's interpretability tooling locally through an SDK CLI. Aquin SDK records training runs locally, including metrics, config, and checkpoints, then CLI packages and pushes them to Aquin for post-hoc. Once pushed, run appears in CLI runs with full inspection: loss curves, learning rate, grad norm, epoch summaries, SAE diff, and model diff. SDK is framework-agnostic. It works with any Python training loop that produces a PyTorch model. For HuggingFace Trainer and TRL, a TrainerCallback pattern wires everything in without touching training logic. pip install Aquin!
2
2
11
398
May 25
Glad to announce that @AquinF03 now supports embedding models: Geometry inspection, retrieval evaluation, fine-tuning monitoring, and embedding diff across checkpoints. here's how we support them:
2
2
12
1,030
May 25
Embedding diff: Aquin's embedding diff compares two checkpoints on centroid positions, similarity distributions, anisotropy, and nearest-neighbor ranks. A composite drift score captures the tradeoffs, penalizing fine-tunes that improve one cluster by degrading another's geometry even if overall recall looks fine.
1
4
69