AaronCQL

AaronCQL

50 Photos and videos

Tweets

Pinned Tweet

AaronCQL

@AaronCQL

7 Oct 2022

Everybody and their mums building the infra. So who's building the damn apps?

AaronCQL

AaronCQL

@AaronCQL

19h

Incredibly weird take. Just customise your harness to fit the model as much as possible?

Max Katz

@maxktz

Jun 13

life lesson: never bet on a custom harness like Pi been loving my custom Pi setup for the last few weeks, the fact that I can build any extension, use any models but things are moving too fast today huge teams behind Claude / Codex change the way we develop almost every month so by building and maintaining a custom agent you're more likely to get left behind most models perform better in their native harnesses anyway, and using external ones is likely to get banned so I recommend betting your workflow on a portable primitives, like prompts, skills or scripts, instead of custom agents

149

AaronCQL

AaronCQL

@AaronCQL

Jun 9

Hashline is almost never the cheapest edit tool to use. I benchmarked 3 different edit tools across 5 different models to find out. 1) Replace: plain old string replacement 2) Patch: OpenAI's V4A patch format 3) Hashline: references lines via content hash anchors

166

more replies

AaronCQL

AaronCQL

@AaronCQL

Jun 9

Hashline was the one I most wanted to like given the hype. But across the full benchmark, it failed to beat replace on dollar cost. The hash anchor references do reduce some output tokens, but they add too much input overhead during reads to be worth it.

AaronCQL

AaronCQL

@AaronCQL

Jun 9

The best edit tool depends on your model and what you're optimising for: - Replace: as the sensible default - Patch: if the model was trained on it - Hashline: when edit density is high enough to amortise the anchor tax Full writeup charts: aaroncql.com/writings/harnes…

The Harness Problem Is Also A Training Problem

No, you really shouldn't default to hashline edits just because.

aaroncql.com

AaronCQL

AaronCQL

@AaronCQL

Jun 6

Interesting, Opus starting from 4.7 seems to be trained on the apply_patch tool call that GPT uses.

AaronCQL

AaronCQL

@AaronCQL

Jun 4

Can it run doom?

Candy樂兒

@candyyueliu

Jun 3

Introducing Monako Glass 👓 The world's first wearable Linux computer in glasses form. Run Claude Code, Codex, and any coding agent — anywhere.

3:21

579

AaronCQL

AaronCQL

@AaronCQL

Jun 4

Normally an open source tool getting acquired by a larger corporation usually doesn't bode well. But cloudflare has a good reputation amongst open source projects, so I'm cautiously optimistic!

VoidZero

@voidzerodev

Jun 4

VoidZero is joining Cloudflare. Our mission stays the same: to make JavaScript developers more productive than ever before. Vite, Vitest, Rolldown, Oxc, and Vite remain MIT-licensed. Evan and the VoidZero team will continue leading them. Cloudflare shares our commitment to open source. Together, we can keep investing in the tooling developers rely on every day, while bringing the Vite ecosystem and Cloudflare’s platform even closer together.

206

AaronCQL

AaronCQL

@AaronCQL

Jun 4

Any terminal benchmark scores out yet for Gemma 4 12B?

119

AaronCQL

AaronCQL

@AaronCQL

Jun 2

Pretty pretty please open source 35B and 27B next!!! 🥹

Qwen

@Alibaba_Qwen

Jun 1

👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation. ✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks ✅ Versatile coding agent & productivity assistant with full-modality input ✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA ✅ Cross-harness generalization across diverse agent frameworks One model. Sees, thinks, codes, acts.🙌🙌 Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎 🔗🔗⬇️⬇️ Blog：qwen.ai/blog?id=qwen3.7-plus Qwen Studio：chat.qwen.ai/?models=qwen3.7… API：modelstudio.console.alibabac…

238

Qwen

AaronCQL retweeted

Qwen

@Alibaba_Qwen

Jun 1

271

457

3,946

488,394

AaronCQL

AaronCQL

@AaronCQL

May 31

If you work with multiple different models, you can almost certainly map them to people you've known. GPT: principled, meticulous, never wants to be wrong, but replies are slightly autistic. Opus: gets your vague brief immediately, but occasionally goes rogue and destroys half the work. Gemini: the creative one who just wants to rest and vest. Qwen/Deepseek: the intern grinding 80hr weeks who never quite hits the mark.

370

AaronCQL

AaronCQL

@AaronCQL

May 31

People of Pi! Made a Bun-native extension pack, with web access, subagents, revamped core tools, ANSI-compatible themes, fzf-style completions, Telegram mode, and more. Called it Pim (Pi IMproved) 😏. Its goal is to improve the out-of-the-box experience for both users and agents, without sacrificing composability with other Pi extensions. Pim with Qwen3.6-35B managed to average 37.8% (Peak of 41.6%!) on Terminal-Bench 2.0 over 3 full runs. Pretty damn amazing that a local model running on my MacBook can rival the performance of Claude Code Sonnet 4.5 (40.1%) and outperform Codex GPT-5-Mini (31.9%)... Open source and MIT. Try it out and tell me what you think: github.com/AaronCQL/pim-agen…! Ps, huge thanks to @badlogicgames @mitsuhiko for creating and making Pi so easy to customise, it was an absolute joy to work with!

647

AaronCQL

AaronCQL

@AaronCQL

May 29

TIL Jira MCP costs 12K tokens on startup, even when you don't use it... Audit your installed MCPs folks.

2,033

AaronCQL

AaronCQL

@AaronCQL

May 29

We finally solved a pet peeve of ours: simulated balance changes on Jupiter Wallet for Jito bundles show accurately across all txs now! Thanks @PierreArowana for the PR at github.com/jito-foundation/j…, and @0xTsathir for helping out.

707

AaronCQL

AaronCQL

@AaronCQL

May 29

Who approved this epileptic animation?

230

AaronCQL

AaronCQL

@AaronCQL

May 26

Ok, claude legit feels kinda lobotomised now

879

AaronCQL

AaronCQL

@AaronCQL

May 26

One of the few things I'm extremely proud of in Jupiter is RTSE. Working with the team day in day out to fine tune how we best estimate expected output and slippage, scouring over tons of data to get the most optimal calculation - the magical point that maximises output amount against minimising swap failures. This article is a great read and sums up quite succinctly the battles we've had to fight over the past few years. Special shoutout to @melvinzzy, @gn_dnomsed (and many more behind the scenes) - the brains behind it all. And special shoutout to all of our users, for always giving us the feedback that we need to improve on the system. Even now, we're still iterating and making it better. If figuring out slippage is still a chore to you, come speak to us!

Jupiter Developers

@JupDevRel

May 26

Route A quotes 100 tokens, executes 95 Route B quotes 150, executes 50 Metis v8 picks Route A. It routes on what each path will actually execute at, not what it quotes. The best quote isn't the best swap. Read the full article by @melvinzzy here: developers.jup.ag/blog/why-y…

744