bro, last week we got neck deep with support tickets for Bodega. users saying their cpu was spiking hard during tasks that shouldn't even touch the gpu.
turned out our task scheduler was the problem. every time an agent spawned inference jobs or the voice pipeline queued audio processing, we were hammering redis. even when the worker was running on the same damn machine.
didn't do it manually. used axe, the coding agent we built for codebases where things breaking actually matters. it's our internal tool, been using it for close to 6 months now. some of the LLMs we used are especially trained on our codebases too. we dont use claude code, its incentivized to waste tokens and cost you more, plus cant let our shi be trained on their next models.
anyways, so we dug in. traced the bottleneck through 7 layers of our task queue. took 25 minutes. used way less tokens than you'd think. shipped the fix.
before: ~400-2500µs overhead per task. production speed around 2,000 tasks/sec.
after: ~0.5-5µs per task. 20,000 tasks/sec.
do the math lil bro!!!
cpu usage dropped back to normal. and actually, forgot to mention lol, axe also suggested this zero-copy optimization that made things even faster. wait i did mention the overhead speed above, anyways so yeah users happy now.
okay so why would u care? we maintain bodega, a voice os running entirely on your device. inside it there's an inference engine powering ai agents, voice pipelines, real-time audio processing, and a distributed task scheduler handling thousands of jobs per second. when something breaks here, people feel it immediately. we made bodega to be run literally on arch and in all three major OS platforms. so things can obv break even after fool proofing our best for 2 years.
the problem was our scheduler kept serializing every job, network call to redis, then deserializing on the other end. for local tasks where the worker's literally sitting right there? completely wasted cycles. that's what was murdering cpus.
here's the flow axe traced:
before (redis path):
serialize args/kwargs with cloudpickle (~50µs)
create redis message dict
inject opentelemetry context
XADD to redis stream (~100µs)
worker XREADGROUP from redis (~180µs)
deserialize args/kwargs (~50µs)
extract opentelemetry context
execute task
XACK XDEL to redis (~100µs)
total overhead: 400-2500µs per task
after (local queue path):
check local_queue exists? yes
queue.put(execution) (~0.2µs)
event.set() to wake worker (~0.2µs)
worker queue.get() (~1µs)
execute task (same execution object!)
total overhead: ~0.5-5µs per task
here's where most coding agents just don't work for us. claude code dumps 100 files into context, burns through tokens like scar, still misses how functions actually call each other, can't tell you what breaks when you change stuff. they're optimized for demos. for looking good in screenshots. not production systems where downtime costs our life, ngl actually our life. we treat each other as players in srswti, this is a football team lil bro not a family in this org-- each one of us has to perform at best.
axe does it differently. precision retrieval.
pulled exactly what we needed. who's calling the scheduler, what the scheduler calls, how task data flows from creation through the queue to workers, plus impact analysis showing what breaks if we bypass redis for local tasks. way fewer tokens. complete picture of what's happening.
before touching any code, axe showed us 23 call sites hitting task scheduling, 11 files getting affected, exact worker polling logic needing updates, which tests would fail. we shipped zero-copy local queues. 8x performance improvement. it did miss one thing which was to understand the we can just increase our local queue size to max, it can cause problems but thats fine-- thats what us engineers are there for as well.
this is what you need when maintaining large codebases. not websites or excel stuff or faking using gcc compilers on demos. its actually safe refactoring by actually understanding call signatures, dependencies, how data moves through your system.
axe does three things other slop agents can't:
understands large codebases through call graphs, not just dumping files
typical numbers: 99% fewer tokens on function refactors, 89% on codebase overviews, 95% on deep call chains. but here's the interesting part. sometimes axe uses MORE tokens than other tools. because we actually see full call graphs, data flow analysis, program dependencies, impact stuff. tracking down some nasty bug through 7 layers of code? we'll pull 150,000 tokens if that's what it takes. getting it right matters more than saving tokens.
works really well with local llms because we built it for hardware people actually own
bodega runs on all devices (rn only available for apple silicon doe, others coming soon in few weeks) , gaming laptops, whatever you've got. local llms are different though. slower prefill times, smaller context windows, no per-token costs. so we had to build precise retrieval from the ground up. turns out precision helps cloud models too. faster responses, lower costs, better understanding. we tested with our srswti/blackbird-she-doesnt-refuse-21b on m4 max. full agent capabilities, spawning subagents, 120 tok/s inference, everything running local. and yeah now we open sourcing it, because precision is somethign even cloud llms like opus or gpt5.2 can benefit from as well
so the version we released -- indexes the codebase initially--semantic search that finds what code does, not what it's named. its not rag or anything.
you ask "find code handling task queue consumption" grep finds variable names, comments, anything with "consume_task" in it axe finds the actual function polling redis consumer groups, deserializing task data, spawning worker threads. even if someone named it _internal_loop() or something random
how it works: axe-dig, our 5-layer retrieval process. ast for structure, call graphs for relationships, control flow for complexity, data flow for how values transform, program dependence for what affects what. every function gets embedded with what it actually does.
look, if you're building something new from scratch, use whatever works for you. but maintaining production systems? refactoring code someone wrote 3 years ago? working on distributed stuff where you need to understand things before shipping? want powerful agents running on your own machine?
yeah, we built axe for that.
open source now. terminal native. built by the most lethal team working on the fastest retrieval and inference platform