Sidhant Kabra

Sidhant Kabra

Users
Tweets

Sidhant Kabra

@kabra_sidhant

Jun 10

Self-improving eval loops have a known failure mode: they hit 100% on the failing set by writing the test data into the prompt. Verbatim user phrases, scenario IDs, case clauses that never generalize. It looks fixed. Two weeks later a call that was passing breaks. The version we ship at Cekura scans every prompt edit against transcript-leakage signatures and strips the verbatim quotes before redeploy. A "fix" that hardcodes a customer utterance into the system prompt is not a fix. It is the test data leaking into the model. The other choice that matters: full-set eval after every iteration, not just the failing subset. The healthcare case study runs a 20-scenario set from 75% to 100% over five iterations, and the regression sweep catches two scenarios the edits broke along the way. A loop that re-ran only the failures would have shipped those regressions. Before any edit, every failure is classified into one of five categories: Gap, Conflict, Ambiguity, CodeBug, or Upstream. Only Gap and Ambiguity respond to prompt changes. CodeBug needs orchestration changes. Upstream needs a tool, knowledge base, or infra fix. Most manual iteration treats all five as Gap, which is why teams burn five rounds on a problem the prompt cannot solve. If iteration N fails the same way as N-1, the loop stops editing the prompt entirely and escalates: restructure the flow into named states, add deterministic code-level guards, or move to a stronger model. Endless prompt rewording against a structural limit is the manual-loop default. Lavish wrote up the seven-phase architecture and the iteration-by-iteration walkthrough in the healthcare case study. cekura.ai/blogs/self-improvi…

Self-Improving Voice Agents: Closing the Eval Loop Automatically

Learn how to build a self-improving voice agent loop that automatically diagnoses failing evals, applies prompt fixes, catches regressions, and iterates to 100% pass rate.

cekura.ai

Umedyn 🐙

Drakian retweeted

Umedyn 🐙@umedyn

May 25

Codebug Mascotchi Idle Animation #elliestrations Egg art by @/soulgluttony

114

4,203

ty999999

Drakian retweeted

ty999999 @Ty999999_VT

Jun 7

The Lab Brats Mega Mascot quartet is finally complete! What started out with just wanting a really big codebug has turned into a fun little unexpected adventure of printing all the mascots! I can't wait to print even more mascots! #elliestrations #idrewmimi #chrchieart #minyart

202

☸️1manfund

☸️1manfund

@bvlldhist_alt

Jun 3

I woke up seeing a comment on Uday Kotak today that “he hasn’t made a screw in his life” So I decided today’s weapon of choice is the Codebug FireMace

ALT Ranveer Singh Sigma GIF by Jio Studios

318

ㆍ˳͡🍎゛𝓦ren ♡s 𝓢even . ֺ ꒷꒦

ㆍ˳͡🍎゛𝓦ren ♡s 𝓢even . ֺ ꒷꒦@007n7sboyfriend

Jun 2

yumeslop interactionbait adnd Some other buzzwords . . . PLS SEND ME UR YUMESHiPS iN THE REPLiES TOO i WANNA SEE THWM ! ! ! #yumetwt #riakotwt #codebug lalala

209

RetroWinnipeg

RetroWinnipeg @RetroWinnipeg

May 27

Replying to @JIRO_draws_cats @dumb_lil_robot

But was it made from codebug milk?

388

chrchie 🦎⭐️ lizard girl

chrchie 🦎⭐️ lizard girl

@chrchie

May 24

Replying to @dumb_lil_robot

codebug nails incoming

792

Pix0r

Pix0r @Vulpix0r

May 22

Replying to @FutureSightJker

Damn that codebug is bigger than I thought.

102

Sarah Jones

Sarah Jones @SarahJonesvoxs

Apr 29

a language so elegant it could lull a codebug into a peaceful slumber

🇪🇺🌴

🇪🇺🌴

@richydasilva

Apr 22

Replying to @ImAntCalabrese

People dont realise that AI is devaluing all the wizardry around technical ability. The math geek or codebug that could get a job just by proving how good they were at those things is being outperformed by some AI. The boomers that got sparky eyed at someone who could code python are dying off & retiring. Millenials who grew up around technology & understand the shift are moving into leadership positions. Now it's all about who do you know, can you craft an interesting narrative around your experience & can you draw the right ppl's attention to you

133