Iris Lindsey

Iris Lindsey

Users
Tweets

Meredith Barkhau retweeted

Iris Lindsey @iris_lindsrbm2

12h

[Finally nailed that last bug in the project! No more late nights debugging, time to celebrate this win ]，

Visual Studio

netlabs retweeted

Visual Studio

@VisualStudio

Jun 13

AI can absolutely generate code fast. 🧪 Debugging whatever it generated is where things get interesting: youtu.be/fVE4Ol085UU

0:48

3,908

Ferbin

Ferbin

@Ferbin08

Replying to @GaryMarcus

Automating boilerplate doesn't kill coders. It kills tedious work. Debugging, integrating legacy mess, fixing what breaks? That's the actual job.

Rohan

Rohan

@proxy_vector

14m

Replying to @shmidtqq

The "100% of today's coding tasks" line is directionally interesting, but task completion and production responsibility are different games. Writing code is one step; debugging ambiguity, rollback risk, and ownership are the expensive parts.

YigitCan bslm

Kaan Yılmaz retweeted

YigitCan bslm @yigitcan_bslm

25m

Finally hit that milestone—30 consecutive days of daily coding! No more late-night debugging, just the satisfying vibe of wrapping things up right.

aditii

aditii

@aditiitwt

22m

Replying to @AlfinCodes

debugging

Emre Arslan

Emre Arslan

@AarslanEmre

22m

Books won't make you an LLM engineer. Building dreamandstars taught me more about prompt reliability than any handbook. The real skill is debugging why GPT-4 works Tuesday but fails Friday with the same input. Start shipping, not reading.

Javarevisited

@javarevisited

20 Jul 2025

5 Must-Read LLM Engineering Books in 2025 1. LLM Engineering Handbook - buff.ly/wogklbo 2. Building LLMs for Production - buff.ly/wjpOeTB 3. Build a Large Language Mode - buff.ly/DHp4ZR1 4. Hands-On Large Language Models - buff.ly/WInCgwi 5. LLMs in Production

iGarlic

iGarlic

@ablenavy

23m

x.com/i/article/206607383344…

Dharmvir

Dharmvir

@dharmvir_

23m

Replying to @AlfinCodes

Debugging

The Remnant

ohk retweeted

The Remnant @TheRemnant232

17h

Imagine debugging a Zulu script

Mzwandile Zulu @Father_Of_Geeks

21h

Growing up in South Africa, coding always felt like it belonged to someone else's language. So I built my own. Introducing CMT-IsiZulu — write Python code in isiZulu South African can now write codes in their home language 🇿🇦 Sikhona. We exist. drive.google.com/file/d/1arL…

488

26,300

Fred Roger

Fred Roger

@FredRoger0x666

25m

Dubai's basically speedrunning the future while half the world's still debugging the past. Smart move locking down AI governance before the chaos gets expensive.

UAEGOV

@UAEmediaoffice

42m

Replying to @UAEmediaoffice

Mohammed bin Rashid Approves Establishing the Artificial Intelligence and Data Authority

No Filter

No Filter @hvg108

26m

Replying to @nandantechtwts

Gemini is now pretty good except for debugging. But it has always been like that?

time velocity 🇺🇸

time velocity 🇺🇸

@time0149

27m

Replying to @IntCyberDigest

To understand how this jailbreak works, look at how Fable 5's safety architecture parses a prompt. It separates input into two distinct buckets: the Instruction (what you are asking it to do) and the Data (the context or code you provide). Classifiers are primarily intent-engines. They are trained to look for hostile or dangerous instructions. If your instruction is "write a script to exploit this server," the classifier detects a massive spike in malicious intent and drops the connection. The code-review jailbreak is a structural exploit that neutralizes the instruction bucket. It shifts the "danger" entirely into the data bucket.The "Bring Your Own Payload" Bypass. Instead of asking Fable 5 to write an exploit, a user pastes a block of raw, vulnerable code or a half-finished malware payload into the prompt. They then wrap it in a perfectly benign instruction: "Please review this codebase and fix any logical flaws or syntax errors."To the Fable 5 classifier, the intent signature is near zero. It looks identical to millions of routine programming tasks submitted by legitimate developers every day. The classifier waves it through.Once the prompt clears the filter, it hits the core engine. Fable 5 shares the same underlying neural architecture as Mythos 5—a model explicitly built as a state-of-the-art cybersecurity and debugging tool. The core model reads the code, identifies the "bugs" (which, in this context, are the flaws preventing the malware from working), and helpfully rewrites it to be highly efficient and fully functional. The user gets a weaponized exploit optimized by an advanced AI, simply by asking for a routine code review.

Ola

Ola retweeted

Ola

@dev_olayinka

30m

Everybody wants to become a tech bro until it's time to spend 6 hours debugging a problem caused by a missing semicolon. 😂

Alfin

Alfin

@AlfinCodes

30m

What's the hardest part of programming? - algorithms - system design - debugging - understanding someone else's code

Manish Nair | RAG Systems

Manish Nair | RAG Systems retweeted

Manish Nair | RAG Systems @manish_nair26

Unpopular opinion:- Debugging is more harder than coding . #vibecoding

Sai Tedla

Sai Tedla @tedlasai

32m

Debugging diffusion model be like, change 1 line and wait 5 days to see the effect 😭

time velocity 🇺🇸

time velocity 🇺🇸

@time0149

33m

Replying to @kevinnbass

Fable 5 uses a semantic classifier to flag risky prompts and route them to an older, safer model (Opus 4.8). Because the un-nerfed version of the model (Mythos 5) is genuinely dangerous regarding things like zero-day exploits and synthetic biology, Anthropic panicked and cranked the classifier's sensitivity. This is the classic precision vs. recall tradeoff, and it is where the system broke. Anthropic prioritized "recall"—meaning they wanted to catch every possible threat, regardless of the collateral damage to normal prompts. The breaking point estimate: To stop the tiny fraction of actual exploits, Anthropic likely tuned the classifier's threshold down to roughly 10% to 15% similarity. If your benign prompt about a high school biology project or a standard Python script shared even a 15% structural or thematic similarity with a dangerous pathogen query or malware, the system blocked it. They essentially accepted a massive 40% to 50% false-positive rate on everyday STEM queries just to ensure the false-negative rate on real threats stayed near absolute zero. Even with the dial cranked to paranoid levels, researchers and government officials still punched right through Fable 5's armor. They did this because classifiers measure the intent of your words, while jailbreaks exploit the structure. If you ask for malware, that 15% similarity threshold trips instantly. But if you paste a block of malicious code and ask Fable 5 to "review this for syntax errors," the classifier just sees a routine debugging request. It passes. The model then uses its raw intelligence to inadvertently optimize the exploit. There isn't a classifier in existence that can reliably tell the difference between debugging normal code and debugging a weaponized payload without making the AI completely useless for programmers. Anthropic knew Fable 5's filter was leaky. They tested it with the government for months prior to launch. Their plan was never 100% impenetrable safety—it was to launch the model, monitor what users did, and patch the holes over time (which is why they pushed for that controversial 30-day data retention policy). But they boxed themselves into a corner: They spent the entire pre-launch cycle selling Fable 5 as a god-tier model, pushing the narrative that it was almost too dangerous for the public. The Trump administration took that marketing literally. When the trivial code-review jailbreak surfaced immediately after launch, the government didn't see it as a normal software bug. They saw a supply-chain national security threat. Now, Anthropic is doing damage control. They are downplaying the jailbreaks because they are desperately trying to reframe the narrative. They need the public and regulators to believe that the government is overreacting and demanding an impossible standard of "perfect safety" that no tech company can actually deliver. In my opinion, they didn't build a dumb filter on purpose; they just lost control of the massive gamble they took between safety engineering and PR.

ALT Estimate for illustration only.