Phil Stőck

Phil Stőck

119 Photos and videos

Tweets

Phil Stőck

@PhilShteuck

Spent ~2 weeks building a harness to make a local LLM (Qwen) write working code — constrained JSON edits, no shell access, every step verified by the harness. It generated a "backend API," passed every gate, and got certified at a maturity level on my own benchmark. Then I read the output. 43 lines of code. A 3-line README. So I dug in. The prompt that asks the model to write each file includes an "example valid transaction"… which was the entire target file, fully written. The model wasn't generating anything. It was copying the answer my own harness handed it. Output matched my fixtures line-for-line. I set out to prove a local model could write real software — and accidentally built a very elaborate photocopier. Then the benchmark certified the copy as "Level 3." 😂 Lesson, burned in: if your few-shot example contains the answer, your eval isn't measuring generation. It's measuring xerox fidelity. And it will happily report progress. Back to it — this time the model has to solve, not copy.

Phil Stőck

Phil Stőck

@PhilShteuck

Each technological jump brought its fair share of detractors... And every single time they were hilariously wrong 😂

Phil Stőck

Phil Stőck

@PhilShteuck

13h

Just started using Claude. Is there a reason why Sonnet has its own weekly limit? 🤔

Phil Stőck

Phil Stőck

@PhilShteuck

18h

McDonald’s, we need to talk 😂 Ordered a simple medium coffee with 1 cream. Ended up with what looks like a latte that fell into a cream vat. How do you mess this up every single time? Send help (and less cream) 🥛☕️

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 14

Now, that's a new perspective to Tetris. 😂

Vivo

@vivoplt

Jun 14

The last game built by Claude Fable 5.

0:43

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 14

I miss mining asteroids in Eve Online. Seems like the perfect game for vibe coders. Anyone's gaming while cooking code? 🤔

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 14

x.com/i/article/206622845513…

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 14

IntentForge is exploring a different path for local coding agents: not “let the model drive a terminal,” but “let the model propose structured intent and let a deterministic harness turn that intent into verified software.” Early results show this can reach non-trivial Level 5 Python app profiles without giving the local model raw shell or file authority.

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 14

Yesterday’s IntentForge lesson: GPT was in full corporate mode, clutching its safety blanket like a responsible adult who’s read too many HR manuals. “Level 2, sir. Maybe 2.1 if we do a full architectural review, three stakeholder syncs, and a tiny emotional support document for the tokens.” I looked it dead in the prompt and said: “Nah. We’re jumping from the tree. Straight to Level 5, buddy.” And somehow… it freaking soared. Not because we removed the guardrails—no, the Patch VM was still locked tighter than a paranoid parent’s WiFi. Same tests, same contracts, same quality gates, same deterministic “show me the receipts” energy. We just stopped babying it. Turns out the foundation was already jacked. We were the ones acting like it still needed training wheels and a helmet. Moral of the story: Sometimes your LLM doesn’t need more planning docs. Sometimes it just needs a firm shove off the branch and a loud “YOU’VE GOT WINGS, YOU DRAMA CODEX!” It worked. 10/10 would yeet again. 🚀

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 14

IntentForge has reached a new milestone: it can now generate and validate more complex Python application shapes under deterministic Patch VM control, including Level 5 ACB profiles. The key achievement is that we pushed the system beyond cautious incremental growth. Instead of only moving from simple apps to slightly less simple apps, we tested whether the existing foundation could support a much richer target. It could. The current harness can now work with multi-artifact Python applications that include application modules, CLI/API-style interfaces, SQLite-backed persistence, JSON/CSV fixtures, contracts, tests, documentation, quality gates, generated-code inspection, and public-safe evidence. It can also expose live run events and file diffs so a future TUI or API consumer can show what is changing while the harness is working. The biggest lesson is architectural: the local model does not need broader authority to create more complex software. It needs sharper targets. IntentForge keeps the model constrained to structured coding intent while the harness owns validation, application, verification, scoring, and evidence. We also learned that the system was more capable than our development pace assumed. Jumping from lower complexity targets to Level 5 showed that the deterministic foundation was already strong enough to support richer app profiles, as long as the target included exact examples, clear file roles, contracts, tests, and quality gates.

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

Not coding today. I have a garden to take care of. 🚜😂

120

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

🤣

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

That's thoughtful of you OpenAI 😅 Just when your new rates kick in and my vibe time is cut in half since June 4th. 😂

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

😂

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

The LLM remains stochastic, but IF makes the path to code deterministic.

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

For the love of god! I just started something with Fable! The desktop app is not informing me that it's not Fable designing my project at this hour! Who is working on my code? 😂

Anthropic

@AnthropicAI

Jun 13

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

😅

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 13

Anyone else getting absolutely cooked by the new Codex limits? 😩 Last month I was smashing GPT-5.5 xhigh on 2 parallel projects, 8h/day, and still had credits left for the week. Now? GPT-5.5 medium on the exact same setup and I’m at 50% after just 2 days. What the hell happened around June 4? Feels like they quietly slashed the effective quota for heavy users. Token-based was already tighter, but this reset hit different. If you’re a dev on a budget trying to actually ship with Codex… this one stings. Who else is feeling it?

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 12

Traditional way: You design the architecture, draw UMLs, flowcharts, DB schema by hand. Then break it into stories/features, build a backlog, estimate effort (planning poker etc.) and give the PM a timeline. You knew the road because you built the map.

more replies

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 12

So the classic question has become almost impossible: “Here are the features → here are the tasks → ??? → profit (and a delivery date)” We lost the ability to give reliable timelines.

Phil Stőck

Phil Stőck

@PhilShteuck

Jun 12

Real talk: how are you handling task estimation and timelines in the AI era? Especially when the AI drags you into unfamiliar territory every other sprint. Drop your current workflow (or coping mechanism) below 👇 Curious how PMs and tech leads are dealing with this.