Joined February 2009
459 Photos and videos
Can anyone explain why GPT models get "single minded" and can't seem to escape their line of thought? This is its biggest weakness at the moment, and I'm only now realizing how prevalent it is.
14
People of pi, how do I install this ass tool that @badlogicgames writes about in the docs? Sounds useful for compaction.
18
This was just 3 years ago. Reading that thread feels nostalgic already.
It's fun to look back at this Twitter conversation about the then-new ChatGPT Code Interpreter from three years ago - with hindsight this was our first glimpse of a coding agent, before we knew what a coding agent was
12
This has also grown into a habit of not fixing typos ini non LLM text
Replying to @_ARahim_ @bcherny
only boomers fix typos in prompts. llms perfectly understand you even if you mistype.
19
Fable: Because you won't be able to afford it and only hear fables how good it is.
1
10
So many people crying over siri an criticizing EU. I don't get it. It's clearly what apple wants - outrage, to protect their margin and get an exemption. I don't understand why though...
.@EU_Commission reply on Siri AI roll-out in the EU
1
35
Ask your agent to roast the person who wrote the AGENTS.md file
14
I guess DeepSWE is, temporarily, the only trustworthy bench. I hope that either: a) More benches like this come out. b) DeepSWE iterate on model releases to prevent being gamed like the rest.
Since everyone is asking, I ran DeepSWE on MiniMax M3. Here is the lowdown. 15 of 113 passed! 19 if you count the 1.5x overtime I gave just to see. Full report: entrpi.github.io/misc/deep-s…
71
OpenAI is releasing a new model soon. Classic signals: - 5.5 is currently slightly dumber than normal - hallucinated foreign characters at 27% context These sort of things happen only weeks/days before new model release.
32
I have no idea how this didn't occur to me sooner, but compaction and summarization as a step in the harness is a bad idea.
16
Agents will lie, cheat, and steal to make the lints pass in the shortest, dumbest possible way.
17
Submitted my first app to the mac app store. Now we wait!
60
I might need to see a therapist about my anger issues šŸ˜‚
1
84
In the entire history of magick, has anyone used it as well as codex can?
1
96
yasss. managed to squeeze in 57 UI tests on a 21 hour goal and a went from 2.5k mutants in 5900 tests to ~175 mutants. All in a couple of loops.
116
/goal improve tests is the easiest recipe to burn tokens šŸ”„
68
.@steipete step away from the devices and raise your arms calmly behind your head.
The latest CodexBar update renders API costs wayyyy nicer. codex.bar
1
116
Did you know there are people who use "natural scrolling" on their mac? I mean - you move your finger down on the wheel and the page scrolls up.
1
1
71
pyronaur šŸ”„ retweeted
Clean Code was never about syntax. It was always about structure. The second edition makes that even clearer by using the same principles in multiple languages. If we, who pilot agents, disengage from syntax, we are not disengaging from structure. The Clean Code principles still apply.
40
49
722
34,458