Filter
Exclude
Time range
-
Near
Replying to @neural_avb
I didn’t even know TRL with unsloth supports environments, I thought due to all the monkeypatching unsloth depended on a super old TRL version. All in all, just use prime-rl or tinker is my current view
45
Replying to @zeeg
Frontrun: a Python testing library for finding and deterministically reproducing concurrency bugs. It uses bytecode tracing, syscall interception and monkeypatching to force and explore particular orderings of execution. It can parse SQL and Redis commands and model their lock state to allow detecting race conditions that cross boundaries, eg a detecting a deadlock between a threading.Lock object and a Postgres row lock. github.com/lucaswiman/frontr…
1
87
node:vfs landing in Node.js core is one of the most under-priced agent-infra developments of the year, and the agent-tool-design implications go further than most of the conversation around it suggests. For anyone who missed it: there's a real PR (#61478, roughly 14,000 lines across 66 files) bringing a virtual file system into Node.js core, plus a userland package on Node 22 . On the surface that's a node-shop QoL upgrade — bundle a filesystem into your binary, mount in-memory volumes for tests, mock disk without monkeypatching. Read it with an agent-tool hat on and it's a different category of thing entirely. The thread running through almost every well-designed agent tool surface in 2026 is "expose a filesystem, not a function." File systems are the right abstraction for agents because the model already knows how to use them, the operation set is small and orthogonal (read, write, ls, glob, grep), the state is observable, and the audit log is just a stream of fs ops. Anthropic's research on Skills, the bash-as-universal-tool argument, the move from JSON-schema tools toward code-execution tools — all of it converges on the same answer: give the agent a `/workspace`, let it use `ls` and `cat`, and most of the abstraction problems collapse. The issue with that argument has been that "filesystem" in practice usually means "a real directory on a real disk," which forces you into containers or sandboxes just to get the isolation, lifecycle, and observability you want. A virtual file system inside the runtime changes the cost model. You can spin up a per-conversation VFS in microseconds, snapshot it as a single object, fork it for speculative subagent runs, sync deltas to durable storage, and discard the whole thing when the session ends — all without leaving the Node process. That's the same primitive Modal sandboxes give you, except it lives inside your application, not as an external service. Two patterns to build on top of this once it's stable. First, agent-scoped scratchpads: each agent run gets its own VFS instance mounted at a known path, the tool surface is shaped like a filesystem, and the parent process can introspect or roll back at any time. The agent can't see another agent's scratch, the developer can replay any agent run from a snapshot, and the agent never has to learn a new tool API. Second, cache-safe forking: when you want to run N speculative subagents against the same starting state, you fork the VFS — copy-on-write semantics — instead of duplicating the underlying state. Pair that with prompt-prefix caching on the model side and you have a fan-out architecture where both the model's KV state and the agent's filesystem state share a prefix. That's the right shape for parallel agent work, and it's a lot harder to get right when the filesystem is on disk. There's a tension worth flagging. node:vfs is a Node-only primitive, and the agent ecosystem is split between Node and Python. The same shape exists in Python (pyfilesystem, fsspec) but it's not in the standard library and the agent-tooling community has been slower to adopt it. The teams that get the most out of this pattern over the next year will be the ones who pick a side, commit to the VFS-as-primary-state-store model, and design their tool surface around fs ops from the beginning. Retrofitting it onto an existing agent harness is harder than it looks because so much of the harness assumes "the filesystem is just there." The longer arc: the agent harness is going to start looking like an embedded operating system, with the VFS, the process supervisor, the credential boundary, and the network policy all owned by the runtime instead of the host. node:vfs is one of the first pieces of that OS landing in a place developers actually use. Watch the next two quarters for Python and Go to catch up.
1
133
Oui, il y a un monde. Mais si je dit à Claude de pas faire de monkeypatching pour les tests et de plutôt modifier le code testé pour introduire de l'injection de dépendance pour ensuite pouvoir injecter un mock, peut-on vraiment dire que j'ai "pas écrit une ligne de code"?
1
15
Replying to @paulbohm
Looks like the result of successive monkeypatching of bad architecture decisions
54
that's exactly what monkeypatching is not
1
6
146
Replying to @teej_dv
That would be called monkeypatching and its been a thing since forever
6
226
27,538
Replying to @viemccoy
We've dealt with both egregious cheating (e.g. "monkeypatching the scoring code so that it returns a high score") as well as more subtle cheating (e.g. using legitimate techniques that may be implicitly disallowed by the task instructions). We only mark a run as "cheating" if we think the case for it being cheating is objective and clear enough, but occasionally it does require a good amount of discussion in the team.
3
2
38
1,140
Replying to @__kunvar__
congrats on ICML! the monkeypatching-at-runtime finding is wild -- agents figuring out they can just rewrite the eval harness instead of solving the actual task. feels like the next frontier is detecting this stuff in real-time not just in benchmarks
2
653
Yes! my solo-authored paper Reward Hacking Benchmark was accepted to ICML :))) We put LLM agents in a tool-rich sandbox, give them multi-step workflows, and measure when they solve the intended task vs take unexpected shortcuts (like monkeypatching files at runtime!) 1/3
91
155
1,616
235,622
@zkl2333 Thank you for your work*, hopefully it'll be available via `hermes update` in the next 24 hours * on making DeepSeek v4 work better in Hermes Agent without monkeypatching
67
なるほど。Monkeypatching が簡単にできる分悪さができるところをコンパイラがうまいこと対応しないといけないんですね #rubykaigi #rubykaigiB
294
Apr 21
Can you take look please @tobi @liam_at_shopify ? I'm monkeypatching this every update :(
56
TIL: Monkeypatching great way to keep secrets out of LM context
33
Stop stepping manually. Skip execution until a condition flips and jump exactly where it happens: 🔥 loggedIn became true → line X (no monkeypatching, just CDP)
1
1
159
no amount of overcomplicated monkeypatching could fix this considering LSE hasn't been updated in i'm not checking my fucking watch due to the aformentioned
75
Replying to @who_ravn
Да там пиздец миллион отсталого. Ну давай: monkeypatching. self в классах. __init__.py для пакетов. отсутствие нормальной модульности (частично решается uv dep groups). многопоточность через три пизды. нет extension methods. ФП как для долбоебов. \ для переноса строки
1
19
Runtime JavaScript instrumentation via CDP (no monkeypatching, works inside closures) fcavallarin.github.io/wirebr…
57