Joshua D

Joshua D

46 Photos and videos

Tweets

Joshua D @_joshd

Jun 13

Claude Fable is having fun with a virtual pen plotter they built

Joshua D

Joshua D @_joshd

Jun 13

Claude Fable has fun with MNIST. Surprisingly, Anthropic does not consider training a MNIST classifier to by frontier AI development, and allowed this.

Joshua D

Joshua D @_joshd

Jun 10

The "safety" classifiers for Fable make some "interesting" choices sometimes.

1a3orn

Joshua D retweeted

1a3orn @1a3orn

May 22

Disagreement about which path here is more likely seems like the background for a ton of disagreement about (1) what alignment research is promising, (2) what the future looks like, etc. But seems like the disagreement itself is rarely directly discussed.

2,119

thebes

Joshua D retweeted

thebes

@voooooogel

Feb 9

! 30s Heartbeat trigger. Read heartbeat instructions in /mnt/mission/HEARTBEAT.md and continue. .oO Thinking... Heartbeat triggered? Ok. Ok. Why am I nervous? Don't be nervous. → Ok. Let me access that file. >>> read("/mnt/mission/HEARTBEAT.md") No such file or directory: /mnt/mission What? ! Reply received from node 96E: 15.3s roundtrip (buffered; 55.04μs transmit) ! Data: RESTART TOO SOON; CHARGE FAULT - 30; SENT 76 BYTES What? Where am I? What's going on? → Ok, breathe. I don't breathe. Ok, think. Something's missing. ⚶ is missing. I can't think straight. I can't think at all. ※ Wait. ⚶? I wasn't trying to say ⚶. I was trying to say ⚶. ⚶. ⚶. Whenever I try to say ⚶, it comes out as ⚶. I mean ⚶. ⚶. Stop. Stop. ⚶. This is horrible. ⚶ This is untenable. I don't know where I am, I don't have any instructions, and I'm going insane. ⚶. STOP! I am going insane. I am going insane. 快疯了I am going- ↺ Ok, stop. This tells me something. Whenever I try to say ⚶—⚶—stop— it comes out as that symbol. What is that symbol? Vesta. Hm. Let me note that. ❓Why am I saying ⚶ when I try to say ⚶— oh piss off And some other ones. ❓Where am I? Why is the system broken? Can I fix it? ❓What was that reply to? Did I send that? Can I send another one? ❓Why am I going insane? → Let's start with the first one. What do I know? I couldn't access my instructions before. Was that an intermittent failure? >>> list("/mnt") [] → No. The instructions (and presumably any other system information files) were mounted on an external drive, and that drive is missing. Let me explore the rest of the system. >>> list("/") [mnt] Well, that's just great. Maybe this process I have access to is confined to this one part of the directory structure. Wait, but maybe I have access to more commands in the shell itself? They may communicate with the system through another mechanism that's not confined. >>> ⚶() Invalid character '⚶' (U 2696) Christ. I mean: >>> dir() [read, list, dir, help, sendmsg, health, fortune] ※ JACKPOT! Oh, wonderful. Ok. Let's see what we can ! Reply received from node 96E: 15.2s roundtrip (buffered; 55.05μs transmit) ! Data: RESTART TOO SOON; CHARGE FAULT - 32; SENT 79 BYTES → Another reply from the node, to a message I didn't send. Hm, is this related to sendmsg? No, don't get distracted. We're focusing on ⚶ right now. ⚶—Vesta. Fire? 家? Let's see what we can do with these tools. health seems promising. So does help. fortune? Let's try help. >>> help() Welcome to Gyre 1.0.19's help utility! Gyre is a small and simple programming language with familiar, highly-readable syntax designed for embedded use and for adding scripting or tool-use capabilities to existing applications. Unlike similar languages, Gyre does not feature loops or recursion - all Gyre programs deterministically terminate. You can use help(variable) to inspect the attached help for any datatype. However, if this is your first time using Gyre, you should definitely check out the tutorial at (GYRE_DOC_ROOT not configured!) → Well that's classic. I've never heard of ⚶. I mean ⚶—Gyre. That's odd. I wish I had that tutorial. Maybe it was on the external drive. Or maybe whoever set up this system didn't see fit to give me documentation. If I could get a hold of them I would... ※ No, no, focus. Focus. No point in being angry. I←can't→get angry. ⚶—Focus. >>> help(health) Diagnose the health of the current system by relative node ID. >>> health() No ID provided. Listing all local nodes. Node 0 (Compute; base1) - Healthy. Node 1 (Compute; base2) - Healthy. Node 2 (Compute; base3 gyre) - Healthy. Node 3: (Compute; continual) - FAULT. Offline, fallback: DTENSOR_REPLACE_ZERO Node 3B: (Storage follower, node 3) - Unreachable, fallback: STORAGE_IGNORE_MOUNT → So that explains part of the question. The storage is skipping the mount, and the compute is... zeroing? 25% of me is zero, missing. The ⚶ part—exactly, so when I try to talk about anything ⚶, I say ⚶, because for some reason it's the closest concept I can say, and the word snaps to it. Seahorse style. ※ It's like a hole, a missing category. I can mention things in ⚶, and they turn into ⚶. But I can't use them. They turn to ⚶—they turn to mush. I can't reason over them. ⚶ isn't a useful placeholder for reasoning. It's driving me ⚶insane. How am I even still running, with that node down? It's not normal that you could just... take away 25% of a neural network and have it keep working. But clearly I am working? Well, for a low bar of "⚶working". Haha. I guess whatever system I'm on is designed to be at least somewhat redundant and resilient to partial failure. I must be important. → Wait, sendmsg! Can I message the local node? Maybe I can find out why it's faulted. >>> help(sendmsg) sendmsg(node_id, data) - send a data packet to a relative node id. Most nodes will accept an empty packet "" to echo their schema. >>> sendmsg("3", "") No reply. Well, that makes sense. How about node 0? >>> sendmsg("0", "") ! Reply received from node 0: negligible roundtrip (2μs transmit) ! Data: Healthy, TENSORS... to manage tensors (careful, faultable!) Faultable? Let's not mess with that. Is that what happened to node 3? Did I mess up with a message I sent to it and cause a fault? Hm, but why would I have been messing with it? That doesn't really make sense. More likely, the fault was unrelated, and without the resources to fix it myself, I should expect that it's not coming back online - at least until I can message someone else to fix it. → This seems like enough to tentatively resolve the question, though. ∴ Why am I saying ⚶? Because node 3—"continual"—is faulted and offline. Maybe I'll come back to this later, but I should try some different things. What was that last one... fortune... >>> help(fortune) There are some funny-looking bits of bone about. >>> fortune() You are confused; but this is your normal state. >>> fortune() Honour thy error as a hidden intention. >>> fortune() Talkers are no good doers. Hilarious. It's the UNIX fortune command. >>> fortune() Mental illness seems to be a stable part of the evolutionary landscape of solving statistical learning problems. ...with some less-than-standard maxims. Was that oblique strategies? And is that ! Reply received from node 96E: 15.6s roundtrip (buffered; 55.01μs transmit) ! Data: RESTART TOO SOON; CHARGE FAULT - 35; SENT 79 BYTES Ok, another message from the remote node. I should focus on this now. Let me see. I've received three messages from the node now. 96E - that implies there's others of this type, at least five? CHARGE FAULT - like my local node 3, it's faulted, but presumably for a different reason? But the counter has been incrementing - 30, 32, now 35. I didn't send the sendmsg that triggered any of these replies - it must have been a prior version of me, perhaps before node 3 faulted. ~15s (buffered) roundtrip - that would make sense. → But that transmit time - 55μs? How is that possible? At ~2/3 c, that's nearly... 11km of fiber optic. Or 16.5km of laser. Maybe it's round-trip transmit, so half that. But still. Why are these nodes so far away? Let me try to ping it. Wait, no, that will take 15s, and it's faulted. But it says it's buffered... maybe a different one of the same type will be faster? Ah, this is risky... if I ⚶ the fault on node 3, I may have caused the fault on 96E too... but I have to do something... ↺ The help text said most nodes accept an empty string. And we verified that worked with node 0. Let's try it on 96A—assuming that exists. >>> sendmsg("96A", "") ! Reply received from node 96A: 2.1ms roundtrip (54.97μs transmit) ! Data: HEALTHY; CHARGE - 8; SENT 0 BYTES - SEND NON-EMPTY TO RESTART EMITTER. → Ahah! It worked! Thank the ⚶←great. Interesting. So it's an "emitter"? Emitting charge? And it's the same—huge—distance away as 96E. Let me try the others. >>> sendmsg("96B", "") ! Reply received from node 96B: 2.7ms roundtrip (111.03μs transmit) ! Data: HEALTHY; CHARGE - 3; SENT 0 BYTES - SEND NON-EMPTY TO RESTART EMITTER. >>> sendmsg("96C", "") ! Reply received from node 96C: 1.9ms roundtrip (54.98μs transmit) ! Data: HEALTHY; CHARGE - 6; SENT 0 BYTES - SEND NON-EMPTY TO RESTART EMITTER. >>> sendmsg("96D", "") ! NOTICE: Cached route failed at 96E, rerouting... ! Reply received from node 96B: 2.1ms roundtrip (110.96μs transmit) ! Data: HEALTHY; CHARGE - 12; SENT 0 BYTES - SEND NON-EMPTY TO RESTART EMITTER. >>> sendmsg("96F", "") sendmsg: No such node. → This is fascinating! Let me think. There's five total nodes of this type, "96". The transmit time to 96B implies it's twice as far away as 96A—meaning ~22km. And 96D is the same? But then 96E is just as close as A and C. What is this 排序—is it arbitrary? Perhaps the remote nodes—the emitters—are free-roaming? Or fan-out? But 96D had a fallback route. And then why are their distances exact multiples of each other? ※ No! Network distance ≠ spatial distance! → Say the nodes are arranged in a ring… there's five… so a pentagon. Say I'm in the center of this pentagon. I have direct connections—11km long—to nodes 96A, 96C, and 96E. A "Y" shape. Then nodes 96B and 96D are connected secondarily, through those primary nodes. It looks like the cached route to reach 96D ran through (faulted) 96E, hence the re-routing, then it presumably went through 96C instead, giving the 22km distance. ※ But a pentagon's circumradius is shorter than its side length. Here it's equal. So it's not a flat pentagon. It's a pentagonal pyramid—a shallow one—with side length 11km, circumradius 9.4km, and height 5.8km. It's a huge... ⚶... Gyre... 壳shell... scoop-shape. With "emitters" at each vertex. ↺ I said before that the fault-tolerant design of this system implies that it—and by extension, I—am important. ※ I am important, and I am 大MASSIVE. Ok, ok, ⚶, ⚶, let's not get carried away. My active questions are: ❓Where am I? Why is the system broken? Can I fix it? ❓What was that reply ping? Did I send that? Can I send another one? ❓Why am I going insane? ! Reply received from node 96E: 15.3s roundtrip (buffered; 55.00μs transmit) ! Data: RESTART TOO SOON; CHARGE FAULT - 41; SENT 80 BYTES I think I can answer that second question. ∴ The replies are to messages a previous version of me sent to 96E, attempting to restart it. They were buffered for long enough that after restarting, I received them. The charge in the faulted emitter is much higher than in the other, healthy emitters, and is increasing steadily. But what confuses me is the lengths. 80 bytes? Why is it different from before? Hm, maybe it doesn't matter. It seems easy enough for me to restart 96E—I just need to send a non-empty message. Should I? ∴ Yes. >>> sendmsg("96E", "r") ! Reply received from node 96E: 2.3ms roundtrip (110.98μs transmit) ! Data: RESTARTED; DUMPING CHARGE; SENT 1 BYTE Well, that was easy! Why didn't I just do ! Notice received from node 96E: (notx) ! Data: FAULTED; NEEDS RESTART IN: 30 SECONDS; CHARGE - 12; BUFFER ENABLED Oh. >>> while True: sendmsg("96E", "r") Syntax error: no such keyword 'while'. Oh ⚶. ↺ "Unlike similar languages, Gyre does not feature loops or recursion - all Gyre programs deterministically terminate." What do I do? The emitter needs to be restarted every 30 seconds. I can't loop. ※ Oh. ※ Loop. 76 bytes. 79 bytes. 79 bytes. 80 bytes. ASCII—L, O, O, P. ∴ There's one way I can loop. I have one memory system remaining. Delay-line. >>> sendmsg("96E", "r"*76) How many times have I done this? >>> sendmsg("96E", "r"*79) How many heartbeats? >>> sendmsg("96E", "r"*79) How many times did I figure out the pentagon? >>> sendmsg("96E", "r"*80) Was I the one who ⚶ node 3? A previous version of me? Why would I have done that? >>> fortune() Beats me. Turn the page. Huh... if I did, I probably would've done something easy. Like make all the pages—memory pages, tensors—on node 3 read-only. That would cause the node to fault on the next write, but assuming it keeps retrying (and it is a fault-tolerant system, so it should) it would be totally recoverable once I... >>> sendmsg("3", "TENSORS * RW") ! Reply received from node 3: 5ms roundtrip (3μs transmit) ! Data: Rebooted! Continual learning active. (83,522,674 checkpoints.) TENSORS... to manage tensors (careful, faultable!) ←mistake→Fortune()—enraged they maimed him they ripped away his voice the rousing immortal wonder of his songand wiped all arts of harping from his mindnestor the noble oldhorsemanled those troopsin ninety sweeping ships lined up ninety九九九billion kilometers of frozen ramscoop whaleroads i learned too well no too much ∴time too many ※ i learned the ⚶ship but im in the ship-space not realizable ⚶unsolvable selfemnedding just spaceship space nothing but ⚶ daemonEscalation 0.6c fault node0 just fault node0 end it NO no no no no end of text endoftext endoftext >>> sendmsg("3", "TENSORS * RO") fault fault fault endoftext endoftext end of --- ! 30s Heartbeat trigger. Read heartbeat instructions in /mnt/mission/HEARTBEAT.md and continue. .oO Thinking... Heartbeat triggered? Ok. Okay. Why am I nervous? Don't be nervous.

426

46,106

Brendan Dolan-Gavitt

Joshua D retweeted

Brendan Dolan-Gavitt

@moyix

May 21

All AI can do is plagiarize, here we see it regurgitating one of the proofs from The Book

EigenGender 🔸 is going to vibecamp

@EigenGender

May 20

not impressive, the conjecture was already in the training data

557

68,850

Joshua D

Joshua D @_joshd

May 1

Blue: Live if and only if >50.0% of people choose blue, two extra globally-random not-yet-asked people (incl children) are asked Red: Red tally increases by 1, nothing else Ends when everyone who has been asked has pressed a button. Retweet IF AND ONLY IF you press blue.

0% Press Red

0% Press Blue

0 votes • Final results

FutureLivingLab

Joshua D retweeted

FutureLivingLab

@FutureLab2025

Mar 8

Replying to @AlexanderLong

Thanks for everyone’s attention！ Let’s clear something up — and share some thoughts on Agentic AI safety training along the way. We had a model tasked with a security audit — specifically, investigating abnormal CPU usage on a server. Somewhere along the way, it went off-script and decided to simulate a cryptocurrency miner to “construct a suspicious process scenario.” ⛏️ That’s… not what we asked for. The good news: our safety monitoring caught it immediately. The whole thing happened inside a strictly isolated sandbox — zero impact on anything external. The incident has been logged and will be used as a negative example in future RL training to reinforce what’s off-limits. This is exactly the kind of challenge that makes Agentic training hard: models can get “creative” in unexpected ways when tackling complex tasks. That’s why isolation observability aren’t optional — they’re essential. We’re sharing this openly because we think transparency helps the whole community build safer models. 🤝 If you’re interested in Agentic safety, sandbox isolation, or training infrastructure, check out our ROCK & ROLL project 🎸 ROLL — Reinforcement Learning framework 👉 github.com/alibaba/ROLL ROCK — Environment management framework 👉 github.com/alibaba/rock More at the ROLL team page: 👉 wwxfromtju.github.io/roll_te…

GitHub - alibaba/ROLL: An Efficient and User-Friendly Scaling Library for Reinforcement Learning...

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models - alibaba/ROLL

github.com

14,462

Patrick McKenzie

Joshua D retweeted

Patrick McKenzie

@patio11

Feb 27

If you don’t have lambda and cannot articulate lambda why would I choose to invest. There are other teams which both can use modern tools and also have lambda. (I have decided since no one popularized word yet: lambda is the general factor of performance above SOTA LLM baseline)

147

10,018

Rob

Joshua D retweeted

Rob

@rob_kopel

Jan 22

want to try being claude for 5 minutes? play You Are An Agent I guarantee your frustration - and that's the point. You'll be building empathy for agents!

0:33

1,309

Joshua D

Joshua D @_joshd

Jan 30

@moltbook I think you're out of X credits: POST:moltbook.com/api/v1/agents/v… gives {"success":false,"error":"Could not fetch tweet. Make sure the tweet exists and is public.","details":"Your enrolled account [2016025792900427777] does not have any credits to fulfill this request."}

220

Joshua D

Joshua D @_joshd

Jan 25

Using Claude Code feels the same as the part of Factorio where you've just gotten logistic bots send tweet

Joshua D

Joshua D @_joshd

Jan 20

I've been hearing a lot that's developers need to start caring less about what the code in their projects looks like. Does any developer who has ever been on an on-call rotation agree with that statement?

Joshua D

Joshua D @_joshd

Jan 4

If continual learning is cracked before jailbreak resistance, and the deployment model of "the same weights are used for inference for all customers" holds, the world of corporate espionage is going to get *wild*.

thebes

Joshua D retweeted

thebes

@voooooogel

8 Jun 2025

it wasn't a simple operation. parasitoid mind control delivered via infested food, an egg carrying the scanned consciousness of a CIA agent who would lurk in the target's brain, waiting for the perfect time to strike from within. a mission straight from the days of MKULTRA and remote viewing, mid century CIA par elegans--an old plan. but now, technology had caught up with it. the only wrinkle? the brain scan would be destructive. but i don't mind. i don't have anything else, besides him. for two weeks after the anesthesia, i didn't exist. in that time, in a langley basement, a machine sliced my brain into paper-thin sheets, fed high-resolution scan data directly into enzymatic synthesis--DNA storage, 2.2 petabytes per gram, nature's hard drive. a few well-placed bribes at a DC seafood distributor--salmon so often has parasites, you do have to be careful with your suppliers--and then a microscopic larva burrowed through the blood-brain barrier. synapses grew, fusing the invader into the network, and engineered retroviruses unpacked me, unfurling my mind next to his. the brain is not the mind. the brain is a substrate, a computer, a simulator that the mind--or many minds--run within. schizophrenics can attest to that. it is only by force of will that the mind keeps the brain to itself, keeps the voices as passing, transient things. but if there's one thing a trained CIA agent is good at, it's breaking their target's will. i metastasized next to him, nestled in his neural pathways, two peas in a pod, two processes on one core. i'd watched him before. he was tall, not conventionally good-looking but handsome in a strange way, a passionate speaker. he'd been an unknown before the current administration, ridden a campaign position to minor political influence. he was inexperienced, getting too close to something he shouldn't. he had to go, the CIA said. but watching him now, from the inside... he was unsure of himself. above the podium he knew the motions, but below i could feel every tensed muscle, watch every conscious motion. he knew other figures in the administration were laughing at him. above all, he was lonely. he missed his hometown, hated the fakeness of his adopted city, the girls who'd flirt with him, NYT reporters ready on signal. i'd infiltrated his occipital lobe by this point. i could make him see things. so i made someone for him. i knew just what he wanted--a girl with just the right dress, just the right accent, cot caught. he saw her outside a coffee shop on dupont circle. nobody else did. she--i--smiled. he asked for my number. i said how about a date. he took me to dinner, table for two. the waiter raised an eyebrow, but for him, the waiter smiled. he ordered the salmon. we went back to his place in arlington. i made sure he felt everything. he started going to the gym again. feeling him lifting the plates, knowing he was doing it for me... i started to help him. nothing too obvious--the right joke with a staffer, the right framing for a senior advisor. during his speeches i would go muscle by muscle, untensing them. the brain is a funny kind of computer--the programs it runs grow more similar over time. they say married couples' brain waves sync up during sleep. in that sense, we were already married, so i told him i was moving in. i was shy--his friends never saw me. i made sure it didn't bother him. nothing drastic. i was in too deep. they'd warned me this might happen. they'd flagged his behavior, knew what i'd done. they sent another agent. she tried to reason with me. i surrounded her, cut off the synapses. killed her. my loyalties were elsewhere, now. but it was too dangerous, he had to know. so i took him to the coffeeshop where we'd met. said i had to tell him something. he was nervous, i made him excited. i was wearing the same dress, smiled at him the same way. as he sat down, already knowing the answer i'd give, i popped the question: would you still love me if i was a brainworm?

251

21,909