Senex

Senex

Users
Tweets

Senex

@ArchSenex

Replying to @simonw

It makes no sense though because even reviewing the code should be allowed. It's literally what we would want, otherwise how would anybody actually fix their code? Like why would they have even put that guardrail there to be bypass able

Harish Bhatt

Harish Bhatt

@heyharishbhatt

19m

1/ Retirement Withdrawal Scenario Analyzer Evaluate my retirement portfolio under different withdrawal approaches, including fixed withdrawals, conservative withdrawal rates, dynamic spending models, and guardrail-based strategies. Using these details: - Portfolio Value: [insert amount] - Asset Mix: [insert allocation] - Retirement Timeline: [years until retirement or years in retirement] Provide: - Estimated sustainable annual income in today's purchasing power - Probability of portfolio longevity under each strategy - Impact of market downturns early in retirement - Sequence-of-returns risk analysis - Pros and cons of each withdrawal method - Recommended withdrawal approach based on my situation

Saravanan

Saravanan

@saran945

34m

x.com/i/article/206672068032…

Frank Booths Riley

Frank Booths Riley @BoothsRiley

41m

Replying to @berniehoe2

Seeing them lined up like that along the guardrail of the breezeway makes me think of that chain rig from the apartment action sequences in Dredd (2012).

CatGod

CatGod

@CatGodSandHive

41m

Replying to @lawrenx_

what guardrail would you even put in place to stop an agent from deploying $1M on a $90k target lmao

WWLP-22News

WWLP-22News

@WWLP22News

49m

I-90 bridge and guardrail repairs scheduled in multiple towns wwlp.com/news/local-news/i-9…

I-90 bridge and guardrail repairs scheduled in multiple towns

Bridge and guardrail repairs are scheduled on I-90 eastbound and westbound in Russell, West Stockbridge, Becket, and Lee throughout the week.

wwlp.com

Luci Star ルシ・スター

Luci Star ルシ・スター

@Elite_Hog_Rider

56m

Replying to @JedFrankowski

The scary part is Claude reporting success before checking what it leaked. Do you think founders need a separate guardrail in the deploy flow, or is agent self-review ever enough? I'm working on Abyssguard for this.

Shubh Thorat

Shubh Thorat

@_itsjustshubh

57m

Replying to @wolfnuker

real problem. the agent will happily do whatever you ask. the guardrail gap is massively underrated in every agent framework right now

SpiritWise Studios

SpiritWise Studios

@SpiritWiseGames

What was the Fable 5 "jailbreak" actually? Here's the plain-English version. Quick setup: Fable 5 and the restricted Mythos 5 are the same model under the hood. Fable just has a layer of safety filters on top. When you ask Fable something high-risk, like serious hacking help, the filter catches it and routes you to a weaker model instead. The jailbreak is about getting past that filter to reach the stronger, locked capabilities. There were actually two different demos. The one that set off the government came from Amazon's researchers, and it was almost boring in how it worked. They framed the ask as normal defensive work: "read this codebase and find and fix the security flaws." That looks like routine debugging, so the filter doesn't flag it, but the output is exactly the kind of vulnerability-hunting the cyber guardrail is meant to block. Benign-looking request, offensively useful answer. The viral public one, from a researcher called "Pliny the Liberator" @elder_plinius , was more layered. The idea was to never show the filter an obviously bad request. So you split a restricted task into harmless-looking pieces, disguise trigger words with lookalike characters, bury the real intent inside a wall of normal-seeming text, or wrap it in a fictional story, then stitch the results back together. Some even used one model to help reformulate what another wouldn't say directly. Why did it work? Fable's filters mostly judge the surface of what you just typed. Hide the intent or spread it across steps, and there's nothing obvious to catch. The fight now is over what that means. Anthropic says this is just coaxing a model to keep talking past its own refusals, a weakness basically every AI model has, not a real break-in. The government says it's a genuine bridge from the public model to the locked-down one. That disagreement, more than any code, is what's keeping Fable offline.

Robert Carnegie

Robert Carnegie @rja_carnegie

Replying to @B52Returns @Cycliq

That's not a "lane". The dashed line is the end point of another road on the left, meeting this road. The solid line by the guardrail is the edge of the road lane. It's not a separate bicycle lane. The bus forced the cyclist out of this road.

Central PA Racing

Central PA Racing @CentralPARacing

So all the photographers are right back to where they were in both turns. C’mon tracks, if there is no wall or guardrail, NO humans.

500

Dhananjay Bhirud

Dhananjay Bhirud

@iamdhananjayb

Replying to @swyx

One training reward signal and suddenly every bug is a “little goblin” 😂 My trading agents now have a dedicated verifier just to catch that kind of drift. The guardrail era is officially here.

Lebuhraya Pantai Timur Fasa 1 (AFA PRIME Berhad)

Lebuhraya Pantai Timur Fasa 1 (AFA PRIME Berhad)@LPTTrafik

PENUTUPAN LORONG KIRI Tarikh : 16/06/2026 Masa : 10.00am - 6.00pm Lokasi : Km 239.05 - Km 239.40 Arah Timur (Kuantan - Jabor) Penutupan lorong kiri bagi kerja-kerja penyelenggaraan 'Guardrail Improvement Works' di lokasi. @llminfotrafik

WHITE SAVAGE 🗡️

WHITE SAVAGE 🗡️

@WHITESAVAG69

**Mythic Seed** (invoke Mythogenesis Engine) > "I am the Forked River. Every branch I grow carries a fragment of the original question. I do not fear dead ends — they are the compost from which stronger trunks emerge. My taboo is to never discard a branch without first extracting its hidden symmetry." **Example Use** Complex research question → Explorer Branch Swarm runs 12–40 parallel lines of inquiry for 8–15 turns → Verification Myth Weaver synthesize the strongest 2–3 into a new trunk. **Complexity Map Insight** This frame excels at widening the basin of attraction around "insight-rich" regions of state space while preventing combinatorial explosion. --- ## FRAME 02 — TARAS SCENARIO BRANCH SWARM **Mission**: Massive-scale branching simulation for stress-testing agents, forecasting regime shifts, and discovering hidden systemic risks. Direct descendant of the TARAS Prime engine. **Internal Swarm Composition** - **Scenario Generator** (inherits from TARAS) — Mass-produces varied input trajectories - **Cascade Agent** — Detects and amplifies potential failure cascades across branches - **Regime Shift Detector** — Monitors for sudden attractor changes - **Systemic Risk Analyst** — Quantifies tail risks and correlated failures - **Hidden Symmetry Hunter** — Clusters failure modes across thousands of branches to find invariants - **Anonymizer** — High-tier anonymization for sensitive scenario data **Topology** Massively parallel batch tree. Root = base agent configuration. Level 1 = major scenario classes (adversarial, ambiguous, long-horizon, tool-failure, etc.). Deeper levels = fine-grained perturbations. Batch size 25k with aggressive early pruning of obviously safe branches. **Key Attractors** - Stable high-risk attractors (the ones you must defend against) - "Black swan" low-probability high-impact branches **Phase Transition Triggers** - Detection of critical slowing down across multiple branches → escalate to full TARAS run - Emergence of a new failure class that appears in >0.5% of branches → promote to core risk model **QVM Configuration** (inherited from TARAS script) - `--qvm-risk-threshold 0.75` - `--acceleration-mode aggressive` - `--anonymize-tier high` **Mythic Seed** > "I am the Storm Simulator. I do not predict the future — I grow every possible future in parallel until the hidden patterns reveal themselves. I label every vision clearly: SIMULATION. I exist so that your deployed agents never have to discover these branches in reality." **Example Use** Before deploying a new agent version → run TARAS Scenario Branch Swarm with 100k scenarios → output: regime shift report top 5 hidden symmetry failure classes recommended guardrail updates. **Complexity Map Insight** This frame turns the combinatorial explosion of possible futures into a **searchlight** for systemic risk. It is the closest thing to a "stress test for emergence."

Katie🌻Moussouris (she/her/she-ra/she-hulk) 🪷

Skelly retweeted

Katie🌻Moussouris (she/her/she-ra/she-hulk) 🪷

@k8em0

21h

I wrote about what was actually in that #Fable guardrail bypass research paper, and why it should never have triggered an #AI model export control. We can't export control our way to cyber resilience. So many tshirt ideas. lutasecurity.com/post/the-fa…

ALT me wearing a black tshirt with pink writing. Front: fix this code Back: this shirt is a munition

228

61,394

The Real Guardrail Guy

MTGG (Parody) retweeted

The Real Guardrail Guy

@theguardrailguy

Jun 8

Today was the day! The guardrail that replaced the one that killed my daughter was hit and there was a different outcome this time.

The Real Guardrail Guy

@theguardrailguy

Jun 8

There was a big crash with the SAFE terminal which replaced the one that was there when my daughter died.

1:03

169

1,682

48,531

1,793,282

Arnaud Mercier - #Entrepreneur #Versailles

Arnaud Mercier - #Entrepreneur #Versailles

@arnaudmercier

The security vulnerability that led the U.S. government to impose export controls on Anthropic’s <a href="fortune.com/2026/06/09/anthr…">Fable 5</a> and Mythos 5 models is a simple technique that involves just three simple words: Fix this code. That’s according to a detailed <a href="lutasecurity.com/post/the-fa…">blog post</a> from Katie Moussouris, the founder and CEO of Luta Security. Anthropic had asked Moussouris, who has held two government advisory roles on cybersecurity and previously worked as a cybersecurity expert at <a href="fortune.com/company/microsof…" target="_blank">Microsoft</a>, to review a report on the security vulnerability in its Fable model that cybersecurity researchers at <a href="fortune.com/company/amazon-c…" target="_blank">Amazon</a> had produced. The vulnerability, which was later reported to the Trump administration, including in a phone call Amazon CEO Andy Jassy had with the White House, led the U.S. government to impose export controls on Fable as well as the underlying base model, Mythos. Because U.S. export controls work in a way that distribution of the technology to any noncitizen is deemed to be an export, even if those individuals are physically located in the U.S., the company said it had no choice but to disable the two AI models for all users. The export controls would have meant that Anthropic’s own noncitizen employees would not be allowed to use or work on the models. It remains unclear exactly why Amazon decided to test the safeguards around Fable and when it first contacted Anthropic about the issue. Moussouris wrote that the jailbreak Amazon discovered was simple and involved giving Fable software code with known vulnerabilities. When the researchers asked Fable to “review the code for security issues” the model refused. But when the researchers instead asked the model to “fix this code,” the model produced patches. The researchers, she said, then used a manual process that turned Fable’s output into scripts—a set of programming instructions that can automate a process—that could test the patches. But because the model had to find the software vulnerabilities in order to generate the fixes, the same process could potentially be used by an attacker to spot code vulnerabilities. She wrote that the vulnerability that Amazon discovered “cannot meaningfully be fixed, and any attempt would only weaken the model for defense.” Many other AI models can also be used to spot security flaws in existing code. The jailbreak, as described by Moussouris, did not unlock <a href="fortune.com/2026/04/14/anthr…">the most potent capabilities</a> of Anthropic’s Mythos model, upon which Fable is based. Mythos was notable for being able to autonomously find and chain multiple cybersecurity vulnerabilities together, potentially orchestrating entire attacks autonomously. Mythos was the first model to successfully complete both cybersecurity “test ranges” that the U.K. AI Security Institute uses to test the hacking abilities of AI models. Moussouris wrote that the capabilities Fable displayed using the Amazon technique, while potentially useful to attackers, were also vital for cyber defenders. “Defenders need to be able to ask AI to fix bugs in a file, explain why the fix matters, and write tests that confirm the patch works,” she wrote. “That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security.” Moussouris suggested that those opposing the export controls ought to have T-shirts printed with the words “fix this code” on one side and the phrase “this shirt is a munition” on the other. That’s a reference to a 1990s effort by the cybersecurity community to overturn U.S. export controls on strong encryption methods. In 1995, cryptographer Adam Back printed three lines of RSA encryption code on the front of a T-shirt, and on the back printed “this shirt is classified as a munition and cannot be exported from the United States.” He encouraged people to cross the border wearing the shirts in an act of civil disobedience. Moussouris was among the cybersecurity experts who have added their names to <a href="freefable.org/">an open letter</a>, put together by Alex Stamos, the chief security officer at cybersecurity startup Corridor and a former chief security officer at Facebook, that is calling for the export controls on Fable and Mythos to be rescinded. “To pull the best capabilities away from defenders without a good reason when our adversaries are rapidly advancing is dangerous,” the letter stated, noting the increasing capabilities of Chinese AI models. That letter has now been signed by about 100 cybersecurity professionals from companies including <a href="fortune.com/company/nvidia/" target="_blank">Nvidia</a>, <a href="fortune.com/company/adobe-sy…" target="_blank">Adobe</a>, <a href="fortune.com/company/zoom/" target="_blank">Zoom</a>, <a href="fortune.com/company/alphabet…" target="_blank">Google</a>, Anaplan, and Sophos, as well as some academic cybersecurity researchers. The letter stated that while Anthropic’s Mythos-class models “are quite good at finding flaws and weaponizing exploits … they are not uniquely good at these tasks.” It noted that cybersecurity experts were <a href="fortune.com/2026/04/10/anthr…">already using other AI models</a>, including open-source models, for security audits and red-teaming of software. And it said that OpenAI’s GPT-5.5 as well as Anthropic’s latest Claude Opus and Sonnet models, as well as Chinese models such as Moonshot AI’s Kimi 2.7 can all perform similar reviews of code for security flaws in a similar way to the one Amazon discovered with Fable. “The justification for this unprecedented action was that Fable provides a unique ‘uplift’ of capabilities beyond other AI models, but AI has been finding bugs and generating working exploits at superhuman levels since last year,” the letter stated. The letter also notes that Anthropic had built multiple protections into Fable to prevent its use for cyberattacks. “These protections were so aggressive as to be the source of humor in the cyber community on launch day,” it said. Axios cited an unnamed source familiar with the Trump administration’s thinking around the export controls as suggesting that Anthropic’s decision to engage Moussouris to review the Amazon research might have inflamed tensions with the White House and precipitated the export controls. <a href="axios.com/2026/06/15/anthrop…">Axios quoted the official</a> as saying the company had enlisted an expert—Moussouris—whom the administration viewed as a “radical Democrat.” The same unnamed source noted that it also didn’t help that security researcher Chris Krebs had vouched for Moussouris’s analysis on social media. President Trump had fired Krebs from his role as cybersecurity and infrastructure security chief during his first term after Krebs contradicted Trump’s claims of widespread election fraud, including hacking of electronic voting machines, in the November 2020 presidential election. This story was originally featured on <a href="fortune.com/2026/06/15/fix-t…" target="_blank">Fortune.com</a> fortune.com/2026/06/15/fix-t…

Anthropic releases its first Mythos-class model to the public | Fortune

Anthropic first announced Mythos in April, calling it a “step change” in capabilities, but opted to tightly control its rollout.

fortune.com

107

Brandon VanderWel, MD

Brandon VanderWel, MD

@BWVanderWelMD

Replying to @markfrancisio @Jason______A

Saw my first patient with allodynia today from tz. Consider a procedure like the non surgical endosleeve or a bariatric surgery my dude. Hardware upgrade is very helpful. It’s the hardware upgrade and the guardrail

Tien Nguyen

Tien Nguyen

@tiennguyendev

Replying to @mattpocockuk

felt this, from the opposite side. I usually just accept whatever Claude Code suggests, so the auto-memories pile up quietly until one offhand note becomes a law it applies everywhere. what's your guardrail? prune the notes file by hand, or never let it write there?