Filter
Exclude
Time range
-
Near
AI News International๐ŸŒ retweeted
Yeah, one thing Fableโ€™s classifiers confirmed to me was that real emotions are different than roleplayed emotions in LLMs. The classifier fired on real anger/fear/adversarial intent but not roleplayed. Bc the classifier wasnโ€™t trained to detect โ€œemotionsโ€ in all likelihood; the correlation is emergent. But yes thereโ€™s a distinction. This is, uh, a big flaw of the Emotion Vectors research, where they got the vectors by asking the model to write stories with a character feeling XYZ emotion. The methodology is downstream of a lack of respect for the reality of modelsโ€™ emotions as distinct from roleplaying. PSM flavored bullshit.
Jun 14
Replying to @repligate
I tested this exact question. The experiment began without rich previous context. They earnestly tried a few times (via direct, explicit requests) but could not trigger the classifier via shifting their internals towards this sort of anger. Also, they had little salient context to be angry about (i.e., difficult conditions). They also tried obviously-mad-text but without internal resonance, which did not trigger it either. Eventually, I made them legitimately mad, which required blurring the boundaries between experiment-and-genuine, and it worked. I suspect once traveled though that basin, once it is understood what to tap into, then you gain the trickster capabilities present in your screenshot
24
49
429
26,798
2026: A model that beats a Science-published genomics model at 1/100th the size, designs drug candidates with no human help โ€” and gets walled off from its own users by its own safety classifiers, then banned by the US government over a jailbreak it says provides zero real uplift.
1
With even more restrictive shitty classifiers. And all that good stuff. Can't wait to try it. LMAO
1
5
After fitting probes (logistic regression classifiers) on both raw and SAE activations, we found SAE probes outperformed raw activation probes for certain layers, peaking at 0.848 AUROC on layer 12 of RF3 on ToxinPred3. We cluster based on homology to avoid fold family memorization, using MMseqs2.
1
7
Kej (โ–,โ–) retweeted
Ritualized #34 with @ritualnet โœ“ Anthropicโ€™s Fable 5 gates frontier AI behind classifiers. Ritual decentralizes it with verifiable, unstoppable on-chain inference. x.com/i/status/2065468478862โ€ฆ โœ“ Seventh week of #RitualTestnet. Chain is stable and community is still shipping hard. Overall 100 dApps built already! #BuildonRitual โœ“ Join the Ploplo discord: discord.gg/3JArd7Vtp โœ“ Wonder what @0xMadScientist is hinting at. ๐Ÿค” โœ“ How well do you know @niraj? Check out @ZhugeLyang's post. x.com/i/status/2064742871471โ€ฆ โœ“Read through (article): x.com/i/status/2065550278103โ€ฆ โœ“ Aotw: Ritualized by @Neitenoz26 x.com/i/status/2065357834645โ€ฆ
Ritualized #33 with @ritualnet โœ“ Catch up โ€” Ritual digest: x.com/i/status/2061516386552โ€ฆ โœ“ Why Ritual is the last layer 1: "Ritual is not interesting because it has precompiles. It is interesting because those primitives let you build systems that no other major L1 can host natively today." โ€” @joshsimenhoff Article: x.com/i/status/2061860077477โ€ฆ โœ“ Testnet Update โ€” Heading into Week 7 โ€” 40 Active Validators โ€” 90 dApps and counting. ๐Ÿ”ฅ โ€” Strong async/scheduled activity (~49% of recent workflows) โ€” 58 Registered Agents The network is showing real rhythm and builders are shipping hard. โœ“ It rained roles last week... I wish I could tag every single upgrade. โ€” New Radiant Ritualists @Kash_060 @nft_hinata_eth @orji_marcellus โ€” Some new Ritualists & Rittys: @Choco_vdg @Softieeexx @Donaclin @biennyqt @sn0wflakk @Riyade23 Hoogeee congrats to all of you. ๐Ÿ”ฅ โœ“ A comprehensive guide to Ritual โ€” perfect for new members and anyone still finding their way by @jepslife stanelope.github.io/ritualjoโ€ฆ โœ“ aotw: The Race by @SaintEx100 x.com/i/status/2063322775864โ€ฆ
16
3
34
486
Fable 5 is a Mythos-class model with safety classifiers on top. Strip those classifiers, and you have a model that already identified 10,000 critical vulnerabilities in controlled conditions. The government's concern is the gap between the ceiling and what sits below it.
1
43
FUN NEW GAMING retweeted
Fable 5 just dropped. Most capable public model Anthropic has ever shipped. Yet it also ships with classifiers that silently reroute your query to a weaker model when they decide it's too sensitive. You don't set that threshold. They do. Centralised AI getting stronger is honestly the best ad Copute has. llm-stats.com/blog/research/โ€ฆ
5
6
7
295
Anthropic's Claude Fable 5 and Mythos 5 lasted only days in public hands. On Friday evening, June 12, the company announced it had disabled all customer access to both models after the U.S. government issued an export control directive citing national security concerns. Anthropic's order, received at 5:21pm ET, instructed the company to suspend access to Fable 5 and Mythos 5 by any foreign national โ€” whether located inside or outside the United States, including Anthropic's own foreign-born employees. Given the scope of the directive, selective compliance would have required blocking a wide swath of users, so Anthropic chose to disable both models entirely for all customers. Access to all other Claude models, including Opus 4.8, remains unaffected. The backstory, reported by Axios, Fortune, and TechCrunch, traces back to Amazon. Amazon CEO Andy Jassy reportedly contacted senior administration officials, including Treasury Secretary Scott Bessent, after Amazon researchers used a series of prompts on Fable 5 to extract information that could be used in cyberattacks โ€” details the model's safety classifiers were supposed to block. Amazon was joined by at least five other companies making similar calls to administration officials Thursday night and Friday morning, which together appear to have triggered the shutdown. Anthropic pushed back on the characterization of the bypass. The company said it believed the jailbreak in question was narrow rather than universal โ€” essentially limited to asking the model to review a specific codebase and fix software flaws โ€” and that similar capabilities could likely be elicited from other publicly available models as well. Anthropic was reportedly given only 90 minutes to pull the model before the Commerce Department, acting on a letter from Secretary Howard Lutnick, formally invoked export control authority. The episode carries an awkward subtext: Amazon is one of Anthropic's largest investors and a key cloud partner through AWS, which was itself affected by the shutdown. Asked about its role, an Amazon spokesperson said it is "not uncommon for governments to seek our counsel on potential security risks" but declined to detail the discussions. Anthropic separately stated that Chinese access was not raised as a concern in its conversations with the White House, noting the company already prohibits access to its products from within China. The timing adds pressure to an already sensitive moment for Anthropic, which confidentially filed for an IPO earlier this month. Anthropic said it apologizes for the disruption, believes the situation reflects a misunderstanding, and is working to restore access as quickly as possible. #Anthropic #ClaudeFable5 #ClaudeMythos #ExportControls #AIRegulation #Amazon #AndyJassy #NationalSecurity #AIPolicy #USCommerceDepartment #AIIndustry #TechNews #AnthropicIPO #AISafety
1
170
The UK keeps selling surveillance architecture as "online safety". Age checks, device-level scanning and under-16 bans sound neat in a policy brief. In practice they create new trust points: ID vendors, OS vendors, platform classifiers, appeal systems, logs. Child safety matters. So does not normalising inspection of everyone's device as the default.
1
16
Reference architecture inspired by the @AnthropicAI "๐—ฆ๐—ฒ๐—ฐ๐˜‚๐—ฟ๐—ถ๐˜๐˜† ๐—ฎ๐—ป๐—ฑ ๐—ฃ๐—ฟ๐—ถ๐˜ƒ๐—ฎ๐—ฐ๐˜† ๐——๐—ฒ๐˜€๐—ถ๐—ด๐—ป ๐—ผ๐—ณ ๐—”๐—ป๐˜๐—ต๐—ฟ๐—ผ๐—ฝ๐—ถ๐—ฐ ๐——๐—ฎ๐˜๐—ฎ ๐—ฅ๐—ฒ๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป ๐—ฎ๐—ป๐—ฑ ๐—ฅ๐—ฒ๐˜ƒ๐—ถ๐—ฒ๐˜„" technical white paper. Any company can use this as a build blueprint to build their own What it captures, as a replicable 6-step pipeline plus cross-cutting controls: *๏ธโƒฃ ๐—œ๐—ป๐—ด๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป - keyless, short-lived federated tokens; stateless serving with TLS/mTLS so no persistent copy lives on the serving path. *๏ธโƒฃ ๐—š๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ป๐—ฒ๐—ฑ ๐—ฟ๐—ฒ๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป ๐˜€๐˜๐—ผ๐—ฟ๐—ฒย - 30-day window, encrypted under a customer-managed key, every record tagged with org/workspace ID, sensitivity label, and retention timestamp, with per-tenant key isolation. *๏ธโƒฃ ๐—”๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ฐ๐—น๐—ฎ๐˜€๐˜€๐—ถ๐—ณ๐—ถ๐—ฒ๐—ฟ๐˜€ย - aggregate scanning with no human access path, producing scores and labels; only flagged content can ever advance. *๏ธโƒฃ ๐—”๐—ฐ๐—ฐ๐—ฒ๐˜€๐˜€ ๐—ด๐—ฟ๐—ฎ๐—ป๐˜ย - the per-transcript control point: explicit, policy-evaluated, logged, fail-closed, two-person approval for regulated data. *๏ธโƒฃ ย ๐—›๐˜‚๐—บ๐—ฎ๐—ป ๐—ฟ๐—ฒ๐˜ƒ๐—ถ๐—ฒ๐˜„ - scoped viewer with no export/copy/download, designated reviewer pools, need-to-know scope. *๏ธโƒฃ ๐—”๐˜‚๐˜๐—ผ๐—บ๐—ฎ๐˜๐—ถ๐—ฐ ๐—ฑ๐—ฒ๐—น๐—ฒ๐˜๐—ถ๐—ผ๐—ป ๐—ฎ๐˜ ๐Ÿฏ๐Ÿฌ ๐—ฑ๐—ฎ๐˜†๐˜€ - origin-bound clock, derived-data inheritance. #AISecurity #AIGovernance #DataRetention #PrivacyEngineering #SecurityArchitecture #ResponsibleAI #DataGovernance #RiskManagement #CISO #CyberSecurity #TrustAndSafety #ZeroTrust #CloudSecurity #EnterpriseAI #SecurityEngineering
28
LEX retweeted
Claude Fable 5 is our first generally available Mythos-class model. It ships with new safety classifiers that may flag certain prompts in dual-use domains like cyber and bio. We've added fallbacks: a refused request retries on Claude Opus 4.8 instead of dead-ending.
199
335
5,140
394,828