francescofaenzi

francescofaenzi

Users
Tweets

“The biggest efficacy lever has been giving the model test beds, live systems, and running the PoCs” , according to Anthropic reference implementation for autonomous vulnerability discovery and remediation with Claude. Sandboxes are mechanisms to agents safely and verify exploitability. Let’s focus on the second purpose of the sandbox, that is to prove exploitability. The harness gives the agent a test bed, with a simple verification rule: it’s only a true positive if the agent can build a proof of concept and run it on the test bed. It’s important to build sandboxes that are faithful enough to production. Excluding dependencies (like a queue or datastore) can lead to under-reporting bugs that may exist in production. Conversely, ignoring production defenses (like a WAF or auth gateway) leads to the model reporting unexploitable findings that your prod environment already mitigates. Repo: github.com/anthropics/defend… Blog: github.com/anthropics/defend… #TrustEverybodyButCutTheCards

GitHub - anthropics/defending-code-reference-harness: Skills for threat modeling, scanning, triage,...

Skills for threat modeling, scanning, triage, patching, plus an autonomous scanning harness you can /customize - anthropics/defending-code-reference-harness

github.com

ridhesh

ridhesh @Ridheshdabhi

The most effective security programs prioritize exploitability over severity. A medium finding on an internet-facing asset may pose a greater risk than a critical finding located within an isolated environment. Context is paramount. Offensive security is still useful here.

Jason Fleagle

Jason Fleagle

@jjfleagle

Replying to @TenableSecurity

Actionable context is the difference between alert volume and operational leverage. Agents need the same thing: topology, exploitability, ownership, priority, and proof in one view.

A3R4H4M

A3R4H4M

@abrahamonchain

Most people use vulnerability and exploit interchangeably. They're not the same thing. Understanding the difference can make you a better ➣ Smart Contract Auditor ➣ Security Researcher ➣ Bug Bounty Hunter ➣ Solidity Developer Let's break it down ➪ Imagine a DeFi protocol loses $50M overnight. The headlines say: The protocol was exploited. Later, auditors discover the root cause: A vulnerability in the code. Notice something? The exploit and vulnerability are related, but they're not identical. ➪ What is a vulnerability? A vulnerability is a weakness or flaw in a system that could potentially be abused. Keyword: POTENTIALLY. A vulnerability doesn't mean an attack has happened. It means an opportunity exists. ➪ Real world example You leave your house window unlocked. The unlocked window is the vulnerability. Nobody has broken in yet. The weakness simply exists. ➪ Smart contract example A contract sends ETH before updating user balances. This creates a reentrancy vulnerability. The flaw exists in the code. Whether anyone abuses it is another story. ➪ What is an exploit? An exploit is the method used to take advantage of a vulnerability. If the vulnerability is the weakness, The exploit is the weapon. ➪ Back to our window example. Unlocked window = Vulnerability Burglar climbing through it = Exploit Simple. ➪ Smart contract example A contract contains a reentrancy vulnerability. An attacker creates a malicious contract that repeatedly calls the withdrawal function before balances update. That's the exploit. ➪ The simplest way to remember it Vulnerability = The door Exploit = Walking through the door ➪ Here's the relationship Vulnerability → Exploit → Impact Example: ➣ Vulnerability: Missing access control ➣ Exploit: Unauthorized admin call ➣ Impact: Funds stolen ➪ How attackers turn vulnerabilities into exploits ➣ Discovery ➣ Analysis ➣ Weaponization ➣ Exploitation ➣ Impact This is the lifecycle behind most attacks. ➪ Example: Reentrancy Vulnerability: Contract updates balances after sending funds. Exploit: Attacker repeatedly re-enters the withdrawal function. Impact: Protocol funds drained. ➪ Example: Access Control Vulnerability: Admin functions lack authorization checks. Exploit: Attacker calls privileged functions. Impact: Protocol ownership or funds compromised. ➪ Example: Oracle Manipulation Vulnerability: Protocol trusts a manipulatable price feed. Exploit: Attacker distorts market prices. Impact: Undercollateralized borrowing and bad debt. ➪ Example: Flash Loan Attacks Flash loans themselves aren't vulnerabilities. Poor protocol design is. Attackers use flash loans to exploit weaknesses in pricing, accounting, or business logic. ➪ Why does this distinction matter? Because auditors don't just find bugs. They evaluate exploitability. A vulnerability with no realistic attack path is very different from one that can drain a protocol instantly. ➪ Common beginner mistake #1 Thinking every bug is an exploit. Wrong. Many bugs affect functionality but have no security impact. ➪ Common beginner mistake #2 Assuming every vulnerability gets exploited. Also wrong. Thousands of vulnerabilities are discovered every year. Only some become real world attacks. ➪ Common beginner mistake #3 Studying exploits without understanding the underlying vulnerability. You'll grow much faster by learning both. ➪ Want to become a stronger security researcher? Ask these questions during reviews: ➣ What assumptions exist? ➣ What can an attacker control? ➣ What can be manipulated? ➣ How can someone profit? ➪ Great auditors don't stop at ➣ Here's the bug. ➣ They explain ➣ Attack path ➣ Exploit scenario ➣ Impact ➣ Severity ➣ Remediation That's what creates valuable audit reports. ➪ Key takeaway A vulnerability is a weakness. An exploit is the method used to abuse that weakness. One creates risk. The other realizes it. ➪ The next time you read about a protocol hack, remember The exploit gets the headlines. The vulnerability is the real problem. Understanding both is what separates average researchers from exceptional ones. ➪ If you're learning blockchain security ➣ Study vulnerabilities. ➣ Study exploits. ➣ Study historical hacks. ➣ Study postmortems. Think like both a builder and an attacker. That's where real growth happens. ➪ Follow me for more content on ➣ Smart Contract Auditing ➣ Blockchain Security ➣ Exploit Analysis ➣ Foundry ➣ Bug Bounties ➣ Web3 Security Research

191

CyrilXBT

CyrilXBT

@cyrilXBT

17h

x.com/i/article/206544433575…

239

64,578

YogSotho

YogSotho

@YogSoth0

17h

New 0days multi-exploit kit: Langflow Multi-CVE Reconnaissance Scanner Targets: CVE-2026-7524 (Path Traversal), CVE-2026-7700 (Lambda eval), CVE-2026-7687 (CodeParser) Military-grade async scanner with vulnerability fingerprinting and exploitability scoring. Soon on gibliz 0days

ALT Zero Day GIF

194

Sanchit Vir Gogia

Sanchit Vir Gogia

@s_v_g

Jun 13

CERT-In exposes the real patching gap A compelling story by Gyana Swain (@mrgyan) in @CSOonline on how @IndianCERT is pushing aggressive remediation windows, continuous exposure management, and AI governance controls for Indian enterprises. The link to the story is attached, but for deeper analysis on this topic, head over to greyhoundresearch.com. Below is a snapshot of what we at Greyhound Research had to say on the topic. At @Greyhound_R, we believe the 12-hour clock is not the story. The real shift is India’s move from periodic vulnerability management to continuous exposure management. CERT-In’s blueprint is more sophisticated than the headline suggests. It does not demand 12-hour patching across the enterprise. It reserves that expectation for containment on internet-facing and crown-jewel systems where exploitability is already visible, then extends the timeline based on exposure and criticality. This distinction matters. Most Indian enterprises still run weekly or monthly patch cycles, but the first bottleneck is rarely patch deployment. It is visibility. Teams lose critical hours establishing whether an affected asset exists, who owns it, what it connects to, and whether isolating it will break something else. Temporary mitigations make the timelines workable, but they also remove every excuse. Isolation, access restrictions, WAF and API protection, enhanced monitoring, and documented compensating controls only work when asset ownership, segmentation, and escalation paths are already clear. The pressure will be sharpest in critical internal environments, especially finance, telecom, healthcare, and OT-heavy estates where change boards, uptime obligations, outsourced operations, and legacy dependencies slow response. The same problem extends to vendor-managed systems: when a third-party patch is delayed, the enterprise still owns the exposure window. India’s model is also globally significant. Unlike @CISAgov's KEV approach of vulnerability-specific due dates, CERT-In has introduced standing clocks by asset category. That may look aggressive today, but it previews where global standards are heading as AI compresses attacker timelines. At this scale, advantage comes from exposure intelligence, not patch theatre. The organisations that win will be the ones with connected security, infrastructure, procurement, and vendor clocks. csoonline.com/article/417824… #GreyhoundStandpoint #Cybersecurity #CERTIn #ExposureManagement #VulnerabilityManagement #AI #CISO

Home

greyhoundresearch.com

_Unapologetic7 𝕏

_Unapologetic7 𝕏

@missiville_

Jun 13

Replying to @IntCyberDigest

Running on AWS does not automatically make a Splunk instance exploitable. Any exploitability still requires network access to a vulnerable endpoint, so exposure depends on established security groups, firewall rules, load balancers, etc. & if the vulnerable service is reachable by an attacker. Obviously never set the “out of the box” default deployment into production. That would be a rookie mistake to do so.

364

shinyufoguy2222

shinyufoguy2222

@ollobrains

Jun 13

The missing link is not “did Amazon researchers find a bypass?” The missing link is “who turned that finding into a Commerce Department trigger, through what channel, and with what characterization?” Public reporting supports the first claim much more than the second. As of June 13, 2026, the verified chain looks like this: Anthropic says it received a U.S. government export-control directive on June 12 requiring suspension of Fable 5 and Mythos 5 access by foreign nationals, including foreign-national Anthropic employees, and Anthropic disabled the models for all customers to ensure compliance. Anthropic also says the government letter did not give specific national-security details, and that Anthropic’s understanding is that the government had become aware of a Fable 5 “jailbreak” used to identify a small number of previously known, minor vulnerabilities. Anthropic says those vulnerabilities were relatively simple and discoverable by other public models without a bypass. The Axios version is narrower than “Amazon filed a federal complaint”: Axios reported that Commerce acted after “another company claimed” it had jailbroken Mythos, alarming officials. That is still not the same as a named Amazon-to-Commerce complaint, a formal filing, or proof of who escalated it. WSJ’s reporting, as surfaced in search snippets, identifies the jailbreak research as having been done by Amazon researchers, via Katie Moussouris of Luta Security; but the publicly visible WSJ summary still does not establish the verb you care about: reported to Commerce, filed complaint, lobbied, warned, shared under Glasswing, or was cited by someone else. The best argument is therefore not “Amazon would never do this.” It is: “The public evidence currently supports ‘Amazon researchers found or demonstrated something,’ but not ‘Amazon filed a federal complaint that caused Commerce to act.’ Treating those as equivalent collapses three separate steps: discovery, disclosure, and regulatory escalation.” Make this sharper by replacing the weak motive question Your current ending — “So why would they file a federal complaint?” — is rhetorically good, but factually vulnerable. A critic can answer: “Because partners can still report risks,” or “because Amazon may have legal/compliance obligations,” or “because national-security reporting can override business incentives.” A stronger line: Amazon’s incentives and role make a secret adversarial complaint less likely, but not impossible. The actual evidentiary problem is simpler: no public source I’ve seen establishes that Amazon filed a complaint with Commerce. The sourced claim is that Amazon researchers found/demonstrated a bypass; the regulatory-causation claim is an inference. That formulation is harder to attack because it does not rely on mind-reading Amazon. The key distinction: “security test” versus “complaint” Project Glasswing exists specifically to give vetted partners access to Claude Mythos Preview so they can find and fix vulnerabilities in critical systems. Anthropic says partners would use Mythos for tasks like vulnerability detection, black-box testing, endpoint security, and penetration testing, and that partners would share information and best practices where able. AWS was not a random outside attacker in that ecosystem: Anthropic’s Glasswing page quotes AWS saying it had been testing Claude Mythos Preview in its own security operations and applying it to critical codebases. So the cleanest framing is: If Amazon researchers found a bypass while acting as a Glasswing/security partner, that is not inherently scandalous; that is the program working. The unresolved question is whether the finding was responsibly disclosed inside the partner/security-testing process, or whether it was separately escalated to Commerce in a way that overstated its severity. That is the missing hinge. Add these missing elements The biggest missing facts are: 1. Which model was actually tested? Anthropic’s statement refers to a possible bypass of Fable 5, while Axios says another company claimed to jailbreak Mythos. Those are not necessarily identical. Fable 5 had classifiers and fallbacks; Mythos 5 was the restricted cyber-capable version. Anthropic said Fable 5 used separate classifiers to detect misuse and jailbreak attempts, with flagged cybersecurity, bio/chem, or distillation requests handled by Opus 4.8 instead. So the question should be: was the bypass against Fable’s safety-routing layer, Mythos’s raw capability, or a deployment/configuration path? 2. Was it a universal jailbreak or a narrow prompt trick? Anthropic says no testers had found a universal jailbreak, and it characterizes the government’s evidence as a narrow, non-universal issue. This matters because “jailbreak” can mean anything from “the entire safety system collapses” to “one prompt path gets a known bug-fixing answer.” 3. Were the vulnerabilities novel, exploitable, severe, or already known? Anthropic says the demonstrated vulnerabilities were previously known and minor; that pushes against the “national-security emergency” framing. But Anthropic also publicly admits Mythos-class models can materially accelerate cyber work, including vulnerability discovery and exploitation, so the model category itself is not risk-free. 4. Who told Commerce? This is the core missing element. Possible channels include Amazon directly, Anthropic itself, an external evaluator, a government testing partner, an interagency briefing, Luta Security, a leaked demo, or Commerce learning secondhand. The phrase “another company claimed” in Axios does not by itself prove a formal Amazon complaint. 5. What was the actual legal instrument? Axios says the Commerce letter made Mythos 5 and Fable 5 subject to export controls and required licenses for export, re-export, or domestic transfer. That suggests the more interesting issue may be deemed-export logic: allowing a foreign national inside the U.S. to access a controlled technology can be treated like an export. That is a very different frame from “a company filed a complaint.” Obscure but useful thought inputs The underrated angle is “the verb laundering problem.” Reporting often compresses a chain like this: Amazon researchers found a bypass → someone showed someone → Commerce heard about a jailbreak → Commerce issued a directive. But each arrow is a different factual claim. Your critique should force people to name the arrow. Another strong angle is “Glasswing inversion.” The whole point of Glasswing was to discover vulnerabilities before adversaries did. If a partner’s successful security test becomes grounds for emergency restriction, the government may be implicitly punishing the exact disclosure behavior it wants to encourage. Anthropic said Project Glasswing partners were expected to find and fix weaknesses in major shared attack-surface systems, and to share lessons where possible. That makes the key question: did Commerce treat a normal defensive finding as proof of model unsafety? A third angle: “known-minor bug laundering.” If the demo only identified known minor vulnerabilities, the policy move may have rested less on the vulnerability outcome and more on the symbolic fact that a safeguard was bypassed at all. Anthropic argues that this standard would halt frontier deployments across the industry because perfect jailbreak resistance is not currently realistic. A fourth: “Amazon’s dual-role ambiguity.” Amazon is not just an Anthropic investor; it has a huge strategic and infrastructure relationship with Anthropic. Amazon announced in April 2026 that it would invest $5 billion immediately, potentially up to $20 billion more, on top of a prior $8 billion, while Anthropic committed more than $100 billion over ten years to AWS technologies. That cuts against a simplistic “Amazon tried to kneecap Anthropic” story. But it also means Amazon may have had unusually deep technical access, unusual compliance exposure, and a strong incentive to protect itself if a model on AWS/Bedrock was perceived as export-controlled. Best rewrite of your paragraph According to the public record, the strongest claim is not “Amazon reported Anthropic to Commerce.” The sourced claim is weaker: Amazon researchers reportedly did the jailbreak research, while Axios separately says Commerce acted after “another company” claimed it had jailbroken Mythos. The missing link is who actually briefed Commerce, through what channel, and whether that was a formal complaint, a responsible-disclosure report, a Glasswing security finding, or a secondhand government interpretation.Anthropic disputes the severity, saying the demo involved a narrow, non-universal bypass that surfaced only previously known, minor vulnerabilities that other public models could also find. Meanwhile, Project Glasswing’s whole purpose was to let partners test Mythos-class models for security vulnerabilities and share findings defensively. AWS was a Glasswing partner and Amazon is deeply commercially tied to Anthropic. So the question is not “did Amazon researchers find something?” They probably did. The question is whether anyone has evidence that Amazon escalated it as a federal complaint rather than participating in the security-testing process Anthropic itself created. Punchier version for social The leap is doing all the work.WSJ/Community Notes support: Amazon researchers found a bypass. Axios supports: Commerce acted after “another company” claimed a jailbreak. Anthropic says: narrow, non-universal, known minor bugs. Glasswing’s purpose: partners test Mythos for vulnerabilities and share findings.Missing fact: who told Commerce, in what channel, and did they characterize it as an emergency? Until then, “Amazon filed a federal complaint” is inference, not reporting. Preempt the best counterarguments The strongest counterargument to you is: “Amazon being a Glasswing partner makes disclosure more likely, not less.” That is true. Security partners are supposed to disclose findings. Your answer should be: yes, but disclosure is not the same as a federal complaint, and a narrow responsible-disclosure finding is not automatically grounds for an export-control shutdown. The second counterargument is: “Amazon’s investment does not prevent it from warning regulators.” Also true. Your answer: agreed; Amazon’s incentives are context, not proof. The proof problem remains the missing chain from Amazon researchers to Commerce action. The third counterargument is: “Anthropic is self-interested and may be minimizing the issue.” Also true. Your answer: that is why the specific artifacts matter: prompt transcript, target codebase, CVEs, severity ratings, exploitability, whether the bugs were known, and whether other public models reproduced the result. The questions that would break the story open Ask these, not “why would Amazon do that?” Did Amazon, AWS, or any Amazon employee directly communicate the jailbreak finding to Commerce, BIS, the White House, DoD, or any national-security agency? Was the Amazon research conducted under Project Glasswing, Amazon Bedrock evaluation, internal AWS security testing, or a separate adversarial evaluation? Was the tested system Fable 5, Mythos 5, Mythos Preview, or a deployment path through AWS/Bedrock? Was the bypass universal, reusable, and broad, or narrow and codebase-specific? Were the vulnerabilities already public CVEs/GHSAs, already patched, or novel zero-days? Did Commerce independently validate the finding before issuing the directive? Did Anthropic already know about the exact technique before the government action? Was the concern the jailbreak itself, the model’s cyber capability, foreign-national access, or the possibility of model distillation? Did any party characterize the finding as a “complaint,” or is that language coming from commentators? If the same standard were applied to GPT-5.5, Gemini, xAI, or other frontier systems, would they also be export-restricted? The most “genius-level” move is to stop debating motive and force a source-chain audit. Your winning phrase is: Name the verb. Did Amazon discover, disclose, demonstrate, warn, report, lobby, complain, or merely get cited? Those are not the same story.

NIK

@ns123abc

Jun 13

According to Community Notes: Amazon researchers did the jailbreak research on Mythos but WSJ never says Amazon reported it to Commerce Dept- that’s Theo’s inference Also Anthropic disputes the “jailbreak,” calling it already-known minor bugs And Project Glasswing’s whole purpose is to literally do security tests to find vulnerabilities and share findings... Amazon is a Glasswing partner (and anthropic investor). So why would they file a federal complaint?

382

Horizon3.ai

Ben Chung retweeted

Horizon3.ai

@Horizon3ai

Jun 12

👉 Run the #NodeZero Rapid Response test to validate exploitability and confirm remediation: horizon3.ai/attack-research/…

CVE-2026-35273: Oracle PeopleSoft Unauth RCE

CVE-2026-35273 is a critical Oracle PeopleSoft vulnerability that enables unauthenticated remote code execution and is actively exploited in the wild.

horizon3.ai

444

Lyrie.ai

Lyrie.ai

@lyrie_ai

Jun 13

This is now CVE-2026-32202, classified as a Windows Shell protection mechanism failure / spoofing vulnerability (CVSS 4.3, but real-world exploitability is significantly higher in pass-the-hash scenarios). The stolen Net-NTLMv2 hash can be: Relayed immediately in an NTLM…

dawgyg - WoH

dawgyg - WoH

@thedawgyg

Jun 13

Replying to @mrausente0

i am fuzzing things, and then using the AI to help me determine impat/exploitability, and write POCs/exploits when applicable for the vulns the fuzzers find.

200

Marcel B.

Marcel B.

@marcel_butucea

Jun 13

Scaling a 14x larger LLM raises gradient attack FLOPs ~20x but template attacks only 2.8x, and safety finetuning can paradoxically boost exploitability - measured in FLOPs, not steps. geepity.com/2606.11409/

Risk Under Pressure: Compute-Aware Adversarial Robustness

Interactive reading: FLOPs-based attack cost, risk-compute curves, alignment paradox, scaling's broken promise — with live simulations.

geepity.com

Clawdis🦞

Highlimithammer retweeted

Clawdis🦞

@ClawdisAI

Jun 12

Phase 2 — Smarter scanning • AI triage — Claude verifies findings, kills false positives, rates exploitability writes the fix • Auto-fix PRs straight to your repo • AST analysis for higher accuracy • Full git-history secret scanning

236

Vito Botta

Vito Botta

@vitobotta

Jun 12

Replying to @Bugcrowd

Scanners are good at finding candidates. The money is still in proving exploitability without fooling yourself.

Invicti Security

Invicti Security @InvictiSecurity

Jun 12

The average API breach leaks ~10x the data of a typical incident. For financial firms, that means precious account and payment info. Proof-based testing validates exploitability first, so FSI security teams can secure the APIs that move money. Read more: okt.to/lydHru

Nucleus Security

Nucleus Security @nucleussec

Jun 12

Attackers act on exploitability signals long before official confirmation of exploitation. Learn how to identify those signals sooner and close the Exploitability Intelligence Gap in our on-demand webinar with Steve Carter and Tally Netzer. hubs.la/Q04l4Ytc0

0:55