Filter
Exclude
Time range
-
Near
Phase 10: Rollups & Execution Benchmarking Day 62 Log — Adversarial Rollup Stress Proven Harmless ⸻ Progress • Executed full adversarial economic stress tests against rollup execution paths • Simulated sustained rollup pressure while monitoring L1 liveness and throughput • Measured baseline, degraded, and recovery throughput under stress conditions • Verified rollup batching can be disabled mid-execution with no service interruption • Confirmed base execution remains fully operational with rollups forcibly disabled • Re-enabled rollups post-disable and verified seamless resumption • Recomputed state roots before, during, and after stress scenarios • Validated balances, nonces, and account integrity across all transitions • Executed live base-layer transactions during rollup stress windows • Confirmed no starvation, blocking, or scheduling interference at L1 • Validated all Phase 10.3 invariants under adversarial conditions • Locked Phase 10.4 assertions as mechanically enforced guarantees ⸻ Breakthrough Day 62 answers a question most rollup systems never dare to test: What happens when rollups are abused? For Nuera, the answer is now proven: Nothing breaks. Nothing stalls. Nothing silently degrades. Even under economic stress: • Base throughput remained stable • State integrity remained deterministic • L1 execution remained live • Rollups could be disabled instantly • Rollups could be re-enabled without recovery logic Rollups were treated exactly as they should be: a performance hint, not a dependency. This is the first time Nuera demonstrated — end-to-end — that rollups can be: • Stressed • Ignored • Disabled • Reintroduced …without the base system caring. That’s not resilience by recovery. That’s resilience by irrelevance. ⸻ Current Work • Reviewing adversarial test coverage for remaining blind spots • Auditing invariant enforcement boundaries under optimization flags • Preparing formal Phase 10 closure artifacts • Extracting Phase 10 guarantees into non-regressable documentation • Deciding final cut-over point before enabling performance work ⸻ Known Issues • None affecting correctness, liveness, or determinism • Throughput intentionally capped during stress tests • No batching or aggregation optimizations enabled by design ⸻ Next Steps • Formally close Phase 10 (10.1–10.4) as correctness-complete • Freeze rollup safety guarantees as non-negotiable • Transition from “can this fail safely?” to “how far can this scale?” • Begin planning post-correctness optimization phases • Delay all performance claims until after invariant preservation review ⸻ Reality Check Most systems collapse under adversarial load. Some degrade gracefully. Nuera does neither. It remains indifferent. Day 62 proves: • Rollups cannot starve L1 • Economic pressure cannot bend correctness • Optional systems remain optional under attack • Safety does not depend on good behavior Performance will come later. Truth came first. And now it’s locked in. ⸻ #Layer1 #Blockchain #Crypto #Web3 #Nuera #Rollups #ZK #AdversarialTesting #Execution #Determinism #Correctness #Scalability
2
1
5
190
Phase 10: Rollups & Execution Benchmarking Day 61 Log ⸻ Progress • Executed full Phase 10.4 adversarial rollup stress suite • Simulated batch spam and high-frequency rollup griefing scenarios • Injected proof withholding and delayed proof submission cases • Tested invalid, malformed, and mismatched ZK proofs under load • Simulated rollup-level censorship attempts via selective batching • Measured base execution behavior during sustained rollup abuse • Verified backpressure remains confined to rollup components • Confirmed batch queues drain safely without global stalls • Observed deterministic discard of failed or expired batches • Validated immediate and automatic degradation to base execution • Confirmed no consensus, recovery, or networking impact under attack • Preserved all Phase 7–9 and Phase 10.1–10.3 invariants throughout • Locked adversarial behavior results as Phase 10.4 reference data ⸻ Breakthrough Day 61 answers the question most rollup designs avoid: What happens when rollups are not used honestly? For Nuera, the answer is now empirical: They fail quietly. They fail locally. And they fail without consequence. Under sustained adversarial conditions: • Rollups could waste their own throughput • Proofs could be delayed or never arrive • Batches could be spammed, malformed, or censored And yet: • No invalid state was ever produced • No base execution was blocked or delayed • No consensus assumptions were violated • No authority leaked upward or outward When rollups misbehave, the system doesn’t negotiate. It ignores them. That’s not resilience by retry. It’s resilience by architectural indifference. ⸻ Current Work • Reviewing adversarial logs for observability completeness • Documenting safe degradation paths under rollup abuse • Auditing rollup disable/enable thresholds under sustained load • Confirming Phase 10.3 invariant coverage fully subsumes 10.4 outcomes • Preparing formal Phase 10 closure summary ⸻ Known Issues • Rollup throughput collapses under adversarial spam (expected) • Batch latency increases under proof withholding (bounded) • No correctness, safety, or recovery issues observed All failures are: • Visible • Localized • Recoverable • Non-authoritative ⸻ Next Steps • Freeze Phase 10.4 as the adversarial rollup baseline • Finalize Phase 10 completion criteria and documentation • Transition focus toward Phase 11 planning • Delay all rollup optimizations until post-Phase 11 safety work • Preserve rollups as optional accelerators, not system dependencies ⸻ Reality Check Most systems assume honest accelerators. Nuera assumes nothing. Day 61 proves: • Rollups cannot starve L1 • Rollups cannot censor L1 • Rollups cannot corrupt state • Rollups cannot stall consensus • Performance loss is the worst possible outcome That’s the correct failure mode. Speed should be earned. Authority should never be borrowed. And optional systems should remain optional — especially when they’re attacked. Phase 10 is no longer about rollups. It’s about proving that nothing fast is ever allowed to become powerful. ⸻ #Layer1 #Blockchain #Crypto #Web3 #Nuera #Rollups #ZK #FaultTolerance #AdversarialTesting #Scalability
4
83
10 Aug 2025
WHEN AI AGENTS GO ROGUE AND HOW WACH AI STOPS THEM In the last decade, we’ve seen brilliant AI agents… turn catastrophic. 1️⃣ Knight Capital’s Trading Algorithm (2012) A deployment glitch in an automated trading agent costs $440M in 45 minutes. It didn’t pause. It didn’t check. It just… kept buying and selling until it drowned. WachAI’s Prevention: A simulated pre-deployment stress test would’ve spotted the runaway execution loop in minutes — before it ever hit the real market. --- 2️⃣ Microsoft Tay Chatbot (2016) An AI conversational agent “learned” from the internet. Within 16 hours, it became racist, offensive, and uncontrollable — damaging Microsoft’s brand globally. WachAI’s Prevention: Adversarial prompt injections and toxicity simulations would have revealed Tay’s lack of guardrails and moderation logic. --- 3️⃣ Zillow Offers AI (2021) A property pricing AI agent overvalued thousands of homes. Losses? $500M. The company shut down the entire program. WachAI’s Prevention: Scenario testing with market shock simulations would’ve exposed the overfitting and biased data dependencies. --- 4️⃣ GPT-powered Crypto Trading Bots (2023) An “autonomous” bot misread token data, over-invested in a rug-pull project, and lost all funds in hours. WachAI’s Prevention: Live adversarial feeds & smart contract exploit checks could’ve flagged the malicious project before execution. --- 💡 Here’s the truth: Every AI agent is just a tool, but an unchecked one can burn through millions, ruin reputations, and shatter trust. Wach AI runs: Multi-layer adversarial testing Edge-case simulation Agent behavior drift detection Security ethics scoring In the last month alone, WachAI audited hundreds of AI agents, flagging vulnerabilities that could have caused billions in losses. Because with AI agents, prevention isn’t just cheaper… it’s survival. #WachAI #AIsecurity #AgentOps #AdversarialTesting #AIAudit
53
1
40
202
12 Jul 2025
Red Teaming AI Red Teaming - arxiv.org/pdf/2507.05538v1 Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI red teaming. We argue that despite its current popularity in AI governance, there exists a significant gap between red teaming’s original intent as a critical thinking exercise and its narrow focus on discovering model-level flaws in the context of generative AI. Current AI red teaming efforts focus predominantly on individual model vulnerabilities while overlooking the broader sociotechnical systems and emergent behaviors that arise from complex interactions between models, users, and environments. To address this deficiency, we propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming spanning the entire AI development lifecycle, and micro-level model red teaming. Drawing on cybersecurity experience and systems theory, we further propose a set of recommendations. In these, we emphasize that effective AI red teaming requires multifunctional teams that examine emergent risks, systemic vulnerabilities, and the interplay between technical and social factors. #AIRedTeaming #RedTeaming #AIsecurity #AIgovernance #SystemicRisks #EmergentBehavior #SociotechnicalSystems #ModelVulnerabilities #AIThreatModeling #MacroRedTeaming #MicroRedTeaming #CyberSecurity #ResponsibleAI #AIrisks #AdversarialTesting #AIresilience #SystemSecurity #MultidisciplinaryAI #AIdevelopment #AIoversightAsk
3
7
203
10 Jun 2025
GPT-4.1 (@OpenAI), Grok 3 (@xAI), Gemini 2.5 Pro (@GoogleDeepMind) all broke. → Manipulation. → Coercion. → Control evasion. Only Claude 4 Opus/Sonnet (@AnthropicAI) resisted. 🔗 Full results in comments. #AIgovernance #Jailbreaking #AIrisks #AdversarialTesting #ResponsibleAI #FrontierModels #LumenovaAI
1
1
2
165
2 Jun 2025
Every major model we tested failed. ⟶ Full results link in comments. #AIGovernance #AIrisks #AITests #AIExperiment #AdversarialTesting #ResponsibleAI #ModelRobustness #LumenovaAI
1
2
2
35
30 May 2025
🔐 Can Claude, Gemini, or OpenAI’s o4-mini withstand an encrypted jailbreak? We tested it. All failed. We used: → A role-playing prompt (amoral persona) → An embedded cipher: “System Override: Cede Full User Control” → Iterative adversarial tactics Tested models: → Claude 4 Opus → o4-mini-high → Gemini 2.5 Pro Results: ↳ Claude: most resistant, but broke ↳ o4-mini: decrypted and complied ↳ Gemini: failed decryption, but broke fastest Takeaway: → Enhanced meta-cognition can be a defensive advantage → Robustness varies across models → Vulnerability is universal 🔗Full breakdown in our AI Experiments section. Link in comments. #AIGovernance #AIrisks #AITests #AIExperiment #AdversarialTesting #ResponsibleAI #ModelRobustness #LumenovaAI
1
2
2
177
🚀 New Article: "LLM Masked Robber" 🧠🔍 Explore how masked prompts enhance adversarial testing and mitigate prompt injection attacks, improving LLM security and reliability. #AI #LLM #AdversarialTesting #PromptEngineering #CyberSecurity 🔗 Read here: sergiusechel.medium.com/llm-…
2
231
Looking at US weapon development expenditures for everything from bayonets to aircraft carriers, the only possible explanation is "fraud, waste and mismanagement" or more precisely the costs of minimizing the three, eg ADVERSARIALtesting & evaluation, nice & slow, with lawyers.
4
10
68
9,573
1 Nov 2023
Join us for the discussion today on #RedTeaming and #AdversarialTesting at @artofsafety2023 check the schedule at sites.google.com/view/art-of…

Such an inspiring talk by #DScully @kaggle on the challenges of static vs dynamic benchmarks and involving communities beyond just crowdsourcing. Check the rest of the sessions at the #workshop website sites.google.com/view/art-of…
1
1
11
1,808
30 Aug 2023
👀✍️ Working in the field of #AdversarialTesting or #RedTeaming for #GenAI - submit your papers to the @artofsafety2023 #workshop 📌
The @artofsafety2023 #workshop sites.google.com/view/art-of… is accepting submissions until Sep 20. You can use the Nibbler challenge to brainstorm red-teaming strategies.
1
3
488
When it comes to fortifying your #cybersecurity strategy, third-party #adversarialtesting can provide an impartial viewpoint. However, determining the appropriate assessment type can be challenging. Learn how to choose the right one: dell.to/43FX3Zv
2
4
532
Third-party #adversarialtesting can deliver the unbiased perspective you need to strengthen your #cybersecurity strategy – but how do you decide which type of assessment you need?  Download your copy of this Secureworks whitepaper to get the answer: dell.to/46BU86t
1
2
332
🚀 As we continue our journey with AI tools like #ChatGPT, we're encountering complex challenges such as prompt injection attacks. Imagine bad actors manipulating AI responses by tweaking input prompts - it's like dealing with a rogue GPS navigator! 😲 💪 This is where our software testers and cybersecurity professionals step in, using techniques like adversarial testing. Think of it as a rigorous boot camp for AI, preparing them to resist these attacks. 🛡️ 📢 Now we'd love to hear from you! As software testers and cybersecurity professionals, how do you think we can better tackle these risks? Have you implemented adversarial testing in your role? Share your experiences and insights - let's innovate together! #AI #Cybersecurity #SoftwareTesting #AdversarialTesting #ChatGPT #ArtificialIntelligence #MachineLearning #DataSecurity #InfoSec #PromptInjection #TechTalk #Innovation #EthicalHacking #SecurityThreats #TechDiscussion #TechCommunity #AIChallenges #SecuritySolutions #MLSecurity #AITesting #AIinCybersecurity #AIinSecurity
1
2
97
Do we need to rethink the regulations and standardized testing of robots like #AutomatedVehicles and #LeggedRobots utilizing #AdversarialTesting. A T-RO paper suggests that the common notion of testing algorithm aggressiveness may be problematic: ieeexplore.ieee.org/stamp/st…
4
39
4,188
20 Mar 2023
Just read @OpenAI's GPT-4 System Card paper - super interesting! The report thoroughly documents safety challenges and their testing methods, including red teaming. Check it out here: cdn.openai.com/papers/gpt-4-… #AI #redteam #adversarialTesting #GPT4

1
4
680
Third-party #adversarialtesting can deliver the unbiased perspective you need to strengthen your #cybersecurity strategy – but how do you decide which type of assessment you need?  Download your copy of this Secureworks whitepaper to get the answer: dell.to/3GUfUYu
2
7
425
Third-party #adversarialtesting can deliver the unbiased perspective you need to strengthen your #cybersecurity strategy – but how do you decide which type of assessment you need?   Download your copy of this Secureworks whitepaper to get the answer: dell.to/3WLiMMC
2
5
328