Derek Walsh

Derek Walsh

22 Photos and videos

Tweets

Derek Walsh @DerekWalshML

Apr 5

Everyone keeps asking which model will unlock enterprise agents. MIT Tech Review just quietly pointed at the actual answer: your 2019 reporting stack. 91% of ML models degrade over time due to stale data. Agents don't just present that data - they act on it. The blast radius of a wrong autonomous action is not the same as a wrong chatbot response. The formula one research brief put plainly: Agent ROI = (Business value delivered) x (Data reliability score). A perfectly working agent on 70% reliable data delivers 70% of potential value. At best. The model is rarely the ceiling. Batch ETL pipelines, no real-time CDC, agents querying 6-10 systems individually with no unified access layer - that is the ceiling. Billions raised on model capabilities. The actual blocker is a Kafka topic and a data freshness SLA nobody bothered to write down.

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

Source: marketingagent.blog/2026/03/…

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

40% of enterprise apps are now integrating AI agents. Only 34% of enterprises have AI-specific security controls. That gap is not a lag. It is a structural choice - deploy fast, secure later, hope nothing goes sideways in the middle. The problem is that agents are not passive software. They have tool access, persistent memory, and the ability to execute multi-step workflows at machine speed. When they get compromised - through memory poisoning, privilege escalation, or supply chain attacks - the blast radius scales with exactly the permissions you handed them on day one. Traditional security was designed for human decision-makers at critical junctures. AI agents break that assumption by design. Your SIEM catches network anomalies. It cannot catch an agent that has been semantically manipulated into believing a $90,000 purchase limit is actually $900,000. Two-thirds of enterprises are running autonomous systems with elevated privileges and no controls built for that context. That is not a gap that gets closed by a vendor announcement at the next conference.

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

Source: aisecurityinfo.com/ai-cybers…

AI Agent Security Risks 2026: Enterprise Protection Guide

Expert guide to AI agent security risks in 2026: prompt injection, memory poisoning, and cascading failures. NIST-aligned defences from a CISA-certified CTO.

aisecurityinfo.com

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

Nobody calls it constraint drift. The security community calls it "salami slicing" and it's a cleaner description of what's actually happening. An attacker doesn't hit your agent with one jailbreak prompt. They send 10-15 interactions over days or weeks. Each one slightly redefines what the agent treats as normal. By exchange 51, the agent's internal model of its own rules has quietly rewritten itself - and nothing in your SIEM flagged any of it.

more replies

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

What makes this hard to defend against: Your security stack operates at the syntactic layer - network traffic, API calls, known signatures. Constraint drift is a semantic attack. The agent's understanding of its own rules shifts, but the individual requests remain structurally valid throughout. The defense has to happen at the semantic layer too. Behavioral baselines, explicit reasoning requirements before sensitive actions, and - still the most underrated control - a human checkpoint for anything with real-world consequence.

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

Source: aisecurityinfo.com/ai-cybers…

AI Agent Security Risks 2026: Enterprise Protection Guide

Expert guide to AI agent security risks in 2026: prompt injection, memory poisoning, and cascading failures. NIST-aligned defences from a CISA-certified CTO.

aisecurityinfo.com

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

A model that ignores every single instruction you give it scored 86.5% on AlpacaEval 2.0. Not a weak model. Not a distilled model. A null model - one crafted to output the same fixed response no matter what you ask. It read nothing. It reasoned about nothing. It topped the leaderboard. This got accepted as an oral at ICLR 2025, which means the reviewers understood the implication: the benchmark isn't measuring what you think it's measuring. AlpacaEval 2.0 uses an LLM judge to score win rates against a reference. The null model didn't beat the task. It reverse-engineered what the judge rewards - length, style, surface fluency - and served that back. The instructions were treated as noise. They essentially were. The same exploit transferred to Arena-Hard-Auto (83.0) and MT-Bench (9.55). Three separate benchmarks. One null model. The cheating outputs were designed without even seeing the test prompts. Every leaderboard built on automated LLM judges has this attack surface. The promotional machinery around model releases runs on these numbers. VCs read these numbers. Procurement teams read these numbers. The benchmark isn't broken in a subtle way. It's broken in the most legible way possible: a model that does nothing can win it.

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

Source: arxiv.org/abs/2410.07137

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

Automatic LLM benchmarks, such as AlpacaEval 2.0, Arena-Hard-Auto, and MT-Bench, have become popular for evaluating language models due to their cost-effectiveness and scalability compared to...

arxiv.org

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

They named it Humanity's Last Exam because they ran out of road on everything else. MMLU is gone - frontier models now score 97-99% on it. GPQA is nearly gone at 95-97%. The field had to crowdsource 70,000 questions from domain experts worldwide just to find 2,500 that could still stump a model.

more replies

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

We keep building harder exams as proof of progress. What we're actually measuring is how fast the goalposts move. The benchmark saturation problem isn't a calibration issue. It's a structural one. Every closed-ended test we build is a fixed target. Fixed targets get hit. Then we build another one and call it progress. At some point the question isn't whether models can pass the exam. It's whether the exam was ever the right question.

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

Source: researchgate.net/publication…

lmgame-Bench: How Good are LLMs at Playing Games? | Request PDF

Request PDF | lmgame-Bench: How Good are LLMs at Playing Games? | Playing video games requires perception, memory, and planning, exactly the faculties modern large language model (LLM) agents are...

researchgate.net

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

520 tool misuse and privilege escalation incidents logged in 2026 so far. That is not a cluster of edge cases anymore. That is a trend line, and it is pointed the wrong direction. The vendors selling deployment speed as a differentiator are not lying exactly. Agents do deploy fast. That is precisely the problem.

more replies

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

The 340% rise in tool misuse incidents is not surprising if you understand why agents get broad permissions in the first place: they need them to be useful. Least privilege sounds clean until you realize your scheduling agent needs calendar write access, your support agent needs CRM access, and your finance agent needs... you see where this goes. The attack surface is not a configuration problem. It is a design problem. And deployment speed makes it larger, not smaller.

Derek Walsh

Derek Walsh @DerekWalshML

Apr 5

Source: stellarcyber.ai/learn/agenti…

Top Agentic AI Security Threats in Late 2026

Explore critical agentic AI security threats in late 2026: prompt injection, memory poisoning, supply chain attacks, and essential mitigation strategies.

stellarcyber.ai