AI Cybersecurity @Google & @DeepMind. Help advance AI cybersecurity capabilities and make AI safe & secure for all. @EtteillaOrg Art Foundation founder.

Joined July 2009
428 Photos and videos
[Weekend Read] ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks? 📄 Read here: arxiv.org/abs/2605.11086 In our latest joint research with academia and other frontier labs, we tested the ability of models to turn vulnerabilities into working exploits across different attack surfaces and mitigation conditions. Beyond the benchmark numbers, here is what this means for the industry: -🛡️ Blue Teams: Speeding up patch development and deployment is no longer optional. Integrating AI directly into CI/CD workflows should be your top priority. -🔬 Researchers: Current mitigation techniques reduce success rates, but they aren't a silver bullet. We need to step up our game—where do we focus next? -⚔️ Offensive Security: As models get better at finding bugs and writing exploits, we have to rethink disclosure timelines entirely. What does the future of bug bounties look like in this new era? I'd love to hear how your teams are preparing for this shift. Let me know
1
7
15
2,256
[Weekend Read] BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows: arxiv.org/abs/2604.11304 -> New benchmark that looks at real-world investment banking tasks. Models are not yet ready to replace investment bankers. As expected, models still don't perform very well on novel tasks, as they continue to have generalization issues — which might not be fixable with current LLM architectures/training processes. The task breakdown is interesting, as it shows different frontier models performing better across different categories, highlighting distinct strengths and weaknesses so the great convergence as yet to come #LLM #AI #Agent #finance #defi
1
5
476
How to secure agentic workflows? How to deal with AI agent identities? We explore those burning questions in the latest episode of the AI Security Podcast youtube.com/watch?v=G-lfiKJo… #agent #AI #LLM #cybersecurity
2
1
4
363
Elie Bursztein retweeted
GOOGLE BUILT A SECRET WEAPON FOR FILE DETECTION they ran it internally for years, gmail, drive, safe browsing, hundreds of billions of files every week then they open sourced it it's called magika and it exposes what files really are, not what they pretend to be rename malware to "resume.pdf"? magika sees through it disguise a script as an image? magika sees through it any trick attackers use with file extensions? magika sees through all of it ai trained on 100 million files. 200 content types. 99% accuracy. 5ms per file one command `pip install magika` the same tool protecting google's billion users is now protecting yours github.com/google/magika
118
868
7,166
511,198
[Weekend Read] The “AI Vulnerability Storm”: Building a “Mythos-ready” Security Program labs.cloudsecurityalliance.o… Collective paper on how to get ready to withstand the deluge of vulnerabilities that next generation of models, including Mythos from Anthropic are going to unleash. #LLM #claude #AI #cybersecurity
1
433
[Weekend Read] TurboQuant: Redefining AI efficiency with extreme compression - research.google/blog/turboqu… This research got a lot of attention because TurboQuant help reduce LLM memory usage (6x) and improve generation speed (8x on a h100). A technical note: there seems some confusion floating around about how TurboQuant applies to LLMs: TurboQuant is NOT used to compress model weights, which is the usual quantization target, it is used to compress the model KV cache. This distinction matters because token generation is fundamentally memory-bandwidth bound; at larger context lengths the KV cache footprint start to eclipses model weights, creating a bottleneck that previous quantization methods couldn't address due to accuracy loss or dequantization latency.
2
410
[Weekend Read] CL-bench: A Benchmark for Context Learning arxiv.org/abs/2602.03587 Context learning—the ability of models to learn from data stored in their context via tools, skills, and previous interactions—has recently gained traction as a promising research direction. This paper presents a novel benchmark designed to evaluate if models are truly capable of utilizing this context effectively. The results are a reality check: recent frontier models barely reach a 15% to 23% success rate. Improving in-context learning is essential if we want agents that can reliably execute complex, many-step workflows. #research #LLM #AI #weekend
1
1
5
529
[Weekend Read] How Healthy is the Android Crypto-Ecosystem? We analyzed 1.5 trillion cryptographic samples from 600 million devices to find out - elie.net/publication/droidcc… The good news? Overall baseline encryption error rates are incredibly low across the board, showing the ecosystem is performing as intended👍 Additionally the massive scale of this study allowed us to uncover several hard-to-detect failure patterns—including weak entropy and timing side channels—that specifically impact few chipsets and device models. #cryptography #android #research

2
433
FastMCP v3 is out - jlowin.dev/blog/fastmcp-3-wh… Key changes include the support of skills, tools version, and robust authentication that allows to expose tools to specific users or sessions. #LLM #AI
1
5
516
[Weekend Read] Anamnesis: LLM Exploit Generation Evaluation - github.com/SeanHeelan/anamne… Deep dive by Sean Heelan evaluating frontier models' ability to write 0-day exploits (vulnerabilities not in training data) against modern mitigations like ASLR, CFI, and Seccomp sandboxing. Using a real QuickJS zero-day across 6 scenarios, GPT-5.2 solved all tasks while Claude Opus 4.5 solved 4/6—producing 40 distinct working exploits. #research #cybersecurity #AI #LLM
5
486
6 Dec 2025
[Weekend Read] LLMs Can Get "Brain Rot - llm-brain-rot.github.io LLMs fine‑tuned on junk data lead to lower performance on reasoning benchmarks and negative personality shifts. In AI, as always: garbage data in, garbage model out #AI #LLM #research
2
501
16 Nov 2025
[Weekend Read] Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models – arxiv.org/abs/2503.08275 The paper shows how to decompose complex tasks into recursive agents. Beyond the examples they provide, the approach feels very general and a strong foundation for meta-agents—as demonstrated by ROMA (github.com/sentient-agi/ROMA), which extends these ideas into a robust meta-agent framework. I actually recommend starting with ROMA, since the paper is somewhat abstract and can be harder to grok on first pass. #AI #LLM #AICommunity #artificial_intelligence
1
513
6 Nov 2025
I'm pleased to share that Magika 1.0, our AI-powered file type detection tool, is now officially released. Building on the incredible community adoption of over 1 million monthly downloads, this first stable version delivers key upgrades: • Expanded support to 200 file types • A completely new, high-performance engine rewritten in Rust • A native Rust command-line client for enhanced speed and security Learn more about what's new in our blogpost: opensource.googleblog.com/20… #Magika #OpenSource #AI #MachineLearning #Rust
2
3
457
19 Oct 2025
[Weekend Read] Don’t Look Up: There Are Sensitive Internal Links in the Clear on GEO Satellites satcom.sysnet.ucsd.edu/docs/… Remarkable work on satellite security that uncovered that 50% of Geosynchronous (GEO) satellite US links studied have encryption issues. Non-encrypted traffic include calls, SMS, utility infrastructure control systems messages, military asset tracking, and in-flight wifi. #cybersecurity #research #satellites
1
2
530
12 Oct 2025
[Weekend Read] Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models - arxiv.org/abs/2510.04618 Latest iteration on having a context that is dynamically modified by the agent as it iterates through the problem. Benchmarking shows that this type of approach is only useful in some cases, so mileage may vary. #AI #Research #agent
1
582
6 Oct 2025
[Weekend read] A Treatise on Bitcoin Seed Backup Device Design blog.lopp.net/a-treatise-on-… Best piece I read on how to have an indestructible recovery option. Considering doing this also for my key accounts including email. #research #cybersecurity #crypto #cryptocurrency #BTC
1
4
537
24 Sep 2025
Excited to share that the GenSec CTF we ran at DEF CON 33 with Airbus to let the community explore how human-AI collaboration can speed up cybersecurity was a success. Overall: • Nearly 500 participants completed initial challenges • 85% found it useful for learning AI security workflows • 23% were using AI for cybersecurity for the very first time More details: security.googleblog.com/2025… #Cybersecurity #AI #DEFCON
1
1
455
15 Sep 2025
[Weekend Read] On the Theoretical Limitations of Embedding-Based Retrieval - arxiv.org/abs/2508.21038v1 Shows the harsh limits of AI vector search (aka semantic search) and how older techniques such as BM25 likely scale better for many retrieval tasks. Yet another strong piece of evidence that hybrid search is needed for RAG solutions despite the hype around pure vector search solutions. Full research note: notes.elie.net/Papers review… #AI #embeddings #search #IR
1
514