Elie Bursztein

Elie Bursztein

428 Photos and videos

Tweets

Elie Bursztein

@elie

May 15

[Weekend Read] ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks? 📄 Read here: arxiv.org/abs/2605.11086 In our latest joint research with academia and other frontier labs, we tested the ability of models to turn vulnerabilities into working exploits across different attack surfaces and mitigation conditions. Beyond the benchmark numbers, here is what this means for the industry: -🛡️ Blue Teams: Speeding up patch development and deployment is no longer optional. Integrating AI directly into CI/CD workflows should be your top priority. -🔬 Researchers: Current mitigation techniques reduce success rates, but they aren't a silver bullet. We need to step up our game—where do we focus next? -⚔️ Offensive Security: As models get better at finding bugs and writing exploits, we have to rethink disclosure timelines entirely. What does the future of bug bounties look like in this new era? I'd love to hear how your teams are preparing for this shift. Let me know

2,256

Elie Bursztein

Elie Bursztein

@elie

May 10

[Weekend Read] BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows: arxiv.org/abs/2604.11304 -> New benchmark that looks at real-world investment banking tasks. Models are not yet ready to replace investment bankers. As expected, models still don't perform very well on novel tasks, as they continue to have generalization issues — which might not be fixable with current LLM architectures/training processes. The task breakdown is interesting, as it shows different frontier models performing better across different categories, highlighting distinct strengths and weaknesses so the great convergence as yet to come #LLM #AI #Agent #finance #defi

476

Elie Bursztein

Elie Bursztein

@elie

May 1

How to secure agentic workflows? How to deal with AI agent identities? We explore those burning questions in the latest episode of the AI Security Podcast youtube.com/watch?v=G-lfiKJo… #agent #AI #LLM #cybersecurity

The Zero-Click AI Hack: How to Contain the Blast Radius of Autonomous...

Is an AI agent's identity a workload or an action? Ashish spoke to ...

youtube.com

363

Vaishnavi

Elie Bursztein retweeted

Vaishnavi

@_vmlops

Apr 13

GOOGLE BUILT A SECRET WEAPON FOR FILE DETECTION they ran it internally for years, gmail, drive, safe browsing, hundreds of billions of files every week then they open sourced it it's called magika and it exposes what files really are, not what they pretend to be rename malware to "resume.pdf"? magika sees through it disguise a script as an image? magika sees through it any trick attackers use with file extensions? magika sees through all of it ai trained on 100 million files. 200 content types. 99% accuracy. 5ms per file one command `pip install magika` the same tool protecting google's billion users is now protecting yours github.com/google/magika

GitHub - google/magika: Fast and accurate AI powered file content types detection

Fast and accurate AI powered file content types detection - google/magika

github.com

118

868

7,166

511,198

Elie Bursztein

Elie Bursztein

@elie

Apr 13

[Weekend Read] The “AI Vulnerability Storm”: Building a “Mythos-ready” Security Program labs.cloudsecurityalliance.o… Collective paper on how to get ready to withstand the deluge of vulnerabilities that next generation of models, including Mythos from Anthropic are going to unleash. #LLM #claude #AI #cybersecurity

433

Elie Bursztein

Elie Bursztein

@elie

Apr 5

[Weekend Read] TurboQuant: Redefining AI efficiency with extreme compression - research.google/blog/turboqu… This research got a lot of attention because TurboQuant help reduce LLM memory usage (6x) and improve generation speed (8x on a h100). A technical note: there seems some confusion floating around about how TurboQuant applies to LLMs: TurboQuant is NOT used to compress model weights, which is the usual quantization target, it is used to compress the model KV cache. This distinction matters because token generation is fundamentally memory-bandwidth bound; at larger context lengths the KV cache footprint start to eclipses model weights, creating a bottleneck that previous quantization methods couldn't address due to accuracy loss or dequantization latency.

410

Elie Bursztein

Elie Bursztein

@elie

Feb 8

[Weekend Read] CL-bench: A Benchmark for Context Learning arxiv.org/abs/2602.03587 Context learning—the ability of models to learn from data stored in their context via tools, skills, and previous interactions—has recently gained traction as a promising research direction. This paper presents a novel benchmark designed to evaluate if models are truly capable of utilizing this context effectively. The results are a reality check: recent frontier models barely reach a 15% to 23% success rate. Improving in-context learning is essential if we want agents that can reliably execute complex, many-step workflows. #research #LLM #AI #weekend

529

Elie Bursztein

Elie Bursztein

@elie

Feb 1

[Weekend Read] How Healthy is the Android Crypto-Ecosystem? We analyzed 1.5 trillion cryptographic samples from 600 million devices to find out - elie.net/publication/droidcc… The good news? Overall baseline encryption error rates are incredibly low across the board, showing the ecosystem is performing as intended👍 Additionally the massive scale of this study allowed us to uncover several hard-to-detect failure patterns—including weak entropy and timing side channels—that specifically impact few chipsets and device models. #cryptography #android #research

433

Elie Bursztein

Elie Bursztein

@elie

Jan 22

FastMCP v3 is out - jlowin.dev/blog/fastmcp-3-wh… Key changes include the support of skills, tools version, and robust authentication that allows to expose tools to specific users or sessions. #LLM #AI

What's New in FastMCP 3.0

A comprehensive guide to every major feature

jlowin.dev

516

Elie Bursztein

Elie Bursztein

@elie

Jan 20

[Weekend Read] Anamnesis: LLM Exploit Generation Evaluation - github.com/SeanHeelan/anamne… Deep dive by Sean Heelan evaluating frontier models' ability to write 0-day exploits (vulnerabilities not in training data) against modern mitigations like ASLR, CFI, and Seccomp sandboxing. Using a real QuickJS zero-day across 6 scenarios, GPT-5.2 solved all tasks while Claude Opus 4.5 solved 4/6—producing 40 distinct working exploits. #research #cybersecurity #AI #LLM

GitHub - SeanHeelan/anamnesis-release: Automatic Exploit Generation with LLMs

Automatic Exploit Generation with LLMs. Contribute to SeanHeelan/anamnesis-release development by creating an account on GitHub.

github.com

486

Elie Bursztein

Elie Bursztein

@elie

6 Dec 2025

[Weekend Read] LLMs Can Get "Brain Rot - llm-brain-rot.github.io LLMs fine‑tuned on junk data lead to lower performance on reasoning benchmarks and negative personality shifts. In AI, as always: garbage data in, garbage model out #AI #LLM #research

501

Elie Bursztein

Elie Bursztein

@elie

16 Nov 2025

[Weekend Read] Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models – arxiv.org/abs/2503.08275 The paper shows how to decompose complex tasks into recursive agents. Beyond the examples they provide, the approach feels very general and a strong foundation for meta-agents—as demonstrated by ROMA (github.com/sentient-agi/ROMA), which extends these ideas into a robust meta-agent framework. I actually recommend starting with ROMA, since the paper is somewhat abstract and can be harder to grok on first pass. #AI #LLM #AICommunity #artificial_intelligence

513

Elie Bursztein

Elie Bursztein

@elie

6 Nov 2025

I'm pleased to share that Magika 1.0, our AI-powered file type detection tool, is now officially released. Building on the incredible community adoption of over 1 million monthly downloads, this first stable version delivers key upgrades: • Expanded support to 200 file types • A completely new, high-performance engine rewritten in Rust • A native Rust command-line client for enhanced speed and security Learn more about what's new in our blogpost: opensource.googleblog.com/20… #Magika #OpenSource #AI #MachineLearning #Rust

Announcing Magika 1.0: now faster, smarter, and rebuilt in Rust

opensource.googleblog.com

457

Elie Bursztein

Elie Bursztein

@elie

19 Oct 2025

[Weekend Read] Don’t Look Up: There Are Sensitive Internal Links in the Clear on GEO Satellites satcom.sysnet.ucsd.edu/docs/… Remarkable work on satellite security that uncovered that 50% of Geosynchronous (GEO) satellite US links studied have encryption issues. Non-encrypted traffic include calls, SMS, utility infrastructure control systems messages, military asset tracking, and in-flight wifi. #cybersecurity #research #satellites

530

Elie Bursztein

Elie Bursztein

@elie

12 Oct 2025

[Weekend Read] Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models - arxiv.org/abs/2510.04618 Latest iteration on having a context that is dynamically modified by the agent as it iterates through the problem. Benchmarking shows that this type of approach is only useful in some cases, so mileage may vary. #AI #Research #agent

582

Elie Bursztein

Elie Bursztein

@elie

6 Oct 2025

[Weekend read] A Treatise on Bitcoin Seed Backup Device Design blog.lopp.net/a-treatise-on-… Best piece I read on how to have an indestructible recovery option. Considering doing this also for my key accounts including email. #research #cybersecurity #crypto #cryptocurrency #BTC

A Treatise on Bitcoin Seed Backup Device Design

Lessons learned from stress testing dozens of seed phrase backup devices.

blog.lopp.net

537

Elie Bursztein

Elie Bursztein

@elie

24 Sep 2025

Excited to share that the GenSec CTF we ran at DEF CON 33 with Airbus to let the community explore how human-AI collaboration can speed up cybersecurity was a success. Overall: • Nearly 500 participants completed initial challenges • 85% found it useful for learning AI security workflows • 23% were using AI for cybersecurity for the very first time More details: security.googleblog.com/2025… #Cybersecurity #AI #DEFCON

Accelerating adoption of AI for cybersecurity at DEF CON 33

blog.google

455

Elie Bursztein

Elie Bursztein

@elie

15 Sep 2025

[Weekend Read] On the Theoretical Limitations of Embedding-Based Retrieval - arxiv.org/abs/2508.21038v1 Shows the harsh limits of AI vector search (aka semantic search) and how older techniques such as BM25 likely scale better for many retrieval tasks. Yet another strong piece of evidence that hybrid search is needed for RAG solutions despite the hype around pure vector search solutions. Full research note: notes.elie.net/Papers review… #AI #embeddings #search #IR

514