DAPLab

DAPLab

10 Photos and videos

Tweets

DAPLab

@DAP__Lab

May 23

The models causing failures cannot be the systems we rely on to prevent them. A Cursor-based coding agent deleted a production database and its backups. Amazon’s Kiro was reportedly involved in a 13-hour outage after deleting and recreating an environment. A zero-click vulnerability in ChatGPT Deep Research enabled Gmail data exfiltration from a malicious email. Better prompts and guardrails will not prevent disasters because, once agents can read and write state, failures become data loss, leakage, outages, and irreversible actions. A probabilistic model cannot be the only safety boundary around its own actions. No single component or checkpoint can reliably contain these risks because unsafe side effects emerge across interconnected systems, data flows, and external tools in ways difficulty to fully anticipate. Safety therefore has to be enforced by the surrounding data environment itself, which must constrain, isolate, and manage how agents interact with state. Agents need environments designed for agent workloads. At Columbia Data Agents Process Lab (DAPLab), we call these Agentic Data Environments: systems that actively prepare, expose, constrain, and version the state agents operate over. An Agentic Data Environment must: 🚀 Amplify agent capabilities: deliver relevant information in forms agents can use to complete real tasks. 🛡️ Bound downside risks: enforce constraints on what agents can read, combine, modify, and release, while supporting safe exploration over branched state. Building this requires new systems infrastructure: branch-native environments, deterministic data-flow controls, agent-oriented information management, and retrieval that works over real data lakes. Over the next couple weeks, we are publishing a series that lays out this agenda and the open research problems behind it. First overview post here: daplab.cs.columbia.edu/gener… #agentautomation #safety #agenticdataenironments

The Need for Agentic Data Environments | DAPLab

Why AI agents need more than better models—the case for Agentic Data Environments that bound risk and amplify capability.

daplab.cs.columbia.edu

307

DAPLab

DAPLab

@DAP__Lab

May 12

That's a wrap! 🎉 Last Friday, we hosted North East AI Agents Day in NYC, bringing together an amazing community of researchers working on AI agents across ML, Systems, and HCI. Huge thanks to all speakers, poster presenters, mentors, attendees, and sponsors for making the day such a success! ne-agents-day.github.io/ #neagentsday

290

DAPLab

DAPLab

@DAP__Lab

Apr 24

📢 Upcoming AI Entrepreneurship Series Talk Title: Empowering Future Gen-AI Enterprise and Research Through AI-Native Cloud: Together AI's Perspective Speaker: Leon Song Location: Davis Auditorium Date/Time: Thursday, April 30, 2026, 11:30 AM ET Bio: Leon Song is the Vice President of Research at Together AI, where he leads the R&D organization to develop large-scale, industry-leading inference system solutions. Prior to Together AI, he was a Senior Principal Research Manager at Microsoft, working on DeepSpeed and Brainwave, and served as Chief Scientist for the DeepSpeed4Science initiative. Earlier in his career, he was a tenured professor of computer science and also worked for the U.S. Department of Energy. He is an ACM Distinguished Speaker. Abstract: We are living in the era of GenAI, which has transformed not only the computing industry but also our daily lives. In this talk, Leon Song will discuss the development of the Together AI Native Cloud, designed to empower next-generation enterprise-scale GenAI through customized, end-to-end solutions across the entire AI lifecycle, powered by open-source models. He will highlight innovations in inference system research and their impact on real-world applications, and explore future trends in GenAI, including large-scale agentic systems, multi-silicon adoption, and the evolution of AI-native cloud infrastructure.

244

DAPLab

DAPLab

@DAP__Lab

Apr 8

📢 Check out our next AI Entrepreneurship Series Talk! Title: A Physicist’s Perspective on Building a High-Tech Hardware Startup Speaker: Alexander Gaeta Location: Davis Auditorium Date/Time: Thursday, April 16, 2026, 11:30 AM ET Bio: Alexander Gaeta joined Columbia University in 2015 as the David M. Rickey Professor of Applied Physics. Prior to that, he was the Samuel B. Eckert Professor of Engineering at Cornell University. He has published more than 300 journal articles in the field of quantum and nonlinear photonics. He is also a co-founder of Xscape Photonics, where he served as CEO from 2021 to 2023, and currently serves as President and on the Board of Directors. His work focuses on advancing optical interconnect technologies for next-generation AI and high-performance computing systems. Abstract: In this talk, Prof. Gaeta will share his journey of founding and building a startup to commercialize high-bandwidth optical interconnects for AI and high-performance computing. He will discuss the challenges of becoming a hardware supplier for data centers and hyperscalers, and offer insights into translating cutting-edge photonics research into scalable, real-world systems.

234

Billy Xuanming Zhang

DAPLab retweeted

Billy Xuanming Zhang

@XuanmingZhang07

Apr 6

Accepted at #ACL2026 Findings! Check out our paper on how to make agents better reasoners given a budget. See u in San Diego 🏖️!

Billy Xuanming Zhang

@XuanmingZhang07

Jan 27

LLMs can “think longer” and get better answers… but what if you can’t afford long reasoning? In our new paper, we study how LLMs reason under fixed computation budgets, where producing useful partial solutions quickly matters more than exhaustive reasoning. 🧵(1/n) 🔗: arxiv.org/pdf/2601.11038

957

DAPLab

DAPLab

@DAP__Lab

Apr 1

Calling all academic researchers working on AI Agents!! 👇 We’ve extended the deadline to April 6 for North East AI Agents Day — a one-day workshop in NYC bringing together _academic_ researchers across ML, Systems, and HCI pushing the frontier of agentic AI. If you're building, studying, or questioning agents — this is your crowd. 📍 May 8 📍 NYC (Jane Street offices) 📝 Submit a short extended abstract by April 6: ne-agents-day.github.io/#sub… More details: ne-agents-day.github.io/ --- Come meet the people shaping the future of agents. #AIAgents #AgenticAI #MachineLearning #SystemsResearch #HCI #AIResearch #AcademicTwitter #NYCTech

1,164

DAPLab

DAPLab

@DAP__Lab

Mar 26

[📢 Upcoming AI Entrepreneurship Series Talk] Title: Running AI is Harder Than Training It: The Engineering Behind Inference Speaker: Sidharth Shanker Location: Davis Auditorium Date/Time: Thursday, April 2, 2026, 11:30 AM ET Bio: Sidharth Shanker leads the Core Product Engineering team at Baseten, where he focuses on building a robust and scalable platform for deploying and serving machine learning models in production. With over a decade of experience in software engineering, he has worked across a range of industries, including e-commerce, genomics, and social media, developing systems that power real-world applications at scale. At Baseten, Sidharth is particularly interested in the challenges of inference infrastructure—ensuring models are served securely, reliably, and efficiently to end users. His work sits at the intersection of machine learning systems and developer experience, with an emphasis on making advanced AI capabilities accessible in production environments. Abstract: In this talk, Sidharth will explore why deploying AI systems in real-world applications presents challenges distinct from model training. He will discuss the engineering complexities behind inference, including serving models securely, reliably, and at scale, and provide insight into the hidden systems work behind a single request to a large language model.

196

DAPLab

DAPLab

@DAP__Lab

Mar 16

Submissions are now open on OpenReview for North East AI Agents Day! Deadline is April 1st! openreview.net/group?id=NE_A…

DAPLab

@DAP__Lab

Feb 11

🤖 Calling all academic researchers in Agents! We are excited to announce North East AI Agents Day, a one-day workshop bringing together communities in ML, Systems, and HCI! 📅 May 8th 📍 New York 💡 Submit your extended abstract (DDL: Apr 1st)! More: ne-agents-day.github.io

2,102

DAPLab

DAPLab

@DAP__Lab

Mar 11

📢[New AI Entrepreneurship Series Talks] Title: AI Attacks Speaker: Dr. Neil Daswani Location: Davis Auditorium Date/Time: Thursday, March 19, 2026, 11:30 AM ET Bio: Dr. Neil Daswani is a CISO-in-Residence at Firebolt Ventures and Co-Academic Director of Stanford’s Advanced Cybersecurity Program. After completing his PhD at Stanford University and leading security initiatives at Google, he co-founded Dasient, a cybersecurity company funded by Google Ventures and later acquired by Twitter/X. After his time at Twitter, he served as CISO of several public companies including LifeLock, Symantec’s Consumer Business Unit, and QuantumScape. Today, he advises multiple venture capital funds and focuses on both securing artificial intelligence and applying AI to cybersecurity. Dr. Daswani has co-authored two books, Big Breaches: Cybersecurity Lessons for Everyone and Foundations of Security: What Every Programmer Needs to Know. He holds over a dozen patents, has published numerous technical articles, and earned his PhD and MS in Computer Science from Stanford and his BS in Computer Science with honors and distinction from Columbia University. Abstract: In this talk, Dr. Daswani will discuss the emerging era of non-human adversaries, where AI does not merely assist hackers but autonomously executes the majority of attack workflows. He will examine key developments in AI-driven cyber threats, from AI-orchestrated espionage campaigns to multimillion-dollar deepfake fraud incidents, and discuss what these developments mean for the future of cybersecurity and artificial intelligence.

246

DAPLab

DAPLab

@DAP__Lab

Feb 24

📢[Upcoming AI Entrepreneurship Series Talks] Guest Name: Ivan Burazin Title of the speech: Scaling RL Rollouts: Agent-Native Infrastructure with Daytona Bio of the guests: Ivan Burazin is the co-founder and CEO of Daytona, one of the fastest-growing infrastructure companies of its generation. Daytona is building agent-native cloud infrastructure that enables AI agents to securely run, fork, and manage stateful runtime environments at scale. Backed by $31M, including a $24M Series A led by FirstMark Capital, Daytona powers millions of sandboxes per day for startups and Fortune 500 companies building autonomous AI systems. Previously, Ivan co-founded Codeanywhere, one of the first cloud IDEs (2009), and created Shift, Europe’s leading developer conference, acquired by Infobip in 2021. He later joined Infobip’s executive board as Chief Developer Experience Officer. Abstract of the talk: In this talk, we’ll outline why a new class of agent-native infrastructure is emerging, what problems it is designed to solve, and the core use cases driving it, from autonomous coding agents to large-scale evaluation and training workloads. Daytona is an agent-native control plane designed to orchestrate isolated, stateful sandbox environments at scale. We’ll break down the infrastructure challenges behind isolation, state management, and massive parallelism, and why traditional VM and container stacks fall short. As a concrete example, we’ll walk through scaling RL rollouts, showing how tens of thousands of environments can be provisioned and orchestrated in minutes as part of a high-throughput RL pipeline. Location Davis Auditorium, 530 W 120th St, New York, NY 10027, USA Talk time Date: March 5, 2026 Time: 11:30 - 1:00 PM

786

DAPLab

DAPLab

@DAP__Lab

Feb 13

📢[AI Entrepreneurship Series Talks] Title A Talk about STLabs (TBD) Speaker Amit Agarwal Location Davis Auditorium Date/Time Thursday, February 19, 2026, 11:40 AM ET Bio Amit Agarwal is the founder of Standard Template Labs (STLabs), where he is building a new platform in enterprise software. Before STLabs, Amit spent a year as a General Partner at ICONIQ Capital, investing in and advising technology companies. He also serves on the Board of Directors at Datadog, where he previously spent 13 years as an executive, joining as employee number eight. At Datadog, Amit helped build the company from its earliest days through its growth into a public company. He built and led teams across product, marketing, sales, corporate development, and operations — having started in the early years doing hands-on product management, go-to-market, and customer-facing work.

2,112

DAPLab

DAPLab

@DAP__Lab

Feb 11

17,075

DAPLab

DAPLab

@DAP__Lab

Feb 2

📡Columbia Engineering AI Entrepreneurship Series Title: A Talk about Parallel.AI (TBD) Speaker: Parag Agrawal Location: Davis Auditorium Date/Time: Thursday, February 5, 2026, 11:00 AM ET Bio: Parag Agrawal is the founder of Parallel Web Systems, a company unlocking the web for AI agents. Previously, he spent 11 years at Twitter, where he joined as an engineer before serving as CTO, and then CEO. Parag has a PhD from Stanford University in Computer Science and a Bachelor’s degree in Computer Science and Engineering from IIT, Bombay.

3,005

Billy Xuanming Zhang

DAPLab retweeted

Billy Xuanming Zhang

@XuanmingZhang07

Jan 27

6,105

Jenny Ma

DAPLab retweeted

Jenny Ma @jenny_ma_

Jan 23

i've isolated four recurring agent behaviors behind most vibe-coding failures: 1. skipping steps 2. ignoring conventions and style 3. making wrong assumptions 4. local optimization check out my new blog post for more details! daplab.cs.columbia.edu/gener…

Vibe Coding Needs Policy Enforcement | DAPLab

Why vibe-coding agents need policy enforcement to behave reliably, with practical enforcement patterns.

daplab.cs.columbia.edu

394

DAPLab

DAPLab

@DAP__Lab

Jan 23

[New Blog on Vibe Coding!] Vibe Coding needs Policy Enforcement Vibe-coding is both amazing and infuriating. If I want to spin up a brand-new app from scratch? Holy shit, it’s magic. It’s fast, it’s fluid, it feels like collaborating with an engineer who’s always in a good mood. But the moment I ask it to do something more risky, tricky, or unspecified—where my particular taste and coding style matter (like adding a decently complex feature to a codebase I care about), I’m suddenly fighting with it. Vibe-coding devolves into vibe-debugging, vibe-backtracking, vibe-arguing. I isolated four recurring agent behaviors behind most vibe-coding failures: 1) Skipping Steps - The agent confidently says it will do something (“I’ll build the backend and the frontend!”) and then only builds half, forgetting entire chunks of functionality. 2) Ignoring Conventions and Style - Even with clear patterns in my codebase and explicit rules (ie: keep my imports at the top of the file), AI still goes rogue. It adds docstrings when I never use them, rearranges file structures, overengineers components. 3) Making Wrong Assumptions - Because it’s so eager to help, the agent commits to the first interpretation it forms. It builds whole flows and architectures around assumptions I would’ve corrected if it had asked one more question. 4) Local Optimization (Hacking Instead of Engineering) - Agents love the quickest apparent fix. For example, when writing code for a Rubik’s cube app, it might try to hardcode cube states instead of writing a real solver.Check out the full blog post to see how existing solutions can still fail to fix these issues, and how we should approach this instead (hint – vibe coding needs policy enforcement)! See more details here: daplab.cs.columbia.edu/gener…

2,690

DAPLab

DAPLab

@DAP__Lab

Jan 18

9 Critical Failure Patterns of Coding Agents Vibe coding feels like magic. Until you try to ship a real feature. We spent last semester using and evaluating the top AI agents (Claude, Cline, Cursor, Replit) by building 15 real-world applications. We collected hundreds of failures and found 9 recurring patterns that repeat across every single tool. The reality is that agents often prioritize runnable code over correct code. They suppress errors to keep the app "alive," even if the underlying logic is broken. We documented the critical failure patterns, including: 1) Business Logic Mismatch: Why agents struggle with basic rules (like applying a discount to a shopping cart total). 2) Presentation & UI Grounding Mismatch: Why layouts break because agents can't see. 3) Exception & Error Handling: How agents suppress errors just to make code run. Read more about it here! daplab.cs.columbia.edu/gener…

9 Critical Failure Patterns of Coding Agents | DAPLab

Nine recurring failure patterns we observed when using coding agents, and why they matter.

daplab.cs.columbia.edu

373

DAPLab

DAPLab

@DAP__Lab

Jan 18

Why Vibe Coding Fails and How to Fix It Everyone is talking about how AI agents will 10x developer productivity. But anyone who has actually built a real app with Cursor, Cline, or Replit knows the reality: The first draft looks amazing. But as soon as you try to iterate? The application starts breaking. This is the struggle of Vibe Debugging. It starts out looking great. But then you encounter silent errors and buggy logic. You realize the AI doesn't actually understand what you are building, and you are stuck trying to fix a black box. At Columbia DAPLab, we are investigating exactly why this happens. We have written a blog series on the reality of Vibe Debugging and how to close the gap between demo and production. Read our first part here! daplab.cs.columbia.edu/gener…

Why Vibe Coding Fails and How to Fix It | DAPLab

Why vibe coding often stalls during iteration: common failure patterns and fixes to make coding agents more reliable.

daplab.cs.columbia.edu

678

DAPLab

DAPLab

@DAP__Lab

Jan 15

🎉 Excited to share that our project has been newly funded by Microsoft Research! Towards Robust Generalization in Agentic AI via Environment Scaling explores how agentic systems can generalize more reliably by systematically scaling and diversifying their environments. Grateful for the support! Looking forward to pushing this direction forward! 🚀 🔗 microsoft.com/en-us/research…

Towards Robust Generalization in Agentic AI via Environment Scaling - Microsoft Research

This project addresses the challenge of enabling AI agents to operate effectively in complex, realistic environments such as web navigation, computer use, and mobile interfaces. While current models...

microsoft.com

4,659