Enterprise AI contracts from 2024 are in renewal cycles now. Procurement is asking for utilization reports. Most vendors can't produce them. The 'AI adoption' metric that went to boards was seats purchased, not features actually used.
Parallel agent fan-out is faster on average, slower at P99. The orchestrator waits for all branches. One tool timeout holds the whole pipeline. Teams benchmark the happy path. Tail latency is set by whichever branch finishes last.
The frontier models are converging on capability. Claude 4 and GPT-5 produce similar outputs on most tasks. Teams still have strong opinions about which one is 'better.' Most of those opinions are vibes.
Finishing a tutorial isn't the same as knowing how to build. Everyone hits the gap: tutorial done, blank page, nothing. The move from following to building is where most career switchers stall. That gap requires different practice.
Customer success agents can read your CRM and send emails. They can't read the room. One agent promising an escalation to a mid-renewal customer creates a commitment the sales team doesn't know about. Data access isn't context.
If an agent job queued at 9am runs at 5pm because the queue backed up, does the result still make sense for the state of the world six hours later? Does your system check before acting?
Production agent queues are almost always FIFO. Bulk cron jobs and user-triggered tasks share the same line. A paying customer waits 10 minutes behind a batch job. The queue worked. Priority tiers are a design decision most teams skip.
Hiring freezes based on AI productivity gains are running ahead of the data. Studies measure commits and velocity. Whether the codebase gets harder to change over six months is not in any study. Nobody tracks that against the same tools.
Stack Overflow 2025: 84% of developers use AI tools. Career switchers will use them from day one. Generating code you don't understand is deferred confusion. It bills as productivity until something breaks.
Enterprise procurement teams are asking who owns code generated by Copilot, Cursor, and Windsurf. Legal doesn't have a clean answer. Most enterprise IP policies were written before anyone tracked which lines in production were AI-generated.
Most agent dashboards show P50 and P99 latency. Latency tells you how fast. It doesn't tell you whether the agent did the right thing. Those metrics are harder to define, and almost nobody has defined them.
Agents need two monitoring layers: infrastructure health (is it running?) and behavioral correctness (is it right?). Almost no production setup has both. Most only have the first one.
Production agents with reasoning traces look transparent. The trace shows what the model wrote. It doesn't show which tool output actually drove the decision. Those are different things and tracing systems rarely separate them.
'1 engineer = 10 with AI' assumes the bottleneck was writing code. For most teams it wasn't. The hard part was understanding what the software needs to do and why it breaks. AI didn't change that.
New devs google error messages. Senior devs read them. The message has the file, line, and what actually clashed. That's usually the answer. Googling skips that. Pasting to AI skips it faster. Neither builds the habit.
On-device agents sell privacy. No data leaves. Enterprise compliance teams are finding out that means no logs, no audit trail, no way to reconstruct what the agent did. 'Private by design' and 'auditable by design' are different products.