Yesterday scientists published proof that AI has a fundamental cognitive weakness.
Not on a hard problem. Not on a complex benchmark.
On a test designed for undergraduate psychology students.
The paper was published June 10, 2026 in PNAS Nexus, one of the most rigorous scientific journals in the world. Three researchers from Texas Tech University and the City University of New York. One finding that reframes every confident claim about AI capability you have heard in the last two years.
Here is the test they used.
The Stroop task. Invented in 1935. Used in psychology labs for 90 years. You see a word, the word says RED but the ink is printed in blue. Your job is to name the ink color. Not read the word. Just name the color.
Your brain fights itself. The obvious answer, RED is wrong. The correct answer, blue requires suppressing the thing your mind wants to do automatically.
That suppression is called executive control. It is one of the most fundamental measures of cognitive function humans possess. It underlies everything important following complex instructions, maintaining a rule across a long task, catching when a later piece of information contradicts an earlier one, noticing when you are about to give the wrong answer because the obvious answer is wrong.
Researchers gave top AI models the classic attention test and found a major flaw. While the models could correctly name colors in short lists, their performance deteriorated sharply as the task became longer and more complex.
Short list. The AI is fine. Gets it right. Looks capable. Impressive even.
Longer list. Performance collapses.
The paper describes the finding as deficient executive control in transformer attention.
Not slower. Not less accurate. Deficient.
Here is what makes this alarming beyond the benchmark score.
Every enterprise AI deployment in the world right now is built on an assumption. The assumption is that if AI performs well on the demonstration the controlled test, the curated benchmark, the polished proof of concept, it will perform comparably on the real task.
The Stroop finding breaks that assumption at its foundation.
Short task. Looks fine.
Long task. Deficient.
The demo is always short. The real work is always long.
A legal AI reviewing a 300-page contract needs to maintain a rule flag this clause type,across hundreds of pages of text. A medical AI analyzing a complex patient history needs to hold context across dozens of symptoms, test results, and medications without letting the obvious pattern override the correct one. A financial AI auditing a large dataset needs to catch the exception buried in page 47 of a 60-page report.
These are Stroop tasks. They are tasks where the obvious answer is wrong and executive control is the only thing that catches it.
And the paper published yesterday says that executive control in transformer-based AI models is deficient under exactly the conditions where it matters most.
The AI does not know the task is getting harder. It does not experience cognitive load increasing. It does not know it is failing. It generates the wrong answer in the same confident tone it used for all the right ones.
There is no uncertainty signal. No "this is getting complicated." No slowdown that would prompt a human to pause and check. Just a confident answer that is increasingly wrong as the task grows.
AI passed the bar exam. Scored 90% on elite mathematical competitions. Achieved human-level performance on medical licensing exams. Every benchmark the AI industry uses to demonstrate capability is a short Stroop task.
The real work is the long one.
Yesterday researchers published proof that the long one is where AI breaks.
Source: Patel, Wang, Fan · Texas Tech University City University of New York · "Deficient Executive Control in Transformer Attention" · PNAS Nexus
( Link in the comments)