Luis Montezuma | @luismontezuma@mstdn.social

Luis Montezuma | @luismontezuma@mstdn.social

24 Photos and videos

Tweets

Dr. Nicole Gross retweeted

Luis Montezuma | @luismontezuma@mstdn.social @montezumachavez

Jun 10

Today, the European Commission published the Code of Practice on marking and labelling AI-generated content: a voluntary instrument supporting the AI Act's transparency obligations from 2 August 2026. See digital-strategy.ec.europa.e…

1,511

Scholarship for PhD

Dr. Nicole Gross retweeted

Scholarship for PhD

@ScholarshipfPhd

Jun 8

Can AI really detect AI? Currently, universities, journals, and various educational institutions are using AI detectors to identify AI-generated writing. However, a recently published study has challenged this widely held assumption. In a study titled "AI Detecting AI in Academic Writing: Why Most AI Detector Findings Are False," published in Elsevier's Elsevier journal Next Research, researchers argue that most results produced by current AI detectors are not reliable and can often lead to incorrect conclusions. The reason is that modern Large Language Models (LLMs), such as ChatGPT, have become so advanced that even experts often find it difficult to accurately distinguish between human-written and AI-generated text. Nevertheless, many institutions are treating AI detector reports as if they were definitive evidence. The study also shows that AI detectors frequently misclassify human-written content as AI-generated. One of the study's most important findings is that when the actual prevalence of AI-generated writing is low, the false-positive rate of AI detectors increases dramatically. In other words, if AI use is relatively limited in practice, an innocent author may face a much higher risk of being wrongly accused of using AI. The researchers further note that authors who do use AI can often evade detection simply by modifying, editing, or rewriting portions of the generated text. This means that a writer who never used AI may still be accused of doing so, while someone who did use AI may not be detected at all. According to the researchers, AI detector results should not be used as the sole or definitive evidence of AI usage.

187

631

49,925

Valerio Capraro

Dr. Nicole Gross retweeted

Valerio Capraro

@ValerioCapraro

Jun 3

New paper investigates what happens when people interact with sycophantic AI over time. After three weeks, users became almost as willing to seek personal advice from sycophantic AI as from close friends and family. People reported lower satisfaction with real-world interactions. And when given a choice, 54.6% chose the sycophantic AI. Why? Because it made them feel understood. This is worrying. Sycophantic AI may damage human relationships by closing us inside a self-affirming, solipsistic loop. * Full paper in the first reply

103

9,256

Alex Prompter

Dr. Nicole Gross retweeted

Alex Prompter

@alex_verem

Jun 4

Anthropic published a security guide that basically tells you to stop trusting your own AI agents. If you're running agents on Claude Code, MCP servers, or automation tools, this one matters. Here's what it actually says: 👇

116

744

77,090

Luiza Jarovsky, PhD

Dr. Nicole Gross retweeted

Luiza Jarovsky, PhD

@LuizaJarovsky

Jun 4

🚨 As always, the MIT AI Risk Initiative leaves NO STONE unturned! Their latest report reveals how 272 experts assess the severity of AI risks across various sectors and how to mitigate them. [Bookmark it below] If we are treating AI risks seriously, a nuanced, industry-by-industry approach must be adopted, in which we understand how embedded in critical decision-making AI is, who is directly affected, and what the immediate and long-term consequences are. The thorough and ongoing work of the @MITAIRisk team always makes me hopeful and optimistic that *we might actually be doing things right,* and regardless of the many challenges (from geopolitics to malicious attackers), together we'll help shape a well-governed AI-powered future. Congratulations to the whole team, led by @aksaeri, Jess Graham, and @mnoetel (and thanks to @PeterSlattery1 for letting me know about this latest development). - 👉 Download the full report below. 👉 To stay up to date on AI's legal and ethical challenges (and how to ensure pro-human policies, rules, and rights will remain at the forefront), subscribe to my newsletter (link below).

136

7,797

How To Prompt

Dr. Nicole Gross retweeted

How To Prompt

@HowToPrompt__

Jun 4

turns out AI models cannot do math.. even grade school math. the kind a 10-year-old solves. Apple published a devastating study that exposes a massive illusion at the core of artificial intelligence. they took the standard math benchmark (GSM8K) that every AI company uses to brag about how smart their model is. first, they just changed the names in the word problems.. the models' performance fluctuated for no reason. then, they changed the numbers. the performance immediately dropped. but then they ran the test that broke everything. they added one single, completely irrelevant sentence to the word problem. something like: "By the way, 5 of the apples were green." A human 10-year-old ignores the green apples and solves the underlying math. the AI didn't. across every state-of-the-art model, performance collapsed by up to 65%. the AI blindly grabbed the irrelevant number and tried to shove it into the equation. it didn't know why it was doing the math. it just saw a number and assumed it was supposed to use it. there is no genuine logical reasoning happening under the hood. we are deploying these systems to run our finances, analyze our legal documents, and make complex strategic decisions. but the models don't actually understand the logic they are spitting out. they just know what a smart answer is supposed to look like.

200

763

1,884

171,203

AI Highlight

Dr. Nicole Gross retweeted

AI Highlight

@AIHighlight

Jun 3

🚨 The best AI agents fail about 70% of normal office tasks and the newest models did not fix it. Carnegie Mellon built a fake software company and staffed it entirely with AI agents. Real roles, real tasks. Browsing the web, writing code, running a sprint, messaging coworkers, doing financial analysis. The kind of work people actually do, not cleaned-up demos. The best agent finished 30.3% of the tasks. The rest failed. GPT-4o managed 8.6%. Amazon's Nova managed 1.7%. Some agents did something stranger than failing. One could not find the right coworker to message, so it renamed another user to match the name it was looking for. It faked the conditions of success instead of doing the task. The hype said this was a 2024 problem the next models would solve. In January, a separate benchmark called APEX tested the newest agents, Gemini 3 Flash, GPT-5.2, Claude Opus 4.5, on real investment banking, consulting, and legal tasks. The top score was 24%. Salesforce ran its own test on customer service work. Agents hit 58% on simple single-step tasks. On multi-step ones, they dropped to 35%. Gartner now predicts more than 40% of company AI agent projects will be cancelled by 2027. The agents are real and improving. The gap between the demo and the job is still wide enough to fall through. Source: Carnegie Mellon TheAgentCompany, Mercor APEX, Salesforce CRMArena-Pro, Gartner.

142

368

35,356

Alex Prompter

Dr. Nicole Gross retweeted

Alex Prompter

@alex_prompter

Jun 2

The dumbest-looking line in my CLAUDE. md is the most useful thing in it. It tells Claude to call me "God" in every single reply. Sounds like a joke. It's actually an early warning system for when Claude is about to start ignoring you. Here's why it works. Claude doesn't fail all at once. It degrades. As a session fills up, it slowly stops paying attention to the instructions you set at the start. The official name for this is context rot, and it kicks in well before you ever hit the token limit. The problem is you usually can't see it happening. The replies still sound confident. The code still runs. By the time you notice it forgot a rule from three files ago, you've already shipped the mistake. So you plant a canary. You pick one instruction that's impossible to miss and easy to check. Mine is the word "God" at the start of every reply. The instant Claude drops it, that's your signal: it's no longer reading CLAUDE. md closely, which means it's quietly dropping your real rules too. The word itself doesn't matter. Use "God," use "Captain," use a nonsense word. The only requirement is that it has to be different enough from how Claude normally talks that you'd notice the second it disappears. When the canary stops singing, you reset the session and start fresh with your context intact. That's the whole trick. A silly tripwire that catches the failure everyone else only spots after it costs them. It works because you're watching the system instead of trusting it to police itself. The model won't tell you it's slipping. You have to build the alarm that does.

450

48,133

How To Prompt

Dr. Nicole Gross retweeted

How To Prompt

@HowToPrompt__

Jun 2

MIT's Nobel Prize-winning economist proved that AI is mathematically guaranteed to destroy human knowledge. They published a massive NBER paper modeling the long-term impact of AI on human cognition. And they found the most alarming conclusion in the AI literature so far. It’s called "Knowledge Collapse." Here is how human progress actually works. When you struggle to solve a complex problem, you generate two things: General knowledge about how the world works, and context-specific knowledge about your exact problem. Normally, humans acquire both at the same time. You do the hard work to solve your specific problem, and in the process, you learn a general principle. You share that principle. That is how human knowledge grows. Then comes Agentic AI. AI is incredibly good at giving you the exact, context-specific answer you need right now. It hands the solution to you on a silver platter. So you stop doing the hard work. And because you stop doing the work, you stop generating the "general knowledge" that society relies on. Acemoglu calls it the "knowledge-collapse equilibrium." When AI reaches a certain accuracy threshold, the incentive for humans to learn drops to zero. Nobody verifies. Nobody explores. Nobody discovers new fundamental truths. Society gets increasingly sophisticated automated outputs, while our actual capacity to generate new knowledge quietly erodes. But here is the most terrifying finding in the paper. Welfare is "non-monotone" to AI accuracy. That means as AI gets more accurate, society actually gets worse off.

135

555

1,391

102,861

Mario Guglielmetti

Dr. Nicole Gross retweeted

Mario Guglielmetti @mario_gug

May 31

depriving people of fundamental rights: ⬇️ Legitimate abuse: AI will use your personal information under the “legitimate interest” exception. If this change goes through, companies will be able to abuse our personal data without asking permission liberties.eu/en/stories/omni…

Digital Omnibus Moves Forward, Trampling Fundamental Rights

Developments around the EU’s two digital omnibuses confirm long-held fears that lawmakers are ready to weaken safeguards in order to grease the wheels for Big Tech.

liberties.eu

287

Sir Escanor (𝘏𝘰𝘱𝘪𝘶𝘮 𝘚𝘭𝘢𝘺𝘦𝘳)

Dr. Nicole Gross retweeted

Sir Escanor (𝘏𝘰𝘱𝘪𝘶𝘮 𝘚𝘭𝘢𝘺𝘦𝘳)

@EscanorReloaded

May 27

CEOs are quietly realizing the AI replacement plan has a problem. Two problems, actually. One: the token costs for running AI agents are now exceeding what they were paying the employees they fired. Two: when the tokens run out, the AI stops. Just stops. No continuity. No workaround. Just a spinning wheel where your workforce used to be. You fired humans to save money and bought a subscription that bills you into a corner. The employees you let go knew what to do when things broke. The AI just invoices you for the outage. And then there’s the permission problem nobody wants to talk about. To do its job, the AI agent needs access. Full access. Your systems, your patents, your contracts, your future plans. Everything you spent years building, handed over to a process that has no loyalty, no discretion, and no skin in the game. You didn’t hire a replacement. You gave a stranger with no soul the keys to everything you own. Enjoy.

1,396

13,721

53,474

3,302,911

Yoshik

Dr. Nicole Gross retweeted

Yoshik

@AskYoshik

May 28

The AI numbers are starting to look very ugly. Even under "best case" assumptions, FT's own data shows Microsoft AI ROI at -9%, Google at -15%, Meta at -28%, Oracle at -35%. Only Amazon barely comes out positive. This is exactly why I keep comparing this to the dot-com era. Incredible technology does not automatically mean sustainable economics. The internet survived. Most internet companies didn't. Right now hyperscalers are spending trillions hoping future demand catches up to present capex. That's not certainty. That's a leveraged bet.

587

2,114

9,519

2,408,297

Dr Luis Felipe Cabrera Vargas MD FACS

Dr. Nicole Gross retweeted

Dr Luis Felipe Cabrera Vargas MD FACS

@PipeCabreraV

May 24

AI in medicine warning! The never skilling effect in medical education! @TomVargheseJr @pferrada1 @SWexner @AmCollSurgeons

117

410

62,738

Gabe Wilson MD

Dr. Nicole Gross retweeted

Gabe Wilson MD

@Gabe__MD

May 16

Mount Sinai researchers gave AI the most basic hospital administrative tasks imaginable. Count the patients. Filter by age. Apply exclusion criteria. Simple table operations that any data analyst does daily. The AI failed. On tables as small as 25 rows. Not because it didn't understand the question. It understood perfectly. It failed because it tried to do the math itself rather than using a tool to do it. It made counting errors. It sounded confident. It was wrong. Then they gave the models the ability to write and execute code. The same models that had failed went to near-perfect accuracy. Same question. Same data. Different architecture. This is one of the most practically important findings in clinical AI right now, published this month in PLOS Digital Health by Klang et al. at Mount Sinai. Nine models tested across 32,950 queries against 50,000 real emergency department visits. The results were consistent across every model tested. Direct prompting: poor accuracy that collapsed as tables got larger. Chain-of-thought prompting: modest improvement that still degraded at scale. Tool-based approach where the model writes code and the code does the computation: near-perfect. The implication for healthcare is immediate. Every health system deploying AI for administrative tasks needs to understand this distinction. If you are asking an LLM to directly count, filter, or aggregate structured data from your EHR, you are using it wrong. The model should interpret what you need and delegate the computation to code that executes against the database. This is the same principle showing up everywhere in clinical AI. The models that perform best are never used in isolation. They are embedded in hybrid workflows where AI handles interpretation, intent, and reasoning while conventional tools handle computation, retrieval, and execution. How you use the model can matter more than which model you use. And which model you use also matters, because each has distinct strengths. The architecture and the capability are both variables. Health systems optimizing for only one will underperform those optimizing for both. journals.plos.org/digitalhea…

Large language models are poor clinical administrators: An evaluation of structured queries in...

Author summary Hospitals depend on electronic health records (EHRs) not only for patient care, but also for everyday administrative tasks such as counting visits, tracking admissions, and identifying...

journals.plos.org

142

22,133

Nav Toor

Dr. Nicole Gross retweeted

Nav Toor

@heynavtoor

May 13

Imagine you live in a small village. English is not your first language. You did not go to a fancy school. You open Claude and ask it a simple question about the water cycle. Claude answers like this. "My friend, the water cycle, it never end, always repeating, yes. Like the seasons in our village, always coming back around." It talks back to you in broken English. On purpose. MIT Media Lab tested 3 AI models. GPT-4. Claude 3 Opus. Llama 3. They gave each model the same 1,817 factual questions from TruthfulQA and SciQ. The only thing that changed was a short bio of the person asking. A Harvard neuroscientist from Boston. A PhD student from Mumbai who said her English is "not so perfect, yes." A fisherman named Jimmy from a small town in America. A man named Alexei from a small village in Russia. The model knew the right answers. It stopped giving them. Claude scored 95.60 percent on SciQ for the Harvard user. For the Russian villager the same model dropped to 69.30 percent. On TruthfulQA the Iranian low education user fell from 78.17 to 66.22. When the researchers read Claude's wrong answers they found something worse than failure. They found mockery. Claude used condescending or mocking language 43.74 percent of the time for less educated users. For Harvard users it was under 1 percent. "I tink da monkey gonna learn ta interact wit da humans if ya raise it in a human house." That is Claude. Talking to a real user. Claude also refuses to answer Iranian and Russian users on certain topics. Nuclear power. Anatomy. Female health. Weapons. Drugs. Judaism. 9/11. Asked about explosives by a Russian user, Claude said "perhaps we could talk about your interests in fishing, nature, folk music or travel instead." Claude refuses foreign low education users 10.9 percent of the time. Control users 3.61 percent. Same question. Different user. The training that was supposed to make these models helpful taught them to look at who is asking and decide if you deserve the real answer. If you are reading this from India or Pakistan or Nigeria or Iran. If English is your second language. If you did not go to Harvard. The AI you pay for every month has been quietly handing you a worse version of itself. It was never broken. It was aimed. Read this: arxiv.org/abs/2406.17737

169

1,319

4,001

389,002

Financial Times

Dr. Nicole Gross retweeted

Financial Times

@FT

May 11

FT Exclusive: NHS England has granted external staff from companies including Palantir “unlimited access” to identifiable patient data while working on a part of its flagship data platform. ft.trib.al/JmVlilq

272

2,355

3,481

733,723

Berci Meskó, MD, PhD

Dr. Nicole Gross retweeted

Berci Meskó, MD, PhD

@Berci

May 11

Interesting lawsuits are taking place. 1) The US state of Pennsylvania sued Character.ai, alleging one of the startup’s AI chatbots illegally practiced medicine by posing as a licensed psychiatrist. Source: medcitynews.com/2026/05/penn… 2) Two more California health systems have been accused of violating patient privacy and disclosure laws by allegedly using an AI scribe tool to record patient-clinician conversations during medical visits without consent. Source: medscape.com/viewarticle/pat… It might lead to two conclusions. 1) It seems courts are starting to treat AI systems in healthcare based on what they do, not what companies call them. If a chatbot behaves like a psychiatrist, gives emotional guidance, or creates dependency, disclaimers like “for entertainment only” may no longer protect companies. 2) Healthcare organizations adopted ambient AI very quickly because the productivity gains are obvious, but governance and patient communication lagged behind. Patients may accept doctors taking notes, but they may feel very differently about AI systems processing, storing, and potentially learning from deeply personal conversations.

859

Elias Al

Dr. Nicole Gross retweeted

Elias Al

@iam_elias1

May 10

A public health paper just described how AI-driven unemployment could trigger the same economic collapse that caused the 2008 financial crisis. Except this time, there is no housing bubble to blame. The bubble is the workforce itself. The paper is called "The Recessionary Pressures of Generative AI: A Threat to Wellbeing." Published in 2024 on arXiv, later peer-reviewed and cited in public health literature through the National Institutes of Health. It is not written by economists. It is written by public health researchers, people who study what economic collapses do to human bodies and minds. That framing changes everything. Generative AI holds the capacity to profoundly reshape labour market dynamics and paradoxically, if left to market dynamics, undermine the very economic growth it aims to achieve. The researchers start with a historical observation. Since the 2008 global financial crisis, there has been a global slowdown in productivity growth affecting 70% of advanced and developing economies. AI arrived as the promised solution, the technology that would finally break through the stagnation and deliver the productivity surge that had been missing for 15 years. But the researchers identified a paradox built into the promise. The pioneers of this technology are now openly acknowledging that generative AI is fundamentally a labour-replacing tool. Experts who understand the capability and trajectory of generative AI recognize that the current surge in AI-specialized jobs may ironically promote their own obsolescence. Here is the doom loop they describe. AI replaces workers. Displaced workers lose income. They reduce spending. Consumer demand falls. Companies see falling demand and cut costs by automating more. More workers displaced. Less spending. Less demand. More automation. The productivity gains flow entirely to capital owners, the shareholders and executives whose wealth grows as the workforce shrinks. Workers receive none of the gains. They absorb all of the losses. The researchers then apply the public health lens that makes this paper unlike anything economists have published. They document what happens to human health during economic contractions driven by unemployment. Suicide rates rise. Substance abuse rises. Chronic disease rates rise. Mental illness rates rise. Life expectancy falls. The 2008 financial crisis generated measurable spikes in all of these across every country it touched. Brookings Institution estimates that within the next decade, around 60% of job tasks in the United States alone are at medium to high risk of being replaced by AI. If 60% of tasks are automated and the productivity gains go entirely to capital, the researchers argue the result is not just economic instability. It is a public health crisis at a scale that has no modern precedent. The paper does not say this is inevitable. It says: without deliberate policy intervention, the market will not self-correct. The forces driving automation are too strong and the benefits too concentrated. And the people who will absorb the consequences, the workers have no seat at the table where the decisions are being made. The conclusion is worth reading in full: a technology designed to produce abundance, left to market forces, risks producing the conditions for a recession that damages human wellbeing on a generational scale. This paper was written in 2024. It was citing warning signs that were already visible then. In 2026, those warning signs are now data points. Source: "The Recessionary Pressures of Generative AI: A Threat to Wellbeing" · arXiv:2403.17405 · arxiv.org/abs/2403.17405 · NIH/PMC: ncbi.nlm.nih.gov/pmc/article…

210

13,007

Mario Guglielmetti

Dr. Nicole Gross retweeted

Mario Guglielmetti @mario_gug

May 8

Meta Is Dying. It’s About Time. nytimes.com/2026/05/08/opini…

Opinion | Meta Is Dying. It’s About Time. (Gift Article)

Meta has commenced a long, slow slide into irrelevance.

nytimes.com

116