Super interesting to see this. I've been testing how this can play out in science and scientific decision making — we really aren't vigilant about how wrong stuff in the scientific literature might create self-fulfilling loops that extend stagnation

Stella Biderman @BlancheMinerva

Jan 18

It's so great that there are now multiple orgs doing transparent, rigorous testing of basic premises about how LLMs work and how their behavior can be influenced. So glad that Geodesic exists and excited to work with them on more like this!

504

Manjari Narayan

Manjari Narayan @NeuroStats

Jan 17

I wish more people asked themselves "What would John Tukey do?" He sure as hell would have been exciting things, not stuck on old problems. I mean he worked on information retrieval in his retirement in the 90s.

Manjari Narayan @NeuroStats

Jan 17

I found myself similarly disoriented. I suspect we haven't found new abstractions that actually make sense for theory to be cool again. But you have to develop a taste for different research problems.

577

Manjari Narayan

Manjari Narayan @NeuroStats

Jan 17

Wenting Zhao

@wzhao_nlp

Jan 17

🌶️ Some (perhaps) spicy thoughts. It’s been a while since my last tweet, but I wanted to write about how disorienting it has been from academia to an LLM lab 😅 The kind of research I was trained to do during my PhD almost doesn’t exist here. The obsession with mathematical elegance and novelty is mostly gone. Everything is about scaling data and compute. For a while, that really got to me. At my lowest point, I felt like I’d lost interest in building LLMs altogether. I didn’t feel intellectually challenged anymore. What made this even stranger was that, at a technical level, things worked. If there was a capability I wanted to teach a model, scaling the right data and compute always got me there, no exception (so far). But recently, I found a way to reconcile with myself.. I realized the real competition isn’t in the ML recipe anymore. Most teams do roughly the same thing. What actually matters is how fast you can iterate, test ideas, and recover from mistakes. And that speed is mostly backed by infrastructure 🏗️ Faster loops, fewer bugs, better tooling. Seeing this made me excited again! Infra is its own deep, hard, and intellectually fun problem space. In 2026, I want to become an ML researcher who’s really good at infra. And I'll come back to ML problems with that edge, and will be excited to share what I find 😌

1,123

Manjari Narayan

Manjari Narayan @NeuroStats

6 Dec 2025

🧵 Personal Hamming Problem #1: Raise the bar for quantifying risk-benefit tradeoffs in drug development. By risk-benefit tradeoff I mean instantiation of the therapeutic index at some stage of the pipeline. #Neurips2025

270

more replies

Manjari Narayan

Manjari Narayan @NeuroStats

6 Dec 2025

But modern ML and statistics has a much richer set of solutions I am excited about. Easier solution: Use generalized probability indices that trade off two outcomes both of which have their own sources of measurement error Even better: modern multi-multicalibration More 👇

171

Manjari Narayan

Manjari Narayan @NeuroStats

6 Dec 2025

substack.com/@manjarinarayan…

138

Carlos E Alvarez

Manjari Narayan retweeted

Carlos E Alvarez

@CarlosEAlvare17

17 Nov 2025

Replying to @NeuroStats

🔥 Exactly! Genetic variation is comparatively clean to interpret, but dynamic biological measurements (transcriptomic, epigenetic, imaging, physiological) are deeply entangled with life-course factors: environment, development, health status, social, & reverse causation... 1/3

347

Manjari Narayan

Manjari Narayan @NeuroStats

17 Nov 2025

One an start with what the mathematics of a proper scoring criterion ought to be for a problem, what kinds of properties a transformation ought to have, etc.. I've seen this problem emerge in many bio/health competitions over 10 years across kaggle, DREAM, etc..

Mile Sikic

@msikic

15 Nov 2025

Replying to @AnthropicAI @Google

The recent very welcome @Arcinstute challenge made this painfully clear: defining evaluation metrics is hard. In some cases, trivial data transformations—and even random data—can score astonishingly high. Great AI performance ≠ biological meaning. 4/6

341

Mile Sikic

Manjari Narayan retweeted

Mile Sikic

@msikic

15 Nov 2025

Replying to @AnthropicAI @Google

2,321

Manjari Narayan

Manjari Narayan @NeuroStats

17 Nov 2025

OMG yes. Glad to see someone else making this point. Measurements of dynamic biological processes are subject to more novel kinds of confounding and selection bias than genetic markers. 'omics/imaging in biology ignores these challenges of life-course epidemiology

Carlos E Alvarez

@CarlosEAlvare17

15 Nov 2025

Replying to @anshulkundaje

I was just making that point in a 3-tweet thread here. In addition to my closing suggestions there, I would mention the need for life course molecular (omics) epidemiology - high powered.

348

Manjari Narayan

Manjari Narayan @NeuroStats

17 Nov 2025

Being fast has advantages when high quality feedback is really quick. But surely deep thinking / pondering / working from first principles has its place for problems that are either have long-horizon/low quality feedback. Where is the Jim Simon of pharmaceutical forecasting?

Andrew Gordon Wilson

@andrewgwils

15 Nov 2025

Replying to @ltrd_

You also don’t need to be a gold medalist of the IMO to be a great mathematician. He isn’t drawing a contrast between smart and stupid. It’s a contrast between shallow and deep thinking.

804

Andrew Gordon Wilson

Manjari Narayan retweeted

Andrew Gordon Wilson

@andrewgwils

15 Nov 2025

Replying to @ltrd_

You also don’t need to be a gold medalist of the IMO to be a great mathematician. He isn’t drawing a contrast between smart and stupid. It’s a contrast between shallow and deep thinking.

3,061

Manjari Narayan

Manjari Narayan @NeuroStats

17 Nov 2025

Interesting to see do-calculus and causal inference make its way into interpretability.

Christopher Potts

@ChrisGPotts

15 Nov 2025

Replying to @Allen_Schmaltz

Thanks! It's the sort of work that I mean to advocate for in the talk. (We have an ICML paper on using causal interp for correctness prediction: openreview.net/forum?id=Ofa1…) The more we can prove out the value propositions in your post, the better – for safety and for interp.

1,311

Manjari Narayan

Manjari Narayan @NeuroStats

17 Nov 2025

206

Manjari Narayan

Manjari Narayan @NeuroStats

17 Nov 2025

blog.neurostats.org/p/the-po…

Misunderstandings between empirical and theoretical scientists

The Poincaré-Lippman gap is an implicit problem in biomedical research when empirical scientists borrow mathematical tools.

blog.neurostats.org

166

Andrew Gordon Wilson

Manjari Narayan retweeted

Andrew Gordon Wilson

@andrewgwils

16 Nov 2025

Replying to @spimescape

I did a postdoc at CMU. I was grateful that @ericxing took a chance on me, and was supportive of the work I wanted to do. I was given space and trust, which made all the difference.

3,706

Manjari Narayan

Manjari Narayan @NeuroStats

10 Nov 2025

A year ago, I gave a keynote whose thesis was that we leave many potential improvements in the validity of every pharmaceutical forecasting problem on the table. A call to action for researchers who care about causal validity to go into less comfortable areas of biopharma.

Ruxandra Teslo 🧬

@RuxandraTeslo

9 Nov 2025

You have heard of AI slop in the context of short video creation. But the same principle applies when it comes to improving drug discovery: we absolutely do not need a deluge of new hypotheses; we need better predictive validity (as per @JackScannell13). writingruxandrabio.com/p/wha…

1,938

Manjari Narayan

Manjari Narayan @NeuroStats

10 Nov 2025

The concept of target validity is grossly under-utilized outside of the frontiers clinical research in epidemiology and medicine. But I see adoption of its analogs as the path to solving the the kinds of predictive validity problems plagues bipharma R&D @JackScannell13

239

Manjari Narayan

Manjari Narayan @NeuroStats

11 Nov 2025

But I slightly disagree with @RuxandraTeslo here. Exploring a deluge of novel and denovo therapeutic molecules needs to be matched by an abundance of validity increasing feedback loops

174