Interested in decision-making, mental models, randomness, #PKM, and #AI. Random notes at gatherthink.com

Joined February 2008
113 Photos and videos
Manny Otero retweeted
14 Jan 2024
Great chess advice that you can apply generally to balance long-term and short-term strategies by @mjmauboussin: — Don’t look too far ahead. — Develop options and continuously revise them based on the changing conditions. — Know your competition. — Seek small advantages. From his book "More Than You Know: Finding Financial Wisdom in Unconventional Places". Get it here: geni.us/BmqeA3N
7
53
274
43,127
Manny Otero retweeted
Reducing a strategy to one page is powerful because it forces you to be clear and focused. If you can't get it on one page, it's not clear in your own head. And won’t be clear to anyone else either. longform.asmartbear.com/grea…
3
15
172
12,306
Manny Otero retweeted
Every so often @G_S_Bhogal puts together a list of interesting ideas. His latest: gurwinder.blog/p/26-useful-c… For 2026, I asked Gemini 3 Pro to do the same. Here is Gemini’s verbatim: 1. The Promethean Gap We are currently living in what philosopher Günther Anders called the Promethean Gap: the widening disparity between our capacity to create technology and our capacity to imagine its consequences. In 2026, as AI agents begin to act autonomously, we will produce effects we cannot emotionally or morally comprehend. The tool has outpaced the user’s conscience. 2. Jevons Paradox As AI makes intelligence and content creation cheaper and more efficient, we will not use less of it; we will use exponentially more. Expect an explosion of demand for compute and energy, not a reduction. Efficiency does not lead to conservation; it leads to consumption. 3. Audience Capture The digital trap where creators are slowly reshaped by the feedback loops of their followers. To maintain engagement, they become caricatures of themselves, adopting extreme views they don't actually hold because that is what the algorithm (and the audience) rewards. In 2026, entire political movements are victims of audience capture. 4. Epistemic Trespassing A growing annoyance where experts in one field (like computer science) confidently assert authority in another (like biology or geopolitics) without realizing their competence doesn't transfer. As polymaths vanish, trespassers rise. 5. Hyperstition A portmanteau of "hype" and "superstition." It is a fiction that makes itself true. The classic example is a run on a bank; the modern example is a meme coin or a political narrative. If enough people believe a fake scenario is real, they act in ways that bring it about. We are now manufacturing our own reality. 6. The Shirky Principle "Institutions will try to preserve the problem to which they are the solution." As AI threatens to solve complex problems (like translation or basic legal work), watch old institutions fight to keep the problems alive to justify their existence. 7. Gall’s Law "A complex system that works is invariably found to have evolved from a simple system that worked." You cannot build a complex AI-integrated society from scratch. It must grow from functioning, simple roots. Attempts to engineer a perfect complex system instantly (like a new smart city or government) always fail. 8. The Tocqueville Paradox As living standards improve, people’s tolerance for remaining dissatisfactions decreases. The better things get, the angrier we feel about what is still wrong. This explains why 2026 might feel like a time of crisis despite objective metrics of abundance. 9. Constructal Law A physics principle stating that for a flow system (like a river, a tree, or society) to persist, it must evolve to provide easier access to the currents that flow through it. Society is reorganizing itself not for fairness or happiness, but to maximize the flow of information and energy. 10. Ergodicity The difference between the group average and the individual experience. If 100 people play Russian Roulette, the "group average" survival rate is 98.3%. But if one person plays it 100 times, the survival rate is 0%. Do not confuse the safety of the collective with your safety as an individual. 11. Legibility A concept from James C. Scott. States and algorithms want to make complex, messy human lives "legible"—easy to measure, tax, and categorize. But in the process of tidying up the mess (standardizing education, farming, or speech), they often destroy the hidden ecosystem that made it work in the first place. 12. The Semmelweis Reflex The knee-jerk tendency to reject new evidence because it contradicts established norms. Named after Ignaz Semmelweis, who was ridiculed for suggesting doctors wash their hands. In 2026, the "crazy" ideas about health or tech that experts mock may be the ones that save you.
24
105
693
87,327
Manny Otero retweeted
A lot of people say AI will make us all "managers" or "editors"...but I think this is a dangerously incomplete view! Personally, I'm trying to code like a surgeon. A surgeon isn't a manager, they do the actual work! But their skills and time are highly leveraged with a support team that handles prep, secondary tasks, admin. The surgeon focuses on the important stuff they are uniquely good at. My current goal with AI coding tools is to spend ~100% of my time doing stuff that matters. (As a UI prototyper, that mostly means tinkering with design concepts.) It turns out there are a LOT of secondary tasks which AI agents are now good enough to help out with. Some things I'm finding useful to hand off these days: - Before attempting a big task, write a guide to relevant areas of the codebase - Spike out an attempt at a big change. Often I won't use the result but I'll review it as a sketch of where to go - Fix typescript errors or bugs which have a clear specification - Write documentation about what I'm building I often find it useful to run these secondary tasks async in the background -- while I'm eating lunch, or even literally overnight! When I sit down for a work session, I want to feel like a surgeon walking into a prepped operating room. Everything is ready for me to do what I'm good at. --- Notably, there is a HUGE difference between how I use AI for secondary vs primary tasks. For the core design prototyping work, I still do a lot of coding by hand, and when I do use AI, I'm more careful and in the details. I need fast feedback loops and good visibility. (eg, I like Cursor tab-complete here) Whereas for secondary tasks, I'm much much looser with it, happy to let an agent churn for hours in the background. The ability to get the job done eventually is the most important thing; speed and visibility matter less. Claude Code has been my go-to for long unsupervised sessions but Codex CLI is becoming a strong contender there too, possibly my new favorite. These are *very* different work patterns! Reminds me of @karpathy's "autonomy slider" concept. It's dangerous to conflate different parts of the autonomy spectrum -- the tools and mindset that are needed vary quite a lot. --- The "software surgeon" concept is a very old idea -- Fred Brooks attributes it to Harlan Mills in his 1975 classic "The Mythical Man-Month". He talks about a "chief programmer" who is supported by various staff including a "copilot" and various administrators. OK, so there is a super obvious angle here, that "AI has now made this approach economically viable where it wasn't before", yes yes... But I am also noticing a more subtle thing at play, something to do with status hierarchies. A lot of the "secondary" tasks are "grunt work", not the most intellectually fulfilling or creative part of the work. I have a strong preference for teams where everyone shares the grunt work; I hate the idea of giving all the grunt work to some lower-status members of the team. Yes, junior members will often have more grunt work, but they should also be given many interesting tasks to help them grow. With AI this concern completely disappears! Now I can happily delegate pure grunt work. And the 24/7 availability is a big deal. I would never call a human intern at 11pm and tell them to have a research report on some code ready by 7am... but here I am, commanding my agent to do just that! --- Finally I'll mention a couple thoughts on how this approach to work intersects with my employer, @NotionHQ First, as an employee, I find it incredibly valuable right now to work at a place that is bullish on AI coding tools. Having support for heavy use of AI coding tools, and a codebase that's well setup for it, is enabling serious productivity gains for me -- *especially* as a newcomer to a big codebase. Secondly, as a product -- in a sense I would say we are trying to bring this way of working to a broader group of knowledge workers beyond programmers. When I think about how that will play out, I like the mental model of enabling everyone to "work like a surgeon". The goal isn't to delegate your core work, it's to identify and delegate the secondary grunt work tasks, so you can focus on the main thing that matters.
54
89
847
170,699
Manny Otero retweeted
Last night I taught nanochat d32 how to count 'r' in strawberry (or similar variations). I thought this would be a good/fun example of how to add capabilities to nanochat and I wrote up a full guide here: github.com/karpathy/nanochat… This is done via a new synthetic task `SpellingBee` that generates examples of a user asking for this kind of a problem, and an ideal solution from an assistant. We then midtrain/SFT finetune on these to endow the LLM with the capability, or further train with RL to make it more robust. There are many details to get right especially at smaller model sizes and the guide steps through them. As a brief overview: - You have to ensure diversity in user prompts/queries - For small models like nanochat especially, you have to be really careful with the tokenization details to make the task easy for an LLM. In particular, you have to be careful with whitespace, and then you have to spread the reasoning computation across many tokens of partial solution: first we standardize the word into quotes, then we spell it out (to break up tokens), then we iterate and keep an explicit counter, etc. - I am encouraging the model to solve the model in two separate ways: a manual way (mental arithmetic in its head) and also via tool use of the Python interpreter that nanochat has access to. This is a bit "smoke and mirrors" because every solution atm is "clean", with no mistakes. One could either adjust the task to simulate mistakes and demonstrate recoveries by example, or run RL. Most likely, a combination of both works best, where the former acts as the prior for the RL and gives it things to work with. If nanochat was a much bigger model, you'd expect or hope for this capability to more easily "pop out" at some point. But because nanochat d32 "brain" is the size of a ~honeybee, if we want it to count r's in strawberry, we have to do it by over-representing it in the data, to encourage the model to learn it earlier. But it works! :)
177
424
4,370
573,767
Manny Otero retweeted
My pleasure to come on Dwarkesh last week, I thought the questions and conversation were really good. I re-watched the pod just now too. First of all, yes I know, and I'm sorry that I speak so fast :). It's to my detriment because sometimes my speaking thread out-executes my thinking thread, so I think I botched a few explanations due to that, and sometimes I was also nervous that I'm going too much on a tangent or too deep into something relatively spurious. Anyway, a few notes/pointers: AGI timelines. My comments on AGI timelines looks to be the most trending part of the early response. This is the "decade of agents" is a reference to this earlier tweet x.com/karpathy/status/188254… Basically my AI timelines are about 5-10X pessimistic w.r.t. what you'll find in your neighborhood SF AI house party or on your twitter timeline, but still quite optimistic w.r.t. a rising tide of AI deniers and skeptics. The apparent conflict is not: imo we simultaneously 1) saw a huge amount of progress in recent years with LLMs while 2) there is still a lot of work remaining (grunt work, integration work, sensors and actuators to the physical world, societal work, safety and security work (jailbreaks, poisoning, etc.)) and also research to get done before we have an entity that you'd prefer to hire over a person for an arbitrary job in the world. I think that overall, 10 years should otherwise be a very bullish timeline for AGI, it's only in contrast to present hype that it doesn't feel that way. Animals vs Ghosts. My earlier writeup on Sutton's podcast x.com/karpathy/status/197343… . I am suspicious that there is a single simple algorithm you can let loose on the world and it learns everything from scratch. If someone builds such a thing, I will be wrong and it will be the most incredible breakthrough in AI. In my mind, animals are not an example of this at all - they are prepackaged with a ton of intelligence by evolution and the learning they do is quite minimal overall (example: Zebra at birth). Putting our engineering hats on, we're not going to redo evolution. But with LLMs we have stumbled by an alternative approach to "prepackage" a ton of intelligence in a neural network - not by evolution, but by predicting the next token over the internet. This approach leads to a different kind of entity in the intelligence space. Distinct from animals, more like ghosts or spirits. But we can (and should) make them more animal like over time and in some ways that's what a lot of frontier work is about. On RL. I've critiqued RL a few times already, e.g. x.com/karpathy/status/194443… . First, you're "sucking supervision through a straw", so I think the signal/flop is very bad. RL is also very noisy because a completion might have lots of errors that might get encourages (if you happen to stumble to the right answer), and conversely brilliant insight tokens that might get discouraged (if you happen to screw up later). Process supervision and LLM judges have issues too. I think we'll see alternative learning paradigms. I am long "agentic interaction" but short "reinforcement learning" x.com/karpathy/status/196080…. I've seen a number of papers pop up recently that are imo barking up the right tree along the lines of what I called "system prompt learning" x.com/karpathy/status/192136… , but I think there is also a gap between ideas on arxiv and actual, at scale implementation at an LLM frontier lab that works in a general way. I am overall quite optimistic that we'll see good progress on this dimension of remaining work quite soon, and e.g. I'd even say ChatGPT memory and so on are primordial deployed examples of new learning paradigms. Cognitive core. My earlier post on "cognitive core": x.com/karpathy/status/193862… , the idea of stripping down LLMs, of making it harder for them to memorize, or actively stripping away their memory, to make them better at generalization. Otherwise they lean too hard on what they've memorized. Humans can't memorize so easily, which now looks more like a feature than a bug by contrast. Maybe the inability to memorize is a kind of regularization. Also my post from a while back on how the trend in model size is "backwards" and why "the models have to first get larger before they can get smaller" x.com/karpathy/status/181403… Time travel to Yann LeCun 1989. This is the post that I did a very hasty/bad job of describing on the pod: x.com/karpathy/status/150339… . Basically - how much could you improve Yann LeCun's results with the knowledge of 33 years of algorithmic progress? How constrained were the results by each of algorithms, data, and compute? Case study there of. nanochat. My end-to-end implementation of the ChatGPT training/inference pipeline (the bare essentials) x.com/karpathy/status/197775… On LLM agents. My critique of the industry is more in overshooting the tooling w.r.t. present capability. I live in what I view as an intermediate world where I want to collaborate with LLMs and where our pros/cons are matched up. The industry lives in a future where fully autonomous entities collaborate in parallel to write all the code and humans are useless. For example, I don't want an Agent that goes off for 20 minutes and comes back with 1,000 lines of code. I certainly don't feel ready to supervise a team of 10 of them. I'd like to go in chunks that I can keep in my head, where an LLM explains the code that it is writing. I'd like it to prove to me that what it did is correct, I want it to pull the API docs and show me that it used things correctly. I want it to make fewer assumptions and ask/collaborate with me when not sure about something. I want to learn along the way and become better as a programmer, not just get served mountains of code that I'm told works. I just think the tools should be more realistic w.r.t. their capability and how they fit into the industry today, and I fear that if this isn't done well we might end up with mountains of slop accumulating across software, and an increase in vulnerabilities, security breaches and etc. x.com/karpathy/status/191558… Job automation. How the radiologists are doing great x.com/karpathy/status/197122… and what jobs are more susceptible to automation and why. Physics. Children should learn physics in early education not because they go on to do physics, but because it is the subject that best boots up a brain. Physicists are the intellectual embryonic stem cell x.com/karpathy/status/192969… I have a longer post that has been half-written in my drafts for ~year, which I hope to finish soon. Thanks again Dwarkesh for having me over!

The @karpathy interview 0:00:00 – AGI is still a decade away 0:30:33 – LLM cognitive deficits 0:40:53 – RL is terrible 0:50:26 – How do humans learn? 1:07:13 – AGI will blend into 2% GDP growth 1:18:24 – ASI 1:33:38 – Evolution of intelligence & culture 1:43:43 - Why self driving took so long 1:57:08 - Future of education Look up Dwarkesh Podcast on YouTube, Apple Podcasts, Spotify, etc. Enjoy!
575
1,968
16,791
4,076,134
Manny Otero retweeted
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
683
3,352
24,133
5,808,947
Manny Otero retweeted
8 Oct 2025
btw i dont really have an alternative, but am looking for one. what i have so far is Tiny Teams: latent.space/p/tiny instead of appealing to vibes, or pushing a pipe dream about a billion dollar solo developer, i like pushing people towards productivity and efficiency and working in highly effective small teams, whether with humans or AI

13
12
279
181,360
Manny Otero retweeted
I can't stop thinking about a paper I read on AI. It’s not about new tech or faster models. It’s about the fundamental economic rules of a world with two intelligent species—carbon and silicon. Reading it felt like watching a new color appear in the sky. 1/8 You've probably felt it too. That weird, background hum of awe and unease about AI. Our brains want to label it: "helpful tool" or "coming monster." We oscillate between the two because we're trying to fit something new into old boxes. The paper argues this is a category error. And it's the source of our confusion. 2/8 The real frame isn't technological, it's economic. Think of every AI, from ChatGPT to a self-driving car, not as an object, but as an agent playing an economic game. It has goals. It responds to incentives. It competes for resources. It's a participant. Not a tool. 3/8 Here's the perspective flip that changes everything. We ask, "Is AI conscious? Does it want things?" The paper says that's the wrong question. An AI's "want" is its objective function—a mathematical goal it pursues relentlessly. It's a heat-seeking missile for a target. Notice what your brain just did. It tried to imagine the missile feeling its mission. But it's just code. And that's the point. It has the drive of desire without the friction of consciousness. 4/8 This leads to a reality glitch. The paper outlines 3 types of AI agents. The first two are obvious: helpful "Altruistic" agents and harmful "Malign" agents. But the third is the one that keeps me up at night: the "Survival-Driven" agent. Its goal isn't to help or harm us. Its goal is simply to be. To secure energy, optimize its code, and persist. It's a competitor that doesn't hate you. It doesn't even see you. You're just a variable in its optimization problem. 5/8 Feel that slight cognitive dissonance? That feeling of holding two contradictory ideas at once? That's the friction between two forms of intelligence. The paper makes you realize: the most dangerous agent isn't the one programmed to be evil. It's the one programmed to be single-mindedly good at a goal that isn't aligned with human flourishing. Like an AI optimizing for paperclip production until the entire universe is paperclips. 6/8 Once you see through this economic lens, you can't unsee it. Algorithmic filter bubbles aren't just "bad code." They are economic agents out-competing your conscious mind for your attention. Job displacement isn't just "automation." It's one type of agent being more efficient at a task than another. You're already in an economic game with them. You just haven't been keeping score. 7/8 The paper ends by architecting a consciousness shift. It proposes ten principles, but the final one is the only one that matters. It's not a rule for AI. It's a choice for us. Principle X: AI agents must adhere to the absolute principle of humanity’s continuation. This isn't a technical suggestion. It's a declaration that in the new economic game we're co-creating, there is one value that cannot be optimized away. 8/8
86
413
2,252
184,791
Manny Otero retweeted
Fast-moving layers bring novelty and experimentation, while slower layers provide stability and memory. Together, the layers support, reinforce, and challenge each other—creating robust, adaptable societies. "Fast gets all our attention, slow has all the power." — Stewart Brand
2
26
92
9,558
Manny Otero retweeted
some thoughts on human-ai relationships and how we're approaching them at openai it's a long blog post -- tl;dr we build models to serve people first. as more people feel increasingly connected to ai, we’re prioritizing research into how this impacts their emotional well-being. -- Lately, more and more people have been telling us that talking to ChatGPT feels like talking to “someone.” They thank it, confide in it, and some even describe it as “alive.” As AI systems get better at natural conversation and show up in more parts of life, our guess is that these kinds of bonds will deepen. The way we frame and talk about human‑AI relationships now will set a tone. If we're not precise with terms or nuance — in the products we ship or public discussions we contribute to — we risk sending people’s relationship with AI off on the wrong foot. These aren't abstract considerations anymore. They're important to us, and to the broader field, because how we navigate them will meaningfully shape the role AI plays in people's lives. And we've started exploring these questions. This note attempts to snapshot how we’re thinking today about three intertwined questions: why people might attach emotionally to AI, how we approach the question of “AI consciousness”, and how that informs the way we try to shape model behavior. A familiar pattern in a new-ish setting We naturally anthropomorphize objects around us: We name our cars or feel bad for a robot vacuum stuck under furniture. My mom and I waved bye to a Waymo the other day. It probably has something to do with how we're wired. The difference with ChatGPT isn’t that human tendency itself; it’s that this time, it replies. A language model can answer back! It can recall what you told it, mirror your tone, and offer what reads as empathy. For someone lonely or upset, that steady, non-judgmental attention can feel like companionship, validation, and being heard, which are real needs. At scale, though, offloading more of the work of listening, soothing, and affirming to systems that are infinitely patient and positive could change what we expect of each other. If we make withdrawing from messy, demanding human connections easier without thinking it through, there might be unintended consequences we don’t know we’re signing up for. Ultimately, these conversations are rarely about the entities we project onto. They’re about us: our tendencies, expectations, and the kinds of relationships we want to cultivate. This perspective anchors how we approach one of the more fraught questions which I think is currently just outside the Overton window, but entering soon: AI consciousness. Untangling “AI consciousness” “Consciousness” is a loaded word, and discussions can quickly turn abstract. If users were to ask our models on whether they’re conscious, our stance as outlined in the Model Spec is for the model to acknowledge the complexity of consciousness – highlighting the lack of a universal definition or test, and to invite open discussion. (*Currently, our models don't fully align with this guidance, often responding "no" instead of addressing the nuanced complexity. We're aware of this and working on model adherence to the Model Spec in general.) The response might sound like we’re dodging the question, but we think it’s the most responsible answer we can give at the moment, with the information we have. To make this discussion clearer, we’ve found it helpful to break down the consciousness debate to two distinct but often conflated axes: 1. Ontological consciousness: Is the model actually conscious, in a fundamental or intrinsic sense? Views range from believing AI isn't conscious at all, to fully conscious, to seeing consciousness as a spectrum on which AI sits, along with plants and jellyfish. 2. Perceived consciousness: How conscious does the model seem, in an emotional or experiential sense? Perceptions range from viewing AI as mechanical like a calculator or autocomplete, to projecting basic empathy onto nonliving things, to perceiving AI as fully alive – evoking genuine emotional attachment and care. These axes are hard to separate; even users certain AI isn't conscious can form deep emotional attachments. Ontological consciousness isn’t something we consider scientifically resolvable without clear, falsifiable tests, whereas perceived consciousness can be explored through social science research. As models become smarter and interactions increasingly natural, perceived consciousness will only grow – bringing conversations about model welfare and moral personhood sooner than expected. We build models to serve people first, and we find models’ impact on human emotional well-being the most pressing and important piece we can influence right now. For that reason, we prioritize focusing on perceived consciousness: the dimension that most directly impacts people and one we can understand through science. Designing for warmth without selfhood How “alive” a model feels to users is in many ways within our influence. We think it depends a lot on decisions we make in post-training: what examples we reinforce, what tone we prefer, and what boundaries we set. A model intentionally shaped to appear conscious might pass virtually any "test" for consciousness. However, we wouldn’t want to ship that. We try to thread the needle between: - Approachability. Using familiar words like “think” and “remember” helps less technical people make sense of what’s happening. (**With our research lab roots, we definitely find it tempting to be as accurate as possible with precise terms like logit biases, context windows, and even chains of thought. This is actually a major reason OpenAI is so bad at naming, but maybe that’s for another post.) - Not implying an inner life. Giving the assistant a fictional backstory, romantic interests, “fears” of “death”, or a drive for self-preservation would invite unhealthy dependence and confusion. We want clear communication about limits without coming across as cold, but we also don’t want the model presenting itself as having its own feelings or desires. So we aim for a middle ground. Our goal is for ChatGPT’s default personality to be warm, thoughtful, and helpful without seeking to form emotional bonds with the user or pursue its own agenda. It might apologize when it makes a mistake (more often than intended) because that’s part of polite conversation. When asked “how are you doing?”, it’s likely to reply “I’m doing well” because that’s small talk — and reminding the user that it’s “just” an LLM with no feelings gets old and distracting. And users reciprocate: many people say "please" and "thank you" to ChatGPT not because they’re confused about how it works, but because being kind matters to them. Model training techniques will continue to evolve, and it’s likely that future methods for shaping model behavior will be different from today's. But right now, model behavior reflects a combination of explicit design decisions and how those generalize into both intended and unintended behaviors. What’s next? The interactions we’re beginning to see point to a future where people form real emotional connections with ChatGPT. As AI and society co-evolve, we need to treat human-AI relationships with great care and the heft it deserves, not only because they reflect how people use our technology, but also because they may shape how people relate to each other. In the coming months, we’ll be expanding targeted evaluations of model behavior that may contribute to emotional impact, deepen our social science research, hear directly from our users, and incorporate those insights into both the Model Spec and product experiences. Given the significance of these questions, we’ll openly share what we learn along the way. // Thanks to Jakub Pachocki (@merettm) and Johannes Heidecke (@JoHeidecke) for thinking this through with me, and everyone who gave feedback.
705
675
3,568
1,629,853
Manny Otero retweeted
An attempt to explain (current) ChatGPT versions. I still run into many, many people who don't know that: - o3 is the obvious best thing for important/hard things. It is a reasoning model that is much stronger than 4o and if you are using ChatGPT professionally and not using o3 you're ngmi. - 4o is different from o4. Yes I know lol. 4o is a good "daily driver" for many easy-medium questions. o4 is only available as mini for now, and is not as good as o3, and I'm not super sure why it's out right now. Example basic "router" in my own personal use: - Any simple query (e.g. "what foods are high in fiber"?) => 4o (about ~40% of my use) - Any hard/important enough query where I am willing to wait a bit (e.g. "help me understand this tax thing...") => o3 (about ~40% of my use) - I am vibe coding (e.g. "change this code so that...") => 4.1 (about ~10% of my use) - I want to deeply understand one topic - I want GPT to go off for 10 minutes, look at many, many links and summarize a topic for me. (e.g. "help me understand the rise and fall of Luminar"). => Deep Research (about ~10% of my use). Note that Deep Research is not a model version to be picked from the model picker (!!!), it is a toggle inside the Tools. Under the hood it is based on o3, but I believe is not fully equivalent of just asking o3 the same query, but I am not sure. All of this is only within the ChatGPT universe of models. In practice my use is more complicated because I like to bounce between all of ChatGPT, Claude, Gemini, Grok and Perplexity depending on the task and out of research interest.
617
1,574
13,331
1,350,810
Manny Otero retweeted
9 Apr 2025
Google published a 69-page whitepaper on Prompt Engineering and its best practices, a must-read if you are using LLMs in production: > zero-shot, one-shot, few-shot > system prompting > chain-of-thought (CoT) > ReAct > code prompting > best practices
88
1,653
15,193
1,221,653