Satyabrata pal

Satyabrata pal

386 Photos and videos

Tweets

Pinned Tweet

Satyabrata pal @TheCodingProjec

23 Apr 2024

I'm thrilled to announce my newest YouTube tutorial! Dive into key NLP concepts, tackle real-world datasets, and attempt your first kaggle competition🚀 Watch here: youtu.be/ZqUI1CPaITw 🎉 Remember to like, subscribe, and ring that notification bell! 🔔

Andrej Karpathy

Satyabrata pal retweeted

Andrej Karpathy

@karpathy

10 Dec 2025

nanoGPT - the first LLM to train and inference in space 🥹. It begins.

Adi Oltean

@AdiOltean

10 Dec 2025

We have just used the @Nvidia H100 onboard Starcloud-1 to train the first LLM in space! We trained the nano-GPT model from Andrej @Karpathy on the complete works of Shakespeare and successfully ran inference on it. We have also run inference on a preloaded Gemma model, and we plan to try more exciting models in the future. Getting the first H100 to work in space required a lot of innovation and hard work from the incredible Starcloud team to make this breakthrough. This is a significant first step toward moving almost all computing off Earth to reduce the burden on our energy supplies and take advantage of abundant solar energy in space! 🚀

319

847

11,032

1,084,428

Lex Fridman

Satyabrata pal retweeted

Lex Fridman

@lexfridman

30 Nov 2025

Here's my conversation with Michael Levin (@drmichaellevin) about the nature of intelligence in biological systems, including unconventional & alien intelligence, agency, memory, consciousness, and life in all its forms here on Earth and beyond. It's here on X in full and is up everywhere else (see comment). Timestamps: 0:00 - Introduction 0:44 - Biological intelligence 9:17 - Living vs non-living organisms 14:30 - Origin of life 18:15 - The search for alien life (on Earth) 51:19 - Creating life in the lab - Xenobots and Anthrobots 1:04:21 - Memories and ideas are living organisms 1:18:02 - Reality is an illusion: The brain is an interface to a hidden reality 2:03:48 - Unexpected intelligence of sorting algorithms 2:29:26 - Can aging be reversed? 2:33:17 - Mind uploading 2:51:57 - Alien intelligence 3:06:52 - Advice for young people 3:13:21 - Questions for AGI

3:18:09

270

496

2,734

990,233

Andrej Karpathy

Satyabrata pal retweeted

Andrej Karpathy

@karpathy

22 Nov 2025

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to github.com/karpathy/llm-coun… if others would like to play. ty nano banana pro for fun header image for the repo

Andrej Karpathy

@karpathy

18 Nov 2025

I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs. Usually pass 1 is manual, then pass 2 “explain/summarize”, pass 3 Q&A. I usually end up with a better/deeper understanding than if I moved on. Growing to among top use cases. On the flip side, if you’re a writer trying to explain/communicate something, we may increasingly see less of a mindset of “I’m writing this for another human” and more “I’m writing this for an LLM”. Because once an LLM “gets it”, it can then target, personalize and serve the idea to its user.

906

1,437

16,974

5,306,487

Sebastian Raschka

Satyabrata pal retweeted

Sebastian Raschka

@rasbt

23 Nov 2025

Implemented Olmo 3 from scratch (in a standalone notebook) this weekend! If you are a coder, probably the best way to read the architecture details at a glance: github.com/rasbt/LLMs-from-s…

Sebastian Raschka

@rasbt

20 Nov 2025

Olmo models are always a highlight due to them being fully transparent and their nice, detailed technical reports. I am sure I'll talk more about the interesting training-related aspects from that 100-pager in the upcoming days and weeks. In the meantime, here's the side-by-side architecture comparison with Qwen3. 1) As we can see, the Olmo 3 architecture is relatively similar to Qwen3. However, it's worth noting that this is essentially likely inspired by the Olmo 2 predecessor, not Qwen3. 2) Similar to Olmo 2, Olmo 3 still uses a post-norm flavor instead of pre-norm, as they found in the Olmo 2 paper that it stabilizes the training. 3) Interestingly, the 7B model still uses multi-head attention similar to Olmo 2. However, to make things more efficient and shrink the KV cache size, they now use sliding window attention (e.g., similar to Gemma 3.) Next, let's look at the 32B model. 4) Overall, it's the same architecture but just scaled up. Also, the proportions (e.g., going from the input to the intermediate size in the feed forward layer, and so on) roughly match the ones in Qwen3. 5) My guess is the architecture was initially somewhat smaller than Qwen3 due to the smaller vocabulary, and they then scaled up the intermediate size expansion from 5x in Qwen 3 to 5.4 in Olmo 3 to have a 32B model for a direct comparison. 6) Also, note that the 32B model (finally!) uses grouped query attention.

287

1,975

166,151

Unitree

Satyabrata pal retweeted

Unitree

@UnitreeRobotics

6 Nov 2025

Embodied Avatar: Full-body Teleoperation Platform🥳 Everyone has fantasized about having an embodied avatar! Full-body teleoperation and full-body data acquisition platform is waiting for you to try it out!

1:49

553

1,682

11,221

26,063,931

Andrej Karpathy

Satyabrata pal retweeted

Andrej Karpathy

@karpathy

28 Aug 2025

Transforming human knowledge, sensors and actuators from human-first and human-legible to LLM-first and LLM-legible is a beautiful space with so much potential and so much can be done... One example I'm obsessed with recently - for every textbook pdf/epub, there is a perfect "LLMification" of it intended not for human but for an LLM (though it is a non-trivial transformation that would need human in the loop involvement). - All of the exposition is extracted into a markdown document, including all latex, styling (bold/italic), tables, lists, etc. All of the figures are extracted as images. - All worked problems get extracted into SFT examples. Any referenced made to previous figures/tables/etc. are parsed and included. - All practice problems are extracted into environment examples for RL. The correct answers are located in the answer key and attached. Any additional information is added as "answer key" for a potential LLM judge. - Synthetic data expansion. For every specific problem, you can create an infinite problem generator, which emits problems of that type. For example, if a problem is "What is the angle between the hour and minute hands at 9am?" , you can imagine generalizing that to any arbitrary time and calculating answers using Python code, and possibly generating synthetic variations of the prompt text. - All of the data above could be nicely indexed and embedded into a RAG database for later reference, or maybe MCP servers that make it available. Then just as a (human) student could take a high school physics course, an LLM could take it in the exact same way. This would be a significantly richer source of legible, workable information for an LLM than just something like pdf-to-text (current prevailing practice), which simply asks the LLM to predict the textbook content top to bottom token by token (umm - lame). As just a quick and crappy example of synthetic variations of the above example, GPT-5 gave me this problem generator (see image), which can now generalize that problem template to many variations: - When the time is 11:07 a.m., what is the degree measure of the angle between the hands? (Answer: 68) - Determine the angle in degrees between the clock’s hands at 4:14 a.m.. (Answer: 43) - What angle do the clock hands form when the time reads 11:47 a.m.? (Answer: 71) - At 7:02 a.m., what angle separates the hour hand and the minute hand? (Answer: 161) - At 4:14 a.m., calculate the angle made between the two hands. (Answer: 43) - What angle is formed by the hands of a clock at 4:45 p.m.? (Answer: 127) - What is the angle between the hour and minute hands at 8:37 p.m.? (Answer: 36) (infinite practice problems can be created...)

281

651

5,627

721,956

Curiosity

Satyabrata pal retweeted

Curiosity

@CuriosityonX

20 Aug 2025

Tiny Moon 'Daphnis' creating giant waves in Saturn's Rings.

ALT Daphnis in the Keeler Gap. Rendered with Autodesk Maya and Adobe Photoshop by Kevin Gill.

122

917

7,818

249,885

Satyabrata pal

Satyabrata pal @TheCodingProjec

26 Jun 2025

41years of wait and India has come far since then.

वरिष्ठ प्र बंधक

@Bebasbankerz

25 Jun 2025

Congratulations and proud moment for India, after 41 years, in the space, Shubhanshu Shukla, a new chapter in the Indian Space Mission. 👏👏👏🌌🛸🇮🇳🇮🇳 #ISRO #NASA

Andrej Karpathy

Satyabrata pal retweeted

Andrej Karpathy

@karpathy

27 May 2025

So so so cool. Llama 1B batch one inference in one single CUDA kernel, deleting synchronization boundaries imposed by breaking the computation into a series of kernels called in sequence. The *optimal* orchestration of compute and memory is only achievable in this way.

Benjamin F Spector

@bfspector

27 May 2025

(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces. So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel. Megakernels are faster & more humane. Here’s how to treat your Llamas ethically: (Joint with @jordanjuravsky, @stuart_sul, @OwenDugan, @dylan__lim, @realDanFu, @simran_s_arora, and @HazyResearch)

229

2,006

268,032

Andrew Ng

Satyabrata pal retweeted

Andrew Ng

@AndrewYNg

27 May 2025

Agentic Document Extraction just got much faster! From previous 135sec median processing time down to 8sec. Extracts not just text but diagrams, charts, and form fields from PDFs to give LLM-ready output. Please see the video for details and some application ideas.

4:01

561

3,706

290,373

Dr. S. Jaishankar

Satyabrata pal retweeted

Dr. S. Jaishankar

@DrSJaishankar

7 May 2025

The world must show zero tolerance for terrorism. #OperationSindoor

6,385

43,535

278,268

8,035,886

elvis

Satyabrata pal retweeted

elvis

@omarsar0

16 Apr 2025

BREAKING: OpenAI introduces new o-series models o3 and o4-mini OpenAI claims that these are models that can produce novel and useful ideas. Here is all you need to know:

309

67,704

Satyabrata pal

Satyabrata pal @TheCodingProjec

7 Mar 2025

Great thought by @Thom_Wolf on what it may take to create an AI that can create new things instead of just generating stuff from it’s training data.

Thomas Wolf

@Thom_Wolf

6 Mar 2025

I shared a controversial take the other day at an event and I decided to write it down in a longer format: I’m afraid AI won't give us a "compressed 21st century". The "compressed 21st century" comes from Dario's "Machine of Loving Grace" and if you haven’t read it, you probably should, it’s a noteworthy essay. In a nutshell the paper claims that, over a year or two, we’ll have a "country of Einsteins sitting in a data center”, and it will result in a compressed 21st century during which all the scientific discoveries of the 21st century will happen in the span of only 5-10 years. I read this essay twice. The first time I was totally amazed: AI will change everything in science in 5 years, I thought! A few days later I came back to it and, re-reading it, I realized that much of it seemed like wishful thinking at best. What we'll actually get, in my opinion, is “a country of yes-men on servers” (if we just continue on current trends). Let me explain the difference with a small part of my personal story. I’ve always been a straight-A student. Coming from a small village, I joined the top French engineering school before getting accepted to MIT for PhD. School was always quite easy for me. I could just get where the professor was going, where the exam's creators were taking us and could predict the test questions beforehand. That’s why, when I eventually became a researcher (more specifically a PhD student), I was completely shocked to discover that I was a pretty average, underwhelming, mediocre researcher. While many colleagues around me had interesting ideas, I was constantly hitting a wall. If something was not written in a book I could not invent it unless it was a rather useless variation of a known theory. More annoyingly, I found it very hard to challenge the status-quo, to question what I had learned. I was no Einstein, I was just very good at school. Or maybe even: I was no Einstein in part *because* I was good at school. History is filled with geniuses struggling during their studies. Edison was called "addled" by his teacher. Barbara McClintock got criticized for "weird thinking" before winning a Nobel Prize. Einstein failed his first attempt at the ETH Zurich entrance exam. And the list goes on. The main mistake people usually make is thinking Newton or Einstein were just scaled-up good students, that a genius comes to life when you linearly extrapolate a top-10% student. This perspective misses the most crucial aspect of science: the skill to ask the right questions and to challenge even what one has learned. A real science breakthrough is Copernicus proposing, against all the knowledge of his days -in ML terms we would say “despite all his training dataset”-, that the earth may orbit the sun rather than the other way around. To create an Einstein in a data center, we don't just need a system that knows all the answers, but rather one that can ask questions nobody else has thought of or dared to ask. One that writes 'What if everyone is wrong about this?' when all textbooks, experts, and common knowledge suggest otherwise. Just consider the crazy paradigm shift of special relativity and the guts it took to formulate a first axiom like “let’s assume the speed of light is constant in all frames of reference” defying the common sense of these days (and even of today…) Or take CRISPR, generally considered to be an adaptive bacterial immune system since the 80s until, 25 years after its discovery, Jennifer Doudna and Emmanuelle Charpentier proposed to use it for something much broader and general: gene editing, leading to a Nobel prize. This type of realization –"we've known XX does YY for years, but what if we've been wrong about it all along? Or what if we could apply it to the entirely different concept of ZZ instead?” is an example of out-side-of-knowledge thinking –or paradigm shift– which is essentially making the progress of science. Such paradigm shifts happen rarely, maybe 1-2 times a year and are usually awarded Nobel prizes once everybody has taken stock of the impact. However rare they are, I agree with Dario in saying that they take the lion’s share in defining scientific progress over a given century while the rest is mostly noise. Now let’s consider what we’re currently using to benchmark recent AI model intelligence improvement. Some of the most recent AI tests are for instance the grandiosely named "Humanity's Last Exam" or "Frontier Math". They consist of very difficult questions –usually written by PhDs– but with clear, closed-end, answers. These are exactly the kinds of exams where I excelled in my field. These benchmarks test if AI models can find the right answers to a set of questions we already know the answer to. However, real scientific breakthroughs will come not from answering known questions, but from asking challenging new questions and questioning common conceptions and previous ideas. Remember Douglas Adams' Hitchhiker's Guide? The answer is apparently 42, but nobody knows the right question. That's research in a nutshell. In my opinion this is one of the reasons LLMs, while they already have all of humanity's knowledge in memory, haven't generated any new knowledge by connecting previously unrelated facts. They're mostly doing "manifold filling" at the moment - filling in the interpolation gaps between what humans already know, somehow treating knowledge as an intangible fabric of reality. We're currently building very obedient students, not revolutionaries. This is perfect for today’s main goal in the field of creating great assistants and overly compliant helpers. But until we find a way to incentivize them to question their knowledge and propose ideas that potentially go against past training data, they won't give us scientific revolutions yet. If we want scientific breakthroughs, we should probably explore how we’re currently measuring the performance of AI models and move to a measure of knowledge and reasoning able to test if scientific AI models can for instance: - Challenge their own training data knowledge - Take bold counterfactual approaches - Make general proposals based on tiny hints - Ask non-obvious questions that lead to new research paths We don't need an A student who can answer every question with general knowledge. We need a B student who sees and questions what everyone else missed. --- PS: You might be wondering what such a benchmark could look like. Evaluating it could involve testing a model on some recent discovery it should not know yet (a modern equivalent of special relativity) and explore how the model might start asking the right questions on a topic it has no exposure to the answers or conceptual framework of. This is challenging because most models are trained on virtually all human knowledge available today but it seems essential if we want to benchmark these behaviors. Overall this is really an open question and I’ll be happy to hear your insightful thoughts.

Thomas Wolf

Satyabrata pal retweeted

Thomas Wolf

@Thom_Wolf

27 Feb 2025

I want to share bit of context on today's new releases from DeepSeek: three very small (0-500 lines of code), self-contained, yet fascinating newly open-sourced repositories. Let's dive in! 1. The first one is just data: DeepSeek/Profile-data (links at the end) While this repo doesn't contain any code files, it's still extremely interesting. This is profile data that shows in detail and with real recorded data how low-level operations are scheduled to make sure GPUs are kept busy at all times during the training and inference of DeepSeek V3/R1 (see the "profiling session" in the Ultra-Scale Playbook: link at the end) It serves as the organizational blueprint (essentially a Gantt diagram) of the most efficient open-source pretraining to date. A great example to study and something I would love to see released more often by open-source teams: scheduling operations in the most efficient way is the core of large-scale LLM training nowadays. 2. The second one is a very small code snippet (164 LoC) on how to balance the load of experts in mixture-of-expert (MoE): see link at the end. It's impressive that they extracted and released such a core technique for efficiently balancing the load among experts in a self-contained format. You can read more about this in the "Expert Parallelism" section of the Ultra-Scale Playbook. This will make the technique easy to incorporate into most distributed codebases. Congrats 3. Today's last release is larger (500 LoC) and perhaps covers the most fascinating technical part of DeepSeek V3/R1 training: the new DualPipe pipeline parallelism (PP) approach (link at the end). For the first time in large-scale training, the DeepSeek team was able to train using what they called a "zero-bubble regime" in PP, something never before reported in a SOTA large-scale training. If you don't know what a "bubble" or "pipeline parallelism" is, you can check the Pipeline Parallelism section in the Ultra-Scale Playbook. This is perhaps the most impressive part of the DeepSeek technical report! Having a small, standalone codebase that was apparently able to reach this regime is fascinating. I'm so excited to try it! --- Overall, I really like their focus on open-sourcing many independent code modules, each on a specific technique and with examples. Now it remains to be seen whether other teams can use and integrate these code samples and reproduce the results claimed by DeepSeek in their paper. Looking forward to seeing more extremely efficient training available for everyone! --- Links: 1. profile-data: github.com/deepseek-ai/profi… 2. Expert balancing: github.com/deepseek-ai/EPLB 3. Dualpipe: github.com/deepseek-ai/DualP… and the Ultra-Scale Playbook: huggingface.co/spaces/nanotr…

104

601

51,310

Asheesh Arora

Satyabrata pal retweeted

Asheesh Arora @AsheeshAroraEV

7 Jan 2025

Shoutout to @Glida for their incredible #satellitechargers at #DLFcyberPark! Even in low temperatures, they perform #flawlessly, with no signs of #colgating, delivering peak performance instantly. A true #GameChanger for #EVcharging in extreme conditions. #GoGreen #LoyalChargers

0:08

1,321

Logan Kilpatrick

Satyabrata pal retweeted

Logan Kilpatrick

@OfficialLoganK

19 Dec 2024

Just when you thought it was over... we’re introducing Gemini 2.0 Flash Thinking, a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more 🧵

286

495

5,352

978,988

Andrej Karpathy

Satyabrata pal retweeted

Andrej Karpathy

@karpathy

19 Dec 2024

The new Gemini 2.0 Flash Thinking model (Gemini version of GPT o1 that takes a while to think before responding) is very nice and fast and now available to try on Google AI Studio 🧑‍🍳👏. The prominent and pleasant surprise here is that unlike o1 the reasoning traces of the model are shown. As a user I personally really like this because the reasoning itself is interesting to see and read - the models actively think through different possibilities, ideas, debate themselves, etc., it's part of the value add. The case against showing these is typically a concern of someone collecting the reasoning traces and training to imitate them on top of a different base model, to gain reasoning ability possibly and to some extent.

Jeff Dean

@JeffDean

19 Dec 2024

Introducing Gemini 2.0 Flash Thinking, an experimental model that explicitly shows its thoughts. Built on 2.0 Flash’s speed and performance, this model is trained to use thoughts to strengthen its reasoning. And we see promising results when we increase inference time computation!

128

418

5,014

627,039

Shubhabrata ‘Shumi’ Marmar

Satyabrata pal retweeted

Shubhabrata ‘Shumi’ Marmar @shumar

15 Dec 2024

Almost exactly two years into this adventure w @kartiksinghee... the MotorInc app is ready! Easiest way to sign up is at motorinc.com

MotorInc — Vehicles. Trips. Experiences.

Discover cars, motorcycles, and scooters in India. Expert reviews, video guides, and price comparisons.

motorinc.com

MotorInc @themotorinc

15 Dec 2024

Heres the last (really) episode of #ThisConnect Season 02, and we've an #announcement. The #MotorInc app is now ready, and we'd like you to take it for a spin. ▶️ youtu.be/ImWPUe_QYd4 ~ Download the #MotorincApp from motorinc.com - #BuiltForYou #BuiltNotBought

The most important episode (something new for you!) | ThisConnect S02E31 https://youtu.be/up92sVhmA_Y

Heres the last (really) episode of Season 02, and we've an announcement. The MotorInc app is now ready, and we'd like you to take it for a spin.

▶️ https://youtu.be/ImWPUe_QYd4
~
Download the Motorinc app from www.motorinc.com
~
ThisConnect is our podcast. If you haven't already, check out Season 01 on our channel. On the pod, Kartikeya and Shumi discuss a range of topics of automotive interest, from trends to important things we need to think about to just celebrating the automotive lifestyle.
~
#MotorInc #MotorIncThisConnect #ThisConnect #Podcast #App #TheMotorIncApp #BuiltForYou #BuiltNotBought #MoveBetter #TheMovementAboutMovement

ALT The most important episode (something new for you!) | ThisConnect S02E31 https://youtu.be/up92sVhmA_Y Heres the last (really) episode of Season 02, and we've an announcement. The MotorInc app is now ready, and we'd like you to take it for a spin. ▶️ https://youtu.be/ImWPUe_QYd4 ~ Download the Motorinc app from www.motorinc.com ~ ThisConnect is our podcast. If you haven't already, check out Season 01 on our channel. On the pod, Kartikeya and Shumi discuss a range of topics of automotive interest, from trends to important things we need to think about to just celebrating the automotive lifestyle. ~ #MotorInc #MotorIncThisConnect #ThisConnect #Podcast #App #TheMotorIncApp #BuiltForYou #BuiltNotBought #MoveBetter #TheMovementAboutMovement

3,123

Humans Of EV - Community

Satyabrata pal retweeted

Humans Of EV - Community @humans_of_Ev

18 Nov 2024

🚀 Exciting News! 🚀 Join our webinar "Accelerating EV Adoption: Policies, Hurdles, & Solutions" Nov 21, 3 PM Learn from EV experts on challenges like charging infra, policy gaps, & consumer confidence! Register Now: eventyay.com/e/87d6231b #Sustainability #Webinar

855

Asheesh Arora

Satyabrata pal retweeted

Asheesh Arora @AsheeshAroraEV

27 Nov 2024

Delhi is choking on pollution, and yet, the city lacks an active and updated EV policy to combat it effectively. @AamAadmiParty must urgently act to push EV adoption. It's time for @MORTHIndia, @MoHFW_INDIA, and @MoRTHRoadSafety to step up. #DelhiPollution #EVPolicy #SaveDelhi

5,102