:|

Joined March 2017
79 Photos and videos
Pinned Tweet
Golden rule of the second foundation: do nothing unless you must, and when you must act - hesitate
6
1,884
jere:D 고수 retweeted
A technical dive inside our new "Midjourney Scanner"
851
2,044
17,644
5,154,079
jere:D 고수 retweeted
Dinner conversation today: @AlecRad revealing his tips and tricks for distinguishing CNN/RNN/RL training by the sound the GPU makes.
9
35
366
jere:D 고수 retweeted
Jun 8
what post do you want to see next? - how our LoRA kernels work - how we made PyTorch faster than MLX on MPS - how we built faster image resizing than PIL/pyvips - me ranting about blackwell
Jun 4
Wrote a post about how Photon (Moondream's inference engine) hides GPU bubbles using pipelined decoding. Speeding up inference by up to 35%.
34
13
390
24,047
jere:D 고수 retweeted
One professor at the University of Bonn quietly put his entire robotics curriculum on YouTube: SLAM. Sensor fusion. State estimation. Probabilistic robotics. Self-driving cars. Motion planning. Photogrammetry. Cyrill Stachniss has been uploading full university lectures for years! Each topic is a complete playlist; the kind of material that normally costs a semester of tuition. He's one of the most cited researchers in mobile robotics and mapping. His students go on to build the navigation stacks powering real autonomous systems. If you're serious about understanding how robots know where they are... this is the place to start. Free. On YouTube. 📌 [youtube.com/@CyrillStachniss] —— Weekly robotics and AI insights. Subscribe free: 22astronauts.com
1
117
770
21,880
jere:D 고수 retweeted
Blog post about my recent optimal tokenizer exploration blog.aqnichol.com/2026/06/10…

4
5
46
3,913
jere:D 고수 retweeted
All 15 @CVPR 2026 Paper Finalists can now be easily explored here paperswithcode.co/conference… Find GitHub links, @huggingface artifacts, and evals
Replying to @CVPR
Finalists 👏
3
24
130
34,826
jere:D 고수 retweeted
OpenCV 5 is here! This is the biggest update in years for computer vision: >Brand new DNN engine with 80% ONNX coverage >Built-in LLM & VLM support >Faster performance (often beating ONNX Runtime) >Better 3D vision, Python integration, and hardware acceleration OpenCV is not just a computer vision library but the settle stone for millions of projects.
27
307
1,672
113,343
jere:D 고수 retweeted
ETH Zurich just open-sourced their entire 2026 robot learning course. Not a MOOC. The actual course. Slides, lecture recordings, coding assignments, GitHub repo. The curriculum goes from imitation learning and RL all the way to Vision-Language-Action models and foundation models for robotics. Guest lectures from the co-founder of Physical Intelligence. The creator of Diffusion Policy. Pieter Abbeel. Dieter Fox. 12 weeks. Free. No signup. If you want to understand where robot intelligence is actually heading… this is the reading list the field is using right now. 📍[cvg.ethz.ch/lectures/Robot-L…] —— Weekly robotics and AI insights. Subscribe free: 22astronauts.com
21
312
2,115
116,551
jere:D 고수 retweeted
I think this is my favorite paper this CVPR: Magician. before they explore in active view selection, they imagine how gaussians and occupancy map would look like and then compute a coverage metric based on that. during planning, they try 10 views like that in 10 steps in a tree search with pruning and get planning for free. they even have real-world experiments with a drone and a toy car. how are they not an award candidate, it blows my mind.
4
21
202
20,148
jere:D 고수 retweeted
If you used pretrained vision encoders like DINO, this is for you––we found a simple post-training recipe to improve DINO features! CVPR Highlight Paper: cvpr.thecvf.com/virtual/2026… Code: github.com/google-deepmind/r… Poster #63 on Sunday, June 7 at 3-5:30pm. Details in thread.
2
10
128
7,297
jere:D 고수 retweeted
こんな面白い研究あったのね データセットの重複しないサブセット2つ用意してそれぞれで別の拡散モデル訓練する時、データ数増やしてゆくと同じノイズが似たような画像を作るようになる、と openreview.net/forum?id=ANvm…
2
20
184
15,898
jere:D 고수 retweeted
Opus 4.8 just dropped and I ran it through our CAD tasks. 4.6 → 4.7 → 4.8 side by side. The results are unexpected!
198
193
3,532
707,928
jere:D 고수 retweeted
1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI
15
84
372
145,967
jere:D 고수 retweeted
if you are interested in taking a sneak peek at what might be going on in claude code dynamic workflow feature check out this 2h classic
New in Claude Code (research preview): dynamic workflows. Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks. Use the word "workflow" in a prompt to get started.
12
39
462
47,177
jere:D 고수 retweeted
NVIDIA's LocateAnything is a new vision model for grounding and detection. Very performant and accurate! > 10x faster than Qwen3-VL > 138M queries 785M boxes > GUI, OCR, docs, dense detection > Free & open source research.nvidia.com/labs/lpr…
33
253
2,270
120,342
jere:D 고수 retweeted
For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblock… What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffusio… 🐟
154
638
5,758
743,421
jere:D 고수 retweeted
Flash-KMeans was only the beginning. Today, from the Flash-KMeans team, we are releasing FlashLib — a GPU library for fast, predictable, agent-ready classical ML operators. Up to 26× on KMeans, 19× on KNN, 40× on HDBSCAN, 208× on TruncatedSVD, 47× on PCA, 147× on exact t-SNE, and 49× on MultinomialNB over state-of-the-art (cuML). Blog: flashml-org.github.io/ Code: github.com/FlashML-org/flash…
47
237
1,606
866,470
jere:D 고수 retweeted
new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i cover YaRN (why does pairwise coordinate rotation induce positional information?), hybrid attention (getting to 160k context length), soft capping, QK normalization, etc. as the token flows through the transformer bonus transformer math: FLOPs/token formula (and when is 6N formula broken), cluster sizing (how big of a cluster do you need given the model/data size and experiment throughput of interest), and more
22
143
1,015
49,168