jere:D 고수

jere:D 고수

79 Photos and videos

Tweets

Pinned Tweet

jere:D 고수 @CoolMFcat

30 Jun 2023

Golden rule of the second foundation: do nothing unless you must, and when you must act - hesitate

1,884

Midjourney

jere:D 고수 retweeted

Midjourney

@midjourney

13h

A technical dive inside our new "Midjourney Scanner"

4:38

851

2,044

17,644

5,154,079

Andrej Karpathy

jere:D 고수 retweeted

Andrej Karpathy

@karpathy

17 May 2017

Dinner conversation today: @AlecRad revealing his tips and tricks for distinguishing CNN/RNN/RL training by the sound the GPU makes.

366

vik

jere:D 고수 retweeted

vik

@vikhyatk

Jun 8

what post do you want to see next? - how our LoRA kernels work - how we made PyTorch faster than MLX on MPS - how we built faster image resizing than PIL/pyvips - me ranting about blackwell

vik

@vikhyatk

Jun 4

Wrote a post about how Photon (Moondream's inference engine) hides GPU bubbles using pipelined decoding. Speeding up inference by up to 35%.

390

24,047

Ilir Aliu

jere:D 고수 retweeted

Ilir Aliu

@IlirAliu_

Jun 11

One professor at the University of Bonn quietly put his entire robotics curriculum on YouTube: SLAM. Sensor fusion. State estimation. Probabilistic robotics. Self-driving cars. Motion planning. Photogrammetry. Cyrill Stachniss has been uploading full university lectures for years! Each topic is a complete playlist; the kind of material that normally costs a semester of tuition. He's one of the most cited researchers in mobile robotics and mapping. His students go on to build the navigation stacks powering real autonomous systems. If you're serious about understanding how robots know where they are... this is the place to start. Free. On YouTube. 📌 [youtube.com/@CyrillStachniss] —— Weekly robotics and AI insights. Subscribe free: 22astronauts.com

117

770

21,880

Alex Nichol

jere:D 고수 retweeted

Alex Nichol @unixpickle

Jun 11

Blog post about my recent optimal tokenizer exploration blog.aqnichol.com/2026/06/10…

3,913

Niels Rogge

jere:D 고수 retweeted

Niels Rogge @NielsRogge

Jun 7

All 15 @CVPR 2026 Paper Finalists can now be easily explored here paperswithcode.co/conference… Find GitHub links, @huggingface artifacts, and evals

1:53

#CVPR2026 @CVPR

Jun 5

Replying to @CVPR

Finalists 👏

130

34,826

Marco Franzon

jere:D 고수 retweeted

Marco Franzon

@mfranz_on

Jun 9

OpenCV 5 is here! This is the biggest update in years for computer vision: >Brand new DNN engine with 80% ONNX coverage >Built-in LLM & VLM support >Faster performance (often beating ONNX Runtime) >Better 3D vision, Python integration, and hardware acceleration OpenCV is not just a computer vision library but the settle stone for millions of projects.

307

1,672

113,343

Ilir Aliu

jere:D 고수 retweeted

Ilir Aliu

@IlirAliu_

Jun 10

ETH Zurich just open-sourced their entire 2026 robot learning course. Not a MOOC. The actual course. Slides, lecture recordings, coding assignments, GitHub repo. The curriculum goes from imitation learning and RL all the way to Vision-Language-Action models and foundation models for robotics. Guest lectures from the co-founder of Physical Intelligence. The creator of Diffusion Policy. Pieter Abbeel. Dieter Fox. 12 weeks. Free. No signup. If you want to understand where robot intelligence is actually heading… this is the reading list the field is using right now. 📍[cvg.ethz.ch/lectures/Robot-L…] —— Weekly robotics and AI insights. Subscribe free: 22astronauts.com

312

2,115

116,551

F. Güney

jere:D 고수 retweeted

F. Güney @ftm_guney

Jun 7

I think this is my favorite paper this CVPR: Magician. before they explore in active view selection, they imagine how gaussians and occupancy map would look like and then compute a coverage metric based on that. during planning, they try 10 views like that in 10 steps in a tree search with pruning and get planning for free. they even have real-world experiments with a drone and a toy car. how are they not an award candidate, it blows my mind.

202

20,148

Rishabh Kabra

jere:D 고수 retweeted

Rishabh Kabra @RishabhKabra

Jun 6

If you used pretrained vision encoders like DINO, this is for you––we found a simple post-training recipe to improve DINO features! CVPR Highlight Paper: cvpr.thecvf.com/virtual/2026… Code: github.com/google-deepmind/r… Poster #63 on Sunday, June 7 at 3-5:30pm. Details in thread.

128

7,297

danb

jere:D 고수 retweeted

danb

@dnbt777

Jun 5

Replying to @Laz4rz

2ez youtu.be/ONZcjs1Pjmk?t=90

Eulerian Video Magnification

The video accompanying our SIGGRAPH 2012 paper "Eulerian Video Magn...

youtube.com

487

Kent Fujiwara

jere:D 고수 retweeted

Kent Fujiwara @kentfuji

Jun 4

こんな面白い研究あったのねデータセットの重複しないサブセット2つ用意してそれぞれで別の拡散モデル訓練する時、データ数増やしてゆくと同じノイズが似たような画像を作るようになる、と openreview.net/forum?id=ANvm…

184

15,898

vik

jere:D 고수 retweeted

vik

@vikhyatk

Jun 4

moondream.ai/blog/popping-th…

Popping the GPU Bubble | Moondream

Photon, Moondream's inference engine, achieves near-realtime VLM inference (~33ms on NVIDIA B200). This is a peek into how it delivers up to 35% higher decode throughput by optimizing how the GPU...

moondream.ai

2,587

Michael Rabinovich

jere:D 고수 retweeted

Michael Rabinovich

@MikushRab

May 29

Opus 4.8 just dropped and I ran it through our CAD tasks. 4.6 → 4.7 → 4.8 side by side. The results are unexpected!

0:30

198

193

3,532

707,928

Keshigeyan Chandrasegaran

jere:D 고수 retweeted

Keshigeyan Chandrasegaran

@keshigeyan

May 29

1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI

372

145,967

Yacine Mahdid

jere:D 고수 retweeted

Yacine Mahdid

@yacinelearning

May 28

if you are interested in taking a sneak peek at what might be going on in claude code dynamic workflow feature check out this 2h classic

ClaudeDevs

@ClaudeDevs

May 28

New in Claude Code (research preview): dynamic workflows. Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks. Use the word "workflow" in a prompt to get started.

462

47,177

⚡AI Search⚡

jere:D 고수 retweeted

⚡AI Search⚡

@aisearchio

May 27

NVIDIA's LocateAnything is a new vision model for grounding and detection. Very performant and accurate! > 10x faster than Qwen3-VL > 138M queries 785M boxes > GUI, OCR, docs, dense detection > Free & open source research.nvidia.com/labs/lpr…

1:20

253

2,270

120,342

hardmaru

jere:D 고수 retweeted

hardmaru

@hardmaru

May 27

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

DiffusionBlocks: Block-wise Neural Network Training via Diffusion...

End-to-end backpropagation requires storing activations throughout all layers, creating memory bottlenecks that limit model scalability. Existing block-wise training methods offer means to...

arxiv.org

Sakana AI

@SakanaAILabs

May 27

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblock… What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffusio… 🐟

154

638

5,758

743,421

Shuo Yang

jere:D 고수 retweeted

Shuo Yang

@Andy_ShuoYang

May 27

Flash-KMeans was only the beginning. Today, from the Flash-KMeans team, we are releasing FlashLib — a GPU library for fast, predictable, agent-ready classical ML operators. Up to 26× on KMeans, 19× on KNN, 40× on HDBSCAN, 208× on TruncatedSVD, 47× on PCA, 147× on exact t-SNE, and 49× on MultinomialNB over state-of-the-art (cuML). Blog: flashml-org.github.io/ Code: github.com/FlashML-org/flash…

0:20

237

1,606

866,470

Aleksa Gordić (水平问题)

jere:D 고수 retweeted

Aleksa Gordić (水平问题)

@gordic_aleksa

May 26

new in-depth blog post time: Inside the Transformer: The Life of a Token a deep dive into a modern dense transformer, i cover YaRN (why does pairwise coordinate rotation induce positional information?), hybrid attention (getting to 160k context length), soft capping, QK normalization, etc. as the token flows through the transformer bonus transformer math: FLOPs/token formula (and when is 6N formula broken), cluster sizing (how big of a cluster do you need given the model/data size and experiment throughput of interest), and more

143

1,015

49,168