Johan Edstedt

Johan Edstedt

7 Photos and videos

Tweets

Nuno Rodrigues retweeted

Johan Edstedt @Parskatt

Apr 7

Introducing LoMa, the next generation of feature matcher!

294

36,759

SkalskiP

Nuno Rodrigues retweeted

SkalskiP

@skalskip92

15 Nov 2025

pretty crazy what you can build with RF-DETR, supervision and 10 lines of code

0:03

SkalskiP

@skalskip92

13 Nov 2025

RF-DETR paper is finally on arXiv - real time detection with DINOv2 backbone - runs neural architecture search (NAS) over about 6000 architecture variants - uses weight sharing across all configs - first real-time segmentation DETR to break past top YOLO results ↓ more

0:10

128

1,261

295,985

Nuno Rodrigues

Nuno Rodrigues @nmvrodrigues

4 Nov 2025

Got inspired by @skalskip92 and decided to do a side project on sports analytics to get back into computer vision and learn some new things. Initial version of the Padel-AI, looking for more insights to extract and start on action recognition for the different swings

1:20

2,943

Xiaojian Ma

Nuno Rodrigues retweeted

Xiaojian Ma

@jeasinema

23 Oct 2025

Maybe embodied RAG could be better off? Since our embodied-videoagent.github.i…, glad to see more efforts pouring in for building robot memories.

stash

@stash_pomichter

21 Oct 2025

Introducing Spatial Memory for your robots. Spatiotemporal RAG. Open source. Coming soon.

0:32

345

28,698

stash

Nuno Rodrigues retweeted

stash

@stash_pomichter

21 Oct 2025

Introducing Spatial Memory for your robots. Spatiotemporal RAG. Open source. Coming soon.

0:32

213

1,875

142,081

Andrej Karpathy

Nuno Rodrigues retweeted

Andrej Karpathy

@karpathy

20 Oct 2025

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

vLLM

@vllm_project

20 Oct 2025

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/DeepS… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

558

1,558

13,283

3,329,629

Zhenjun Zhao

Nuno Rodrigues retweeted

Zhenjun Zhao @zhenjun_zhao

20 Oct 2025

CuSfM: CUDA-Accelerated Structure-from-Motion Jingrui Yu, Jun Liu, Kefei Ren, @Joydeepb_robots, Rurui Ye, Keqiang Wu, Chirag Majithia, Di Zeng tl;dr: in title; ALIKED LightGlue arxiv.org/abs/2510.15271

110

24,766

Chester

Nuno Rodrigues retweeted

Chester

@chesterzelaya

6 Oct 2025

< Choosing a Vision Backbone > your model’s backbone is its perspective pick ResNet, and it sees in edges pick a ViT, and it sees in patches the backbone decides how your model thinks here are some of the most practical backbones and when you should choose them, from the paper "Battle of the Backbones" (2023): > ResNet - good for fast prototyping, small models, and edge devices > ConvNeXt - great all-purpose backbone; strong for detection & segmentation > Swin Transformer (V2) - best for large-scale detection, segmentation, and high-res inputs > ViT (Vision Transformer) - good when you have huge datasets; less bias, more global context > CLIP - best for vision-language, zero-shot, and retrieval tasks > DINO / MoCo / MAE (SSL) - great when you have little or no labeled data > MiDaS - surprisingly strong if you care about depth, geometry, or robotics perception > Stable Diffusion Encoder - useful for creative or aesthetic tasks; not for accuracy-critical CV > EfficientNet / RegNet / ResNet-18 - good lightweight options for edge or mobile deployment

110

983

58,644

Chester

Nuno Rodrigues retweeted

Chester

@chesterzelaya

1 Oct 2025

v0.2.0 RELEASE so much work went behind this UI/UX overhaul build autonomous drone agents, wirelessly powered by external GPU's to run the heaviest of AI models up to 10km range win / linux version coming later next week!

241

17,626

Gabriele Berton

Nuno Rodrigues retweeted

Gabriele Berton

@gabriberton

24 Sep 2025

[paper release!] Did you know that you can - speed up any LLM by 4x - and reduce its memory footprint by 2x - and improve its results - without modifying the model at all How??? Here is how we do it 🧵

Gabriele Berton

@gabriberton

23 Sep 2025

Did you know that you can - speed up any LLM by 4x - and reduce its memory footprint by 2x - and improve its results - without modifying the model at all How??? Paper and code coming out in a couple of days

724

129,094

Sakana AI

Nuno Rodrigues retweeted

Sakana AI

@SakanaAILabs

25 Sep 2025

We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: sakana.ai/shinka-evolve/ Code: github.com/SakanaAI/ShinkaEv… Like AlphaEvolve and its variants, our framework leverages LLMs to find state-of-the-art solutions to complex problems, but using orders of magnitude fewer resources! Many evolutionary AI systems are powerful but act like brute-force engines, burning thousands of samples to find good solutions. This makes discovery slow and expensive. We took inspiration from the efficiency of nature. ‘Shinka’ (進化) is Japanese for evolution, and we designed our system to be just as resourceful. On the classic circle packing optimization problem, ShinkaEvolve discovered a new state-of-the-art solution using only 150 samples. This is a big leap in efficiency compared to previous methods that required thousands of evaluations. We applied ShinkaEvolve to a diverse set of hard problems with real-world applications: 1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering an entire Pareto frontier of solutions trading performance for efficiency. 2/ Competitive Programming: On ALE-Bench (a benchmark for NP-Hard optimization problems), ShinkaEvolve took the best existing agent's solutions and improved them, turning a 5th place solution on one task into a 2nd place leaderboard rank in a competitive programming competition. 3/ LLM Training: We even turned ShinkaEvolve inward to improve LLMs themselves. It tackled the open challenge of designing load balancing losses for Mixture-of-Experts (MoE) models. It discovered a novel loss function that leads to better expert specialization and consistently improves model performance and perplexity. ShinkaEvolve achieves its remarkable sample-efficiency through three key innovations that work together: (1) an adaptive parent sampling strategy to balance exploration and exploitation, (2) novelty-based rejection filtering to avoid redundant work, and (3) a bandit-based LLM ensemble that dynamically picks the best model for the job. By making ShinkaEvolve open-source and highly sample-efficient, our goal is to democratize access to advanced, open-ended discovery tools. Our vision for ShinkaEvolve is to be an easy-to-use companion tool to help scientists and engineers with their daily work. We believe that building more efficient, nature-inspired systems is key to unlocking the future of AI-driven scientific research. We are excited to see what the community builds with it! Learn more in our technical report: arxiv.org/abs/2509.19349

0:12

249

1,390

359,210

tomaarsen

Nuno Rodrigues retweeted

tomaarsen @tomaarsen

9 Sep 2025

ModernBERT goes MULTILINGUAL! One of the most requested models I've seen, @jhuclsp has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT. Stronger than an existing models at their sizes, while also much faster! Details in 🧵

267

27,335

Vector Wang

Nuno Rodrigues retweeted

Vector Wang

@VectorWang2

2 Sep 2025

XLeRobot 0.3.0 Showcases Open fridge, get drinks, fill ice, wipe table, clean room, take care plants and cats... All for 660$, fully open-sourced, based on HF LeRobot. Teleop with Joy-con, or RL/VLA. Assembly kit ready for purchase soon Stay tuned! github.com/Vector-Wangel/XLe…

2:00

316

18,416

Rohan Paul

Nuno Rodrigues retweeted

Rohan Paul

@rohanpaul_ai

31 Aug 2025

BRILLIANT @GoogleDeepMind research. Even the best embeddings cannot represent all possible query-document combinations, which means some answers are mathematically impossible to recover. Reveals a sharp truth, embedding models can only capture so many pairings, and beyond that, recall collapses no matter the data or tuning. 🧠 Key takeaway Embeddings have a hard ceiling, set by dimension, on how many top‑k document combinations they can represent exactly. They prove this with sign‑rank bounds, then show it empirically and with a simple natural‑language dataset where even strong models stay under 20% recall@100. When queries force many combinations, single‑vector retrievers hit that ceiling, so other architectures are needed. 4096‑dim embeddings already break near 250M docs for top‑2 combinations, even in the best case. 🛠️ Practical Implications For applications like search, recommendation, or retrieval-augmented generation, this means scaling up models or datasets alone will not fix recall gaps. At large index sizes, even very high-dimensional embeddings fail to capture all combinations of relevant results. So embeddings cannot work as the sole retrieval backbone. We will need hybrid setups, combining dense vectors with sparse methods, multi-vector models, or rerankers to patch the blind spots. This shifts how we should design retrieval pipelines, treating embeddings as one useful tool but not a universal solution. 🧵 Read on 👇

370

2,364

241,362

Nuno Rodrigues

Nuno Rodrigues @nmvrodrigues

29 Aug 2025

TIL is easier to setup a LORA adpater and fine tune gemma 3 than running inference on a ConvNeXt for a binary dataset

Nuno Rodrigues

Nuno Rodrigues @nmvrodrigues

27 Aug 2025

In this day and age, is it still worth it to train models at home, provided you have the gpu, considering only electricity costs VS using any cloud provider? For things that can fit under 20GB of VRAM in a single gpu

hardmaru

Nuno Rodrigues retweeted

hardmaru

@hardmaru

25 Aug 2025

Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.

Sakana AI

@SakanaAILabs

25 Aug 2025

What if we could evolve AI models like organisms in nature, letting them compete, mate, and combine their strengths to produce ever-fitter offspring? Excited to share our new work: “Competition and Attraction Improve Model Fusion” presented at GECCO’25🦎 where it was a runner-up for best paper! Paper: arxiv.org/abs/2508.16204 Code: github.com/SakanaAI/natural_… Summary of Paper At Sakana AI, we draw inspiration from nature’s evolutionary processes to build the foundation of future AI systems. Nature doesn’t create one single, monolithic organism; it fosters a diverse ecosystem of specialized individuals that compete, cooperate, and combine their traits to adapt and thrive. We believe AI development can follow a similar path. What if instead of building one giant monolithic AI, we could evolve a whole ecosystem of specialized models that collaborate and combine their skills? Like a school of fish 🐟, where collective intelligence emerges from the group. This new paper builds on our previous research on model merging, which follows such an evolutionary path. We started by using evolution to find the best “recipes” to merge existing models (our Nature Machine Intelligence paper: nature.com/articles/s42256-0…). Then, we explored how to maintain diversity to acquire new skills in LLMs (our ICLR 2025 paper: openreview.net/forum?id=Kvdh…). Now, we're combining these ideas into a full evolutionary system. A key limitation remained in earlier work: model merging required manually defining how models should be partitioned (e.g., by fixed layer or blocks) before they could be combined. What if we could let evolution figure that out too? Our new paper proposes M2N2 (Model Merging of Natural Niches), a more fluid method, which overcomes this with three key, nature-inspired ideas: 1/ Evolving Merging Boundaries 🌿: Instead of merging models using pre-defined, static boundaries (e.g. fixed layers), M2N2 dynamically evolves the “split-points” for merging. This allows for a far more flexible and powerful exploration of parameter combinations, like swapping variable-length segments of DNA rather than entire chromosomes. 2/ Diversity through Competition 🐠: To ensure we have a rich pool of models to merge, M2N2 makes them compete for limited resources (i.e., data points in a training set). This forces models to specialize and find their own “niche,” creating a population of diverse, high-performing specialists that are perfect for merging. 3/ Attraction and Mate Selection 💏: Merging models can be computationally expensive. M2N2 introduces an “attraction” heuristic that intelligently pairs models for fusion based on their complementary strengths—choosing partners that perform well where the other is weak. This makes the evolutionary search much more efficient. Does it work? The results are fascinating: This is the first time model merging has been used to evolve models entirely from scratch, outperforming other evolutionary algorithms. In one experiment, starting with random networks, M2N2 evolved an MNIST classifier that achieves performance comparable to CMA-ES, but is far more computationally efficient. Does it scale? We also showed that M2N2 can scale to large, pre-trained models: We used M2N2 to merge a math specialist LLM with an agentic specialist LLM. M2N2 produced a merged model that excelled at both math and web shopping tasks, significantly outperforming other methods. The flexible split-point was crucial here. Does it work on multimodal models? When we applied M2N2 to text-to-image models, we merged several models by adapting them only for Japanese prompts. The resulting model not only improved on Japanese but also retained its strong English capabilities—a key advantage over fine-tuning, which can suffer from catastrophic forgetting. This nature-inspired approach is central to Sakana AI’s mission to find new foundations for AI based on collective intelligence. Rather than scaling monolithic models, we envision a future where ecosystems of diverse, specialized models co-evolve, collaborate, and combine, leading to more adaptive, robust, and creative AI. 🐙 We hope this work sparks more interest in these under-explored ideas! Published in ACM GECCO’25: Proceedings of the Genetic and Evolutionary Computation Conference. DOI: doi.org/10.1145/3712256.3726…

Competition and Attraction Improve Model Fusion

Model merging is a powerful technique for integrating the specialized knowledge of multiple machine learning models into a single model. However, existing methods require manually partitioning model parameters into fixed groups for merging, which restricts the exploration of potential combinations and limits performance. To overcome these limitations, we propose M2N2, an evolutionary algorithm with three key features: 1/ dynamic adjustment of merging boundaries to progressively explore a broader range of parameter combinations; 2/ a diversity preservation mechanism inspired by the competition for resources in nature, to maintain a population of diverse, high-performing models that are particularly well-suited for merging; and 3/ a heuristic-based attraction metric to identify the most promising pairs of models for fusion. For the first time, we also demonstrate that model merging can also be used to evolve models entirely from scratch.

ALT Competition and Attraction Improve Model Fusion Model merging is a powerful technique for integrating the specialized knowledge of multiple machine learning models into a single model. However, existing methods require manually partitioning model parameters into fixed groups for merging, which restricts the exploration of potential combinations and limits performance. To overcome these limitations, we propose M2N2, an evolutionary algorithm with three key features: 1/ dynamic adjustment of merging boundaries to progressively explore a broader range of parameter combinations; 2/ a diversity preservation mechanism inspired by the competition for resources in nature, to maintain a population of diverse, high-performing models that are particularly well-suited for merging; and 3/ a heuristic-based attraction metric to identify the most promising pairs of models for fusion. For the first time, we also demonstrate that model merging can also be used to evolve models entirely from scratch.

391

65,216

Albumentations

Nuno Rodrigues retweeted

Albumentations @albumentations

24 Aug 2025

albumentations.ai/blog/2025/…

Input Normalization: What We Know, What We Don't, and Why It Works Anyway

A deep dive into input normalization: the solid mathematics for simple cases, the empirical evidence for complex networks, and the fascinating gap between what we can prove and what actually works.

albumentations.ai

347

Ahmad

Nuno Rodrigues retweeted

Ahmad

@TheAhmadOsman

21 Aug 2025

a new Agentic model that can run on a single consumer GPU at home: ByteDance Seed OSS 36B > very strong at coding > excellent at multi-turn tool calling & Agentic tasks > 500k context window as i have been saying, Bytedance is Tier S

104

984

98,495

𝗡𝗼𝗯𝘂.𝗛𝗔𝗡𝗔𝗠𝗜𝗧𝗦𝗨

Nuno Rodrigues retweeted

𝗡𝗼𝗯𝘂.𝗛𝗔𝗡𝗔𝗠𝗜𝗧𝗦𝗨

@873ch

21 Aug 2025

確かに、WEBカメラがあればドキドキ（rPPG）は実装できるよなぁということでVibe codingした GPT-5 Proに論文を調査してアプリ作ってもらって、ちょっとバグがあったからClaudeで手直ししたもの github pagesで公開したからブラウザから動かせるよ hanamitsu.github.io/873ch.rP…

0:10

ゆーじ

@yuujii

20 Aug 2025

カメラで相手のドキドキが取れるようになったwww x.com/912_/status/1954665691…

0:06

726

144,491