Senior data scientist at OLX | ex Zendesk; ex PhD @ Champalimaud

Joined December 2014
7 Photos and videos
Nuno Rodrigues retweeted
Introducing LoMa, the next generation of feature matcher!
8
36
294
36,759
Nuno Rodrigues retweeted
15 Nov 2025
pretty crazy what you can build with RF-DETR, supervision and 10 lines of code
13 Nov 2025
RF-DETR paper is finally on arXiv - real time detection with DINOv2 backbone - runs neural architecture search (NAS) over about 6000 architecture variants - uses weight sharing across all configs - first real-time segmentation DETR to break past top YOLO results ↓ more
31
128
1,261
295,985
Got inspired by @skalskip92 and decided to do a side project on sports analytics to get back into computer vision and learn some new things. Initial version of the Padel-AI, looking for more insights to extract and start on action recognition for the different swings
3
30
2,943
Nuno Rodrigues retweeted
23 Oct 2025
Maybe embodied RAG could be better off? Since our embodied-videoagent.github.i…, glad to see more efforts pouring in for building robot memories.
21 Oct 2025
Introducing Spatial Memory for your robots. Spatiotemporal RAG. Open source. Coming soon.
2
51
345
28,698
Nuno Rodrigues retweeted
21 Oct 2025
Introducing Spatial Memory for your robots. Spatiotemporal RAG. Open source. Coming soon.
66
213
1,875
142,081
Nuno Rodrigues retweeted
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...
20 Oct 2025
🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/DeepS… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning
558
1,558
13,283
3,329,629
Nuno Rodrigues retweeted
CuSfM: CUDA-Accelerated Structure-from-Motion Jingrui Yu, Jun Liu, Kefei Ren, @Joydeepb_robots, Rurui Ye, Keqiang Wu, Chirag Majithia, Di Zeng tl;dr: in title; ALIKED LightGlue arxiv.org/abs/2510.15271
2
21
110
24,766
Nuno Rodrigues retweeted
6 Oct 2025
< Choosing a Vision Backbone > your model’s backbone is its perspective pick ResNet, and it sees in edges pick a ViT, and it sees in patches the backbone decides how your model thinks here are some of the most practical backbones and when you should choose them, from the paper "Battle of the Backbones" (2023): > ResNet - good for fast prototyping, small models, and edge devices > ConvNeXt - great all-purpose backbone; strong for detection & segmentation > Swin Transformer (V2) - best for large-scale detection, segmentation, and high-res inputs > ViT (Vision Transformer) - good when you have huge datasets; less bias, more global context > CLIP - best for vision-language, zero-shot, and retrieval tasks > DINO / MoCo / MAE (SSL) - great when you have little or no labeled data > MiDaS - surprisingly strong if you care about depth, geometry, or robotics perception > Stable Diffusion Encoder - useful for creative or aesthetic tasks; not for accuracy-critical CV > EfficientNet / RegNet / ResNet-18 - good lightweight options for edge or mobile deployment
18
110
983
58,644
Nuno Rodrigues retweeted
1 Oct 2025
v0.2.0 RELEASE so much work went behind this UI/UX overhaul build autonomous drone agents, wirelessly powered by external GPU's to run the heaviest of AI models up to 10km range win / linux version coming later next week!
19
27
241
17,626
Nuno Rodrigues retweeted
[paper release!] Did you know that you can - speed up any LLM by 4x - and reduce its memory footprint by 2x - and improve its results - without modifying the model at all How??? Here is how we do it 🧵
Did you know that you can - speed up any LLM by 4x - and reduce its memory footprint by 2x - and improve its results - without modifying the model at all How??? Paper and code coming out in a couple of days
22
68
724
129,094
Nuno Rodrigues retweeted
25 Sep 2025
We’re excited to introduce ShinkaEvolve: An open-source framework that evolves programs for scientific discovery with unprecedented sample-efficiency. Blog: sakana.ai/shinka-evolve/ Code: github.com/SakanaAI/ShinkaEv… Like AlphaEvolve and its variants, our framework leverages LLMs to find state-of-the-art solutions to complex problems, but using orders of magnitude fewer resources! Many evolutionary AI systems are powerful but act like brute-force engines, burning thousands of samples to find good solutions. This makes discovery slow and expensive. We took inspiration from the efficiency of nature. ‘Shinka’ (進化) is Japanese for evolution, and we designed our system to be just as resourceful. On the classic circle packing optimization problem, ShinkaEvolve discovered a new state-of-the-art solution using only 150 samples. This is a big leap in efficiency compared to previous methods that required thousands of evaluations. We applied ShinkaEvolve to a diverse set of hard problems with real-world applications: 1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering an entire Pareto frontier of solutions trading performance for efficiency. 2/ Competitive Programming: On ALE-Bench (a benchmark for NP-Hard optimization problems), ShinkaEvolve took the best existing agent's solutions and improved them, turning a 5th place solution on one task into a 2nd place leaderboard rank in a competitive programming competition. 3/ LLM Training: We even turned ShinkaEvolve inward to improve LLMs themselves. It tackled the open challenge of designing load balancing losses for Mixture-of-Experts (MoE) models. It discovered a novel loss function that leads to better expert specialization and consistently improves model performance and perplexity. ShinkaEvolve achieves its remarkable sample-efficiency through three key innovations that work together: (1) an adaptive parent sampling strategy to balance exploration and exploitation, (2) novelty-based rejection filtering to avoid redundant work, and (3) a bandit-based LLM ensemble that dynamically picks the best model for the job. By making ShinkaEvolve open-source and highly sample-efficient, our goal is to democratize access to advanced, open-ended discovery tools. Our vision for ShinkaEvolve is to be an easy-to-use companion tool to help scientists and engineers with their daily work. We believe that building more efficient, nature-inspired systems is key to unlocking the future of AI-driven scientific research. We are excited to see what the community builds with it! Learn more in our technical report: arxiv.org/abs/2509.19349
30
249
1,390
359,210
Nuno Rodrigues retweeted
9 Sep 2025
ModernBERT goes MULTILINGUAL! One of the most requested models I've seen, @jhuclsp has trained state-of-the-art massively multilingual encoders using the ModernBERT architecture: mmBERT. Stronger than an existing models at their sizes, while also much faster! Details in 🧵
5
45
267
27,335
Nuno Rodrigues retweeted
XLeRobot 0.3.0 Showcases Open fridge, get drinks, fill ice, wipe table, clean room, take care plants and cats... All for 660$, fully open-sourced, based on HF LeRobot. Teleop with Joy-con, or RL/VLA. Assembly kit ready for purchase soon Stay tuned! github.com/Vector-Wangel/XLe…
9
49
316
18,416
Nuno Rodrigues retweeted
BRILLIANT @GoogleDeepMind research. Even the best embeddings cannot represent all possible query-document combinations, which means some answers are mathematically impossible to recover. Reveals a sharp truth, embedding models can only capture so many pairings, and beyond that, recall collapses no matter the data or tuning. 🧠 Key takeaway Embeddings have a hard ceiling, set by dimension, on how many top‑k document combinations they can represent exactly. They prove this with sign‑rank bounds, then show it empirically and with a simple natural‑language dataset where even strong models stay under 20% recall@100. When queries force many combinations, single‑vector retrievers hit that ceiling, so other architectures are needed. 4096‑dim embeddings already break near 250M docs for top‑2 combinations, even in the best case. 🛠️ Practical Implications For applications like search, recommendation, or retrieval-augmented generation, this means scaling up models or datasets alone will not fix recall gaps. At large index sizes, even very high-dimensional embeddings fail to capture all combinations of relevant results. So embeddings cannot work as the sole retrieval backbone. We will need hybrid setups, combining dense vectors with sparse methods, multi-vector models, or rerankers to patch the blind spots. This shifts how we should design retrieval pipelines, treating embeddings as one useful tool but not a universal solution. 🧵 Read on 👇
52
370
2,364
241,362
TIL is easier to setup a LORA adpater and fine tune gemma 3 than running inference on a ConvNeXt for a binary dataset
1
1
34
In this day and age, is it still worth it to train models at home, provided you have the gpu, considering only electricity costs VS using any cloud provider? For things that can fit under 20GB of VRAM in a single gpu
31
Nuno Rodrigues retweeted
25 Aug 2025
Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.
25 Aug 2025
What if we could evolve AI models like organisms in nature, letting them compete, mate, and combine their strengths to produce ever-fitter offspring? Excited to share our new work: “Competition and Attraction Improve Model Fusion” presented at GECCO’25🦎 where it was a runner-up for best paper! Paper: arxiv.org/abs/2508.16204 Code: github.com/SakanaAI/natural_… Summary of Paper At Sakana AI, we draw inspiration from nature’s evolutionary processes to build the foundation of future AI systems. Nature doesn’t create one single, monolithic organism; it fosters a diverse ecosystem of specialized individuals that compete, cooperate, and combine their traits to adapt and thrive. We believe AI development can follow a similar path. What if instead of building one giant monolithic AI, we could evolve a whole ecosystem of specialized models that collaborate and combine their skills? Like a school of fish 🐟, where collective intelligence emerges from the group. This new paper builds on our previous research on model merging, which follows such an evolutionary path. We started by using evolution to find the best “recipes” to merge existing models (our Nature Machine Intelligence paper: nature.com/articles/s42256-0…). Then, we explored how to maintain diversity to acquire new skills in LLMs (our ICLR 2025 paper: openreview.net/forum?id=Kvdh…). Now, we're combining these ideas into a full evolutionary system. A key limitation remained in earlier work: model merging required manually defining how models should be partitioned (e.g., by fixed layer or blocks) before they could be combined. What if we could let evolution figure that out too? Our new paper proposes M2N2 (Model Merging of Natural Niches), a more fluid method, which overcomes this with three key, nature-inspired ideas: 1/ Evolving Merging Boundaries 🌿: Instead of merging models using pre-defined, static boundaries (e.g. fixed layers), M2N2 dynamically evolves the “split-points” for merging. This allows for a far more flexible and powerful exploration of parameter combinations, like swapping variable-length segments of DNA rather than entire chromosomes. 2/ Diversity through Competition 🐠: To ensure we have a rich pool of models to merge, M2N2 makes them compete for limited resources (i.e., data points in a training set). This forces models to specialize and find their own “niche,” creating a population of diverse, high-performing specialists that are perfect for merging. 3/ Attraction and Mate Selection 💏: Merging models can be computationally expensive. M2N2 introduces an “attraction” heuristic that intelligently pairs models for fusion based on their complementary strengths—choosing partners that perform well where the other is weak. This makes the evolutionary search much more efficient. Does it work? The results are fascinating: This is the first time model merging has been used to evolve models entirely from scratch, outperforming other evolutionary algorithms. In one experiment, starting with random networks, M2N2 evolved an MNIST classifier that achieves performance comparable to CMA-ES, but is far more computationally efficient. Does it scale? We also showed that M2N2 can scale to large, pre-trained models: We used M2N2 to merge a math specialist LLM with an agentic specialist LLM. M2N2 produced a merged model that excelled at both math and web shopping tasks, significantly outperforming other methods. The flexible split-point was crucial here. Does it work on multimodal models? When we applied M2N2 to text-to-image models, we merged several models by adapting them only for Japanese prompts. The resulting model not only improved on Japanese but also retained its strong English capabilities—a key advantage over fine-tuning, which can suffer from catastrophic forgetting. This nature-inspired approach is central to Sakana AI’s mission to find new foundations for AI based on collective intelligence. Rather than scaling monolithic models, we envision a future where ecosystems of diverse, specialized models co-evolve, collaborate, and combine, leading to more adaptive, robust, and creative AI. 🐙 We hope this work sparks more interest in these under-explored ideas! Published in ACM GECCO’25: Proceedings of the Genetic and Evolutionary Computation Conference. DOI: doi.org/10.1145/3712256.3726…
17
47
391
65,216
Nuno Rodrigues retweeted
21 Aug 2025
a new Agentic model that can run on a single consumer GPU at home: ByteDance Seed OSS 36B > very strong at coding > excellent at multi-turn tool calling & Agentic tasks > 500k context window as i have been saying, Bytedance is Tier S
21
104
984
98,495
Nuno Rodrigues retweeted
確かに、WEBカメラがあればドキドキ(rPPG)は実装できるよなぁということでVibe codingした GPT-5 Proに論文を調査してアプリ作ってもらって、ちょっとバグがあったからClaudeで手直ししたもの github pagesで公開したからブラウザから動かせるよ hanamitsu.github.io/873ch.rP…
20 Aug 2025
カメラで相手のドキドキが取れるようになったwww x.com/912_/status/1954665691…
6
94
726
144,491