Personal update: I am starting my PhD @mbzuai where I look forward to work in multimodal realm (interpretability, modality imbalance, eval & application) to address foundational gaps with @AlhamFikri and co.
It’s time to JEPA pill the world!
awesome-jepa: A curated list of papers, models, code, datasets, and learning resources for Joint Embedding Predictive Architectures (JEPA), the self-supervised approach to world models proposed by Yann LeCun.
Come to M4-RAG poster @ 154! Presented by me and @DavidAnugraha
We found that in multilingual multicultural scenario, larger model tends to ignore the retrieval context ❌️
4.30-6pm
#CVPR2026
The humbling lesson for humans from Alyosha: humans turned out much simpler than we thought, 90% of the time we’re just nearest neighbor machines, pastiches from high-school reading lists 🙃 #cvpr2026
Introducing Cosmos 3: Our latest frontier model for Physical AI
Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation.
Today we’re releasing Super (32B) and Nano (8B) variants.
"Learn from your own latents, not tokens: A Sample Complexity Theory"
This paper explains why data2vec and JEPA can learn with much less data.
They showed that when data has hidden hierarchy, token prediction becomes harder as the hierarchy gets deeper. But latent prediction keeps the learning problem simple at every level.
Which suggests that models may learn faster when they stop predicting raw tokens and start predicting their own abstractions.
AI can give researchers the freedom to pursue “crazier” ideas.
For Terence Tao, AI creates more room to experiment, test unexpected paths, and discover what might otherwise stay out of reach.
'Agent Harness Engineering: A Survey' just cited my Agent Skills for Context Engineering project in its Context & Memory Management section.
It’s a new paper on OpenReview (authors from CMU, Yale, Johns Hopkins, Amazon others). They reviewed 170 open-source projects and pulled real production lessons from OpenAI, Anthropic, and LangChain.
Agent performance in the real world = Model capability Harness quality
For long-horizon, multi-step, production tasks, the harness has become the main bottleneck. Simple harness tweaks (better tool formats, sandbox changes, automated verification loops) deliver significant gains on benchmarks.
This is the second time my open-source work has been cited in academic research (first was Peking University’s State Key Lab paper on meta context engineering).
I’m genuinely proud of that, but more than anything it reminds me why I love open source. I’m not from academia. I learned this field by building, shipping, writing...
Open source lets your experiments enter the research papers. That is still one of the best parts of this field.
The paper is worth reading. We're moving from “build one agent” to “operate a fleet of long-running agents” and the paper repeatedly shows that the biggest improvements come from turning production traces into regression tests and automated harness fixes.
Paper & Repo: picrew.github.io/LLM-Harness…
After submitting our culture mixing paper to CVPR (arxiv.org/abs/2511.22787), we came across the ConfusedTourist paper which shares same motivation but different and interesting analysis!
We’ve put together a joint website to share our findings. Check it out below!
Too much? Come try the samples in our hub!
You can copy our exact prompts and culture-mixed images to test where your VLM's understanding breaks down 🤖
[5/n]