TimDarcet

TimDarcet

199 Photos and videos

Tweets

Pinned Tweet

TimDarcet @TimDarcet

21 Apr 2023

1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.

0:27

112

703

124,737

TimDarcet

TimDarcet @TimDarcet

17h

Don't forget the mad lass that was using 0.4 weight decay in 2021 𝘇𝗲𝗿𝗼 𝗽𝗼𝗶𝗻𝘁 𝗳𝘂𝗰𝗸𝗶𝗻𝗴 𝗳𝗼𝘂𝗿 And nobody batted an eye?!

Lucas Beyer (bl16)

@giffmana

20h

LLM community slowly rediscovering what we in vision found out over half a decade ago. MY SCHMIDHUBER MOMENT IS COMING! Source: S4L paper where i tuned the most sota 10% and 1% ImageNet baselines ever, by far. arxiv.org/abs/1905.03670

204

41,354

TimDarcet

TimDarcet @TimDarcet

17h

Honestly it's one of the most surprising parts of the whole DINO thing imo. I'm always surprised people don't notice it more

1,802

TimDarcet

TimDarcet @TimDarcet

Jun 11

ok time for a pet peeve stop saying exponentially or logarithmically when you mean convex or concave

Josh

@JoshPurtell

Jun 10

Good call out. *Everyone* knows capabilities scale logarithmically with computing power. This is the second paragraph. How on earth does a mistake like this make it in?

2,900

james hawkins

TimDarcet retweeted

james hawkins

@james406

Jun 6

parents: "move out" girlfriend: “quit being such a loser” boss: "work harder" claude: "uber for dogs (the dogs are the drivers) is a great idea, you should absolutely pursue it"

197

3,314

81,036

1,260,576

Oriane Siméoni @CVPR

TimDarcet retweeted

Oriane Siméoni @CVPR @oriane_simeoni

Jun 4

Replying to @TimDarcet @iScienceLuvr

Actually clip>ssl for LLMs is fake news, DINOv2 was just not scaled enough. Presenting those results in 1h at CVPR ;)

2,369

Elouan Gardès

TimDarcet retweeted

Elouan Gardès @elouangardes

Jun 4

What if the metadata from acquisition was enough? 👀 Sometimes the most useful labels aren't labels at all. Introducing📄 Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

5,785

Ofer Hasson

TimDarcet retweeted

Ofer Hasson @hassonofer_

May 10

I'm releasing Bio-DINO 🧬 A self-supervised image encoder for natural photographs of biodiversity - trained on ~31M curated images spanning plants, fungi, insects, fish, corals, birds, mammals and more. Built for embeddings, retrieval, clustering, probing and transfer learning.

5,758

EIFY

TimDarcet retweeted

EIFY @EIFY

May 30

Replying to @TimDarcet

The Deep Learning Framework of Theseus?

1,002

TimDarcet

TimDarcet @TimDarcet

May 29

What people don't realize is that you can use torch.Tensor but skip torch fsdp, pp, amp, checkpointing, built-in kernels, or even autograd and... at this point is your stack really "based on pytorch" anymore?

elie

@eliebakouch

May 29

composer 2 tech report

6,617

Joséphine Raugel

TimDarcet retweeted

Joséphine Raugel @JRaugel

May 28

1/ We’re so glad to share this new study 💫 Does the brain learn like a Deep Net? 🧠⚙️ - 📄Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images - 🔗arxiv.org/abs/2605.28693 Thread below 🧵

176

24,957

Thomas Kipf

TimDarcet retweeted

Thomas Kipf

@tkipf

May 13

Confession: I never had a single work-related sleepless night or ever pulled an all-nighter during my career incl. PhD. Don’t sacrifice your health. Sleep is a superpower — your brain on 8hrs of sleep is a lot smarter than your brain on sleep deprivation. Don’t listen to people who tell you to chronically sacrifice sleep for work. Sacrificing sleep for your kids/family is a different story.

Sarvesh Gharat @SarveshGharat12

May 12

Replying to @npparikh

I doubt all those things are really possible. Infact I believe, you are not doing a good PhD unless you have sleepless nights. Definitely just working on your thesis is possible if you follow a 9-6 schedule, but a good PhD which involves exploring, colabs, etc needs extra hours

1,076

105,801

Tom Sander (Ph.D.)

TimDarcet retweeted

Tom Sander (Ph.D.)@RednasTom

May 13

1/9 Excited to share TextSeal, our new state-of-the-art watermark for large language models at FAIR / Meta Superintelligence Labs (@AIatMeta) 🔐 Paper: arxiv.org/abs/2605.12456 Code: github.com/facebookresearch/…

4,219

Thinking Machines

TimDarcet retweeted

Thinking Machines

@thinkymachines

May 11

With the model's simultaneous speech capability, Horace has gotten a lot easier to work with recently.

0:38

1,215

275,640

Quentin Berthet

TimDarcet retweeted

Quentin Berthet @qberthet

May 11

🚨 New paper: Introducing MIND (Monge Inception Distance) Everyone agrees that FID is broken, requires too many samples, slowing down evals. MIND requires 10x fewer samples, is more robust, faster to compute. Our new drop-in replacement for evaluating generative models. 🧵👇

460

55,339

fedaykin reepicheep

TimDarcet retweeted

fedaykin reepicheep

@mauddweeb

May 7

“And if a computer can grade a task, it can do that task.” Hey guys, @TIME says P=NP. Time to pack it up.

Marilyn Burns @mburnsmath

May 7

Interesting article: time.com/article/2026/05/05/…

201

4,743

259,847

TimDarcet

TimDarcet @TimDarcet

May 7

In this field you should aim for your h-index to be always equal to your number of papers

Gabriele Berton

@gabriberton

May 7

And you realize that Kaiming He is the GOAT when you see that he wrote only 96 papers (vs people with his h-index usually have hundreds)

2,585

Mozes Jacobs

TimDarcet retweeted

Mozes Jacobs @mozesjacobs

Feb 10

Are ViTs secretly RNNs? #ICLR2026 Our 2-block recurrent transformer recovers 96% of DINOv2’s IN-1k accuracy & reproduces its activations 1-to-1, motivating the Block-Recurrent Hypothesis: arxiv.org/abs/2512.19941 w/ @thomas_fel_ @RichieHakim @ABrondetta Demba Ba @t_andy_keller

134

720

108,308

Alexis Marouani

TimDarcet retweeted

Alexis Marouani @Alexis1097657

Apr 23

#ICLR2026 Frictions in Vision Transformers 1/ ViTs use a [CLS] for global understanding and patch tokens for local details. Despite their different roles, we've been processing them with the exact same math. Looking forward for discussions ! Sat 25 10:30 AM – 1 PM P4 -#3303

2,235

TimDarcet

TimDarcet @TimDarcet

Apr 21

periodic reminder that FSDP does exactly 0 additional communication compared to DDP

François Fleuret

@francoisfleuret

Apr 20

Nothing shockingly dumb?

17,190

TimDarcet

TimDarcet @TimDarcet

Apr 17

in other words, you need 300 tokens/gpu to not be HBM bandwidth-bound on H100

François Fleuret

@francoisfleuret

Apr 16

Is it reasonable to consider that since the HBM3 memory of a H100 has a bandwidth of ~3Tb/s and the chip can do ~900TFlops, a rule of thumb is that every bfloat16 should be reused ~600 times?

2,838