Computer vision researcher | prev @USC

Joined December 2022
549 Photos and videos
Pinned Tweet
11 Dec 2024
𝖜𝖊𝖑𝖈𝖔𝖒𝖊 𝖙𝖔 𝖒𝔂 𝖜𝖊𝖇𝖗𝖎𝖓𝖌☢️𝖈𝖑𝖎𝖈𝓴 𝖋𝖔𝖗 𝖒𝔂 𝖙𝖗𝖊𝖆𝖘𝖚𝖗𝖊
3
35
8,651
I like working on robotics cause it’s pretty much the most erudite way you can play with toys
Jun 11
I'm not the best judge of robotics simulators, but this is easily the most user-friendly one I've used. It actually runs well on a Mac. Feels more like the Unity editor / game engine than other simulators. luckyrobots.com/#engine"
10
1,637
How to be stupid the right way: say one single thing “Talking past one another” is so common in high-abstraction conversation topics like technology or philosophy. In absence of a solid contention point, we just kinda fling fluent language at each other Much of “that didn’t make sense” gets folded into “I didn’t make sense of that”, oftentimes on both sides, so the narrative of the conversation goes unchecked when it breaks This means that a great, incisive learning strategy is to try saying one single thing. Forget whether you are wrong or right: your feedback will be funneled towards one single claim & you will be legible to *why* you are wrong, even if you are talking to Albert Einstein Don’t underestimate your ability to talk with Albert Einstein & learn absolutely NOTHING. If you can’t keep a grip on the dialectic, all you will learn is the distance between you two. The worst thing you can learn from someone is “gee, that fella sure is smart!” Conclusions like these will pile up when you are in proximity to intelligence, but you yourself won’t develop at all A central sticking point solves this. This is how I (a Best Buy employee, cognitively speaking) can learn anything. At this point I can even watch full VSauce shorts without pausing. Thanks for reading
2
117
Damn the solar panels on this thing are big as fuuuuuuuuuuck. LEO is gonna be BRUTAL on these guys. About 600 m^2, way bigger than an NBA court. And this is supposedly the "mini" version! Lotta money in repair and inspection here. Power systems are already the main failure point for sats, let alone at this size and power density. Its gonna be hard to sell access to a commoditized resource like compute if the datacenters in space need downtime to replace their massive arrays wholesale while the datacenters on Earth can hot-swap parts easily. So you probably monitor it like crazy. I've even heard of demand for insurance claims backed by space vision for stuff like this
JUST IN: SpaceX officially unveils AI1, its first AI satellite, with up to 150,000 watts of compute payload.
22
2
133
38,672
The coolest version of repair would be on-orbit robotic panel replacement. But I've found that space robotics applications can be a little sheepish. I get the sense that NASA lost enthusiasm for robotic arms in space recently. Not for any bad reason; it sounds hard as fuck
7
2,264
Pronouncing the word “auxiliary” is the worst thing about ML by far
6
170
Does anyone else feign aloofness at intersections as a pedestrian so that cars will just fucking go, thereby maximizing civilizational throughput Or am I the only one going to rationalist heaven (hottest hell)
1
10
295
Made a moon tierlist with my coworker. Lotta clunkers
Earth is the perfect planet man. We take a million things for granted. Consider the chudmoons of Mars
5
839
Earth is the perfect planet man. We take a million things for granted. Consider the chudmoons of Mars
10
1,016
A million slights & a lifetime of struggle & strife collapses onto a single phrase of “I love my girl.” It could be a tough day at work & so I love my girl. The S&P500 crashes I love my girl. Famine, locusts; girl. It’s arbitrary & that’s why it’s stronger than anything smart
1
21
3,607
nahhhh my TA is fucking buggginnnnnn
30
1,431
31,811
845,195
I like some of the Bryan Johnson healthmaxx shit but its only for the landed gentry. As a young man finding his place, Id be a shell of who I am without ~20 midnight-5am work sessions over my whole life. Jeff bezos was prolly eating potato chips & shit when he was in the cut. Idk
3
47
2,599
people in SF access the comedic concept of 'riffing' only through the line of thought "Imagine a fraudulent business"
1
12
1,042
There's a lot of hubbub over how much "intermediate representations" from computer vision can inform robotic model training in lieu of dumping massive scales of video data into pretraining and just learning those represented modalities implicitly I figured isolating finetunes of LeWorldModel across several modalities would be a handy way for me to run an experiment from home and figure out which signals actually help
1
4
92
8,472
This Sitzmann post was the big inspiration. But I ended up not probing 3D much at all! I'm VERY interested in finding out how 3D best informs robotic models but this wasn't the right way to test it. If you work on this, I would love to talk further! x.com/vincesitzmann/status/2…

In my recent blog post, I argue that "vision" is only well-defined as part of perception-action loops, and that the conventional view of computer vision - mapping imagery to intermediate representations (3D, flow, segmentation...) is about to go away. vincentsitzmann.com/blog/bit…
1
9
787
I also released the finetuning testbed I used for LeWorldModel if anyone else would like to try out some stuff x.com/kvdozer/status/2042398…

Putting out a super simple testbed for finetuning LeWorldModel because it doesnt seem to exist yet LeWM is very lightweight at 15M params so its great for testing stuff out or finetuning a WM for the first time Ive been experimenting to see which 2D/3D vision signals increase performance the most and I figured I'd tidy it up for general use. Break a leg: github.com/kevdozer1/leWN_fi…
10
647
I keep trying to visualize JEPA-based stuff and getting met with fucked up dementor things. LeWorldModel predicts the next latent state from the current scene and action, instead of directly predicting the resulting image or the action itself. The LeWorldModel paper mentions a pixel decoder that translates this predicted latent back into human-legible videos, but doesn’t actually supply one in the codebase. So im trying to make one myself, but look at the evil ghost thing in the right two sequences. It’s been much easier to instead use a retrieval strategy where the predicted latent is matched to the most similar real image in the full train test reference corpus
1
6
333
my Bug Name would be “Centipedro” What would YOUR Bug Name be???
5
262