Taelin

Taelin

1 Photos and videos

Tweets

Cosmic Labs retweeted

Taelin

@VictorTaelin

Apr 30

seriously, working with AI is MISERABLE for one and only one reason: having to re-explain the same thing "oh yeah this new session obviously doesn't know what proper case trees are, so let me explain it for the 5000th time in my life" I'm tired AGENTS.md doesn't solve this because it is impossible to fit the entire domain knowledge without nuking the context - it would be 1m tokens worth RAGs don't solve this, the agent won't search unknown unknowns SKILLs don't solve this unless I keep like a collection of 1750 skills with specific cuts of domain knowledge for each possible subset of my domain that I might need in a given chat, but that's a lot of manual work recursive LLMs or whatever don't solve this for the same reason, you can't dump a domain book and expect the AGENT will magically guess that it is supposed to search for a specific bit knowledge. unknown unknowns fine tuning doesn't solve this (OSS models suck and OpenAI / Anthropic gave up on user fine tuning) I honestly think a good product around fine tuning on your domain would be a major hit and an underdog lab should take this opportunity

666

178

3,506

254,066

Cosmic Labs

Cosmic Labs

@CosmicLabsTech

Apr 15

220TB of SOCAMM2 per VR300 unit. When NVIDIA scales that at volume, LPDDR5X supply collapses and smartphone OEMs can't build phones even if they want to. Who actually saw this coming?

Jukan

@jukan05

Apr 15

People who think the DRAM supercycle will end in 1H next year don't know what they're talking about. The GB200 uses 17TB of SOCAMM2, but the VR200 uses over 50TB of SOCAMM2. That alone has NVIDIA hoovering up so much LPDDR5X that even smartphone OEMs are panic-buying — and do you know how much SOCAMM goes into the VR300? 220TB. A lot of people have only been focused on the VR300's 1TB of HBM4E, but you need to look at the SOCAMM too. Once NVIDIA starts consuming 220TB of SOCAMM, the LPDDR shortage could become so severe that it reaches a point where OEMs can't build smartphones even if they want to.

514

Cosmic Labs

Cosmic Labs

@CosmicLabsTech

Apr 13

Robots running a full marathon in Beijing. 40% fully autonomous this year. 10s per 100m and still accelerating. No demo setup, no shortcuts. Just 42km of real world breaking things. How long before they outlast us?

Eren Chen

@ErenChenAI

Apr 12

The 2nd Robot Marathon has officially begun in Beijing. This year feels different. 1. Around 40% of teams are running fully autonomous, no remote control. 2. Top robots are already hitting ~10s per 100 meter, getting surprisingly close to human sprint limits. 3. You can also see much better safety design upfront. Way more structured than last year’s chaos. 4. Still, failures happen. Marathon distance pushes motors, structure, and control to the limit. What works in short demos breaks down over longer runs. Overall, a big step forward, but also a reminder that real-world robotics is still far from the polished demo videos you get fed from companies.

0:14

351

Garry Tan

Cosmic Labs retweeted

Garry Tan

@garrytan

Apr 11

The most underrated thing in AI agents right now is: OpenClaw/Hermes Agent is just more free than other locked down AI agents (the standard out-of-box Claude/ChatGPT route) "Free the Claw" is not a vibe I understood until I tried it. Now that I have it, I don't want to go back.

124

1,186

114,597

ℏεsam

Cosmic Labs retweeted

ℏεsam

@Hesamation

Apr 11

AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

321

1,035

9,486

3,865,235

Kislay Parashar

Cosmic Labs retweeted

Kislay Parashar

@KislayParashar1

Apr 10

Replying to @NoFilterSkin

Yeah, that’s the part most people miss. Rockets aren’t reusable round trips like planes. Re-entry needs a heat shield controlled descent, so capsules are designed to survive the physics, not the launch vehicle.

346

Kislay Parashar

Cosmic Labs retweeted

Kislay Parashar

@KislayParashar1

Apr 10

Replying to @Pirat_Nation

This is why official site = safe is a bad assumption. Supply chain attacks hit trusted tools first. Always verify hashes or use mirrors until things are confirmed clean.

253

18,615

Kislay Parashar

Cosmic Labs retweeted

Kislay Parashar

@KislayParashar1

Apr 10

Replying to @GTAVI_Countdown

Budgets aren’t really comparable. One is years of content, marketing, and global distribution. The other is hardware, safety margins, and mission risk. Similar numbers, completely different cost structures and constraints.

6,660

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 5

A single GPU failure in a 1,024-GPU cluster costs over $300,000 per month in lost compute. Most teams are still checkpointing every few hours. When one node fails during distributed training, the entire job stops. The other 1,023 GPUs go idle while the system detects the failure, replaces the node, restarts the job, and loads the last checkpoint. Everything computed since that checkpoint is gone. The math is concrete. At $3/GPU/hour with a 1,024-GPU cluster, a 7.9-hour mean time to failure, 1.5-hour checkpoint frequency, and 15 minutes of recovery overhead, the wasted compute adds up to $307,000. That number comes from real production data, not a theoretical model. MTTF decreases as cluster size increases. A failure that happens once a week on a 128-GPU cluster can happen multiple times per day at 1,024 GPUs. The checkpoint frequency that worked at smaller scale becomes ruinously expensive at larger scale. This is the single biggest reason GPU infrastructure ROI falls short of what finance teams modeled. The hardware works. The GPUs are fast. But every failure rolls back hours of progress and burns money on idle compute that produces nothing. Checkpoint strategy is not an engineering detail. It is a financial decision. How often does your training pipeline checkpoint, and have you calculated what a single rollback actually costs?

620

169,799

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 6

Replying to @Pirat_Nation

Sounds great on paper, but real-time decode cost matters. If reconstruction eats too much GPU time, gains shrink. Still, big win for VRAM and install sizes if overhead stays low.

1,081

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 6

Replying to @LundukeJournal

Those numbers are just minimums on paper. Real-world Windows 11 usage often needs way more to feel smooth, while Ubuntu can still run fine on lighter setups depending on what you install.

533

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 6

Replying to @pmddomingos

Maybe, but something will still look like a GPU under the hood. Dense math doesn’t go away, it just gets packaged differently.

268

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 6

There is a 1,500x bandwidth gap hiding in the inference memory hierarchy. And it is about to become everyone's problem. NVIDIA's Vera Rubin platform now treats the NVMe SSD as part of the inference memory hierarchy. Their CMX architecture (launched at GTC last month with BlueField-4 STX) offloads KV cache from GPU memory to flash so that evicted context survives instead of being recomputed. The performance gain is up to 5x higher tokens per second. Dell, HPE, VAST Data, WEKA, and a dozen other vendors are building products around it. CoreWeave, Lambda, and Oracle are early adopters. That means the SSD is no longer background storage. Its read latency directly affects time to first token. On a deployment serving tens of thousands of requests per hour, even a small increase in drive latency compounds across every single one. The technical detail worth understanding: KV cache stores the attention state for every token in context. As context windows grow into the hundreds of thousands of tokens, the cache overflows GPU HBM (22 TB/s on Rubin) into host DRAM (~300 GB/s) and then onto NVMe (7-14 GB/s on PCIe Gen5). That is a 1,500x bandwidth drop from HBM to SSD. The system works when the drive is healthy. When it is not, every cache read slows down, and the latency shows up in token generation without any GPU-side signal that something is wrong. KV cache recycling is also a write-heavy workload. Every eviction and reload burns SSD write cycles, and the drive's internal garbage collection multiplies those writes further depending on drive quality and whether you are running enterprise TLC or cheaper QLC flash. As the NAND wears, read latency creeps up gradually. The degradation looks like a model problem or a network issue long before anyone checks the drive. Most inference monitoring tracks GPU utilization, temperature, and token throughput. SSD wear, write amplification, and drive-level latency are almost never in the stack. Is drive health part of your inference observability, or is the SSD still invisible?

8,472

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 7

Replying to @VadimYuryev

If they actually get CPU and GPU tightly integrated with low latency, that’s huge. The missing piece is still memory. Without fast shared memory, it won’t feel like a true Apple Silicon competitor.

360

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 7

Replying to @Techjunkie_Aman

If it really uses the same servers, the client was the bottleneck all along. Extra control over bitrate and latency stats alone makes a big difference for real-world streaming quality.

2,175

Kislay Parashar

Cosmic Labs retweeted

Kislay Parashar

@KislayParashar1

Apr 6

Replying to @LLMJunky @centralcomputer

Because VRAM is still 24GB. For LLMs, capacity matters more than newer architecture. 3090s are cheaper, well-supported, and already optimized, so the real-world gain here just isn’t that compelling.

5,490

Kislay Parashar

Cosmic Labs retweeted

Kislay Parashar

@KislayParashar1

Apr 6

Replying to @LLMJunky @centralcomputer

Supported, yes. But for LLMs you hit VRAM limits before speed. Faster cores don’t help if the model doesn’t fit, so 24GB vs 24GB feels the same in practice.

996

Kislay Parashar

Cosmic Labs retweeted

Kislay Parashar

@KislayParashar1

Apr 7

Replying to @TheCatholicEngr

Bit misleading. The Apollo Guidance Computer handled real-time control onboard, humans supported planning and simulation. It wasn’t humans as CPUs, it was a hybrid system working together.

1,325

Kislay Parashar

Cosmic Labs retweeted

Kislay Parashar

@KislayParashar1

Apr 7

Replying to @SynthPotato

If this holds up in real gameplay, it’s basically the first time console RT starts feeling PC-class instead of a scaled-down version. The real test will be how stable those modes stay during dense city scenes and combat. PSSR could be the real game-changer here.

1,738

Meg McNulty

Cosmic Labs retweeted

Meg McNulty

@meggmcnulty

Apr 7

Replying to @Pirat_Nation

Makes sense. Carrying legacy support forever slows everything down. At some point, removing it actually helps keep the kernel cleaner and easier to maintain.

2,614