Filter
Exclude
Time range
-
Near
Digital Supply Chain - Artificial Intelligence retweeted
The Best Kaizen Quotes... “The message of the Kaizen strategy is that not a day should go by without some kind of improvement being made somewhere in the company.” ~Masaaki Imai #kaizenquotes #continuousimprovementquotes #TaiichiOhno supplychaintoday.com/the-bes…
2
2
36
maicolciotti retweeted
PORTA A PORTA... #Salvini: Sono d'accordo con Theresa May (Regno Unito) che propone il raddoppio delle tasse per le imprese che assumono stranieri..... INFORMATE IL CIALTRONE LEGHISTA CHE IN INGHILTERRA LAVORANO 300MILA ITALIANI...
23
57
301
4,182
jyfc76 retweeted
Debuted so young, he was really looking for peers in the same age as him The difference between Chang and Kyu's first impression though haha Kyu's nonchalant exterior
One of my fav stories from this ep was finding out that changmin bought bada’s CD and listened to her songs on purpose just so he could get close to ryeowook and befriend with him. The way changmin made an effort to learn about the things ryeowook liked was so sweet🥹
97
391
17,967
Alexander piston retweeted
.@RicardoMonrealA es uno de los políticos que más gusta de los reflectores. Habla cuando debe, cuando no debe, cuando es imprudente y cuando no tiene nada qué decir. Pocas veces alude a las leyes a pesar de que ese es su trabajo. En esta ocasión, y con su tono de sacerdote, se entromete en las grillas palaciegas de @PartidoMorenaMx. Monreal padece de incontinencia. Vía: @azucenau
67
531
1,880
54,122
Thinking Robots retweeted
Robotics RL finetuning recipe: Pretrain → BC/VLA → Online RL Finetuning → Continuous Improvement. How to efficiently improve pretrained robot policies with minimal online interaction? • EXPO-FT RL finetuning can make VLA models practically useful on real robots. Instead of training from scratch, EXPO-FT starts from pretrained VLA models and performs stable RL finetuning, achieving 30/30 success on challenging manipulation tasks (string-light routing, pool-ball striking, flower-in-bottle insertion) using only ~19.1 minutes of online robot data on average. This directly targets the reliability gap between VLA generalization and deployment-level robustness, via residual action editing and Q-guided action selection. • Q2RL (When Life Gives You BC, Make Q-functions) Behavior Cloning already contains implicit value information. Instead of discarding BC and starting RL from scratch, Q2RL uses a small amount of online interaction to estimate a Q-function around a pretrained BC policy, then performs online improvement with Q-Gating, dynamically selecting between BC actions and RL actions according to estimated values. Results show up to 3.75× improvement over the original BC policy and successful on-robot learning for contact-rich assembly tasks within only 1–2 hours. • Flow Reversal Steering (FRS) Rather than updating policy weights, FRS exploits the latent structure already present inside flow-matching robotic generalists. Given a "reasonable but suboptimal" action, FRS reverses the flow process to recover latent noise and then steers that noise toward nearby policy-consistent action modes. This effectively turns semantic guidance from humans or VLMs into improved robot actions. Even more interesting: the resulting improvements can be distilled into a lightweight policy with BC, producing success-rate gains in under one minute of training. FRS also enables RL to bootstrap from semantic priors when conventional RL fails. • OGPO Don't freeze the generative policy. Fully finetune it. OGPO introduces Off-policy Generative Policy Optimization, combining: Off-policy critics for aggressive data reuse Modified PPO objective for full generative-policy finetuning Backpropagation through the entire diffusion/flow process The paper also identifies several practical stabilizers: Success-buffer regularization Conservative advantages Q-variance reduction Notably, OGPO can recover poorly initialized BC policies to near-complete success even without expert demonstrations in the online replay buffer. • RECAP (π*0.6) Advantage-conditioned policy learning that lets VLAs learn from successes, failures, and human corrections at scale. Then specialize with on-robot data (doubles throughput, halves failure rates on espresso, laundry, box assembly). • RLT (RL Token) Adds a compact learned token interface to the frozen VLA so a lightweight actor-critic head can do fast online adaptation on precise sub-skills in minutes to hours. • DICE-RL Treats RL finetuning as distribution contraction: starting from a broad generative prior, selectively amplifying successful behaviors through residual RL and value-guided updates, improving robustness and stability on challenging real-world manipulation tasks. Summary: Robotics is converging toward the same paradigm shift that happened in LLMs. Pretraining provides broad priors, bottleneck becomes post-training. • Sample-efficient online RL • Value extraction from pretrained policies • Steering latent action manifolds instead of learning from scratch • Full-policy finetuning for diffusion/flow controllers • Closing the pretraining–posttraining gap π0.5/π0.6, RLT, DICE-RL, OGPO, EXPO-FT, Q2RL, and FRS are all pieces of the same puzzle: How can robots continuously self-improve after deployment without requiring another massive pretraining run? The next generation of robotics systems may be defined less by bigger world models and more by better post-training loops. How do we improve it after deployment with minutes of interactions?
1
4
84
M. Saeed Sulehria retweeted
Replying to @oneill_c
Hey Charlie, yes looking forward to understanding what I can fix, the performance numbers could definitely improve with that. What's the best way to reach out? Could you DM with those pointers?
1
1
6
Dandan 🇵🇹🫡 retweeted
Luís Diaz. O que ele decide para a Colômbia é brincadeira. O Raphinha não consegue jogar 10% desse cara pelo Brasil, impressionante.
4
38
894
zuzi78 retweeted
👠💋❤️ This cute girl blows him first, then lets him cum inside her. #中出し #中だし #孕ませ #種付け #子作り #膣内射精 #妊活 #breeding #impregnation #creampie #insemination #seeding
9
61
3,375
James The Luddite retweeted
Computer scientists use data to improve rock climbing in the Canadian Rockies. The ThreeTopo project lays the groundwork to improve pre-climb planning and on-the-wall decision making. goo.gl/alerts/wH6Qgf #UCalgary #AbSci #science #SciChat
5
2
70