rentry.org/LocalModelsLinks
Various links for ML and local models (not just LLMs) that's kept fairly updated.
rentry.org/LocalModelsPapers
ML papers I've read that I think are interesting. Also keep a text file at the top of all the abstracts for easy searching.
Claude has for the first time made it through Victory Road. All hints were removed for this latest iteration as well. Also for the first time caught a legendary (Articuno)
Woosh: A Sound Effects Foundation Model
From Sony AI. Optimized for sound effects with a high-quality audio encoder/decoder model, a text-audio alignment model for conditioning, as well as a text-to-audio and video-to-audio generative models.
Links below
Came across what could be an interesting benchmark. Old famicom game called Radical Bomber: Jurai-Kun. Asymmetrical boardgame with 1 runner and 4 chasers. Runner has the ability to bomb certain connections and limited double turns. Some special blocks too.
youtube.com/watch?v=A8mPtwdT…
VoXtream2: Full-stream TTS with dynamic speaking rate control
Combines a distribution matching mechanism over duration states with CFG across conditioning signals to improve controllability and synthesis quality. Runs 4 times faster than real time on a consumer GPU.
Links below
Speculative Speculative Decoding
Draft model predicts likely verification outcomes and prepares speculations pre-emptively for them. If the actual verification outcome is in the predicted set, a speculation can be returned immediately, eliminating drafting overhead.
Links below
Aletheia tackles FirstProof autonomously
From Deepmind. Autonomously solved 6 problems (2, 5, 7, 8, 9, 10) out of 10 according to majority expert assessments; notes that experts were not unanimous on Problem 8 (only).
Links below
Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum
Scales orthogonalized momentum using a single adaptive stepsize, preserving orthogonality while improving upon Muon at negligible additional cost.
Links below
HiFloat4 Format for Language Model Inference
Packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. Achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.
Links below
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
From Nvidia. Foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos.
Links below