Filter
Exclude
Time range
-
Near
Released a second audio2face teacher dataset on Hugging Face today. Different supervision signal from our first dataset. MediaPipe FaceLandmarker. CC-BY-NC-4.0. Link in the comments ๐Ÿ‘‡ #HuggingFace #OpenSource #DigitalHumans #ARKit #MediaPipe #ModelDistillation
1
12
winbuzzer.com/2026/06/07/xaiโ€ฆ xAI appears to have used a workaround to train its Grok AI with outputs of Anthropic's Claude model after an Anthropic access cutoff in January. #xAI #Grok #Anthropic #Claude #ModelDistillation #AITraining #AICoding #AIModels #ElonMusk
1
48
Many cutting-edge academic studies are dedicated to the application of lightweight model distillation within distributed networks. Traditional distillation methods mostly rely on centralized servers for parameter compression, which fail to fit decentralized architectures with scattered nodes, and easily cause information loss during knowledge transfer and weaken reasoning performance. New academic solutions establish a collaborative distributed distillation structure, splitting the model compression process into different nodes step by step. Cryptographic tools are adopted throughout the whole process to prevent core original model knowledge from being leaked. Without centralized computing scheduling, it greatly lowers the operating threshold of edge nodes while retaining core model capabilities. This research further expands the application scope of lightweight intelligent models in DeSci scenarios. It enables more low-cost distributed nodes to undertake scientific research reasoning tasks, and consolidates solid technical foundations for an open and inclusive decentralized intelligent research ecosystem. #HETU #Setu #ModelDistillation #DeSciTech #DistributedAI
1
28
2,290
May 14
On-Policy Distillation (OPD) is the go-to technique for LLM post-training, but it often mysteriously fails. Is a "smarter" teacher model enough to guarantee success? The answer is NO. ๐Ÿค” Today, we dive into a comprehensive study on OPD by @TsinghuaNLP (OpenBMB member) alongside researchers from ShanghaiTech, UIUC, and RUC. This paper systematically unpacks the phenomenology, mechanism, and practical recipes behind successful On-Policy Distillation. ๐Ÿค— Paper: huggingface.co/papers/2604.1โ€ฆ ๐Ÿ“„ arXiv: arxiv.org/abs/2604.13016 ๐Ÿ’ป Code: github.com/thunlp/OPD Why it matters: 1๏ธโƒฃ The Two Rules of Success: A high-scoring teacher isn't a magic bullet. OPD success depends strictly on two factors: Thinking Pattern Consistency (student and teacher must share compatible reasoning styles) and Information Gain (the teacher must offer truly new, out-of-distribution knowledge). ๐Ÿง  2๏ธโƒฃ The Reverse Distillation Paradox: We tried using an ultra-strong teacher (R1-Distill-7B) to distill a strong RL-tuned student (JustRL-1.5B). Surprisingly, the student regressed to its pre-RL state! Why? Because OPD strictly mimics the teacher's thinking pattern, effectively overwriting the student's existing RL behaviors regardless of the final reward. ๐Ÿ”„ 3๏ธโƒฃ The Overlap Token Mechanism: Zooming into token-level dynamics, we found that successful OPD is driven by "overlap tokens." The model heavily optimizes these shared, high-probability regions, while non-overlapping tokens contribute almost zero useful gradients. ๐Ÿ” 4๏ธโƒฃ The Winning Recipe: How do we fix a "failed" OPD when the thinking patterns don't match? Our recipe: apply Supervised Fine-Tuning (SFT) on teacher rollouts before starting OPD. This elegantly bridges the thinking-pattern gap and dramatically raises the performance ceiling! ๐Ÿ› ๏ธ 5๏ธโƒฃ The Token-Level Reward Mirage: OPD's token-level reward looks like a free lunch โ€” but it isn't. Reward quality decays with trajectory depth. Instability originates at later tokens and propagates backward. Even failing teachers produce globally informative reward โ€” the bottleneck is local optimization geometry, not signal quality.๐Ÿ“‰ Stop guessing why your distillation failed and start aligning thinking patterns! Read the full paper to master LLM post-training. #AI #THUNLP #OpenBMB #LLM #ModelDistillation #ReinforcementLearning
4
18
116
6,607
The irony? Training on others content was called innovation but training on their model is called theft. The real debate is where we draw the line between learning copying, and getting paid for it. ๐Ÿค” #AI #LLMs #ModelDistillation #AIethics
Weโ€™ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models.
2
100
๐Ÿ”ฅ๐ŸŽฏTesla is recruiting AI Distillation Engineers to accelerate on-vehicle inference model deployment 1๏ธโƒฃ Latest hiring update: Tesla has officially posted the role โ€œAI Engineer, Reinforcement Learning & Distillation,โ€ based in Palo Alto, California. 2๏ธโƒฃ Role significance: This position focuses on โ€œdistillationโ€โ€”compressing large training models into lightweight versions that can run directly on vehicle inference hardware. It signals Teslaโ€™s push to strengthen on-car AI execution, not just cloud-based processing. 3๏ธโƒฃ Underlying logic: โ€ข On-vehicle inference faces tighter constraints than cloud training, including compute limits, latency, and power efficiency. โ€ข Distillation transfers the โ€œknowledgeโ€ of a large model into a smaller network suited for automotive hardware. โ€ข The move aligns with Teslaโ€™s broader strategy in autonomous driving, robotics, and in-car intelligence. 4๏ธโƒฃ Strategic implications: โ€ข Tesla is investing not only in training but also in โ€œhow to make AI run efficiently inside the car.โ€ โ€ข Stronger on-vehicle inference could reinforce Teslaโ€™s moat in autonomy, robotics, and the future vehicle-edge AI ecosystem. โ€ข For investors, this marks Teslaโ€™s transition from an AI training powerhouse toward an AI deployment platform. 5๏ธโƒฃ Risks & challenges: โ€ข Compressing large models while preserving performance is technically demanding. โ€ข While distillation helps, heavy reliance on specific hardware may leave Tesla exposed if competitors win on software-hardware integration. โ€ข Expectations for Teslaโ€™s autonomy and in-car intelligence remain high; slow progress could disappoint in the short term. 6๏ธโƒฃ Key points to watch: โ€ข Whether Tesla later discloses details on its on-vehicle inference framework or hardware specs. โ€ข Compensation levels and talent-competition signals tied to this role (e.g., whether it triggers industry movement). โ€ข Real-world use cases of distillation in autonomy, such as how a vehicle inherits capabilities from a large model but executes them with a smaller one. ๐Ÿ” Summary: Teslaโ€™s move to hire โ€œAI Distillation Engineersโ€ is a clear indicator that its AI strategy is shifting toward large-scale deployment inside the vehicle. For investors tracking $TSLA or its broader AI ecosystem, this development deserves close attention. ๐Ÿ“ฌ How long do you think it will take for Teslaโ€™s distilled on-vehicle models to scale across the fleet? Share your view in the comments. #Tesla #AI #AutonomousDriving #MachineLearning #ModelDistillation
1
3
10,522
AI NEWSWIRE: DeepSeek Reveals Low AI Training Cost, Challenges US Giants TECHNOLOGY NEWS; China's DeepSeek revealed its R1 AI model cost $294,000 to train, challenging US AI giants and intensifying global competition. BIG DATA BREAKTHROUGH: This low-cost approach redefines AI development economics. PEOPLE'S REPUBLIC OF CHINA: DeepSeek has reignited the global AI race. MORE BANG FOR THE BUCK: Deep Seek's stunning revelation is that its its R1 model cost just $294,000 to train. This figure, published in the academic journal Nature, stands in stark contrast to the โ€œmuch more than $100 millionโ€ estimated by OpenAI CEO Sam Altman for foundational model training. Further this directly challenges the perceived dominance of US AI giants and sending ripples through global tech markets. The Hangzhou-based developerโ€™s disclosure, its first estimate of R1โ€™s training costs, initially prompted global investors to dump tech stocks, fearing the new models could threaten leaders like Nvidia. DeepSeek stated R1 was trained for 80 hours on 512 Nvidia H800 chips. However, the company also acknowledged for the first time using more powerful A100 chips in preparatory development stages, a detail that fuels ongoing debate given US export controls on advanced AI chips to China. DeepSeek also responded to assertions that it โ€œdistilledโ€ OpenAIโ€™s models. While consistently defending distillation as a method for better, cheaper AI, the company now states its V3 modelโ€™s training data, sourced from crawled web pages, contained a โ€œsignificant number of OpenAI-model-generated answers.โ€ This, it explained, could lead to the base model indirectly acquiring knowledge, but was incidental, not intentional. This lower-cost approach, whether through efficient training or indirect knowledge acquisition, has profound implications. It suggests a potentially more accessible pathway to advanced AI, intensifying competition and forcing a re-evaluation of development economics in the rapidly evolving global AI landscape. #AI #ArtificialIntelligence #DeepSeek #AINewswire #R1Model #AIRevolution #TechNews #ChinaTech #AICompetition #USAI #GlobalAI #LowCostAI #AITraining #BigData #TechBreakthrough #AIDevelopment #TechNews #Nvidia #H800Chips #A100Chips #BusinessNews #AIChips #TechMarkets #AIInnovation #MachineLearning #DeepLearning #AIEconomics #TechStocks #GlobalTech #HangzhouTech #NatureJournal #OpenAI #SamAltman #AIInvestment #DataCrawling #WebData #ModelDistillation #AIEthics #TechCompetition #AIAdvancements #TrainingCosts #AIAccessibility #FutureOfAI #TechRace #AIlandscape #Innovation #TechDisruption #AIPower #GlobalInnovation
3
179
Heterogeneous Ensemble Enables a Universal Uncertainty Metric for Atomistic Foundation Models 1. A new universal uncertainty metric ๐ด? has been introduced for atomistic foundation models (uMLIPs), which provides a reliable measure of prediction errors without needing reference DFT calculations. This metric is based on a heterogeneous ensemble of existing uMLIPs, leveraging their diversity to quantify uncertainty effectively. 2. The metric ๐ด? shows a strong correlation with true prediction errors across diverse datasets, including metals, alloys, inorganic compounds, and complex materials. It can accurately identify high-risk configurations and filter out numerical noise, leading to improved accuracy in some cases compared to DFT reference labels. 3. An uncertainty-aware model distillation framework is proposed, which uses ๐ด? to create system-specific potentials with significantly reduced computational cost. For tungsten (W), comparable accuracy to full-DFT training is achieved using only 4% of DFT labels, while for MoNbTaW alloys, no additional DFT calculations are required. 4. The study demonstrates that the uncertainty metric ๐ด? can guide data selection and fine-tuning strategies, enabling cost-efficient development of accurate interatomic potentials. This approach also facilitates the expansion of datasets and the construction of more reliable foundation models. 5. The framework is validated on a wide range of materials, including elemental tungsten and high-entropy alloys like MoNbTaW, showing its broad applicability. The results highlight the potential for ๐ด? to enhance the safety and reliability of uMLIPs in critical applications. ๐Ÿ“œPaper: arxiv.org/abs/2507.21297v1 #MachineLearning #MaterialsScience #UncertaintyQuantification #AtomisticSimulations #ModelDistillation
1
5
774
Heterogeneous Ensemble Enables a Universal Uncertainty Metric for Atomistic Foundation Models 1. A new universal uncertainty metric ๐ด? has been introduced for atomistic foundation models (uMLIPs), which provides a reliable measure of prediction errors without needing reference DFT calculations. This metric is based on a heterogeneous ensemble of existing uMLIPs, leveraging their diversity to quantify uncertainty effectively. 2. The metric ๐ด? shows a strong correlation with true prediction errors across diverse datasets, including metals, alloys, inorganic compounds, and complex materials. It can accurately identify high-risk configurations and filter out numerical noise, leading to improved accuracy in some cases compared to DFT reference labels. 3. An uncertainty-aware model distillation framework is proposed, which uses ๐ด? to create system-specific potentials with significantly reduced computational cost. For tungsten (W), comparable accuracy to full-DFT training is achieved using only 4% of DFT labels, while for MoNbTaW alloys, no additional DFT calculations are required. 4. The study demonstrates that the uncertainty metric ๐ด? can guide data selection and fine-tuning strategies, enabling cost-efficient development of accurate interatomic potentials. This approach also facilitates the expansion of datasets and the construction of more reliable foundation models. 5. The framework is validated on a wide range of materials, including elemental tungsten and high-entropy alloys like MoNbTaW, showing its broad applicability. The results highlight the potential for ๐ด? to enhance the safety and reliability of uMLIPs in critical applications. ๐Ÿ“œPaper: arxiv.org/abs/2507.21297v1 #MachineLearning #MaterialsScience #UncertaintyQuantification #AtomisticSimulations #ModelDistillation
6
827
AI NEWSWIRE: OpenAI Boosts Security After DeepSeek Model Copying Claims OpenAI has reportedly enhanced its security protocols to safeguard against potential corporate espionage. The increased security measures were implemented following the release of a competing model by the Chinese startup DeepSeek in January. OpenAI alleges that DeepSeek improperly copied its models through โ€œdistillationโ€ techniques. The new security measures include โ€œinformation tentingโ€ policies, which restrict employee access to sensitive algorithms and new products. During the development of OpenAIโ€™s o1 model, only verified team members who had been briefed on the project were permitted to discuss it in shared office spaces. Furthermore, OpenAI now isolates proprietary technology within offline computer systems. Biometric access controls, such as fingerprint scanning, have been implemented for office areas. The company also maintains a โ€œdeny-by-defaultโ€ internet policy, requiring explicit approval for all external connections. Physical security at data centers has been increased, and the company has expanded its cybersecurity personnel. These changes are reportedly driven by concerns regarding potential intellectual property theft by foreign entities. #AI #OpenAI #DeepSeek #AISecurity #Cybersecurity #AIIntellectualProperty #ModelDistillation #AIEthics #CorporateEspionage #DataProtection #AIInnovation #TechSecurity #InformationTenting #BiometricSecurity #FingerprintScanning #OfflineSystems #DenyByDefault #DataCenterSecurity #CyberThreats #AICompetition #IntellectualProperty #AIResearch #TechNews #AIStartups #ChineseAI #USAI #TechRivalry #AIGovernance #DataPrivacy #SecurityProtocols #AIAdvancements #TechIndustry #CyberDefense #AITheft #ProprietaryTech #AIRevolution #TechEthics #CyberSec #AIDevelopment #SecurityMeasures #TechInnovation #DataBreaches #AIChallenges #TechPolicy #CyberRisks #IPProtection #TechCompetition #AIStrategy #DigitalSecurity #TechTrends
1
3
218
Big Moments from ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—–๐—ผ๐—ป ๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ ๐— ๐—ฒ๐˜๐—ฎ made waves at hashtag#LlamaCon2025 with major announcements shaping the future of ๐—”๐—œ ๐—ฑ๐—ฒ๐˜ƒ๐—ฒ๐—น๐—ผ๐—ฝ๐—บ๐—ฒ๐—ป๐˜: ๐Ÿ”น A standalone ๐— ๐—ฒ๐˜๐—ฎ ๐—”๐—œ ๐—ฎ๐—ฝ๐—ฝ with a social "Discover" feed - positioned to rival ChatGPT ๐Ÿ”น ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ ๐—”๐—ฃ๐—œ enters free preview, making Llama models more accessible to devs ๐Ÿ”น New safety tools: ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ ๐—š๐˜‚๐—ฎ๐—ฟ๐—ฑ ๐Ÿฐ (๐Ÿญ๐Ÿฎ๐—•), ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ๐—™๐—ถ๐—ฟ๐—ฒ๐˜„๐—ฎ๐—น๐—น, ๐—ฎ๐—ป๐—ฑ ๐—ฃ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜ ๐—š๐˜‚๐—ฎ๐—ฟ๐—ฑ ๐Ÿ”น Hardware collabs with ๐—š๐—ฟ๐—ผ๐—พ and ๐—–๐—ฒ๐—ฟ๐—ฒ๐—ฏ๐—ฟ๐—ฎ๐˜€ for lightning-fast inference During a conversation between ๐— ๐—ฎ๐—ฟ๐—ธ ๐—ญ๐˜‚๐—ฐ๐—ธ๐—ฒ๐—ฟ๐—ฏ๐—ฒ๐—ฟ๐—ด and ๐—ฆ๐—ฎ๐˜๐˜†๐—ฎ ๐—ก๐—ฎ๐—ฑ๐—ฒ๐—น๐—น๐—ฎ, some eye-opening stats and predictions surfaced: ๐Ÿง  ๐Ÿฎ๐Ÿฌโ€“๐Ÿฏ๐Ÿฌ% ๐—ผ๐—ณ ๐— ๐—ถ๐—ฐ๐—ฟ๐—ผ๐˜€๐—ผ๐—ณ๐˜โ€™๐˜€ ๐—ฐ๐—ผ๐—ฑ๐—ฒ is now ๐—”๐—œ-๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ฒ๐—ฑ ๐Ÿ“ˆ ๐— ๐—ฒ๐˜๐—ฎ ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฐ๐˜๐˜€ ๐—”๐—œ ๐˜๐—ผ ๐—ต๐—ฎ๐—ป๐—ฑ๐—น๐—ฒ ๐Ÿฑ๐Ÿฌ% of its software development ๐˜„๐—ถ๐˜๐—ต๐—ถ๐—ป ๐—ฎ ๐˜†๐—ฒ๐—ฎ๐—ฟ The buzz wasnโ€™t just about speed - model distillation took center stage in the open source discussions. The ability to compress massive models into efficient, smaller versions (while preserving most of the intelligence) is being hailed as one of open source's most "magical" strengths. Imagine retaining 90 to 95% of a giant modelโ€™s power - optimized for your laptop or phone. This is ๐—”๐—œ ๐—ฝ๐—ฟ๐—ผ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€ ๐—ฎ๐˜ ๐˜€๐—ฐ๐—ฎ๐—น๐—ฒ, and it's only just getting started. #AI #Meta #Llama #OpenSource #DevTools #AIInfrastructure #LlamaAPI #LlamaCon #Groq #Cerebras #SoftwareDevelopment #ModelDistillation #GenAI #TechNews #Llamacon2025
2
61
19 Feb 2025
Make AI smarter, faster, and more efficient without the high cost & complexity! Model Distillation is doing just that. But how?๐Ÿค”& What does it mean for your business?๐Ÿ‘‡ ๐Ÿ‘‰ Read More: zurl.co/rttgB #AI #ModelDistillation #Innovation #Opensource #VE3 #AImodels
2
5
35
6 Feb 2025
๐Ÿ”ฅ Build a GPT-like AI for just $50, no coding required ๐Ÿ˜ฑ! (๐Ÿ”” Please Follow me for more exciting updates on cutting-edge AI and tech! ๐ŸŽฅ Continue watching the full video on my YouTube: youtube.com/shorts/KcJOmjenDโ€ฆ ) Researchers at Stanford and the University of Washington have developed S1, an AI model that rivals industry giants like OpenAI's GPT and DeepSeek's R1. Discover how model distillation and tools like Kiln AI are democratizing advanced AI development. __________________________________________ ๐ŸŒ Discover more in my latest blog: makayis.co/model-distillatioโ€ฆ __________________________________________ ๐ŸŒ Visit Our Website: Makayis.co Revolutionize Your Business with Our Custom Chatbot ๐Ÿค– No matter your industry, our Custom Chatbot makes your business smarter, faster, and more efficient. Perfect for: ๐Ÿ‘‰ Retail: Manage inquiries and suggest products seamlessly. ๐Ÿ‘‰ Healthcare: Schedule appointments and handle patient questions. ๐Ÿ‘‰ Hospitality: Simplify bookings and enhance customer satisfaction. ๐Ÿ‘‰ E-commerce: Offer 24/7 support and tailored recommendations. ๐Ÿ‘‰ Small Businesses: Automate tasks to save time. ๐Ÿ‘‰ Education: Answer queries and streamline processes. โœ… Top Features: ๐Ÿ‘‰ 24/7 Customer Support: Never miss a query with round-the-clock assistance. ๐Ÿ‘‰ Appointment Booking: Effortless scheduling for customers and clients. ๐Ÿ‘‰ Order Management: Recommend products and manage customer orders with ease. ๐Ÿ‘‰ Auto-Scheduling Promotions: Plan and send offers to customers via WhatsApp, email, or SMS, boosting engagement and sales. ๐ŸŒŸ Why Choose Us? Affordable, easy to set up, and designed to help mid-level businesses grow without breaking the bank. ๐Ÿ‘‰ Learn More & Sign Up: chatbot.makayis.co/custom-chโ€ฆ __________________________________________ Letโ€™s Build the Future Together! ๐ŸŒŸ Stay Connected with Us: ๐Ÿฆ X (formerly Twitter): x.com/MAKAYIS2024 ๐Ÿ“ธ Instagram: instagram.com/makayisai/f ๐ŸŽฅ YouTube: youtube.com/@MAKAYIS2024 ๐Ÿ‘ Facebook: facebook.com/profile.php?id=โ€ฆ ๐Ÿ‘ฅ Reddit: reddit.com/user/makayis2024/ ๐Ÿ“Œ Pinterest: es.pinterest.com/makayis2024โ€ฆ ๐ŸŽถ Tiktok: tiktok.com/@makayis2024 __________________________________________ #AI #TechRevolution #ModelDistillation #S1AI #BigTech #MachineLearning #ArtificialIntelligence #getkiln #makayis
1
3
128
3 Dec 2024
๐Ÿ”Š I'm super excited about the new model distillation capability in Amazon Bedrock to easily transfer knowledge from a large, complex model to a smaller one. More details in @channyun 's post! aws.amazon.com/blogs/aws/buiโ€ฆ via @awscloud #AWS #reInvent #ModelDistillation #AmazonBedrock
4
10
959
Meta's Llama, an open generative AI model, is available in multiple versions and platforms, offering diverse capabilities and tools for developers. #GenerativeAi #Llama #ModelDistillation haywaa.com/article/meta-llamโ€ฆ
1
2
41
๐Ÿ“ How does #ModelDistillation, #FineTuning & #RLHF come together for computer vision use cases? ๐Ÿ“Œ ๐Ÿ™Œ๐Ÿป Recently my colleague Rahul Sharma & I co-authored an end-to-end tutorial showing how easy it is for anyone to create a smaller, efficient computer vision model using a combination of model distillation and fine-tuning.
1
1
4
383
๐ŸŽ“ Knowledge Extraction: Distillation techniques help smaller models mimic the network, enabling offline evaluation. #ModelDistillation $TAO #bittensor
8
534
AutoDistill: An End-to-End Fully Automated Distillation Framework for Hardware-Efficient Large-Scale NLP Models | bit.ly/3H6o4uj #AI #ML #ArtificialIntelligence #MachineLearning #NLP #PretrainedModel #ModelDistillation #AutoML
2