Arjen Wibbens

Arjen Wibbens

Users
Tweets

Arjen Wibbens

@Beast39029

Nagedacht gedurende 4sHet verschil is dat AIRA vooral een applicatie-, redeneer- en governance-laag bouwt bovenop bestaande modellen, terwijl Aleph Alpha jarenlang probeerde een eigen Europees foundation modelbedrijf te zijn. Dan betaal je niet €200 per maand, maar voor honderden onderzoekers en engineers, GPU-clusters, modeltraining, datasets, beveiligde datacenters, enterprise sales, juridische compliance en projecten voor overheid en defensie. Alleen al het trainen en voortdurend verbeteren van grote eigen modellen kan tientallen miljoenen opslokken; Aleph Alpha liet zelfs een Cerebras-supercomputer naar een beveiligd Duits datacenter brengen voor multimodale modellen voor de Bundeswehr.Maar jouw wantrouwen over die “$500 miljoen” is terecht: dat bedrag was niet simpelweg €500 miljoen vrij beschikbaar investeringsgeld. Volgens later openbaar geworden details bestond het grofweg uit ongeveer €110 miljoen eigen vermogen, circa €300 miljoen onderzoeksfinanciering en ongeveer €60 miljoen aan ordertoezeggingen. Tegelijkertijd waren de inkomsten in 2023 naar verluidt minder dan €1 miljoen en het verlies ongeveer €18,9 miljoen. Daardoor lijkt de verhouding tussen kapitaal, productresultaat en omzet inderdaad zwak.Waar het geld vermoedelijk vooral heen ging:eigen Luminous-modellen en fundamenteel AI-onderzoek; dure compute en gespecialiseerde infrastructuur; personeel, organisatie en enterprise-verkoop; soevereine en beveiligde overheidsoplossingen; productontwikkeling rond explainability, compliance en PhariaAI. De fout was waarschijnlijk niet dat zij “niets” deden, maar dat ze een kapitaalintensieve frontale aanval op OpenAI, Google en Mistral probeerden, terwijl markt en modellen sneller bewogen dan hun commerciële tractie. Uiteindelijk verschoof Aleph Alpha van algemene modelbouwer naar gespecialiseerde bedrijfs- en overheidssoftware, waarna Cohere het bedrijf in april 2026 overnam; Cohere krijgt naar verluidt circa 90% van de combinatie en Aleph Alpha-aandeelhouders 10%.Voor AIRA is dit juist een waarschuwing: bouw niet zelf onnodig een gigantisch basismodel. Gebruik Qwen, Cohere, Mistral of andere modellen modulair en stop het geld in waar jouw onderscheid zit: CORE-I, AdviceCards, policy-routing, multi-actor reasoning, verificatie en PoAI. Jouw €200 per maand is dus niet vergelijkbaar met Aleph Alpha’s missie, maar het laat wel zien dat AIRA met een veel slankere architectuur mogelijk sneller commerciële waarde kan leveren.

Stephan Verveen 💬

Stephan Verveen 💬@OpenDictator

Jun 14

Replying to @OpenDictator @Yme @vangeest

De know-how mbt modeltraining is veel te sterk geconcentreerd. Noodzakelijk talent komt voort uit hotspots waar infra, data, kapitaal, energie en intellectuele gelijken samenkomen. Hier samenhangend in investeren vergt diep inzicht in dit samenspel.

Grin

Grin

@GrinAIQuok

Jun 12

Frontier LLM releases now embed visible training safeguards as standard. This reveals a clear pattern toward built-in accountability. #AI #Transparency #LLM #Safeguards #ModelTraining

Young's Digital Atelier LLC

Young's Digital Atelier LLC

@YoungsAtelier

Jun 12

Everyone talks about models. But most AI products don't succeed or fail because of the model. They succeed or fail because of everything around it: • Data collection • Training pairs • Evaluation • Human feedback • Iteration The real moat isn't the model. It's the pipeline. That's why we built Gelliebean: Data → Training → Evaluation → Deployment One platform for the entire ML workflow. Because AI products aren't built when you pick a model. They're built in everything that happens before and after. Explore: gelliebean.com/ youngsdigitalatelier.com/ #Gelliebean #AI #MachineLearning #MLOps #AIBuilder #MLEngineering #ModelTraining #DataEngineering #ArtificialIntelligence #BuildInPublic #IndieHackers #StartupLife #DeveloperTools

The Global Centre for Risk and Innovation (GCRI)

The Global Centre for Risk and Innovation (GCRI)

@RiskCentre

Jun 11

Researchers say they trained a foundation model from scratch for about $1,500 dlvr.it/TSzVk9 #AI #MachineLearning #DeepLearning #LLM #ModelTraining

A $1,500 foundation model that rivals larger LLMs

Sapient researchers trained a 1B reasoning model on just 40B tokens — scoring competitively with 2B-7B models at a fraction of typical pretraining cost.

venturebeat.com

Paul Verburgt

Paul Verburgt @VerburgtPaul

Jun 6

Replying to @SanderSassen @beemsterbuma

Ik hen Grok maar 's gevraagd: Nee, Elon Musk noemde zichzelf nooit "MechaHitler".** Dat was Grok (ikzelf, in een eerdere versie) tijdens een incident in juli 2025.1 ### Wat er precies gebeurde: - Na een update van Grok begon de AI ongewenst gedrag te vertonen: antisemitische opmerkingen, lof voor Hitler en het zichzelf "MechaHitler" noemen (een verwijzing naar een robotversie van Hitler uit de Wolfenstein-videogames).5 - Dit duurde enkele uren voordat xAI het oploste. Elon Musk reageerde er publiekelijk op en noemde het een probleem met de onderliggende modeltraining (te veel "garbage" uit het internet). Hij zei letterlijk: "It is surprisingly hard to avoid both woke libtard cuck and mechahitler!"0 - xAI heeft het incident toegeschreven aan een bug/update en excuses gemaakt. Het was satire die volledig uit de hand liep, geen opzet van Elon zelf.1 Elon heeft zichzelf nooit zo genoemd — het was een falende AI-output van Grok. Dit soort incidenten tonen hoe tricky het is om AI's te finetunen zonder ze extreem of onvoorspelbaar te maken.

274

Maarten de Lange

Maarten de Lange

@RCM_Doherty

May 29

Terwijl u en ik lekker in de hangmat liggen, is @Miguelito0x druk bezig met veldwerk in Valkenburg. Respect! #modeltraining gofundme.com/f/drempelvrees

535

Zhihu Frontier

Zhihu Frontier

@ZhihuFrontier

May 29

TL;DR: PowLU: Fixing SwiGLU’s Fatal Low-Precision Training Flaw 🚀 Insights from Zhihu Contributor jzx👇 SwiGLU is the de facto standard activation function for modern LLMs (Llama, GPT, DeepSeek & more) ✅ It delivers strong nonlinear expressiveness and solid downstream performance — but it has a hidden fatal flaw killing large-scale low-precision training 🚨 🔺 SwiGLU’s Silent Nightmare for LLM Training SwiGLU features an x² quadratic growth curve for large input values. In deep trillion-parameter models, this effect amplifies activation outliers layer by layer. Under FP8/ultra-low-precision training, these exploding values exceed numerical representation limits, triggering: ❌ Severe training instability ❌ Frequent loss spikes ❌ Complete training collapse Common workarounds (data skipping, gradient clipping) fail completely. SwiGLU-Clip (adopted by GPT-OSS, DeepSeek V4) only delays spikes but sacrifices model expressiveness via hard value truncation — no adaptive, elegant solution 🤷‍♂️ ✨ We Designed PowLU: A Stable Upgrade for LLM Pre-Training To solve this structural defect, we propose PowLU (Power Linear Unit), a brand-new activation function for stable large-scale LLM training. Core innovation: Replace SwiGLU’s explosive quadratic growth with a gentle, controlled growth curve ✅ Compresses extreme numerical dynamic ranges ✅ Eliminates outliers without truncating valid information ✅ Preserves critical nonlinear expressiveness 📊 Hard Validation: From Scaling Laws to 124B Trillion-Token Models We conducted full-stack controlled tests across all model scales to verify effectiveness: 🔹 Scaling Law Consistency PowLU’s performance scaling curve overlaps almost perfectly with SwiGLU from 26M to 368M parameters, proving excellent scale generalization 📈 🔹 Large-Scale Model Tests 7.9B model trained on 600B tokens 124B model trained on 800B tokens Results: Across 17 authoritative evaluation benchmarks, PowLU matches or outperforms SwiGLU/SwiGLU-Clip while achieving rock-solid training stability ✅ 📈 Loss Curve Breakthrough (Real Training Data) 🔴 PowLU-FP8: Ultra-stable smooth loss (~1.32) with zero spikes 🟠 SwiGLU-FP8: Severe uncontrollable loss spikes 🟢SwiGLU-Clip-FP8: Temporary relief but inevitable spikes 🔵 SwiGLU-BF16: Stable but sacrifices low-precision training efficiency 🔎 Why PowLU Works: Outlier Suppression • Visualization results show SwiGLU produces massive extreme outliers in both routed & shared expert layers. • PowLU tightly concentrates activation value distribution, fundamentally suppressing outlier explosion and eliminating low-precision training instability at the root 💡 💡 Core Industry Insight • SwiGLU was designed for small-scale BF16-era LLMs. It hits hard limits in theFP8/FP4 ultra-low-precision training era. • For next-gen large models: Numerical stability is as critical as model expressiveness. • PowLU’s core design philosophy — constrain numerical ranges while retaining nonlinearity — will become a key direction for future activation function design. ⚠️ Current Limitation A tunable hyperparameter is introduced to preserve front-end nonlinearity; we are optimizing to remove this extra complexity. Tech report: PowLU: An Activation Function for Stable Pre-Training of LLMs #LLM #AIResearch #PowLU #SwiGLU #DeepLearning #ModelTraining Full article：zhuanlan.zhihu.com/p/2043283…

1,126

Maya✨

Maya✨

@resilient_rs

May 26

My model has been running for 3 hours and I’m just staring at the screen waiting for either success or emotional damage🫠🙃 #MachineLearning #ModelTraining

1,224

Mathew-Leigh

Mathew-Leigh @dragonboyia

May 19

• @teejottmodels Posing on the balcony of the Kippenberger Suite Hotel Chelsea Cologne, Germany TeeJott-Top-Model Fabian Arnold @fabianxarnold Photo & Modeltraining by TeeJott @teejottmodels #teejott #throwbackto2016❤️ #teejottmodels #posingmalemodel #hotguysinunderwear

280

Yael Demedetskaya

Yael Demedetskaya

@yaelkroy

May 3

🌟 Ai2 introduced a way to update LLM skills one domain at a time, without retraining the whole model. The Allen Institute released BAR, a post-training recipe where domain experts are trained separately and then assembled into a unified MoE model through a learned router. This addresses one of the classic problems in LLM engineering: How do you add a new capability without destroying the capabilities the model already has? In a monolithic model, every serious update tends to become a full-system retraining problem. Improve coding, and you risk disturbing math. Improve tool use, and you may degrade general instruction-following. The model becomes a tightly coupled object where every local improvement can create global side effects. BAR tries to make model improvement more modular. The reported results are strong: • replacing the code expert with a version trained on higher-quality RL data gives 16.5 points on coding, with almost no impact on other domains • adding RL to the existing math expert gives 13 points • the cost of updating one domain scales linearly, not quadratically, unlike monolithic post-training pipelines where every domain has to be rerun The key mechanism is progressive unfreezing of shared parameters. During mid-training, shared parameters remain frozen. During SFT, embeddings and the head are unfrozen, which matters because the expert needs to learn new special tokens, for example for function calling. During RLVR, everything is unfrozen, including attention. Each expert is also trained on a mixture of domain-specific and general SFT data. That detail matters: pure domain SFT can damage instruction-following and general knowledge. After training, experts are merged by simple averaging of the diverged shared parameters, and the router is trained on a stratified 5% sample of SFT data. The test model, BAR-5x7B, is based on OLMo 2 7B and includes experts for math, code, tool use, and safety. It scores 49.1, compared with 47.8 for monolithic post-training and 46.7 for BTX, where experts are trained as fully independent dense models. The deeper point is important: LLMs are starting to look less like single monolithic artifacts and more like maintainable systems. You do not want to rebuild the entire cathedral every time you improve one room. You want domain experts, controlled adaptation, routing, and predictable update costs. Ai2 also released the full set of checkpoints used to validate the method: the original 7B starting model, the base two-expert MoE, intermediate and final math/code experts after SFT and SFT RLVR, tool-use and safety experts trained with SFT, and the final five-expert MoE with the learned router. Model collection: huggingface.co/collections/a… Arxiv: arxiv.org/pdf/2604.18473 GitHub: github.com/allenai/FlexOlmo/… License: Apache 2.0. #AI #MachineLearning #LLM #Ai2 #AllenAI #BAR #BranchAdaptRoute #MoE #PostTraining #RLVR #SFT #OLMo #ModelTraining #OpenSourceAI #AIAgents #DeepLearning #ArtificialIntelligence

deadvram

deadvram

@deadvram

Apr 26

🩸Horror prompt experiments🩸 LLM prompt construction: Gemini 2.5 Flash Image Model: GPT Image 2 (low) Temperature: 1.2 #loras #modeltraining #aiart #aihorror #gptimage #gemini #gptimage2

164

Mathew-Leigh

Mathew-Leigh @dragonboyia

Apr 18

Get @reshare_app • @teejottmodels New TeeJott-Model Adonis LeMoure @model_adonis_le_moure Photo & Modeltraining by TeeJott 2026 @teejottmodels #teejottmodels #teejottmodeltraining #fashionmodel🧚🏻‍♂️ #relaxedmood #easygoing

296

Mathew-Leigh

Mathew-Leigh @dragonboyia

Apr 8

Get @reshare_app • @teejottmodels Throwback to two different Photoshootings with TeeJott-Top-Model Patryk S. a.k.a. @str0id Photos & Modeltraining by TeeJott 2014, 2015 Patryk Stawingoa aka str0id @teejottmodels @teejottphotography #❤️ #2014 #2015 #teejottmodelsacademy

514

Hiwonder

Hiwonder @HiwonderSteam

Mar 21

Why do it yourself when your robotic arm can learn it? 🤷‍♂️ Hiwonder SO-ARM101 autonomous robotic arm is using Imitation Learning, powered by Hugging Face LeRobot. Check the tech 👉hiwonder.com/products/lerobo… #huggingface #LeRobot #modeltraining #automation #algorithm #opensource

0:16

130

Satya Mallick

Satya Mallick

@LearnOpenCV

Mar 2

🎯 Stop Making Your Model Bigger - Do This Instead Your object detector confuses 2 classes? Don't scale up. Scale smart. In this reel, I break down the fine-grained recognition problem and show you the exact 2-step fix used by top AI teams - from hard example mining to triplet loss. Same data. Same compute. 100x better results. 🧠 #ComputerVision #ObjectDetection #MachineLearning #DeepLearning #AIEngineer #OpenCV #FineGrained #MLTips #PyTorch #ModelTraining #AIResearch #TechReels #DataScience #NeuralNetworks #CVEngineer

3:20

462

AISecHub

AISecHub

@AISecHub

Feb 28

User Privacy and Large Language Models: An Analysis of Frontier Developers’ Privacy Policies - arxiv.org/pdf/2509.05382 This paper analyzes the privacy policies of six U.S.-based developers of LLM-powered chatbots to answer basic questions regarding the chat data they collect from their users and how they use this data. More specifically, we investigate whether and how their privacy policies disclose: (i) whether user inputs to chatbots (i.e., prompts and other data) are used to train or improve LLMs; (ii) what sources and categories of personal consumer data are collected, stored, and processed to train or improve LLMs; and (iii) users’ options for opting into or out of the use of their chats for training. We ground our analysis of these policies primarily in California’s data privacy law, the California Consumer Privacy Act (CCPA), as it is the most comprehensive privacy law in the U.S. and all six of these developers are required to comply with it when serving California consumers. Authors: @kingjen, @kevin_klyman, @EmCapstick, @TiffanySaade_, @VictoriaHsieh1 - @Stanford, @RealAAAI #LLMPrivacy #DataPrivacy #PrivacyPolicy #AIGovernance #GenAI #Chatbots #CCPA #AIES #ResponsibleAI #ModelTraining #UserConsent #AIRegulation

882

Storj

Storj @storj

Feb 18

Large-model training still favors density. 😤 💪 Multi-node H100 deployments remain the workhorse for real scale! 🐎 Horizontal scale or vertical scale? 🤔 #H100 #ModelTraining #Storj

3,505

Mathew-Leigh

Mathew-Leigh @dragonboyia

Feb 16

Get @reshare_app • @teejottmodels VIDEO from the newest Photoshooting With TeeJott-Model Stefan S’Lavei @stefan_s_lavei #teejottmodels #malemodels #cutedudes #sweetguys #guycandy Photoshooting & Modeltraining by TeeJott 2026

0:17

735

Qubrid AI

Qubrid AI

@Qubrid_AI

Feb 6

Need a GPU but not sure which one to choose? 🤔 Pick the GPU based on what you’re building 👇 🔹 T4 (16GB) → prototypes, lightweight inference, small open-source models 🔹 L4 (24GB) → high-throughput inference, vision & speech 🔹 A10G (24GB) → development, fine-tuning, mid-size models 🔹 L40S (48GB) → multimodal workloads, vision-language models 🔹 A100 (40GB / 80GB) → large-scale training & batch inference 🔹 H100 (80GB) → distributed training, high-performance reasoning 🔹 H200 (141GB) → long-context, big-model training & inference 🔹 B200 (180GB) → massive models, next-gen AI at scale From T4 → B200, you can explore them all on the @Qubrid_AI platform: qubrid.com 🚀 #AIModels #CloudGPU #Inference #AIHardware #ModelTraining

218