MLCommons

MLCommons

128 Photos and videos

Tweets

MLCommons @MLCommons

Jun 9

The security world's "find it → patch it → disclose it" model doesn't work for AI. You can't patch a released open-weight model. The weights are already out there — forever. MLCommons is building the disclosure standard AI evaluation actually needs. bit.ly/43t8R3t

155

MLCommons

MLCommons @MLCommons

Jun 2

AI systems co-design is too fragmented. Enter MLCommons Chakra (#MLSys2026): an open execution trace ecosystem to bridge software & hardware without exposing IP. Native in @PyTorch, NVIDIA NeMo, & vLLM. bit.ly/4vkYZEP

152

MLCommons

MLCommons @MLCommons

Jun 1

Meet GeoCroissant. Built on MLCommons Croissant, it adds Earth observation-specific metadata—from coordinate systems to spatial resolution—to give you better traceability and more reproducible workflows for agentic AI pipelines. bit.ly/3PTLywz

119

MLCommons

MLCommons @MLCommons

May 19

Introducing the 2026 @MLCommons Rising Stars! 🌟 We’ve selected 39 outstanding early-career researchers from 26 global institutions who are shaping the future of ML systems, hardware-software co-design, and trustworthy AI. Meet the cohort: bit.ly/3Ru3ONl #AI #MLCommons

1,472

MLCommons

MLCommons @MLCommons

May 19

The median AI benchmark longevity score is 5/100. AILuminate scored 75—but even that degrades over time. To fix this, the @MLCommons AIRR team built the Continuous Prompt Stewardship System to keep risk evaluation fresh and reliable. bit.ly/3On4jrz

139

MLCommons

MLCommons @MLCommons

May 18

What does AI reliability actually require? It comes down to consistently following the right behavioral rules—even under adversarial attack. Meet the AI Reliability Map to guide pre-deployment testing. Explore the framework: bit.ly/4mG7erO #AIReliability #AI

124

MLCommons

MLCommons @MLCommons

May 14

Do tools like OpenClaw signal a turning point for mainstream AI adoption? MLCommons' Dave Graham debated that and more on the Utilizing AI podcast. What do you think? bit.ly/4uJj4Va #AgenticAI #AI

The Future of Agentic AI: Opportunities, Risks, and Society |...

Email your questions, comments, or topics to the podcast: Utilizing...

youtube.com

135

MLCommons

MLCommons @MLCommons

May 14

MLPerf Training v6.0 has added GPT-OSS 20B. With 21B total parameters (but only 3.6B active per token), this new sparse MoE pretraining benchmark is designed specifically for accessibility—it can run on a single 8-GPU node. bit.ly/4noRr14

181

MLCommons

MLCommons @MLCommons

May 13

Mixture-of-Experts (MoE) architectures like DeepSeek-V3 are the new standard for scaling frontier LLMs. Now, that architecture is part of MLPerf Training v6.0. bit.ly/3QSteEj

1,679

MLCommons

MLCommons @MLCommons

May 13

AI Risk and Reliability certification shouldn't be a self-assessment. That's the premise behind the AILuminate Global Assurance Program (GAP). GAP gives organizations an independent path to certify that their AI systems meet established safety standards. bit.ly/4kIS18x

101

MLCommons

MLCommons @MLCommons

May 13

MLPerf Endpoints: decoupled client, any endpoint, zero-effort integration. Cloud or bare-metal — evaluated equally. Built for API-first GenAI. bit.ly/3Pjx34u #MLPerf

MLCommons

MLCommons @MLCommons

May 12

Great to see Microsoft highlighting the need for global collaboration on AI safety testing—and shouting out the MLCommons community’s ongoing work to expand the AILuminate benchmarks for multilingual and multimodal testing. bit.ly/3RdYFZG

Advancing AI evaluation with the Center for AI Standards (US) and Innovation and the AI Security...

Today, Microsoft is announcing new agreements with the Center for AI Standards and Innovation (CAISI) in the US and the AI Security Institute (AISI) in the UK to advance the science of AI testing and...

blogs.microsoft.com

104

MLCommons

MLCommons @MLCommons

May 12

The New Wave of AI in Healthcare 2026 symposium kicks off today in NYC! 5/13 at 10:50 AM, MLCommons' Andrew Gruen, PhD will be taking the stage. If you're attending, don't miss this conversation on trust, accountability, and AI validation in medicine. lnkd.in/efz2t-Ja

123

MLCommons

MLCommons @MLCommons

May 12

AI software optimization is now moving faster than hardware cycles. To capture these rapid gains, MLPerf is shifting to a rolling submission cadence. David Kanter explains why this speed matters for enterprise buyers via Nutanix: bit.ly/3R24FVt #MLPerf #AI

MLCommons’ AI performance benchmarking evolves to help IT teams make wise purchase decisions....

MLCommons cofounder David Kanter explains how the MLPerf benchmark has been overhauled to measure AI performance via API endpoints, reflecting the shift toward rented and hybrid AI infrastructure.

nutanix.com

113

MLCommons

MLCommons @MLCommons

May 11

Submissions for MLPerf Training v6.0 are open! This round brings updates, including the introduction of large-scale MoE pretraining architectures. Benchmarking on a single 8-GPU node or massive cluster, we want your results in this round. bit.ly/4uG3vNS

174

MLCommons

MLCommons @MLCommons

May 11

We're thrilled to welcome @flwrlabs to MLCommons to help shape standards for federated AI at scale. First up: MedPerf is integrating with Flower, enabling researchers to run federated clinical AI studies without moving sensitive patient data. More: bit.ly/4nt1x0T

120

MLCommons

MLCommons @MLCommons

May 8

Measuring today’s production workloads is getting harder. The Inference working group stepped up by adding GPT-OSS 120B, DeepSeek-R1, and our first text-to-video generation benchmark. mlcommons.org/2026/04/mlperf…

158

MLCommons

MLCommons @MLCommons

May 7

MoE benchmarking doesn't require a supercomputer. MLPerf Training v6.0 introduces GPT-OSS 20B: a sparse Mixture-of-Experts pretraining benchmark that can run on a single 8-GPU node. See how the task force engineered away statistical variance (CV < 5%): bit.ly/3QLwvVU

728

MLCommons

MLCommons @MLCommons

May 5

Mixture-of-Experts (MoE) is coming to MLPerf Training v6.0. The new DeepSeek-V3 large-scale pretraining benchmark captures critical innovations like MLA, fine-grained expert segmentation, and MTP at production scale (671B parameters). Technical details: bit.ly/49bRabO

591

MLCommons

MLCommons @MLCommons

Apr 30

Security theater vs. rigorous AI benchmarking - the difference is methodology. AILuminate Jailbreak v0.7: a mechanism-first taxonomy for single-turn jailbreak attacks. Defensible. Reproducible. Auditable. mlcommons.org/2026/02/jailbr… #AILuminate #AISecurity

154