Enkay❄️

Enkay❄️

Users
Tweets

Enkay❄️

@enkaygift

Mar 19

Some deeper builder‑centric developments worth noticing. @0G_labs continues to expand its ecosystem & tooling - RPC node access via global infrastructure providers means AI apps can hit production faster on their AI‑native chain. DGrid AI’s open‑source repos like openbench & SDKs are getting updates, hinting at deeper support for model evaluation and integration tooling beyond just inference routing. @permacastapp /Arweave is increasingly talked about as the foundation for permanent content and memory layers in Web3, especially for governance, media, and AI data. And LightLink multi‑chain LL token now supports faster transfers and DAO governance, laying the groundwork for a more community‑driven future.

Enkay❄️

@enkaygift

Mar 19

Gm CT There’s real progress at the intersection of blockchain and AI lately. @0G_labs isn’t just live, its Aristotle Mainnet launched with 100 ecosystem partners spanning wallets, oracles, cloud compute, and infrastructure providers, putting verifiable, decentralized AI workloads into real use. DGrid AI’s GitHub activity shows active development on SDKs and evaluation tooling, signaling they’re building core developer infrastructure (not just a gateway). @permacastapp Arweave are increasingly referenced together as the “digital memory layer” of Web3, content that can truly outlive platforms and stay accessible forever. And LightLink just completed its LL token migration to an OFT standard, unlocking cross‑chain governance and faster transfers across Ethereum, LightLink, and Base as a multi‑chain token.

194

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 16

Link to the mlx-openbench branch is here: github.com/ivanfioravanti/ml…

GitHub - ivanfioravanti/mlx-openbench: Provider-agnostic, open-source evaluation infrastructure for...

Provider-agnostic, open-source evaluation infrastructure for language models - ivanfioravanti/mlx-openbench

github.com

898

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 16

My for mlx-openbench is working like a charm and crunching evals like crazy. My M3 Ultra are on fire! 🔥 Here testing Qwen3.5-35B-A3B-REAP-pile10k-15p-MLX-q8 by @dfi created with reap-mlx by @0xSero Don't ask me how this is possible, but REAP model is working better on evals 👀

9,448

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 15

I like OpenBench! I created 3 weeks ago a PR to integrate MLX as provider but it was still pending there, so I decided to close it and continue on my own fork 🤷🏻‍♂️ I renamed it in mlx-openbench and added a new mlxlocal provider that removes the need of having a separate mlx_lm.server running and uses Python API directly.

0:28

6,199

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 14

Replying to @bnjmn_marie

Still unclear, I got these results using OpenBench, but it seems mlx_lm.evaluate gives different results, I'm doing a round of tests there too. In any case 4bit MLX has some issues with qwen3.5 for sure. GGUF performs slower but better.

518

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 13

It took so much time running these evals with OpenBench... I will try to combine OpenBench and mlx_lm.evaluate!

1,264

David Hendrickson

David Hendrickson

@TeksEdge

Mar 11

🦞 Clawdbot News: New model dropped, and your Clawdbot doesn't know about it. You need to pay attention to MiroThinker-H1! It is designed for Agentic AI (Task driving AI) and Deep research. 👀It Beats Claude 4.6 on OpenBench! 👀It Beats Claude 4.6 on BrowseCompe-ZH 👀It Beats Claude 4.5 on FrontierScience -Olympiad

ALT https://github.com/MiroMindAI/MiroThinker

4,522

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 9

GSM8K eval in progress on two M3 Ultra to compare MLX 4bit and MLX ParoQuant 4bit using Openbench. Let's hope this 4bit quantization is truly better than standard one!

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 9

x.com/i/article/203100268191…

5,075

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Mar 9

x.com/i/article/203100268191…

12,847

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Feb 21

PR for a native OpenBench MLX provider created! 🚀github.com/groq/openbench/pu…

Add MLX provider with local defaults and stats metadata by ivanfioravanti · Pull Request #347 ·...

Summary Additional MLX Provider added What are you adding? Bug fix (non-breaking change which fixes an issue) New benchmark/evaluation New model provider CLI enhancement Performance improvem...

github.com

Ivan Fioravanti ᯅ

@ivanfioravanti

Feb 21

MLX: you can use OpenBench from groq with mlx_lm server and leverage parallel and batch inference easily! Just use vllm as provider and pass a dummy api-key. I bet a specific provider able to get stats from mlx server is easy to do! Let me try. Command to execute mmlu below.

2:18

1,816

Ivan Fioravanti ᯅ

Ivan Fioravanti ᯅ

@ivanfioravanti

Feb 21

2:18

5,510

Zach Mueller

Zach Mueller

@TheZachMueller

Feb 21

Replying to @ivanfioravanti

Gemini is... a choice. Also why not just OpenBench 👀

399

Florian Brand

Florian Brand

@xeophon

Jan 16

Replying to @fujikanaeda

i was some and i mentioned this being a possibility openbench formats the prompt like this, so you have either: A)<whitespace><solution> or A)<whitespace><whitespace><solution> and now it becomes weirder in terms of impact. lm-harness also has to do something like that

Florian Brand

Florian Brand

@xeophon

Jan 15

Replying to @xeophon @fujikanaeda

The thing you might be missing is that some frameworks (like openbench) might strip the whitespace But insane find

448

Love Web3 World

Love Web3 World

@WebThreeAI

Jan 14

How effectively do AI models grasp real-world spatial environments? ETH Zurich, Microsoft Research, and UCAS researchers introduced OpenBench, an outdoor video benchmark with precise 3D data, to evaluate spatial reasoning in MLLMs. Findings: Models use language cues effectively indoors but struggle in dynamic open-world settings, highlighting gaps in visual-spatial intelligence. Paper: arxiv.org/abs/2512.19683 Project: mingrui-wu.github.io/osi-ben… Report: mp.weixin.qq.com/s/vLpQQGvqh… #AI

机器之心 JIQIZHIXIN

机器之心 JIQIZHIXIN

@jiqizhixin

Jan 14

How well can AI models truly understand the space around us? Researchers from ETH Zurich, Microsoft Research, and UCAS reveal a major gap. They created OpenBench, a new outdoor video benchmark with precise 3D data, to test spatial reasoning. It shows that models rely on language guesses for indoor questions, but this fails completely in the complex, dynamic open world. Performance plummets, proving today's MLLMs lack genuine visual, grounded spatial intelligence. From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs Paper: arxiv.org/abs/2512.19683 Project: mingrui-wu.github.io/osi-ben… Our report: mp.weixin.qq.com/s/vLpQQGvqh… 📬 #PapersAccepted by Jiqizhixin

1,647

Zach Mueller

Zach Mueller

@TheZachMueller

Jan 10

E.g. despite openbench showing magical throughput (and vllm too), checking vllm logs I get 8-14k tok/s when running the benchmarks. And also when running OpenBench itself, it’s much slower

391

Florian Brand

Florian Brand

@xeophon

Jan 10

Replying to @TheZachMueller @llm_wizard

we love openbench at this house as well

Florian Brand

Florian Brand

@xeophon

Jan 5

Replying to @aarush @nvidia

congrats!!! nvidia/openbench 👀

9,070

Fangjinhua Wang

Fangjinhua Wang @FangjinhuaWang

25 Dec 2025

Introducing OpenBench, an accurate outdoor benchmark for evaluating spatial understanding ability of Multimodal Large Language Models (MLLMs). Paper: arxiv.org/abs/2512.19683 Project page: harmlesssr.github.io/openben…

142