Filter
Exclude
Time range
-
Near
Some deeper builder‑centric developments worth noticing. @0G_labs continues to expand its ecosystem & tooling - RPC node access via global infrastructure providers means AI apps can hit production faster on their AI‑native chain. DGrid AI’s open‑source repos like openbench & SDKs are getting updates, hinting at deeper support for model evaluation and integration tooling beyond just inference routing. @permacastapp /Arweave is increasingly talked about as the foundation for permanent content and memory layers in Web3, especially for governance, media, and AI data. And LightLink multi‑chain LL token now supports faster transfers and DAO governance, laying the groundwork for a more community‑driven future.
Gm CT There’s real progress at the intersection of blockchain and AI lately. @0G_labs isn’t just live, its Aristotle Mainnet launched with 100 ecosystem partners spanning wallets, oracles, cloud compute, and infrastructure providers, putting verifiable, decentralized AI workloads into real use. DGrid AI’s GitHub activity shows active development on SDKs and evaluation tooling, signaling they’re building core developer infrastructure (not just a gateway). @permacastapp Arweave are increasingly referenced together as the “digital memory layer” of Web3, content that can truly outlive platforms and stay accessible forever. And LightLink just completed its LL token migration to an OFT standard, unlocking cross‑chain governance and faster transfers across Ethereum, LightLink, and Base as a multi‑chain token.
21
20
194
My for mlx-openbench is working like a charm and crunching evals like crazy. My M3 Ultra are on fire! 🔥 Here testing Qwen3.5-35B-A3B-REAP-pile10k-15p-MLX-q8 by @dfi created with reap-mlx by @0xSero Don't ask me how this is possible, but REAP model is working better on evals 👀
8
2
62
9,448
I like OpenBench! I created 3 weeks ago a PR to integrate MLX as provider but it was still pending there, so I decided to close it and continue on my own fork 🤷🏻‍♂️ I renamed it in mlx-openbench and added a new mlxlocal provider that removes the need of having a separate mlx_lm.server running and uses Python API directly.
6
5
61
6,199
Replying to @bnjmn_marie
Still unclear, I got these results using OpenBench, but it seems mlx_lm.evaluate gives different results, I'm doing a round of tests there too. In any case 4bit MLX has some issues with qwen3.5 for sure. GGUF performs slower but better.
5
518
It took so much time running these evals with OpenBench... I will try to combine OpenBench and mlx_lm.evaluate!
8
1,264
🦞 Clawdbot News: New model dropped, and your Clawdbot doesn't know about it. You need to pay attention to MiroThinker-H1! It is designed for Agentic AI (Task driving AI) and Deep research. 👀It Beats Claude 4.6 on OpenBench! 👀It Beats Claude 4.6 on BrowseCompe-ZH 👀It Beats Claude 4.5 on FrontierScience -Olympiad
7
1
31
4,522
GSM8K eval in progress on two M3 Ultra to compare MLX 4bit and MLX ParoQuant 4bit using Openbench. Let's hope this 4bit quantization is truly better than standard one!
4
1
48
5,075
PR for a native OpenBench MLX provider created! 🚀github.com/groq/openbench/pu…
MLX: you can use OpenBench from groq with mlx_lm server and leverage parallel and batch inference easily! Just use vllm as provider and pass a dummy api-key. I bet a specific provider able to get stats from mlx server is easy to do! Let me try. Command to execute mmlu below.
1
1
12
1,816
MLX: you can use OpenBench from groq with mlx_lm server and leverage parallel and batch inference easily! Just use vllm as provider and pass a dummy api-key. I bet a specific provider able to get stats from mlx server is easy to do! Let me try. Command to execute mmlu below.
3
4
36
5,510
Replying to @ivanfioravanti
Gemini is... a choice. Also why not just OpenBench 👀
2
2
399
Replying to @fujikanaeda
i was some and i mentioned this being a possibility openbench formats the prompt like this, so you have either: A)<whitespace><solution> or A)<whitespace><whitespace><solution> and now it becomes weirder in terms of impact. lm-harness also has to do something like that
1
3
46
The thing you might be missing is that some frameworks (like openbench) might strip the whitespace But insane find
9
448
How effectively do AI models grasp real-world spatial environments? ETH Zurich, Microsoft Research, and UCAS researchers introduced OpenBench, an outdoor video benchmark with precise 3D data, to evaluate spatial reasoning in MLLMs. Findings: Models use language cues effectively indoors but struggle in dynamic open-world settings, highlighting gaps in visual-spatial intelligence. Paper: arxiv.org/abs/2512.19683 Project: mingrui-wu.github.io/osi-ben… Report: mp.weixin.qq.com/s/vLpQQGvqh… #AI
1
1
43
How well can AI models truly understand the space around us? Researchers from ETH Zurich, Microsoft Research, and UCAS reveal a major gap. They created OpenBench, a new outdoor video benchmark with precise 3D data, to test spatial reasoning. It shows that models rely on language guesses for indoor questions, but this fails completely in the complex, dynamic open world. Performance plummets, proving today's MLLMs lack genuine visual, grounded spatial intelligence. From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs Paper: arxiv.org/abs/2512.19683 Project: mingrui-wu.github.io/osi-ben… Our report: mp.weixin.qq.com/s/vLpQQGvqh… 📬 #PapersAccepted by Jiqizhixin
3
30
1,647
E.g. despite openbench showing magical throughput (and vllm too), checking vllm logs I get 8-14k tok/s when running the benchmarks. And also when running OpenBench itself, it’s much slower
1
1
4
391
we love openbench at this house as well
2
53
Replying to @aarush @nvidia
congrats!!! nvidia/openbench 👀
1
21
9,070
Introducing OpenBench, an accurate outdoor benchmark for evaluating spatial understanding ability of Multimodal Large Language Models (MLLMs). Paper: arxiv.org/abs/2512.19683 Project page: harmlesssr.github.io/openben…
1
4
142