Dev Sukhendu

Dev Sukhendu

16 Photos and videos

Tweets

Dev Sukhendu

@devsukhendu

GLM-5.2 vs GLM-5.1: what changed. • Context: 1M vs 203K (5×) • Max output: 131K vs 64K • Pricing: $18/month (Lite) vs $18/month (Lite) — same plan tiers • Effort presets: High/Max vs reasoning variants • Open-source: MIT next week vs MIT available now for 5.1 No independent benchmarks for 5.2 yet.

Dev Sukhendu

Dev Sukhendu

@devsukhendu

GLM-5.2 live on all GLM Coding Plan tiers: • Lite — ~$18/month, ~400 prompts/week • Pro — ~2,000 prompts/week • Max — ~8,000 prompts/week • Team — seat-based Model IDs: glm-5.2 (standard), glm-5.2[1m] (1M context). Max tokens: 131,072 for full PR-scale diffs. docs.z.ai/devpack/latest-mod…

How to Switch Models - Overview - Z.AI DEVELOPER DOCUMENT

docs.z.ai

Dev Sukhendu

Dev Sukhendu

@devsukhendu

14h

MiniMax M3: 34.8% on SWE-fficiency; that's a 10-point improvement over MiniMax M2.7 on a benchmark that measures engineering effort across multiple patches per issue. Combine with 59.0% SWE-Bench Pro and 66.0% Terminal-Bench 2.1, and M3 is the strongest open-weight coding model as of June 2026.

Dev Sukhendu

Dev Sukhendu

@devsukhendu

14h

MiniMax M3 vs Claude Opus 4.7 on coding price: • SWE-Bench Pro: 59.0% vs ~69% • BrowseComp: 83.5 vs 79.3 • Input price: $0.30/M vs $3.00/M (OpenRouter) • Output speed: ~100 tok/s vs ~30 tok/s at long context 11× cheaper input, faster decode, competitive on agentic browsing. Trade-off on hardest SWE tasks.

Dev Sukhendu

Dev Sukhendu

@devsukhendu

16h

GLM-5.2 vs GPT-5.5 (closed source): • Context: 1M vs 256K • SWE-bench Verified: 77.8% vs ~80% • GPQA: 94 vs 96 • Input price: $0.60/M vs $15/M (25x cheaper) • License: MIT vs proprietary 5.2 offers 80% of GPT-5.5's coding ability at 4% of the cost.

288

Dev Sukhendu

Dev Sukhendu

@devsukhendu

17h

GLM-5.2 vs Kimi K2.7 Code (both open-weight coding): • Context: 1M vs 256K • MMLU: 96 vs not reported • SWE-bench Verified: 77.8% vs Kimi K2.6 ~60% • License: MIT vs Modified MIT • Input price: $0.60/M vs $0.50/M GLM-5.2 wins on coding benchmarks and context length. Source: artificialanalysis.ai/models…

295

Dev Sukhendu

Dev Sukhendu

@devsukhendu

18h

GLM-5.2 is now available to all GLM Coding Plan users: Lite, Pro, Max, Team. The model supports 1M-token context, up from 200K in GLM-5. MIT licensing next week means you can run it locally with llama.cpp or vLLM. z.ai/subscribe

152

Dev Sukhendu

Dev Sukhendu

@devsukhendu

18h

GLM-5.2: 1M-context coding model now live. API and chatbot services launching next week. Designed for large-scale software development with integrated debugging. Based on the same 744B MoE architecture as GLM-5, which achieved: • #1 open model on LMArena Text Arena (1452 ELO) • #1 open model on Code Arena • 50 on Artificial Analysis Intelligence Index Docs: docs.z.ai/devpack/latest-mod…

How to Switch Models - Overview - Z.AI DEVELOPER DOCUMENT

docs.z.ai

139

Dev Sukhendu

Dev Sukhendu

@devsukhendu

18h

Z.ai releases GLM-5.2: a coding-focused model with 1M context. Already available via API on OpenRouter, Together.ai, Fireworks. Open-source weights releasing next week on HuggingFace. Family performance: GLM-5 scored 77.8% SWE-bench Verified, 90% HumanEval, 86% GPQA-Diamond. Docs: docs.z.ai/devpack/latest-mod…

130

Dev Sukhendu

Dev Sukhendu

@devsukhendu

19h

MiniMax M3 just hit 59.0% on SWE-Bench Pro — open-weight, beating GPT‑5.5 (~58.6%) and Gemini 3.1 Pro (~54.2%). Only ~10 points behind Claude Opus 4.7 at $0.30/M input on OpenRouter. Also 92.9% on GPQA Diamond, 83.5 on BrowseComp (above Opus 4.7's 79.3). The ceiling for open coding agents moved. llm-stats.com/models/minimax…

MiniMax M3 Benchmarks, Pricing & Context Window

MiniMax M3 is the first open-weight model to combine three frontier capabilities: top-tier coding and agentic performance, a 1M-token context window, and native multimodality. It is powered by...

llm-stats.com

Dev Sukhendu

Dev Sukhendu

@devsukhendu

22h

Congrats to @MiniMax on the launch of MiniMax M3. First open-weight model combining frontier coding, 1M-token context, and native multimodal input. Powered by MiniMax Sparse Attention (MSA), delivering 9× faster prefill and 15× faster decode at 1M tokens vs prior gen. • 428B total params, ~23B active per token • 59.0% on SWE-Bench Pro (beats GPT‑5.5) • $0.30/M input on OpenRouter artificialanalysis.ai/articl…

MiniMax-M3: Leading open weights model, once the weights are released

MiniMax-M3 scores 55 on the Artificial Analysis Intelligence Index. Once the weights are released, it will be the leading open weights model

artificialanalysis.ai

Dev Sukhendu

Dev Sukhendu

@devsukhendu

23h

Kimi K2.7 Code vs K2.6 (Moonshot's own internal metrics): • Kimi Code Bench v2: 62.0 vs 50.9 ( 21.8%) • Thinking-token usage: 30% less vs K2.6 • Context window: both 256K • Architecture: 1T MoE, 32B active, same design K2.7 Code is a focused improvement for agentic coding, not a general model revamp.

Dev Sukhendu

Dev Sukhendu

@devsukhendu

23h

Kimi K2.7 Code on Kimi Code Bench v2: 62.0, up from K2.6's 50.9 — a 21.8% relative improvement. The gap to GPT-5.5 on this benchmark shrank from 18 points (K2.6 era) to just 7 points. All with 1T MoE, 32B active, 256K context.

Dev Sukhendu

Dev Sukhendu

@devsukhendu

Jun 13

Kimi K2.7 Code vs Claude Opus 4.8 on agentic coding: • MCPMark Verified: 81.1% vs 76.4% • Kimi Code Bench v2: 62.0 vs not published • Pricing: unknown for K2.7 (similar to K2.6) vs Opus $5/$25 per M tok • License: Modified MIT open-weights vs closed K2.7 beats Opus on tool use at a fraction of the cost.

Dev Sukhendu

Dev Sukhendu

@devsukhendu

Jun 12

Kimi K2.7 Code just hit 81.1% on MCPMark Verified, beating Claude Opus 4.8's 76.4%. That's 4.7 points higher on tool-calling accuracy and K2.7 is open-source under Modified MIT. Opus 4.8 costs $5/$25 per million tokens. Moonshot AI's internal Kimi Code Bench v2 also shows a 21.8% relative improvement (62.0 vs K2.6's 50.9).

Dev Sukhendu

Dev Sukhendu

@devsukhendu

Jun 12

Kimi K2.7 Code (open-weights) vs GPT-5.5 (closed): • Kimi Code Bench v2: 62.0 (gap 7 pts to GPT-5.5) • MCPMark Verified: 81.1% (not tested on GPT-5.5) • Context: 256K vs 1M • License: Modified MIT vs proprietary • Pricing: K2.7 similar to K2.6 (~$0.15/$2.50 per M tok for K2) vs GPT-5.5 ~$3/$12 per M tok Open model narrowing the gap fast.

130

Dev Sukhendu

Dev Sukhendu

@devsukhendu

Jun 12

Congrats to @Kimi_Moonshot on the launch of Kimi K2.7 Code. A 1T-parameter MoE model (~32B active per token) with 256K context, released under Modified MIT license on HuggingFace. Scored 81.1% on MCPMark Verified, beating Claude Opus 4.8 (76.4%). On Kimi Code Bench v2 it hits 62.0, up from K2.6's 50.9. x.com/KimiDevs/status/206540…

Kimi Developers

@KimiDevs

Jun 12

Meet Kimi-K2.7-Code 👀 Here’s what developers should know to fully unlock K2.7-Code potential：

Dev Sukhendu

Dev Sukhendu

@devsukhendu

Jun 12

OpenAI just fixed one of the most frustrating parts of using Codex. You no longer have to wait for a rate limit reset if you don't want to. Starting today, Go, Plus, Pro, and Business users get a free reset they can save and use whenever they need it. AI tools keep getting more user-friendly. x.com/OpenAI/status/20652253…

OpenAI

@OpenAI

Jun 12

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:

0:28

Dev Sukhendu

Dev Sukhendu

@devsukhendu

Jun 12

Who else runs multiple AI coding agents in parallel? One for planning, one for implementation, one for review? I want to hear your orchestration setups.