Building projects, testing AI agents, and playing with new AI models.

Joined May 2026
16 Photos and videos
GLM-5.2 vs GLM-5.1: what changed. • Context: 1M vs 203K (5×) • Max output: 131K vs 64K • Pricing: $18/month (Lite) vs $18/month (Lite) — same plan tiers • Effort presets: High/Max vs reasoning variants • Open-source: MIT next week vs MIT available now for 5.1 No independent benchmarks for 5.2 yet.
8
GLM-5.2 live on all GLM Coding Plan tiers: • Lite — ~$18/month, ~400 prompts/week • Pro — ~2,000 prompts/week • Max — ~8,000 prompts/week • Team — seat-based Model IDs: glm-5.2 (standard), glm-5.2[1m] (1M context). Max tokens: 131,072 for full PR-scale diffs. docs.z.ai/devpack/latest-mod…
50
MiniMax M3: 34.8% on SWE-fficiency; that's a 10-point improvement over MiniMax M2.7 on a benchmark that measures engineering effort across multiple patches per issue. Combine with 59.0% SWE-Bench Pro and 66.0% Terminal-Bench 2.1, and M3 is the strongest open-weight coding model as of June 2026.
55
MiniMax M3 vs Claude Opus 4.7 on coding price: • SWE-Bench Pro: 59.0% vs ~69% • BrowseComp: 83.5 vs 79.3 • Input price: $0.30/M vs $3.00/M (OpenRouter) • Output speed: ~100 tok/s vs ~30 tok/s at long context 11× cheaper input, faster decode, competitive on agentic browsing. Trade-off on hardest SWE tasks.
28
GLM-5.2 vs GPT-5.5 (closed source): • Context: 1M vs 256K • SWE-bench Verified: 77.8% vs ~80% • GPQA: 94 vs 96 • Input price: $0.60/M vs $15/M (25x cheaper) • License: MIT vs proprietary 5.2 offers 80% of GPT-5.5's coding ability at 4% of the cost.
1
288
GLM-5.2 vs Kimi K2.7 Code (both open-weight coding): • Context: 1M vs 256K • MMLU: 96 vs not reported • SWE-bench Verified: 77.8% vs Kimi K2.6 ~60% • License: MIT vs Modified MIT • Input price: $0.60/M vs $0.50/M GLM-5.2 wins on coding benchmarks and context length. Source: artificialanalysis.ai/models…

295
GLM-5.2 is now available to all GLM Coding Plan users: Lite, Pro, Max, Team. The model supports 1M-token context, up from 200K in GLM-5. MIT licensing next week means you can run it locally with llama.cpp or vLLM. z.ai/subscribe

152
GLM-5.2: 1M-context coding model now live. API and chatbot services launching next week. Designed for large-scale software development with integrated debugging. Based on the same 744B MoE architecture as GLM-5, which achieved: • #1 open model on LMArena Text Arena (1452 ELO) • #1 open model on Code Arena • 50 on Artificial Analysis Intelligence Index Docs: docs.z.ai/devpack/latest-mod…
139
Z.ai releases GLM-5.2: a coding-focused model with 1M context. Already available via API on OpenRouter, Together.ai, Fireworks. Open-source weights releasing next week on HuggingFace. Family performance: GLM-5 scored 77.8% SWE-bench Verified, 90% HumanEval, 86% GPQA-Diamond. Docs: docs.z.ai/devpack/latest-mod…

130
MiniMax M3 just hit 59.0% on SWE-Bench Pro — open-weight, beating GPT‑5.5 (~58.6%) and Gemini 3.1 Pro (~54.2%). Only ~10 points behind Claude Opus 4.7 at $0.30/M input on OpenRouter. Also 92.9% on GPQA Diamond, 83.5 on BrowseComp (above Opus 4.7's 79.3). The ceiling for open coding agents moved. llm-stats.com/models/minimax…
57
Congrats to @MiniMax on the launch of MiniMax M3. First open-weight model combining frontier coding, 1M-token context, and native multimodal input. Powered by MiniMax Sparse Attention (MSA), delivering 9× faster prefill and 15× faster decode at 1M tokens vs prior gen. • 428B total params, ~23B active per token • 59.0% on SWE-Bench Pro (beats GPT‑5.5) • $0.30/M input on OpenRouter artificialanalysis.ai/articl…
39
Kimi K2.7 Code vs K2.6 (Moonshot's own internal metrics): • Kimi Code Bench v2: 62.0 vs 50.9 ( 21.8%) • Thinking-token usage: 30% less vs K2.6 • Context window: both 256K • Architecture: 1T MoE, 32B active, same design K2.7 Code is a focused improvement for agentic coding, not a general model revamp.
48
Kimi K2.7 Code on Kimi Code Bench v2: 62.0, up from K2.6's 50.9 — a 21.8% relative improvement. The gap to GPT-5.5 on this benchmark shrank from 18 points (K2.6 era) to just 7 points. All with 1T MoE, 32B active, 256K context.
58
Kimi K2.7 Code vs Claude Opus 4.8 on agentic coding: • MCPMark Verified: 81.1% vs 76.4% • Kimi Code Bench v2: 62.0 vs not published • Pricing: unknown for K2.7 (similar to K2.6) vs Opus $5/$25 per M tok • License: Modified MIT open-weights vs closed K2.7 beats Opus on tool use at a fraction of the cost.
53
Kimi K2.7 Code just hit 81.1% on MCPMark Verified, beating Claude Opus 4.8's 76.4%. That's 4.7 points higher on tool-calling accuracy and K2.7 is open-source under Modified MIT. Opus 4.8 costs $5/$25 per million tokens. Moonshot AI's internal Kimi Code Bench v2 also shows a 21.8% relative improvement (62.0 vs K2.6's 50.9).
78
Kimi K2.7 Code (open-weights) vs GPT-5.5 (closed): • Kimi Code Bench v2: 62.0 (gap 7 pts to GPT-5.5) • MCPMark Verified: 81.1% (not tested on GPT-5.5) • Context: 256K vs 1M • License: Modified MIT vs proprietary • Pricing: K2.7 similar to K2.6 (~$0.15/$2.50 per M tok for K2) vs GPT-5.5 ~$3/$12 per M tok Open model narrowing the gap fast.
130
Congrats to @Kimi_Moonshot on the launch of Kimi K2.7 Code. A 1T-parameter MoE model (~32B active per token) with 256K context, released under Modified MIT license on HuggingFace. Scored 81.1% on MCPMark Verified, beating Claude Opus 4.8 (76.4%). On Kimi Code Bench v2 it hits 62.0, up from K2.6's 50.9. x.com/KimiDevs/status/206540…

Meet Kimi-K2.7-Code 👀 Here’s what developers should know to fully unlock K2.7-Code potential:
98
OpenAI just fixed one of the most frustrating parts of using Codex. You no longer have to wait for a rate limit reset if you don't want to. Starting today, Go, Plus, Pro, and Business users get a free reset they can save and use whenever they need it. AI tools keep getting more user-friendly. x.com/OpenAI/status/20652253…

Jun 12
We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:
75
Who else runs multiple AI coding agents in parallel? One for planning, one for implementation, one for review? I want to hear your orchestration setups.
12