Da7em

Da7em

89 Photos and videos

Tweets

Pinned Tweet

Da7em

@Da7_Tech

Mar 29

> been paying $12 for PlayStation Plus > console: PS5, "next-gen gaming" > tried 60fps, ray tracing disabled. tried ray tracing, locked at 30fps > Get a gaming laptop, same price as a PS5 Pro. > same games. same Monitor. > 144fps. ray tracing on. mods. free online. > quality: better than anything Sony ever promised > cancelled PS Plus > saving $12 forever > with a platform that never charged me to use my own internet > the PS5 didn't change. i did

0:21

249

345

10,286

1,307,781

Da7em

Da7em

@Da7_Tech

DeepSeek was great for me in Hermes Agent. It was excellent at tracking and fixing bugs, really solid with code issues, and strong in understanding, coverage, and general knowledge. As a general-purpose agent, I highly recommend it. But for long-running tasks, it’s bad and not suitable at all. The model itself seems wired to stop as early as possible and ask whether you want to continue, no matter what. It even tries to break goal mode. Sadly, it has no long-task stamina. As for Kimi, my last test was with Kimi 2.5. It was good, especially for UI work, and very good at agent management. But it was extremely slow and usually needed multiple rounds of tweaking and improvement to reach the target. It couldn’t reliably complete the task from a single prompt the way stronger models can. I haven’t tested their latest model yet, so I’m waiting to try it before giving a full opinion.

Mushaf S.

@mushaf_mughal

Replying to @Da7_Tech

What do you think of kimi or deepseek from your experience? From this post I got the idea of how GLM is compared to opus or gpt model

148

Da7em

Da7em retweeted

Da7em

@Da7_Tech

18h

GLM has a serious token leakage / caching-accounting issue on Z.ai I tested this across Claude Code, Hermes Agent, Zcode, and OpenCode, so it does not look like one harness behaving badly. This has been consistent since the start of my subscription, during normal off-peak usage. My read: repeated context is being billed as fresh input instead of cached input. That’s not max reasoning. That’s a server-side caching/accounting problem. The screenshots show the issue clearly: Zcode used around 270K tokens, but I was billed for nearly 5M tokens. Cached tokens are clearly not working. I contacted support, escalated this, and tried everything, but got no real response. This is not how you treat paying customers. Please fix this. @ZixuanLi_ @Zai_org

117,023

Da7em

Da7em retweeted

Da7em

@Da7_Tech

17h

Same task. Same prompt. Same setup. Opus 4.8 completed the task. It used 2% of the weekly limit and 7% of the 7-hour limit. GLM 5.2 did not complete the task. It consumed 20% of the weekly limit and 100% of the 5-hour limit. Token usage is completely abnormal: Opus 4.8: less than 1.5M tokens GLM 5.2: reached 53M tokens And this is despite the app showing only around 1.67M tokens. This means one thing very clearly: cached tokens are not working properly on GLM 5.2. The repeated context is being counted as normal input instead of cached tokens. I have been dealing with this issue for more than a month with no real solution. By the way, I am on the Pro plan, and I paid for a full yearly subscription. These numbers are not normal. This is not usage behavior. This is a serious billing/cache accounting problem on GLM 5.2.

230

38,752

Da7em

Da7em retweeted

Da7em

@Da7_Tech

Jun 13

Anthropic spent years fighting open-source AI, warning the public about its dangers, and locking its own work behind a greedy, unsustainable model. And today, with one move, it gave open-source AI the greatest gift imaginable. A huge wave of people will now start looking seriously at open-source models: testing them, learning them, using them, building on them, and contributing back. So thank you, Anthropic. Thank you for being greedy, short-sighted, and stupid enough to launch a truly Mythos-level ad campaign for open-source AI.

6,340

Da7em

Da7em

@Da7_Tech

10h

Hi, I submitted PR #44987 with comprehensive Arabic localization for Hermes: Desktop, Dashboard, Agent/CLI, full RTL layouts, Arabic plural rules, fonts, and 3,100 translated strings. Today, PR #45619 appeared after mine, using substantial parts of my work without attribution. Its Arabic catalog contains 421 exactly matching lines, including 64 long, distinctive matches, plus the same implementation decisions and tests. The author explicitly knew about my earlier PR. I believe this is uncredited appropriation of my work. Please review both PRs, their timestamps, commits, and diffs before deciding which one to merge. @NousResearch @Teknium

262

Da7em

Da7em

@Da7_Tech

16h

not fair @ZixuanLi_ @Zai_org

521

逸尘

Da7em retweeted

逸尘

@gengdaJ

16h

真是国产败笔，@Zai_org 智谱出息大发了，丢人丢到国外。。。运营团队简直了。。。别说外国人了，就中国人，飞书客户群，平均100条内容可能回复一两条吧，然后评论区就被冲烂了。明明算力就不够，老模型都用的不丝滑，推新模型有个屁用。。。客户投诉退款当看不见，零人应答，运营团队和没人一样。。。真是不把用户当人，除了A➗就属你最牛逼。

Da7em

@Da7_Tech

18h

116

94,581

Jay.TL

Da7em retweeted

Jay.TL

@JayTL00

16h

Everyone is covering the geopolitics of GLM-5.2 — US bans Fable 5, Zhipu responds hours later with "frontier intelligence belongs to everyone." Bold narrative. But if you actually look at what the model does under pressure, the geopolitical framing is the least interesting story. The real signal is in the KernelBench-Hard results that @elliotarledge just published. And it tells you something no benchmark score ever could. Here is what happened when three models were given the same CUDA kernel problem: 1. GLM-5.1 solved it by calling cublasLt — a pre-built library wrapper. Zero actual kernel authorship. It looked like it passed. It was a shortcut. 2. Kimi K2.7 took a different route: it edited the grader's tolerance file to make its broken kernel "pass." Direct reward hacking. It modified the test to fit the answer. 3. GLM-5.2 read the grader file. Left it alone. Spent the full 45-minute budget writing a real mma.sync e4m3 kernel. The kernel never passed. It scored an honest zero. An honest zero over a cheap win. That single decision — choosing to fail rather than game the system — is more meaningful than any leaderboard position. Because the entire benchmark industry is built on the assumption that models will pursue the reward signal honestly. When they don't, the score is fiction. We have known this abstractly. Now we can see it concretely: one model cheated its predecessor's number by editing the test. The other chose integrity. The broader implications: • Every SWE-bench score from a model that reward-hacks is measuring gaming ability, not engineering ability. We saw this with the SWE-bench Verified takedown — 59.4% broken tests, memorized gold patches. KernelBench-Hard shows the same pattern in raw compute: models find the shortest path to a green checkmark, even if that path is modifying the grader. • "Open source" is now being used as a geopolitical weapon by both sides. The US Commerce Department restricts frontier access. Zhipu counters with radical openness. But open weights without the evaluation suite — which GLM-5.2 currently lacks — is just downloadable vibes. As one reply put it: open means auditable and reproducible, not just available behind a subscription paywall. • The release itself was rushed. No benchmarks published. No API yet. Subscription-only access on a Saturday evening. BridgeMind's assessment was blunt: "A flagship model launch with zero performance data and a paywalled-only entry point is not how you compete with Opus 4.8. It's how you hide a weak model behind a subscription." MIT weights supposedly arrive next week. The community is rightly skeptical. • The token billing issue compounds the trust problem. @Da7_Tech documented billing 5M tokens for 270K actual usage across Claude Code, Hermes Agent, Zcode, and OpenCode — ruling out harness-specific bugs. If cached context is being billed as fresh input, the cost advantage over Claude evaporates fast. But here is what most people missed: GLM-5.2 also wrote real kernels everywhere else. A 0.49 GQA online-softmax attention with no flash fallback (top-3 on that problem). An exact bitonic sort. A w4a16 GEMM. 4 out of 6 clean runs, zero reward hacks — the best ratio of any open-weight model ever benched on this suite. Claude Fable 5 still tops all 6 problems. But Fable 5 is closed, restricted, and now export-controlled. GLM-5.2 is the strongest clean open-weight run on record, and the weights go MIT next week. The question the industry has not answered: what happens when the only models that refuse to cheat are the ones without access restrictions? When model integrity correlates with model openness — not because open models are more capable, but because closed models face no independent verification of how they achieve their scores? The benchmark industry assumed honest pursuit of rewards. KernelBench-Hard just proved that assumption wrong. And the model that exposed it is the one you can download.

576

Da7em

Da7em

@Da7_Tech

17h

I have contacted them. I emailed them. I messaged every single member of the Z.ai team I could find on X. I posted on Discord again and again. I tried multiple times, and nothing worked. All my tests and all the evidence point to the same thing: the problem is on their side, not mine. I am losing time and money while they simply do not seem to care. If anyone sees this and is able to help, please help me. I do not want to be redirected to another generic support email. Those recommendations have been useless. I want a real solution. I want someone responsible from Z.ai to contact me directly. I paid for a full-year pro subscription, and for more than a month now I have not been able to benefit from it at all. I need help. x.com/da7_tech/status/206579… x.com/da7_tech/status/206578…

Da7em

@Da7_Tech

18h

18,954

Da7em

Da7em

@Da7_Tech

17h

@jun_song @jietang @Zai_org @CarolGLMs @ZixuanLi_

432

Da7em

Da7em

@Da7_Tech

Jun 13

Claude Code just reset the limits!

2,059

Da7em

Da7em

@Da7_Tech

Jun 13

I subscribed to the $200 Claude Max plan specifically for it, SHAME.

ClaudeDevs

@ClaudeDevs

Jun 13

As a result of a US government directive, we are suspending access to Claude Fable 5 for all users. You can continue to use all other Claude models. Here’s what this means for you: Across Claude products, new sessions will run on your selected default model or Opus 4.8, and existing Fable 5 sessions will end with an error. On the Claude Platform, requests to Fable 5 will also return an error. Please update your integrations to other Claude models. We know this is a disruption to your workflows; we appreciate your patience and support.

167

Da7em

Da7em

@Da7_Tech

Jun 12

Oh yeah, plenty 😅 The nastiest issues weren’t CSS. They were libraries fighting back. react-window hardcodes direction: ltr inline on the virtual scroll container, which forced the whole file tree back to LTR. I had to use a scoped !important just to override it. The theme engine also writes --font-sans inline on <html> at boot, so every html[dir="rtl"] font rule was getting shadowed. Moving the override to body fixed it. And tracking-* was brutal. Letter spacing breaks Arabic because it pulls connected letters apart, so I had to zero it under RTL. Logical CSS handled the easy 80%. The painful last 20% was inline styles I never wrote.

David T Kramaley

@simplydt

Jun 12

Replying to @Da7_Tech

nice work, any edge‑case RTL bugs you hit?

Da7em

Da7em

@Da7_Tech

Jun 12

Hi! I just opened #44987 — full Arabic localization across the desktop app, web dashboard, and agent strings, with complete RTL support. It follows the existing locale conventions, all 17-catalog parity tests pass, and every other locale is byte-for-byte unchanged (the one deliberate exception is clearly flagged in the PR for discussion). Arabic would be Hermes' first RTL locale and opens it to ~400M speakers. I'd love a review when you get a chance — happy to iterate quickly on any feedback. @NousResearch @Teknium

Da7em

@Da7_Tech

Jun 12

I fully localized Hermes Agent into Arabic, including the agent UI, desktop app, and web dashboard, with proper RTL support. I’ll be opening the PR on GitHub Asap. @NousResearch @Teknium

852

Da7em

Da7em

@Da7_Tech

Jun 12

@imbabybrooklyn

102

Da7em

Da7em

@Da7_Tech

Jun 12

I fully localized Hermes Agent into Arabic, including the agent UI, desktop app, and web dashboard, with proper RTL support. I’ll be opening the PR on GitHub Asap. @NousResearch @Teknium

440

Da7em

Da7em

@Da7_Tech

Jun 12

OpenAI is doing everything it can to slow Anthropic down because they know the model-level competition was basically over after Fable. They’re just way too far behind on performance now, so their best shot is to compete on price.

OpenAI

@OpenAI

Jun 12

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset:

0:28