Miyamoto Musashi

Miyamoto Musashi

63 Photos and videos

Tweets

Miyamoto Musashi @solminded

Jun 2

Hey @OpenAI all codex cli and codex app returns the same error response (with 5.5 model only) { "type": "error", "error": { "type": "image_generation_user_error", "code": "invalid_value", "message": "The model 'gpt-image-2' does not exist.", "param": "tools" }, "status": 400 }

486

Wes Bos

Miyamoto Musashi retweeted

Wes Bos

@wesbos

Mar 31

Claude Code leaked their source map, effectively giving you a look into the codebase. I immediately went for the one thing that mattered: spinner verbs There are 187

721

1,872

26,402

2,210,441

Miyamoto Musashi

Miyamoto Musashi @solminded

Mar 24

crewai, hermes, openclaw.. if you are using any of these or an agent framework make sure not to have installed version 1.82.8 of litellm

Andrej Karpathy

@karpathy

Mar 24

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

BURKOV

Miyamoto Musashi retweeted

BURKOV

@burkov

Feb 17

LLMs process text from left to right — each token can only look back at what came before it, never forward. This means that when you write a long prompt with context at the beginning and a question at the end, the model answers the question having "seen" the context, but the context tokens were generated without any awareness of what question was coming. This asymmetry is a basic structural property of how these models work. The paper asks what happens if you just send the prompt twice in a row, so that every part of the input gets a second pass where it can attend to every other part. The answer is that accuracy goes up across seven different benchmarks and seven different models (from the Gemini, ChatGPT, Claude, and DeepSeek series of LLMs), with no increase in the length of the model's output and no meaningful increase in response time — because processing the input is done in parallel by the hardware anyway. There are no new losses to compute, no finetuning, no clever prompt engineering beyond the repetition itself. The gap between this technique and doing nothing is sometimes small, sometimes large (one model went from 21% to 97% on a task involving finding a name in a list). If you are thinking about how to get better results from these models without paying for longer outputs or slower responses, that's a fairly concrete and low-effort finding. Read with AI tutor: chapterpal.com/s/1b15378b/pr… Get the PDF: arxiv.org/pdf/2512.14982

386

1,108

11,558

2,989,252

fmdz

Miyamoto Musashi retweeted

fmdz

@fmdz387

Jan 25

Clawd disaster incoming if this trend of hosting ClawdBot on VPS instances keeps up, along with people not reading the docs and opening ports with zero auth... I'm scared we're gonna have a massive credentials breach soon and it can be huge This is just a basic scan of instances hosting clawdbot with open gateway ports and a lot of them have 0 auth

409

750

6,971

1,627,000

JingyuanLiu

Miyamoto Musashi retweeted

JingyuanLiu

@JingyuanLiu123

13 Sep 2025

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes in large flops run, thus invented so many stability-related tricks, including all kinds of soft-cap, MuP, and spectral norm control tricks - Treats predictabilities more seriously. Check GPT 4 report for reference, even trying to predict the eval task performances - Because of the stability and predictability ask, treats hyper-params and optimization more seriously - Generally believe more in data, optimization than arch China labs be like: - has very limited GPUs, e.g. k2 in 4k GPU and v3 in 2k GPU - as a result, pushing for the limit of pretrain modeling-infra co-design, see so many tricks in V3, and K2 has some cool stuff too (the offload trick helps remove the stupid MoE gating constrain and only uses EP 16) - cares model arch/token efficiency over optimization, stability - cares more about data quality than data quantity - taking inference into consideration day 0, even before the training starts In general, China labs are trying to use <4e 24 flops models to catch up with >1e 25 flops models. It is hard or impossible, but they are making good progress. I am actually very happy to see Qwen's new try on model archs, they used to be focusing more on data side rather than on model arch side. They developed linear attn, not just for people to think they are innovating, it is actually considering pushing the limit for test time scaling. Llama4 failed for many reasons, but qwen-next is different. They just used very limited flops and it is a brave try for good reasons.

Charuru Charuru @CharuruCha14310

13 Sep 2025

Replying to @teortaxesTex @JingyuanLiu123

I bet OpenAI/xAI is laughing so hard, this result is obvious tbh, they took a permanent architectural debuff in order to save on compute costs.

331

3,153

525,922

Miyamoto Musashi

Miyamoto Musashi @solminded

2 Aug 2025

Here is the best practice for making Claude Code smarter: Remove all the MCPs from global settings and use MPCP servers based on project and only if you have to. Use cli apps instead whenever you can and include the basic instructions at the /docs. MCPs are eating up your context

307

Miyamoto Musashi

Miyamoto Musashi @solminded

2 Aug 2025

5 - Here is the best part: Ask your agent “how can I optimize your current context window?” and it will tell you what is repetitive and redundant. Small context = smart agent.

Miyamoto Musashi

Miyamoto Musashi @solminded

2 Aug 2025

4 - Keep your Claude.Md file simple and short and use links to /docs instead of long verbose text.

Overview - Claude Code Docs

Claude Code is an agentic coding tool that reads your codebase, edits files, runs commands, and integrates with your development tools. Available in your terminal, IDE, desktop app, and browser.

code.claude.com

Miyamoto Musashi

Miyamoto Musashi @solminded

2 Aug 2025

3- Do not use @filename unless you need the complete codebase. Instead make claude agent run search for snippets.

Miyamoto Musashi

Miyamoto Musashi @solminded

2 Aug 2025

2- Do not let your context being auto-compacted. Always start fresh before that stage with a habdoff handon command that reviews the previous task progress, code changes(bash script) and git diff.

127

Simon Willison

Miyamoto Musashi retweeted

Simon Willison

@simonw

10 May 2025

llama.cpp shipped new support for vision models this morning, including macOS binaries (albeit quarantined so you have to take extra steps to run them) that let you run vision models in a terminal or as a localhost web UI

ALT macOS terminal running a Gemma 3 4B vision model, which can then load a pelican.jpg photo and describe it

ALT The localhost:8080 web UI for llama.cpp describing a different photo of some pelicans

257

27,032

Raphaël Erba

Miyamoto Musashi retweeted

Raphaël Erba

@raphael_erba

26 Mar 2025

Ghibli trend…

0:10

942

31,512

287,415

15,791,834

AshutoshShrivastava

Miyamoto Musashi retweeted

AshutoshShrivastava

@ai_for_success

8 Mar 2025

Manus AI is freaking insane and I have not used anything like this before. Disclaimer: I am not paid by Manus AI to write this, I just feel lucky to have gotten access. For the given prompt, Manus AI conducted online research, generated Python code, validated all results, and published a detailed report in 47 minutes. Prompt: "Calculate the optimal Hohmann transfer orbit for a spacecraft traveling from Earth to Mars, including the required delta-v (change in velocity) at each maneuver, considering the current positions of Earth and Mars in their orbits as of March 2025. Also, account for gravitational influences from the Sun and other planets." Note: I am not sure if these calculations are correct; someone needs to verify them. I am publishing the report on GitHub. @manusai

0:37

935

227,485

Deedy

Miyamoto Musashi retweeted

Deedy

@deedydas

8 Mar 2025

Manus, the new AI product that everyone's talking about, is worth the hype. This is the AI agent we were promised. Deep Research Operator Computer Use Lovable memory. Asked it to "Do a professional analysis of Tesla stock " and it did ~2wks of professional-level work in ~1hr!

1:20

253

758

7,385

1,519,535

Miyamoto Musashi

Miyamoto Musashi @solminded

19 Feb 2025

When I look at build in public communities all I see is attention bait posts, fake metrics, lies about revenue - users - marketing, typical x grifter hustle

Miyamoto Musashi

Miyamoto Musashi @solminded

14 Feb 2025

this is why you should not listen to influencers hiding the fact that they are being sponsored. @mckaywrigley blocked me when I pointed out that lately his tweets were pushing @OpenAI “products are better” narrative too far and gave me the impression that he was sponsored.

Miyamoto Musashi

Miyamoto Musashi @solminded

14 Feb 2025

“AI influencer” 😄

Sohail

Miyamoto Musashi retweeted

Sohail

@sohailhusn

10 Nov 2024

Replying to @IHaveAFreeWill

This might help

2,359

Miyamoto Musashi

Miyamoto Musashi @solminded

16 Feb 2024

Seems like OpenAI will wreck all AI generative related startups one by one. They have just killed @runwayml today.

235