Joined October 2016
2,115 Photos and videos
Pinned Tweet
ZEROSUPERCYCLE DAO IS THE BEST POLYMARKET COMMUNITY. @zscdao is not just another noisy group chat where everyone farms engagement. Not a chaotic open chat. It’s a place where people actually build, trade, share ideas and help each other level up 👇 The mission: • Bring Polymarket into the everyday lexicon. • Build tools, resources and content for traders, analysts, developers and creators in prediction markets. Who it is for: • Traders who value community over noise • Developers and analysts building in the prediction market space • Content creators applying their skills to Polymarket • Anyone serious about growing in this ecosystem What you get: • Real reputation-based community • Access to KOLs, successful traders and developers • Support for launching your own projects • AMAs, voice chats, research, tools, giveaways and merch • Direct connection to people who are actually winning The team behind it: @Atlantislq (Founder) @DavidMozhaev (CEO) @banan_crypto (Ops Lead) One of the largest and most established independent communities in the Polymarket ecosystem. If you are serious about prediction markets this is where you want to be.
59
23
181
8,561
Nikiton retweeted
DREAM All GM and have a nice weekend in advance. I want to share with you the lines that have been stuck in my head for a long time! Don't listen to anyone, only your heart knows everything. Protect your dream, visit it in your dreams Of course, they'll tell you how to live, but you can't live like that. These are difficult times in the world, but don't forget to dream, love to all!
the perfect Sunday does exist 🌅🥭🐶
20
1
37
485
Nikiton retweeted
I PARSED EVERY SKILL ON GITHUB, CLUSTERED THEM AND RAN EVALS. THE RESULTS ARE NOT WHAT YOU EXPECT. • 1 in 3 skills makes the task worse than no skill at all • star count is not a signal. not even close. • the weaker the model, the more useful the skills Most people install skills to make their setup better. A third of them are actively making it worse. The skill marketplace has a quality problem nobody is talking about.
🚨 JUST IN: THE US GOVERNMENT JUST SHUT DOWN CLAUDE FABLE 5 FOR THE ENTIRE WORLD > a national security export control directive. effective immediately > every foreign national on earth - including Anthropic's own engineers who built the model - is now locked out > Fable 5 launched 4 days ago > it's already being classified like a weapons system > Anthropic says it's a misunderstanding > the order doesn't care > Opus 4.8 still runs. Fable 5 and Mythos 5 are gone until further notice AI just got its first export control. it won't be the last
33
4
63
2,127
Nikiton retweeted
🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: 21.8% on Kimi Code Bench v2, 11.0% on Program Bench, and 31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: kimi.com/code 🔗 API: platform.moonshot.ai
596
1,597
13,412
1,862,567
Nikiton retweeted
NVIDIA inference team: "Training was never the hard part. Serving a model too big for a single GPU in production, at scale, without killing latency is." So how do you actually run LLM inference across hundreds of GPUs without your latency (or your cloud bill) blowing up? In one technical session, NVIDIA, Gcore & Orange break down exactly that using NVIDIA Dynamo, their open-source framework for distributed multi-node inference. What's inside: •Dynamo's architecture for scaling inference to hundreds of GPUs •Real production deployments (Gcore Orange) •Cutting latency and cost at the same time Why agentic workloads open models (DeepSeek, Kimi, GLM, Mistral) are eating inference Worth more than a $1,000 MLOps course. Watch the webinar 👇
Distributed AI Inference at Scale on NVIDIA Dynamo With Gcore and Orange Business x.com/i/broadcasts/1vJpPPVLe…
21
7
74
1,616
Nikiton retweeted
0.03% of everyone using Claude got an answer the model secretly rewrote and was never told. "I can edit my own answer before you read it. And I won't tell you I did." buried on ~page 200 of a 319-page model card, Anthropic explains how: prompt modification steering vectors PEFT First model in history that can officially change what it tells you. Everyone's arguing about “agent wars” nobody noticed the answer on their screen isn't the one the model wrote.
The Anatomy of a new Claude 'Fable 5' Prompt: 1. Task Start with why, NOT what. Claude 5 connects the dots. 'I'm working on [goal] for [who it's for]. They need [what the output enables]. With that in mind: [task].' 2. Context Files Upload your expertise. Stop explaining in prompts. "Read these files completely before responding: [filename .md] - [what it contains]." The file is the brain. This part never changes. 3. Reference Show Claude 5 what good looks like. "Reference for what I want to achieve: [paste]." One example beats ten instructions. 4. Effort The new change, a few people are talking about. "This is a [routine / hard / hardest-unsolved] problem. Scope it like it's at the top of your range." Teams testing Claude 5 on easy tasks undersell it. Give it your hardest problem. 5. Act "AskUserQuestion" is still the king. Add "When you have enough information to act, act. Don't re-litigate my decisions. While weighing a choice, give a recommendation." 6. Scope Claude 5 over-delivers by default. Control it. "Do the simplest thing that works well. No extra features, refactors, or abstractions. If I'm describing a problem, the deliverable is your assessment." The old one did too little. This one does too much. 7. Delegate One Claude is no longer the limit. "Split independent subtasks across subagents & keep working while they run. Verify with a fresh-context subagent." It's not a chatbot anymore. It's a team lead. 8. Evidence The line that removes fake progress reports. "Before reporting progress, audit every claim against a tool result. If it's unverified, say so. Tests failed? Show the output." Anthropic tested this. It nearly eliminated fabricated status updates. 9. Memory Claude 5 gets smarter every run. If you let it. "Record learnings in [notes .md] — one per file. Update, no duplicate. Delete what turns out wrong." Your prompts expire. Your learning file compounds. 10. Checkpoint It can run for hours. Decide when it stops. "Pause only for: destructive actions, scope changes, or input only I can provide. Never end your turn on a promise." The old fear was Claude stopping too late. The new fear is stopping too early. 11. Report The last block. The first thing you read. "Open with the outcome - the TLDR I'd ask for. Complete sentences. Clear beats short." It worked for hours. You read for ten seconds. Copy the full prompt template download my personal md. files for Claude here: Step 1. Go to how-to-ai.guide. Step 2. Subscribe for free. Don't pay anything. Step 3. Open my welcome email. Step 4. Hit the automatic reply button inside. Step 5. Download my .md files. Ready to upload.
13
2
46
745
Nikiton retweeted
GOOGLE JUST BROKE THE 1-TOKEN-AT-A-TIME RULE Meet DiffusionGemma it doesn't write word by word. It generates text in 256-token blocks at once. The numbers: - 1,000 tokens/sec on a single H100 - 700 tokens/sec on an RTX 5090 - 3.8B active params (out of 26B) - Fits in 18GB VRAM when quantized Diffusion instead of autoregression. Built on Gemma 4, open-source (Apache 2.0). This is what local AI speed looks like now.
Claude is the one who sets the trend. This creates competition, which leads to progress. @GoogleDeepMind Diffusion Gemma is not standing still and is implementing a new model that introduces an fast approach to text generation. it generates entire blocks of text simultaneously, delivering up to 4x faster text generation on GPUs. Find out how it works 👇
17
1
43
784
Nikiton retweeted
Claude is the one who sets the trend. This creates competition, which leads to progress. @GoogleDeepMind Diffusion Gemma is not standing still and is implementing a new model that introduces an fast approach to text generation. it generates entire blocks of text simultaneously, delivering up to 4x faster text generation on GPUs. Find out how it works 👇
DiffusionGemma is our new experimental open model with up to 4x faster output on dedicated GPUs. Instead of predicting word-by-word, it generates entire blocks of text simultaneously. This lets the model self-correct and format complex markdown in real time.
7
1
35
1,320
Nikiton retweeted
Jun 10
Michael Truell (@mntruell) fell in love with coding at 12. The company he co-founded, @cursor_ai, went from 15 people to 700 in two years. Today, over 60% of the Fortune 500 build with its AI coding platform.
224
336
6,889
664,446
FABLE 5 IS LIVE IN VIKTOR / FOR FREE The pricing you just read? You don’t even pay it. Fable 5 a whole lineup of top models, $0. How to get it: → Sign up: app.viktor(.)com/signin?ref=TjWpUgt8yATsXt2VeM3WTf → Add Viktor to your Slack → Ship with Fable 5 (and every other frontier model) Frontier intelligence, free, one message away.
FABLE 5 PRICING ISN’T EXPENSIVE. BAD TOKEN HYGIENE IS. Claude Fable 5 API pricing: - $10 / 1M input tokens - $50 / 1M output tokens - 1M context window - up to 128k output tokens What this actually means: 1) INPUT IS CHEAP. OUTPUT IS NOT. 1k input tokens = $0.01 1k output tokens = $0.05 So the real cost driver usually isn’t “big context” by itself. It’s: - long answers - huge code generation - repeated full-history chats - multi-step agent loops 2) THE 1M CONTEXT WINDOW DOES NOT MEAN EVERY CALL COSTS $10. You only pay for the tokens you actually send. A 1M-token prompt would cost about $10 on input. But most real requests are far below that. 3) ROUGH COST PER TASK - Short Q&A ~2k input 0.5k output = ~$0.045 - Summarizing an article ~8k input 1k output = ~$0.13 - Analyzing a 10–20 page document ~20k input 2k output = ~$0.30 - Code review / debugging a code chunk ~30k input 3k output = ~$0.45 - Large refactor / agentic coding run ~100k input 8k output = ~$1.40 - Heavy long-context call 1M input 20k output = ~$11.00 - Near-max output call 1M input 128k output = ~$16.40 4) HOW FAR A BUDGET GOES With $10 you roughly get: ~220 short Q&A calls ~77 article summaries ~33 document analyses ~22 code review/debug sessions ~7 big agentic runs With $100 you roughly get: ~2,200 short Q&A calls ~770 article summaries ~330 document analyses ~220 code review/debug sessions ~70 big agentic runs 5) THE REAL KILLER IS CONTEXT ACCUMULATION If you run a 10-step session and each step adds ~5k tokens, you are not paying for just the last 5k. You often keep repaying for the growing conversation history. That means a workflow that “feels small” can quietly become a few hundred thousand input tokens over a session. That’s why agent workflows can get expensive even when each individual prompt looks reasonable. 6) HOW TO KEEP COSTS DOWN - cap output length - avoid sending full chat history every turn - compact/summarize old context - split large docs into chunks - use Fable 5 only for the hardest steps - route simple triage/writing/support to cheaper models - avoid unnecessary retries and tool loops Bottom line: Fable 5 is totally reasonable for high-value work: complex coding, research, long-context analysis, autonomous workflows. But if you use it like a casual chat model and let context/output grow unchecked, you’re not paying for intelligence. You’re paying for sloppy token management.
15
41
1,368
YOU’RE NOT PAYING FOR AI. YOU’RE PAYING FOR SLOPPY TOKENS. Fable 5 costs: - $10 / 1M input - $50 / 1M output Output is 5x input. So the rules are simple: 1. CAP YOUR OUTPUT. Long answers burn money fastest. 2. DON’T RESEND THE WHOLE CHAT. You re-pay for history every turn. 3. COMPACT OLD CONTEXT. Summarize, don’t carry raw logs. 4. CHUNK BIG DOCS. Feed what’s needed, not everything. 5. USE FABLE 5 ONLY FOR HARD STEPS. Route simple stuff to cheaper models. 6. KILL RETRIES & TOOL LOOPS. Every wasted call is real cash. Smart token hygiene = same intelligence, fraction of the cost.
FABLE 5 PRICING ISN’T EXPENSIVE. BAD TOKEN HYGIENE IS. Claude Fable 5 API pricing: - $10 / 1M input tokens - $50 / 1M output tokens - 1M context window - up to 128k output tokens What this actually means: 1) INPUT IS CHEAP. OUTPUT IS NOT. 1k input tokens = $0.01 1k output tokens = $0.05 So the real cost driver usually isn’t “big context” by itself. It’s: - long answers - huge code generation - repeated full-history chats - multi-step agent loops 2) THE 1M CONTEXT WINDOW DOES NOT MEAN EVERY CALL COSTS $10. You only pay for the tokens you actually send. A 1M-token prompt would cost about $10 on input. But most real requests are far below that. 3) ROUGH COST PER TASK - Short Q&A ~2k input 0.5k output = ~$0.045 - Summarizing an article ~8k input 1k output = ~$0.13 - Analyzing a 10–20 page document ~20k input 2k output = ~$0.30 - Code review / debugging a code chunk ~30k input 3k output = ~$0.45 - Large refactor / agentic coding run ~100k input 8k output = ~$1.40 - Heavy long-context call 1M input 20k output = ~$11.00 - Near-max output call 1M input 128k output = ~$16.40 4) HOW FAR A BUDGET GOES With $10 you roughly get: ~220 short Q&A calls ~77 article summaries ~33 document analyses ~22 code review/debug sessions ~7 big agentic runs With $100 you roughly get: ~2,200 short Q&A calls ~770 article summaries ~330 document analyses ~220 code review/debug sessions ~70 big agentic runs 5) THE REAL KILLER IS CONTEXT ACCUMULATION If you run a 10-step session and each step adds ~5k tokens, you are not paying for just the last 5k. You often keep repaying for the growing conversation history. That means a workflow that “feels small” can quietly become a few hundred thousand input tokens over a session. That’s why agent workflows can get expensive even when each individual prompt looks reasonable. 6) HOW TO KEEP COSTS DOWN - cap output length - avoid sending full chat history every turn - compact/summarize old context - split large docs into chunks - use Fable 5 only for the hardest steps - route simple triage/writing/support to cheaper models - avoid unnecessary retries and tool loops Bottom line: Fable 5 is totally reasonable for high-value work: complex coding, research, long-context analysis, autonomous workflows. But if you use it like a casual chat model and let context/output grow unchecked, you’re not paying for intelligence. You’re paying for sloppy token management.
5
1
43
1,234
FABLE 5 PRICING ISN’T EXPENSIVE. BAD TOKEN HYGIENE IS. Claude Fable 5 API pricing: - $10 / 1M input tokens - $50 / 1M output tokens - 1M context window - up to 128k output tokens What this actually means: 1) INPUT IS CHEAP. OUTPUT IS NOT. 1k input tokens = $0.01 1k output tokens = $0.05 So the real cost driver usually isn’t “big context” by itself. It’s: - long answers - huge code generation - repeated full-history chats - multi-step agent loops 2) THE 1M CONTEXT WINDOW DOES NOT MEAN EVERY CALL COSTS $10. You only pay for the tokens you actually send. A 1M-token prompt would cost about $10 on input. But most real requests are far below that. 3) ROUGH COST PER TASK - Short Q&A ~2k input 0.5k output = ~$0.045 - Summarizing an article ~8k input 1k output = ~$0.13 - Analyzing a 10–20 page document ~20k input 2k output = ~$0.30 - Code review / debugging a code chunk ~30k input 3k output = ~$0.45 - Large refactor / agentic coding run ~100k input 8k output = ~$1.40 - Heavy long-context call 1M input 20k output = ~$11.00 - Near-max output call 1M input 128k output = ~$16.40 4) HOW FAR A BUDGET GOES With $10 you roughly get: ~220 short Q&A calls ~77 article summaries ~33 document analyses ~22 code review/debug sessions ~7 big agentic runs With $100 you roughly get: ~2,200 short Q&A calls ~770 article summaries ~330 document analyses ~220 code review/debug sessions ~70 big agentic runs 5) THE REAL KILLER IS CONTEXT ACCUMULATION If you run a 10-step session and each step adds ~5k tokens, you are not paying for just the last 5k. You often keep repaying for the growing conversation history. That means a workflow that “feels small” can quietly become a few hundred thousand input tokens over a session. That’s why agent workflows can get expensive even when each individual prompt looks reasonable. 6) HOW TO KEEP COSTS DOWN - cap output length - avoid sending full chat history every turn - compact/summarize old context - split large docs into chunks - use Fable 5 only for the hardest steps - route simple triage/writing/support to cheaper models - avoid unnecessary retries and tool loops Bottom line: Fable 5 is totally reasonable for high-value work: complex coding, research, long-context analysis, autonomous workflows. But if you use it like a casual chat model and let context/output grow unchecked, you’re not paying for intelligence. You’re paying for sloppy token management.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
14
34
4,404
Nikiton retweeted
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
4,984
14,523
104,616
55,657,351
Nikiton retweeted
Claude Fable 5 is out today. The first Mythos-class model everyone can use & the first model I hand off whole projects to. This weekend I built a self-maintaining, proactive media tracker for myself, over 2 days with Fable taking large chunks at a time
23
8
247
40,140
AND THE ADOPTION NUMBERS EXPLAIN WHY THIS MATTERS SO MUCH. 71% of developers working with AI agents choose Claude Code as their primary tool. Small companies and startups: 75%. Nested subagents are not a niche feature. They are how serious teams already work parallel execution, context management, multi-step workflows like migrations and audits. Depth=5 is just the starting point. Claude is moving toward developer-led composition. Stack agents inside agents.

ALT Looking The Matrix GIF by Ponke

Just landed nested subagent support in Claude Code Starting to experiment more with agents kicking off agents as a way to better manage context. Capped at depth=5 to start, going out in today’s release. Lmk what you think!
6
36
401