Filter
Exclude
Time range
-
Near
4/ The hardest part wasn't building the UI. It was defining the rules behind every block. Each component needed explicit logic for: Input/output dimensions Parameter counts Compute cost Compatibility with other layers
1
1/ While studying transformer architectures, I noticed I kept doing the same things repeatedly: Sketching architectures Calculating parameter counts Tracking tensor dimensions Estimating compute requirements So I started building a tool for myself: LLM Architecture Lab.
1
Replying to @Invest_AndGrow
Most of them will not move much and are not suitable for investment as the most important parameter is missing in them that is growth , nestle posted good number for 1 quater but need to see the consistency
1
1
5
Ja, det er også en parameter. Plus utryghed for vold og voldtægt.
1
4
HarperAI Guru retweeted
Rio de Janeiro’s city government released Rio 3.5 Open, a roughly 397-billion-parameter open-weight AI model available on Hugging Face. The model was created by merging existing models, including Qwen3.5-397B-A17B, followed by additional training from a stronger model. It is released under an MIT license, allowing developers to download, modify, and deploy it. Its arrival, alongside MiniMax M3, is adding new competition to the open-model frontier while Alibaba’s latest flagship, Qwen3.7-Max, remains proprietary and accessible only through an API. The Rio model page currently warns that an incorrect version was initially uploaded and that the proper final model is being restored. Its performance should therefore be independently tested once the corrected weights are available.
Alibaba Qwen3.7 slowly fading into irrelevance at the frontier due to proprietary stance. In it's place we have Minimax M3 and... *checks notes* Rio 3.5 397b, made by the municipal IT company of Rio de Janeiro's city government. huggingface.co/prefeitura-ri…
1
2
1
391
PARAMETER Rehearsal Tape
2
Replying to @BrooksWhaleX
128GB that both CPU and GPU can access simultaneously is what makes running a 235B parameter model possible in a lunchbox. That's a fundamentally different design than stacking a discrete GPU with its own separate VRAM. Qwen3 235B local is not the same experience as Claude Code Max or GPT-5.5 on frontier tasks. You're trading quality ceiling for cost and privacy. For the hardest reasoning tasks, the frontier models still have a gap.
19
Replying to @vini546
@growth_edge_ Great info! however one major parameter is missing - taxation rule for Multi asset funds. Most multi-asset funds have <65% equity exposure and are taxed according to the investor's income tax slab if held for less than 24 months.
1
1
8
The question is, what is the cost compared to an Apple Mac Studio. Besides speed, the cost is a parameter to compare with too.
5
Replying to @teortaxesTex
You might want to check the experimental setup more carefully. Unique data/repetitions/tokens per parameter might give a hint if this is applicable in practice
68
$5,000 A Year On AI You Don't Own. Lisa Su Just Put The Whole Stack In A Box The Size Of A Hardcover Book. It Sits On A Desk. No Cloud. Nothing Leaves The Room. Inside Is The Ryzen AI Max 395, The First x86 Chip To Run A 235 Billion Parameter Model On One Piece Of Silicon. 128GB Of Memory In A Single Pool, Shared Across CPU, GPU And NPU. On Linux You Get 110GB Of Usable VRAM Out Of That 128. The GMKtec EVO-X2. The Size Of A Lunchbox. $1,700. It Runs Qwen3 235B Fully. DeepSeek V3 Comfortably. Llama 3.3 70B With Room To Spare. AMD Put It Against NVIDIA's RTX 5080, A $1,000 Discrete GPU, On DeepSeek R1 Inference. The Lunchbox Won By 3x. Now Run The Math A Heavy User Already Pays. Claude Code Max, $200. ChatGPT Pro, $200. Cursor, $20. Gemini, $20. That's $440 A Month. $5,280 A Year Going To Servers You'll Never See. Electricity On The Box Runs $9 A Month. Ollama Installs In One Command. Claude Code Points At Localhost. Nothing Leaves The Machine. Nothing Costs Per Request. No Throttling At Peak Hours, No $200 Paywall Deciding How Many Times You Get To Think Today. Pays For Itself In 10 Months. After That The $440 Stops Leaving Your Account Every Month And Just Stays There. The Cloud Made Sense When Nothing On A Desk Could Touch It. It Sits On The Desk Now.
11
🚨 AMD just dropped a lunchbox-sized AI PC that’s making people rethink cloud AI. Lisa Su demoed a 235B-parameter model running locally on the new Ryzen AI Max 395. No datacenter. No cloud. No rented GPUs. The game changer: up to 128GB of unified memory shared between CPU and GPU. For comparison: • RTX 5090 → 32GB VRAM • RTX 4090 → 24GB VRAM • Ryzen AI Max 395 → 128GB unified memory AMD says it delivers over 3× faster DeepSeek R1 inference than an RTX 5080 in certain workloads. Meanwhile, AI subscriptions add up fast: • Claude Code Max: $200/mo • ChatGPT Pro: $200/mo • Cursor: $20/mo • Gemini: $20/mo That's $5,280/year before building anything. The 128GB system starts around $2,399. Install Ollama. Run large models locally. Keep your data private. No token limits. No monthly AI tax. The future of AI may not be in the cloud—it might be sitting on your desk.
1
16
have you tried to code a small-medium codebase using a 27B parameter model with 24GB VRAM ? It will run out of context fast.
1
4
Franzl retweeted
AMD's CEO held a mini PC on stage that runs a 235 billion parameter model. Locally. On a lunchbox-sized box. This is the moment the AI industry's entire business model got challenged. Every major AI company built the same playbook: rent our model, pay per token, send your data to our servers. AMD just showed you can run comparable models on your desk. No cloud. No API key. No ongoing cost. Ryzen AI Max 395. 128GB unified memory. Runs Qwen3 235B fully. Beat an NVIDIA RTX 5080 by 3x on inference. And after this week where the US government pulled Claude Fable 5 overnight and millions lost access with zero warning local AI isn't just cheaper anymore. It's the only AI nobody can take away from you. The question isn't whether local AI is good enough. It clearly is. The question is why you'd keep renting when you can own. vc: @adiix_official
14
25
145
13,393
The 128GB number is the part everyone's repeating. The number that actually decides whether you'd use this box is 256. That's the memory bandwidth, in GB/s. A 5090 moves about 1,800. An H100 moves 3,350. Local token speed is bound by how fast weights get read out of memory, and this APU reads them at roughly a seventh of a gaming GPU. So the headline does something quiet. Qwen3 235B runs here at about 11 tokens a second, which sounds impossible on 256 GB/s until you notice the model is mixture-of-experts: 235B total, ~22B active per token. The chip only moves the 22B it needs. The "235B" on the slide is a storage stat. The 22B is the speed stat. Run something dense and the trick drops. Llama 3.3 70B, where every parameter fires on every token, does about 5 tokens a second on the same box. Readable. Not something you sit in front of for eight hours. That 3x win over a 5080 lives in the same place. A 5080 has 16GB of VRAM and can't hold a 235B model at all, so it spills to system memory and crawls. The APU wins that matchup on capacity. Change the test to a model that fits in 16GB and the 5080 walks away on speed. Now look at the workload in the pitch: point Claude Code at localhost. Agentic coding is the worst possible fit for a bandwidth-starved box. One task is dozens of sequential model round trips, each waiting on the last, each streaming at 11 tokens a second. The exact use case used to sell the $5,280 in savings is the one that exposes the bottleneck. The same Qwen3 235B runs at 1,500 tokens a second on a Cerebras wafer. That's the real comparison: 1,500 versus 11, and how much of your day goes to watching the slow one think. The box is a real deal for what it is. A quiet, private, $1,800 machine that runs big open models at conversational speed for one person. The frontier stack it's sold as replacing answers at 50 to 100 tokens a second with quality no open 235B matches yet. It pays for itself in 9 months only if your time is worth nothing per token.
AMD CEO LISA SU HELD A MINI PC ON STAGE THAT RUNS A 235B MODEL AND REPLACES YOUR $440/MONTH AI STACK amd's ryzen ai max 395 is the first x86 chip that runs a 200 billion parameter model on one piece of silicon. cpu and gpu share 128gb of unified memory, no separate graphics card needed the gmktec evo-x2 runs qwen3 235b fully, deepseek v3 comfortably and llama 3.3 70b with headroom. on linux you get 110gb of usable vram out of 128gb amd claimed the chip beat an nvidia rtx 5080 by more than 3x on deepseek r1 inference. a lunchbox sized pc outrunning a $1,000 discrete gpu on a real ai workload a heavy ai user pays $200 for claude code max, $200 for chatgpt pro, $20 for cursor and $20 for gemini. that's $5,280 a year and the box pays itself off in 9 to 10 months install ollama, pull the model, point claude code at localhost. same interface, nothing leaves the machine, nothing costs per request bookmark this and read the article below
2
4
1,344
alex mercer retweeted
Replying to @Tzu_108_112
We are building smaller models. Currently a 100 B parameter model is in the works.
31
67
879
14,958