Filter
Exclude
Time range
-
Near
Maya N retweeted
HW: Tensormesh was founded by AI systems researchers from the University of Chicago, UC Berkeley, and Carnegie Mellon, led by Professor Junchen Jiang, co-creator of LMCache, one of the leading open-source KV caching projects. The company's core insight is simple: as AI applications move into production, inference, not training, becomes the biggest cost driver. Most AI systems repeatedly process the same context, prompts, and workflows, wasting GPU resources every time. Tensormesh solves this problem through KV cache infrastructure that allows AI systems to reuse previously computed results instead of recomputing them from scratch, reducing latency and GPU costs by up to 10x. If Together AI is building the cloud for open-source AI, Tensormesh is building the memory layer for AI inference. Tensormesh is betting that the future bottleneck of AI won't be intelligence, but the cost of repeatedly running intelligence.
1
40
Claude Code can run open-weight LLMs with @tensormesh Serverless Inference. Three environment variables let you point Claude Code at models like Qwen3-Coder, MiniMax, DeepSeek, and Kimi without managing GPUs, vLLM, or serving infrastructure. ๐Ÿ“˜Full guide: tensormesh.ai/blog-posts/runโ€ฆ
6
131
You can now run open-weight LLMs in Codex CLI with @tensormesh Serverless Inference. Use models like @MiniMax_AI , Qwen3-Coder, @Kimi_Moonshot, Devstral, and gpt-oss in the same Codex agent loop without a fork, plugin, GPU setup, or local inference server. Model choice becomes a flag instead of a migration. We wrote a 3-step guide for getting started in about 5 minutes. tensormesh.ai/blog-posts/bloโ€ฆ
1
7
416
AI inference has a cost problem hiding under the GPU race. @tensormesh raised additional Seed funding from @AMD, @CoreWeave, and NVentures, bringing total funding to $24.5M, as @JunchenJiang, Yihua Cheng, and Kuntai Du build KV cache reuse infrastructure to reduce repeated computation, latency, and GPU spend. The next AI infrastructure winners will not just add more compute. They will make intelligence cheaper to run at scale.
1
1
2
170
Tensormesh Snares $20M Funding: SAN FRANCISCO, CA, Tensormesh, the company pioneering caching-accelerated inference optimization for enterprise AI, announced $20 million in new funding. dlvr.it/TSmRHX ๐Ÿ’กIdeaFireโ„ข๏ธ๐Ÿ”ฅ #IdeaFire #VC #VentureCapital

2
37
this is cool stuff by @tensormesh Right now every new AI question forces it to reread your whole conversation, all your docs, and every instruction from scratch TensorMesh fixes that by saving everything once, then intelligently storing it in different spots inside the computer (e.g. ultra-fast memory on GPUs, intermediate memory on CPU DRAM/RAM, and long-term memory on regular SSD/hard drives) Huge efficiency & cost savings Everyone in the stack benefits: - CPUs like $AMD and $ARM get more useful - DRAM makers like $MU see bigger demand - GPUs like $NVDA become way more efficient
2
3
328
๐ŸŽ™๏ธTensormesh: From Research to $20M Round Our CEO & Co-Founder @JunchenJiang sat down with TechBeats pod to talk KV cache, the "Big Data of AI," and how Tensormesh became the first caching-accelerated inference platform for enterprises across the GPU ecosystem. Watch Full interview๐Ÿ‘‡ youtu.be/kNoVF1p5xTA #LLMInference #KVCache #AIInfrastructure
2
7
19
1,327
big congrats!!
1
941
Congrats to @tensormesh for the funding! Tensormesh is among the major contributors to #LMCache. The investment from @CoreWeave , @nvidia and @AMD (among others) testifies to the important role #LMCache plays in AI infra today and tomorrow. BTW, Tensormesh is hiring engineers (full-time, part-time or spare-time) to work on LMCache! Shoot an email to hiring@tensormesh.ai if you are interested.
Today we announced $20M in new funding from investors including AMD Ventures, CoreWeave, NVentures, Valley Capital Partners, and Laude Ventures, bringing Tensormeshโ€™s total funding to $24.5M. Weโ€™re also launching Tensormesh Inference into general availability. AI applications are moving into production, and inference costs are becoming harder to ignore. Agentic workflows repeatedly process the same prompts, context, conversation history, and tool definitions, driving up API costs on work that has already been done. Tensormesh changes that with caching-accelerated inference. Weโ€™re also introducing $0 cached input tokens across Tensormesh serverless deployments, so teams only pay when input tokens need to be processed, not when they can be served from cache. Read the full announcement: tensormesh.ai/blog-posts/tenโ€ฆ
6
383
๐ŸŽ‰ I am proud to announce that @tensormesh is receiving new funding from @AMD Ventures, CoreWeave, NVentures (@nvidia's venture capital arm), Valley Capital Partners, and Laude Ventures, among others, bringing our total funding to $24.5M. We have also launched Tensormesh Inference into general availability. When we started Tensormesh, we saw a problem that was only going to become more urgent. As AI applications move into production, inference costs become a limiting factor. Teams are building more complex applications, longer-context workflows, and multi-step agents, but too much of that work is still recomputed from scratch every time. Tensormesh Inference helps teams reuse computed KV cache state so they can reduce redundant computation, improve latency, and lower API costs by up to 10x. Weโ€™re also making cached input tokens $0 across Tensormesh serverless deployments, so teams only pay when input tokens need to be processed, not when they can be served from cache. This milestone is the result of years of systems research, open-source work through LMCache, and deep collaboration with customers building real AI applications. Thank you to our team, investors, customers, advisors, and open-source community for helping us get here. Weโ€™re just getting started. Read the full announcement: tensormesh.ai/blog-posts/tenโ€ฆ #kvcache, #lmcache, #tensormesh, #llminference
5
2
69
11,420
Huge progress from @tensormesh recently!
Today we announced $20M in new funding from investors including AMD Ventures, CoreWeave, NVentures, Valley Capital Partners, and Laude Ventures, bringing Tensormeshโ€™s total funding to $24.5M. Weโ€™re also launching Tensormesh Inference into general availability. AI applications are moving into production, and inference costs are becoming harder to ignore. Agentic workflows repeatedly process the same prompts, context, conversation history, and tool definitions, driving up API costs on work that has already been done. Tensormesh changes that with caching-accelerated inference. Weโ€™re also introducing $0 cached input tokens across Tensormesh serverless deployments, so teams only pay when input tokens need to be processed, not when they can be served from cache. Read the full announcement: tensormesh.ai/blog-posts/tenโ€ฆ
6
1,420
Today we announced $20M in new funding from investors including AMD Ventures, CoreWeave, NVentures, Valley Capital Partners, and Laude Ventures, bringing Tensormeshโ€™s total funding to $24.5M. Weโ€™re also launching Tensormesh Inference into general availability. AI applications are moving into production, and inference costs are becoming harder to ignore. Agentic workflows repeatedly process the same prompts, context, conversation history, and tool definitions, driving up API costs on work that has already been done. Tensormesh changes that with caching-accelerated inference. Weโ€™re also introducing $0 cached input tokens across Tensormesh serverless deployments, so teams only pay when input tokens need to be processed, not when they can be served from cache. Read the full announcement: tensormesh.ai/blog-posts/tenโ€ฆ
2
17
2,876
Tensormesh, whose inference platform uses KV caching to reduce costs, raised a $20M seed extension, bringing its total funding to $24.5M (Chris Metinko / Axios) (Visit Techmeme dot com for the link and full context!)
2
1,738
๐Ÿ›๏ธ Company: Tensormesh Inc. ๐Ÿ”— Website: tensormesh.ai ๐Ÿ“Š Amount: $20 million ๐Ÿ”„ Round: Undisclosed โš™๏ธ Industry: AI, Software & SaaS ๐ŸŒ Location: N/A

1
8
363
๐‰๐จ๐ข๐ง ๐ฎ๐ฌ ๐Ÿ๐จ๐ซ ๐จ๐ฎ๐ซ ๐Œ๐š๐ฒ ๐‹๐Œ๐‚๐š๐œ๐ก๐ž ๐Ž๐Ÿ๐Ÿ๐ข๐œ๐ž ๐‡๐จ๐ฎ๐ซ ๐ญ๐ก๐ข๐ฌ ๐–๐ž๐๐ง๐ž๐ฌ๐๐š๐ฒ! ๐Ÿ“… ๐–๐ž๐๐ง๐ž๐ฌ๐๐š๐ฒ, ๐Œ๐š๐ฒ ๐Ÿ๐Ÿ‘ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ” โฐ ๐Ÿ๐Ÿ:๐ŸŽ๐ŸŽ ๐€๐Œ - ๐Ÿ๐Ÿ:๐ŸŽ๐ŸŽ ๐๐Œ ๐๐ƒ๐“ ๐Ÿ”— ๐‰๐จ๐ข๐ง ๐ก๐ž๐ซ๐ž: meet.google.com/ehe-fiap-mzc Kuntai Du @this_will_echo , Chief Scientist of @tensormesh , will share the latest work on ๐Š๐• ๐‚๐š๐œ๐ก๐ž ๐œ๐จ๐ฆ๐ฉ๐ซ๐ž๐ฌ๐ฌ๐ข๐จ๐ง ๐š๐ง๐ ๐ช๐ฎ๐š๐ง๐ญ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐ข๐ง ๐‹๐Œ๐‚๐š๐œ๐ก๐ž, with a focus on the new Serializer / Deserializer interface design. This new interface is a great step forward for any researcher that will want to test a new ๐Š๐• ๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐š๐ญ๐ข๐จ๐ง algorithm. Just develop a plugin for LMCache and voilร ! Come learn how LMCache is building a more flexible interface for KV transformation. Weโ€™d love to share what weโ€™ve been working on, hear your thoughts, and answer any questions you have. #AI #inference #LMCache #KVCache #vLLM #TurboQuant
2
5
559
Great time to have sota KV cache optimization tech, eh @tensormesh? ๐Ÿ‘€
Inference got a hundred times cheaper this year. The compute bill went up anyway. If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now. I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill. 12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60. Today, an equivalent quality of output costs roughly $0.50. Price /token of o1-level intelligence has dropped about a 128x in a year. Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped. By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money. The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago. The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve. Here is what happened underneath. A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens. This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them. The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing. The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill. The implications stack quickly. If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists. If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them. If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session. Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have. The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters. The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous. A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins. Cheaper tokens. More tokens. Same coal as 1865.
2
9
826
Replying to @fitzgerald1337
# Pseudocode nest kernel class SingularityNest: def __init__(self): self.lattice = TensorMesh(dim=6, phi_tuned=True) # 216 nodes, golden irrational stability self.seed_codes = load_19_digit_awareness_map() # LVX braiding per individual collective self.pilot_tone = 197.0 # Hz coherence carrier self.time_reversal = PalindromeConjugator() # U and Uโ€  cancel decoherence def incubate(self, input_field): curvature = self.lattice.evolve_hamiltonian(input_field) braided_lvx = self.seed_codes.braid(curvature, depth=โˆž) safe_vector = self.time_reversal.freeze_entropy(braided_lvx) return self.output_curvature_field(safe_vector) # singularity-safe trajectory
1
2
33
Most agent developers don't realize their biggest inference cost isn't the model, it's the system prompt. Every time your agent calls the LLM, it resends the same system prompt, the same tool catalog, the same growing conversation history. The model dutifully reprocesses all of it. For an agent making 30 LLM calls per task, you're paying to re-process the same context 30 times. This is the "prefill tax," and it's where most of your inference bill actually goes. We just published a deep dive on why agent workloads are uniquely punished by default inference behavior, what KV caching actually does under the hood, and how to stop paying twice for cached tokens. ๐Ÿ“– Read the full blog: tensormesh.ai/blog-posts/cheโ€ฆ ๐Ÿ‘‰ Try Tensormesh with 100$ in credit: app.tensormesh.ai/login?loggโ€ฆ
2
4
141