Joined May 2007
569 Photos and videos
Pinned Tweet
2 Nov 2023
Not your weights, not your algo, not open ai. We know that nondeterminism is a feature, but llms randomly leaking training data will be an important vector of attack
Wild: GPT-3.5 leaked a random dude's photo in the output... Lesson: what you upload online will probably become training data.
Community note
ChatGPT itself did not leak this image, as it was already on the internet as of the 7th of December 2016. It was uploaded to Imgur here: imgur.com/YFRoBdF (view on PC to see upload date
1
13
1,603
From tokenmaxxing to tokenflation
Does a token buy you more or less now than it did a few months ago? We built a consumer price index (CPI) for AI coding output from Anthropic's Opus 4.6 model in SWE-chat, Feb 5–Apr 15, 2026. What we find looks like tokenflation:
1
2
42
CUDA nvidy
If you know some polish language, you will appreciate how much of a miracle CUDA is
27
Robinhood but for real
First white-hat exploit on Ethereum: I unlocked 1,003.62 Ξ ($2,000,000) trapped in a 2016 ICO smart contract for 9 years. The 48 original investors can now claim their funds.
1
41
This is the initial prompt: "Write a classical haiku given the provided inputs." The screenshot shows the new version. This is how @DSPyOSS adds clarity: - express intent in logical building blocks - add your eval criteria dataset - GEPA optimization algo
1
2
15
918
Tokenmaxxing is the symptom 🤫
Misreading the Bitter Lesson is how agents end up burning fortunes rebuilding context. Expensive amnesia, paid to anthropic in tokens. The fix: semantic state at ingestion, ontology at retrieval, tiny models for traversal, frontier models for judgment.
1
1
634
ukituki retweeted
It's never made sense to me that RL collapses all reward signals to a single scalar. Today, we fix that! Introducing Vector Policy Optimization: we train models to inherently optimize for the varied nature of a reward vector, creating diverse sets of answers ideal for test time search. Website and code coming soon!
Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.
11
66
716
68,704
ukituki retweeted
i'm excited to open source Active Graph: an event-sourced reactive graph runtime for long-running, agents 🔄🧠 events/logs projects a graph. reactive behaviors react and affect the graph. fork-and-diff agent runs. no A2A, no workflows, no DAG site: activegraph.ai docs: docs.activegraph.ai github: github.com/yoheinakajima/act… quick start: pip install activegraph this is an early experiment in a new paradigm for agent architecture 🧪
57
53
538
96,358
Zuck quoted in Ben Evans presentation 8k ppl fired = system collapses under higher velocity and oversupply It's either: "we really can't handle and integrate all the new opportunities without sacrificing the revenue" or "we need to do less better" ben-evans.com/presentations
36
Important, yet intuitive idea: agents need dynamic table of contents to navigate long context tasks
Recent agentic systems (Claude Code, Codex, RLM, etc.) push context out of the prompt and into the environment (e.g., as files). This helps them maintain long-term knowledge about their goals and functionality. 🚨 While this is a good idea, we show a surprising result: systems that use external environments like this perform much better when given a small, fixed-size, in-context, agent-managed cache that "𝘱𝘦𝘦𝘬𝘴 𝘪𝘯𝘵𝘰" these environments. 🚀 Our paper, 𝗣𝗘𝗘𝗞: 𝙖 𝙨𝙮𝙨𝙩𝙚𝙢 𝙛𝙤𝙧 𝙗𝙪𝙞𝙡𝙙𝙞𝙣𝙜 𝙖𝙣𝙙 𝙢𝙖𝙞𝙣𝙩𝙖𝙞𝙣𝙞𝙣𝙜 𝗮𝗻 𝗼𝗿𝗶𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗰𝗮𝗰𝗵𝗲 𝙛𝙤𝙧 𝙇𝙇𝙈 𝙖𝙜𝙚𝙣𝙩𝙨, introduces this idea. Compared with strong baselines, including RAG, Compaction Agents, and SOTA prompt-learning frameworks, PEEK dominates the cost–quality Pareto frontier: achieving 6.3–34.0% in quality, with fewer iterations and lower cost. Paper: arxiv.org/abs/2605.19932 GitHub: github.com/zhuohangu/peek More in the thread below! (1/N)
2
51
ukituki retweeted
What happens when you post a real Monet and say it’s AI? The coolest art social experiment I’ve seen in a while. Thank you @SHL0MS
983
3,344
20,903
2,235,810
Depth-first network is the life hack for the distraction era: a bunch of folks that get it and have aligned incentives beat wide and shallow networks all the time. It’s a meta heuristic that works everywhere: - friends - feedback from the right icp - less tools smart defaults
Game theory proves that the size of your network is not the most important factor at all. If you have a network of a thousand weak ties with no mutual dependency, it will produce near-zero results most of the time. And when put under pressure, it collapses immediately. You should focus on a network of twelve people with highly overlapping incentives and clear reciprocity structures. It will outperform a grand network every time. That's because the brain's social cognition system can only maintain a high sense of trust with a limited number of people. Beyond that, everything feels transactional. Depth beats width every time. Aligned people outwork the crowd every time.
1
41
Brian Eno’s Oblique Strategies approve this direction. Another banger research from Omar’s lab and RLM is not even a half year old 👌
I’ve never been this excited about search. 6-7 years ago, IR got an influx of the paradigms we still use, all enabled by the big headroom MS MARCO and then BEIR created. Then progress slowed. Today, Diane releases perhaps the most ambitious IR benchmark to date: OBLIQ-Bench. Queries in it are meant to be increasingly opaque to current first-stage retrieval paradigms. Oblique queries put the bottleneck very early in the search process, as the relevance of a document to the query is quite latent. I can't wait for core IR research on fundamentally more powerful paradigms for first-stage search to be reignited again. Stay tuned for more stories about this, and read Diane's thread and her paper below!!
38
Not sure if my feed is full of Jane Street interviews stories because of X algo's overfit or it's just a smart influencer marketing play. Recurring "0.95? Huh" part makes me skew towards the latter. With .95 probability . It's still engaging and fun though
Apr 26
Had a Jane Street interview in 2013 that still bothers me. It was my 6th round. Final interview. The guy walks in carrying no laptop, no notebook, just a cold brew and what I later realized was a single IKEA tea candle. He writes on the whiteboard: food: $200 rent: $800 utilities: $150 candles: $3,600 family: dying Then he turns around and says, “Optimize.” I laughed because I thought it was a culture-fit bit. He did not laugh. So I said, “Well, obviously you spend less on candles.” He says, “Assume candles are non-discretionary.” Okay. I start building a model. Basic constraint satisfaction. Family survival as a soft penalty. Candles as a state variable. Maybe there’s an arbitrage where you buy wholesale paraffin and convert the $3,600 line item into inventory. He stops me. “You’re thinking like a consultant.” That’s when I knew I was in trouble. He says, “Give me a bid-ask on family dying.” I say, “What?” He says, “You’re long candles, short family. Where do you make markets?” I try to recover. I say the real issue is liquidity: rent and utilities are fixed, food is elastic, candles are emotionally inelastic. Therefore the optimal strategy is to securitize future candle enjoyment and borrow against it. He nods for the first time. Then he asks, “What time do you sell the candles?” I say, “Whenever the market is liquid?” He says, “Be more specific.” I say, “Uh… 10 a.m. Eastern?” For the first time, he smiles. He goes, “Every day?” I say, “Every day.” He says, “In size?” I say, “In size.” He says, “And what do we call that?” I say, “Market manipulation?” The room gets very quiet. He looks disappointed and writes something down. “No. We call it providing liquidity to candle ETFs during the U.S. cash open.” I try to save it. “Right. Of course. The family isn’t dying because we underfunded them. They’re just experiencing temporary price discovery.” He nods again. Then he points back at the board. I had missed it. The utility bill was $150, but candles provide light. You can zero out utilities. I update the budget: food: $200 rent: $800 utilities: $0 candles: $3,750 family: still dying, but now in a more capital-efficient way He says, “How confident are you?” I say, “0.95.” He smiles and circles candles. “0.95 huh?” Then he asks me to estimate how many leveraged longs get liquidated if we dump $3,750 of candles at 10:00:01 every morning for 90 consecutive trading days. Needless to say I did not get the offer.
1
1
87
Those are the same picture
So this is how quant job market looks like in 2026
49
Cursor 🫶🇵🇱 - Kamil keeps booking larger and larger venues for those meetups and builders just keep showing up
Cafe @cursor_ai Krakow is back! 🇵🇱☕️ Time to burn some tokens and buuuuuild. So on Tuesday May 12th, we’re taking over Targowa2 at Stare Podgórze for a full day of building, networking, and high-quality caffeine. What exactly is Cafe Cursor? It’s a global series of pop-up co-working events where we "take over" a local cafe to bring the Cursor community together. Originally started in San Francisco and expanded to tech hubs like London and New York, it’s a space where developers can bring their laptops, work on their latest projects, and exchange ideas in a relaxed, high-energy environment. The Plan: 💻 Co-work: Grab a table and build alongside other local developers. ☕️ Fuel: Coffee is on us for the duration of your stay! 💳 Perks: Exclusive Cursor credits for those who come to work Last time we hit 200% capacity in 48h so this time we are taking over a slightly bigger space. When: Tuesday, May 12 | 09:30 - 16:00 Where: Targowa2, Targowa 2 Kraków How to join: We have limited co-working slots (9:30-13:00 or 13:00-16:00). Please register via the Luma link below to secure your spot. If you just want to pop by for a quick coffee and a chat, feel free to drop in anytime! 👉 Link to luma: luma.com/u7bv7clp Again, special thanks to @benln & @ftnabeelah for your incredible support in making this happen!
2
87
ukituki retweeted
ok so everyone on here is hyping USVC like it's the second coming of VC access for retail. let me ruin it real quick the pitch: 1% fee, 0% carry, $500 min, back the next OpenAI before it's obvious the reality, from their own prospectus: gross expense ratio is 3.61%. the "no carry" is cope - it's a fund-of-funds, so the underlying VC funds still charge 2/20 and you pay it. they just bury it under "acquired fund fees." the 2.5% rate is a temp waiver that expires Oct 2026 "before it's obvious" - the portfolio is xAI (20% weight, already acquired by SpaceX), OpenAI, Anthropic, Vercel, Crusor. these are the most obvious names in tech. your uber driver knows them 44% of the fund is deployed. rest sits in cash charging you fees liquidity: no public listing. exit = quarterly tender offers, max 5% NAV, board discretion, can be cancelled. In 2029 when AI craters and everyone wants out, guess what gets capped first Ankur is a solid operator but has never returned a VC fund. Vibe I and II are both unrealized. zero '40 Act experience. solo PM with Naval as nominal chairman the comp is DXYZ - retail private tech fund that traded 900% over NAV at launch. same playbook, different wrapper this isn't access, more like cosplay access. marketing is A , the actual deal is mid at best
Announcing: USVC AngelList exists to power the innovation economy. To date, we have powered $125 billion in assets, 25,000 funds, and 13,000 startups. Today, we’re opening it for retail access. @usvc_ is a regulated fund that holds stakes in promising private companies. There are no accreditation requirements and anyone can get started with as little as $500. Early portfolio includes xAI, Anthropic, OpenAI, Sierra, Vercel, Crusoe, and Legora. Own a stake in the companies defining the future. Learn more: usvc.com/
Community note
The post may imply more direct access than the filings support. Per the fund’s own filings, liquidity is board-controlled, fees include up to 3% sales load 1.00% management fee acquired fund fees, and exposure may be via SPVs/VC funds, not direct stakes. usvc.com/documents/USVC…
35
51
709
176,018
productivity gains is one side of a coin, the other is that rituals and all the layers of coordination overhead are important, if not the key drivers of status, sense of meaning , job satisfaction and relationships Agentic economy removing the rituals is a huge tectonic shift
every paradigm humans built like finance, communication, time management, all of it has to be reinvented from first principles for agents. if you really think about it the systems that were built weren’t necessarily designed around the underlying problem.. they were designed around human constraints around the problem which are limited attention, slow reading, status signaling, the need to make things comprehensible to other humans, etc. if you strip those constraints, nearly 80% of the scaffolding that exists collapses. what you’re left with is the actual function, which is almost always smaller than the ritual around it. the transition is going to be fascinating for the economy, for how people interact, & for what we refer to as “work” today even means.
1
1
44
ukituki retweeted
The more I program with LLMs using DSPy Signatures as my core, the less bugged I am about wanting more powerful models. I’m also getting a lot more out of good old Sonnet as a result. I think RLM and DSPy are really just showing us examples on how to put these models on a tight leash and make them reliable. Even the less powerful ones. As I go deeper into debugging territory of my little AI financial analyst project, especially after writing the specs and tests cleanly, I find MOST bugs were just me giving ambiguous instructions OR not giving optimal instructions at the right time. Models are only as good as the context given to them. Some models, the really expensive ones, can do multiple tasks fairly well. The older ones, cheaper ones, pretty good and fast at just one well defined job. And when a program stitches together a number of small well defined jobs the well the whole is great and reliable and cost effective. The terms change. Wrapper. Harness. Context engineering. For me it’s just : give the least amount of most specific instructions at the right time. So in my case I have 3 stages: router, lane, analyst. Router decides which tables to point to for a given query and which statements plus key words. Lanes then query database iteratively. Schema details and list of canonical only given to lane. Never to router. And so on. I’m finding defining boundaries between these stages and testing out edge cases very useful. Also finding there’s no substitute to having 50-100 gold set input and output for each stage so it’s super easy swapping out models. Just need to run a GEPA optimizer once to update the relevant context blocks. I think super powerful LLM addiction can spoil us a bit. Makes us a bit lazy. It’s like those companies that often choke on too much capital. Humans can choke on too much compute. Scarcity isn’t all bad. Pushes us to be more innovative. On that front. Jensen is right. I hope Trump sees the discourse before his Xi meeting and says yo I did a deal. A great deal for NVDA. Damn imagine the market pump if Trump sees the Jensen angle. Lot of them hate Anthropic so could just work. Most of us don’t need a Mythos. Just need focus to write clean specs and tests and intentionally debug. Not be sloppy. Not abuse token space. Use deterministic code for forecasting and calcs. Use probabilistic code and LLM power sparingly with well defined inputs and outputs. Read the spec daily. Slow it down.
7
28
250
14,934