ukituki

ukituki

569 Photos and videos

Tweets

Pinned Tweet

ukituki @ukituki

2 Nov 2023

Not your weights, not your algo, not open ai. We know that nondeterminism is a feature, but llms randomly leaking training data will be an important vector of attack

Alex Ker 🔭

@thealexker

2 Nov 2023

Wild: GPT-3.5 leaked a random dude's photo in the output... Lesson: what you upload online will probably become training data.

0:07

Community note

ChatGPT itself did not leak this image, as it was already on the internet as of the 7th of December 2016. It was uploaded to Imgur here: imgur.com/YFRoBdF (view on PC to see upload date

1,603

ukituki

ukituki @ukituki

Jun 8

From tokenmaxxing to tokenflation

Christopher Potts

@ChrisGPotts

Jun 8

Does a token buy you more or less now than it did a few months ago? We built a consumer price index (CPI) for AI coding output from Anthropic's Opus 4.6 model in SWE-chat, Feb 5–Apr 15, 2026. What we find looks like tokenflation:

Line chart titled "Token purchasing power across the engineering basket — Opus 4.6 in SWE-chat." Subtitle: each line shows units of the good 1 token buys, relative to a Feb 5–24 baseline (1.00×); knowledge capture (teal) erodes least. The full-basket composite ends at 0.23× purchasing power (95% CI [0.18, 0.28]) = 4.38× more tokens per unit. The y-axis is log-scaled "output per token," from ~0.15× to 1×. The x-axis spans five time windows A–E (Feb 05 to Apr 15), labeled as phases: Pre-mystery climb, Mystery climb, Post-mystery climb. Colored lines track five goods: agent-drafted code, PR shipped, file touched, agent-drafted docs, and knowledge capture; a thick black "COMPOSITE" line with a gray 95% CI band trends downward from 1× to 0.23×. Knowledge capture rebounds to 0.37×. Reasoning effort shifts from "high" to "medium" to "high" across phases.

ALT Line chart titled "Token purchasing power across the engineering basket — Opus 4.6 in SWE-chat." Subtitle: each line shows units of the good 1 token buys, relative to a Feb 5–24 baseline (1.00×); knowledge capture (teal) erodes least. The full-basket composite ends at 0.23× purchasing power (95% CI [0.18, 0.28]) = 4.38× more tokens per unit. The y-axis is log-scaled "output per token," from ~0.15× to 1×. The x-axis spans five time windows A–E (Feb 05 to Apr 15), labeled as phases: Pre-mystery climb, Mystery climb, Post-mystery climb. Colored lines track five goods: agent-drafted code, PR shipped, file touched, agent-drafted docs, and knowledge capture; a thick black "COMPOSITE" line with a gray 95% CI band trends downward from 1× to 0.23×. Knowledge capture rebounds to 0.37×. Reasoning effort shifts from "high" to "medium" to "high" across phases.

ukituki

ukituki @ukituki

Jun 5

CUDA nvidy

Jerry Tworek

@MillionInt

Jun 4

If you know some polish language, you will appreciate how much of a miracle CUDA is

ukituki

ukituki @ukituki

May 31

Robinhood but for real

0xflorent.eth

@0xFlorent_

May 31

First white-hat exploit on Ethereum: I unlocked 1,003.62 Ξ ($2,000,000) trapped in a 2016 ICO smart contract for 9 years. The 48 original investors can now claim their funds.

ukituki

ukituki @ukituki

May 31

It rhymes well with classical 12 leverage points for system intervention donellameadows.org/archives/…

Leverage Points: Places to Intervene in a System - The Donella Meadows Project

By Donella Meadows~ Folks who do systems analysis have a great belief in “leverage points.” These are places within a complex system (a corporation, an economy, a living body, a city, an ecosystem)...

donellameadows.org

Visa is doing marketing consults (see pinned!)

@visakanv

May 31

here is some vague abstract advice, may it be weirdly relevant to whatever specific thing you're stuck on

ukituki

ukituki @ukituki

May 31

This is the initial prompt: "Write a classical haiku given the provided inputs." The screenshot shows the new version. This is how @DSPyOSS adds clarity: - express intent in logical building blocks - add your eval criteria dataset - GEPA optimization algo

ALT Example of DSPy-optimized haiku writing prompt using GEPA learning algorithm

918

ukituki

ukituki @ukituki

May 31

Docs (full tutorial): dspy.ai/getting-started/prog… Docs (optimization): dspy.ai/getting-started/gepa… Haiku scoring metric (with 20 sub-checks) by @dbreunig : gist.github.com/dbreunig/228…

Program, don't prompt - DSPy

The framework for programming—rather than prompting—language models.

dspy.ai

ukituki

ukituki @ukituki

May 23

Tokenmaxxing is the symptom 🤫

Ashwin Gopinath

@ashwingop

May 22

Misreading the Bitter Lesson is how agents end up burning fortunes rebuilding context. Expensive amnesia, paid to anthropic in tokens. The fix: semantic state at ingestion, ontology at retrieval, tiny models for traversal, frontier models for judgment.

634

Isha Puri

ukituki retweeted

Isha Puri

@ishapuri101

May 22

It's never made sense to me that RL collapses all reward signals to a single scalar. Today, we fix that! Introducing Vector Policy Optimization: we train models to inherently optimize for the varied nature of a reward vector, creating diverse sets of answers ideal for test time search. Website and code coming soon!

Ryan Bahlous-Boldi

@RyanBoldi

May 22

Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.

716

68,704

Yohei

ukituki retweeted

Yohei

@yoheinakajima

May 20

i'm excited to open source Active Graph: an event-sourced reactive graph runtime for long-running, agents 🔄🧠 events/logs projects a graph. reactive behaviors react and affect the graph. fork-and-diff agent runs. no A2A, no workflows, no DAG site: activegraph.ai docs: docs.activegraph.ai github: github.com/yoheinakajima/act… quick start: pip install activegraph this is an early experiment in a new paradigm for agent architecture 🧪

1:39

538

96,358

ukituki

ukituki @ukituki

May 21

Zuck quoted in Ben Evans presentation 8k ppl fired = system collapses under higher velocity and oversupply It's either: "we really can't handle and integrate all the new opportunities without sacrificing the revenue" or "we need to do less better" ben-evans.com/presentations

ukituki

ukituki @ukituki

May 20

Important, yet intuitive idea: agents need dynamic table of contents to navigate long context tasks

Joshua Gu

@astrogu_

May 20

Recent agentic systems (Claude Code, Codex, RLM, etc.) push context out of the prompt and into the environment (e.g., as files). This helps them maintain long-term knowledge about their goals and functionality. 🚨 While this is a good idea, we show a surprising result: systems that use external environments like this perform much better when given a small, fixed-size, in-context, agent-managed cache that "𝘱𝘦𝘦𝘬𝘴 𝘪𝘯𝘵𝘰" these environments. 🚀 Our paper, 𝗣𝗘𝗘𝗞: 𝙖 𝙨𝙮𝙨𝙩𝙚𝙢 𝙛𝙤𝙧 𝙗𝙪𝙞𝙡𝙙𝙞𝙣𝙜 𝙖𝙣𝙙 𝙢𝙖𝙞𝙣𝙩𝙖𝙞𝙣𝙞𝙣𝙜 𝗮𝗻 𝗼𝗿𝗶𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗰𝗮𝗰𝗵𝗲 𝙛𝙤𝙧 𝙇𝙇𝙈 𝙖𝙜𝙚𝙣𝙩𝙨, introduces this idea. Compared with strong baselines, including RAG, Compaction Agents, and SOTA prompt-learning frameworks, PEEK dominates the cost–quality Pareto frontier: achieving 6.3–34.0% in quality, with fewer iterations and lower cost. Paper: arxiv.org/abs/2605.19932 GitHub: github.com/zhuohangu/peek More in the thread below! (1/N)

Jediwolf

ukituki retweeted

Jediwolf

@Jediwolf

May 14

What happens when you post a real Monet and say it’s AI? The coolest art social experiment I’ve seen in a while. Thank you @SHL0MS

983

3,344

20,903

2,235,810

ukituki

ukituki @ukituki

May 9

Depth-first network is the life hack for the distraction era: a bunch of folks that get it and have aligned incentives beat wide and shallow networks all the time. It’s a meta heuristic that works everywhere: - friends - feedback from the right icp - less tools smart defaults

Incentivising

@incentivising

May 8

Game theory proves that the size of your network is not the most important factor at all. If you have a network of a thousand weak ties with no mutual dependency, it will produce near-zero results most of the time. And when put under pressure, it collapses immediately. You should focus on a network of twelve people with highly overlapping incentives and clear reciprocity structures. It will outperform a grand network every time. That's because the brain's social cognition system can only maintain a high sense of trust with a limited number of people. Beyond that, everything feels transactional. Depth beats width every time. Aligned people outwork the crowd every time.

ukituki

ukituki @ukituki

May 7

Brian Eno’s Oblique Strategies approve this direction. Another banger research from Omar’s lab and RLM is not even a half year old 👌

Omar Khattab

@lateinteraction

May 6

I’ve never been this excited about search. 6-7 years ago, IR got an influx of the paradigms we still use, all enabled by the big headroom MS MARCO and then BEIR created. Then progress slowed. Today, Diane releases perhaps the most ambitious IR benchmark to date: OBLIQ-Bench. Queries in it are meant to be increasingly opaque to current first-stage retrieval paradigms. Oblique queries put the bottleneck very early in the search process, as the relevance of a document to the query is quite latent. I can't wait for core IR research on fundamentally more powerful paradigms for first-stage search to be reignited again. Stay tuned for more stories about this, and read Diane's thread and her paper below!!

ukituki

ukituki @ukituki

Apr 27

Not sure if my feed is full of Jane Street interviews stories because of X algo's overfit or it's just a smart influencer marketing play. Recurring "0.95? Huh" part makes me skew towards the latter. With .95 probability . It's still engaging and fun though

poof

@poof_eth

Apr 26

Had a Jane Street interview in 2013 that still bothers me. It was my 6th round. Final interview. The guy walks in carrying no laptop, no notebook, just a cold brew and what I later realized was a single IKEA tea candle. He writes on the whiteboard: food: $200 rent: $800 utilities: $150 candles: $3,600 family: dying Then he turns around and says, “Optimize.” I laughed because I thought it was a culture-fit bit. He did not laugh. So I said, “Well, obviously you spend less on candles.” He says, “Assume candles are non-discretionary.” Okay. I start building a model. Basic constraint satisfaction. Family survival as a soft penalty. Candles as a state variable. Maybe there’s an arbitrage where you buy wholesale paraffin and convert the $3,600 line item into inventory. He stops me. “You’re thinking like a consultant.” That’s when I knew I was in trouble. He says, “Give me a bid-ask on family dying.” I say, “What?” He says, “You’re long candles, short family. Where do you make markets?” I try to recover. I say the real issue is liquidity: rent and utilities are fixed, food is elastic, candles are emotionally inelastic. Therefore the optimal strategy is to securitize future candle enjoyment and borrow against it. He nods for the first time. Then he asks, “What time do you sell the candles?” I say, “Whenever the market is liquid?” He says, “Be more specific.” I say, “Uh… 10 a.m. Eastern?” For the first time, he smiles. He goes, “Every day?” I say, “Every day.” He says, “In size?” I say, “In size.” He says, “And what do we call that?” I say, “Market manipulation?” The room gets very quiet. He looks disappointed and writes something down. “No. We call it providing liquidity to candle ETFs during the U.S. cash open.” I try to save it. “Right. Of course. The family isn’t dying because we underfunded them. They’re just experiencing temporary price discovery.” He nods again. Then he points back at the board. I had missed it. The utility bill was $150, but candles provide light. You can zero out utilities. I update the budget: food: $200 rent: $800 utilities: $0 candles: $3,750 family: still dying, but now in a more capital-efficient way He says, “How confident are you?” I say, “0.95.” He smiles and circles candles. “0.95 huh?” Then he asks me to estimate how many leveraged longs get liquidated if we dump $3,750 of candles at 10:00:01 every morning for 90 consecutive trading days. Needless to say I did not get the offer.

ukituki

ukituki @ukituki

Apr 27

Those are the same picture

Piotr Pomorski

@PtrPomorski

Apr 27

So this is how quant job market looks like in 2026

ukituki

ukituki @ukituki

Apr 27

Cursor 🫶🇵🇱 - Kamil keeps booking larger and larger venues for those meetups and builders just keep showing up

Kamil Stanuch

@KamilStanuch

Apr 27

Cafe @cursor_ai Krakow is back! 🇵🇱☕️ Time to burn some tokens and buuuuuild. So on Tuesday May 12th, we’re taking over Targowa2 at Stare Podgórze for a full day of building, networking, and high-quality caffeine. What exactly is Cafe Cursor? It’s a global series of pop-up co-working events where we "take over" a local cafe to bring the Cursor community together. Originally started in San Francisco and expanded to tech hubs like London and New York, it’s a space where developers can bring their laptops, work on their latest projects, and exchange ideas in a relaxed, high-energy environment. The Plan: 💻 Co-work: Grab a table and build alongside other local developers. ☕️ Fuel: Coffee is on us for the duration of your stay! 💳 Perks: Exclusive Cursor credits for those who come to work Last time we hit 200% capacity in 48h so this time we are taking over a slightly bigger space. When: Tuesday, May 12 | 09:30 - 16:00 Where: Targowa2, Targowa 2 Kraków How to join: We have limited co-working slots (9:30-13:00 or 13:00-16:00). Please register via the Luma link below to secure your spot. If you just want to pop by for a quick coffee and a chat, feel free to drop in anytime! 👉 Link to luma: luma.com/u7bv7clp Again, special thanks to @benln & @ftnabeelah for your incredible support in making this happen!

gemchanger

ukituki retweeted

gemchanger

@gemchange_ltd

Apr 22

ok so everyone on here is hyping USVC like it's the second coming of VC access for retail. let me ruin it real quick the pitch: 1% fee, 0% carry, $500 min, back the next OpenAI before it's obvious the reality, from their own prospectus: gross expense ratio is 3.61%. the "no carry" is cope - it's a fund-of-funds, so the underlying VC funds still charge 2/20 and you pay it. they just bury it under "acquired fund fees." the 2.5% rate is a temp waiver that expires Oct 2026 "before it's obvious" - the portfolio is xAI (20% weight, already acquired by SpaceX), OpenAI, Anthropic, Vercel, Crusor. these are the most obvious names in tech. your uber driver knows them 44% of the fund is deployed. rest sits in cash charging you fees liquidity: no public listing. exit = quarterly tender offers, max 5% NAV, board discretion, can be cancelled. In 2029 when AI craters and everyone wants out, guess what gets capped first Ankur is a solid operator but has never returned a VC fund. Vibe I and II are both unrealized. zero '40 Act experience. solo PM with Naval as nominal chairman the comp is DXYZ - retail private tech fund that traded 900% over NAV at launch. same playbook, different wrapper this isn't access, more like cosplay access. marketing is A , the actual deal is mid at best

AngelList

@AngelList

Apr 22

Announcing: USVC AngelList exists to power the innovation economy. To date, we have powered $125 billion in assets, 25,000 funds, and 13,000 startups. Today, we’re opening it for retail access. @usvc_ is a regulated fund that holds stakes in promising private companies. There are no accreditation requirements and anyone can get started with as little as $500. Early portfolio includes xAI, Anthropic, OpenAI, Sierra, Vercel, Crusoe, and Legora. Own a stake in the companies defining the future. Learn more: usvc.com/

Community note

The post may imply more direct access than the filings support. Per the fund’s own filings, liquidity is board-controlled, fees include up to 3% sales load 1.00% management fee acquired fund fees, and exposure may be via SPVs/VC funds, not direct stakes. usvc.com/documents/USVC…

709

176,018

ukituki

ukituki @ukituki

Apr 20

productivity gains is one side of a coin, the other is that rituals and all the layers of coordination overhead are important, if not the key drivers of status, sense of meaning , job satisfaction and relationships Agentic economy removing the rituals is a huge tectonic shift

signüll

@signulll

Apr 19

every paradigm humans built like finance, communication, time management, all of it has to be reinvented from first principles for agents. if you really think about it the systems that were built weren’t necessarily designed around the underlying problem.. they were designed around human constraints around the problem which are limited attention, slow reading, status signaling, the need to make things comprehensible to other humans, etc. if you strip those constraints, nearly 80% of the scaffolding that exists collapses. what you’re left with is the actual function, which is almost always smaller than the ritual around it. the transition is going to be fascinating for the economy, for how people interact, & for what we refer to as “work” today even means.

Asfi

ukituki retweeted

Asfi

@AsfiShaheen

Apr 17

The more I program with LLMs using DSPy Signatures as my core, the less bugged I am about wanting more powerful models. I’m also getting a lot more out of good old Sonnet as a result. I think RLM and DSPy are really just showing us examples on how to put these models on a tight leash and make them reliable. Even the less powerful ones. As I go deeper into debugging territory of my little AI financial analyst project, especially after writing the specs and tests cleanly, I find MOST bugs were just me giving ambiguous instructions OR not giving optimal instructions at the right time. Models are only as good as the context given to them. Some models, the really expensive ones, can do multiple tasks fairly well. The older ones, cheaper ones, pretty good and fast at just one well defined job. And when a program stitches together a number of small well defined jobs the well the whole is great and reliable and cost effective. The terms change. Wrapper. Harness. Context engineering. For me it’s just : give the least amount of most specific instructions at the right time. So in my case I have 3 stages: router, lane, analyst. Router decides which tables to point to for a given query and which statements plus key words. Lanes then query database iteratively. Schema details and list of canonical only given to lane. Never to router. And so on. I’m finding defining boundaries between these stages and testing out edge cases very useful. Also finding there’s no substitute to having 50-100 gold set input and output for each stage so it’s super easy swapping out models. Just need to run a GEPA optimizer once to update the relevant context blocks. I think super powerful LLM addiction can spoil us a bit. Makes us a bit lazy. It’s like those companies that often choke on too much capital. Humans can choke on too much compute. Scarcity isn’t all bad. Pushes us to be more innovative. On that front. Jensen is right. I hope Trump sees the discourse before his Xi meeting and says yo I did a deal. A great deal for NVDA. Damn imagine the market pump if Trump sees the Jensen angle. Lot of them hate Anthropic so could just work. Most of us don’t need a Mythos. Just need focus to write clean specs and tests and intentionally debug. Not be sloppy. Not abuse token space. Use deterministic code for forecasting and calcs. Use probabilistic code and LLM power sparingly with well defined inputs and outputs. Read the spec daily. Slow it down.

250

14,934