qrdl

qrdl

192 Photos and videos

Tweets

Pinned Tweet

qrdl

@QRDL

31 Dec 2024

@VitalikButerin @shayne_coplan @mansourtarek_ @giancarloMKTS @aosipovich @Polymarket @metaculus @kalshi I'm a huge believer in Prediction Markets. It is free speech, but with accountability. Messengers from the future, warning us of the consequence of our actions. But the markets have significant negative social utility when they are subjective and opaque. It can not and it must not be just about gambling. Prediction Markets have perverse incentives to spread disinfo and manipulate outcomes, so more work needs to be done to surface authentic signal and increase transparency to stop bad faith actors. Polymarket, metaculus and kalshi are the leaders in this industry. Polymarket is arguably the leader and best in terms of transparency and free speech, but has serious problems with bad actors and poorly written subjective markets. Metaculus lack of trading means it doesn't have bad actors, and the rulesets are very well done, but their poor transparency is devastating and destroys a great deal of potential value. They also have poor accuracy and no emergent signal / breaking news because of their minimal incentives. Kalshi has reasonable rulesets, but too many bad actors which is a result of their weak transparency. Through no fault of their own, DCM status suppresses their ability to embrace free speech. I am hopeful, and perhaps even a little optimistic, that 2025 will be the year when Prediction Markets grow up. They will realize their greatest strength is their openness, that they have a profound responsibility to make the world better, and they are not here to just facilitate the transfer of wealth from the naïve and gullible to the sharp and cunning.

634

qrdl

qrdl

@QRDL

Jun 15

Very weird, chatgpt using the term 'sovereign' in negative connotations, when I am trying to use the term "autonomous". Could be just a strange coincidence though, anyone else seeing this?

qrdl

qrdl

@QRDL

Jun 14

lol, yeah, what happens when Bessent realizes you could probably do this all with just more tokens on DeepSeek

Simon Willison

@simonw

Jun 14

I'm just glad nobody at the US government thought to try that Fable 5 "jailbreak" against Opus 4.x or GPT 5.x, or I wouldn't be getting anything useful done this weekend at all

qrdl

qrdl

@QRDL

Jun 14

Could it be that the USG shut Fable down because anthropic refused to put an exception for 0 days that the USG itself was stockpiling?

Teknium 🪽

@Teknium

Jun 13

Replying to @DavidSacks

Also FYI thousands of "Cyber Weapons" (aka, viruses, trojans, hacking tools, etc) are all available for free and open source. It has historically and rightly been a thing we allow and encourage so that defense can outpace offense. I know the government loves hiding zero day exploits from companies and the public so they too can exploit people, but this is just silly.

Clash Report

qrdl retweeted

Clash Report

@clashreport

Jun 13

Canadian PM Mark Carney: Canada, Ireland, and Europe can be pivotal, powerful, and purposeful, a force for good. Together we are powerful because we have the capacity to act together. Combined, the population is more than twice that of the United States. We have a larger cultural export industry and a more diverse one. A similarly sized GDP. Comparable R&D spend. Our collective defense budgets are twice that of China’s. We're home to the majority of the world's top 100 universities and over half of the world's Nobel Prize winners. So together we are one of the largest economic, cultural, technological, financial blocs in the world. We are and can be a force for good because we safeguard the values of human rights, dignity, and pluralism that our people hold dear.

2:03

286

38,519

qrdl

qrdl

@QRDL

Jun 13

That's always the trick, microdose the Kool-Aid

Matt Shumer

@mattshumer_

Jun 13

Most people have too little AI psychosis. A few have way too much. The ones who find the sweet spot will make miracles.

qrdl

qrdl

@QRDL

Jun 13

Wait, what? So mathematicians aren't doing anything valuable?

Alex Kontorovich

@AlexKontorovich

Jun 12

My prediction from last summer was that the number of frontier AI models getting a gold medal at this summer’s IMO will be… zero! The reason is that they won’t bother to compete, it’ll simply be beneath them. If anyone can now push a button on Codex / Claude Code and get a perfect score, what’s the point? No, they’ll just leave the 17 year olds to take the test on their own. (The open source models will still compete for another year or so. That’s my guess!) Similarly, I think the labs pushing “research math” is also a fad that will expire soon enough. Think about it. GPT solved a major problem (Erdos unit distance); what they’re not reporting is the 1000 other problems they attacked and failed to make progress. [That’s not exactly deception; I also don’t report the dozens of things I tried to prove and failed…] They’re also not reporting the millions of dollars all of this cost them, and for what? Right now the “for what” is advertising: they’re signaling that they’re the best model for math, so you should use them for whatever your reasoning task is. Math departments also spend millions of dollars and produce theorems, but that is their actual end goal. A tech company is happy with a million-dollar theorem only if it predicts a billion-dollar application somewhere else. Once the bubble bursts, investors will want “real” applications from AI, new drugs, self driving / flying cars, etc etc. Nobody will care that the systems are also useful at proving theorems. Nobody but us mathematicians. So like the IMO, I think the frontier labs will get bored of theorems, and will leave us humans alone to keep doing math (and they’ll give us an amazing tool with which to do it!). Does that make sense? What do you think?

qrdl

qrdl

@QRDL

Jun 13

ofc, there is a very very simple explanation for all this and everything is over thinking it, including anthropic: Fable 5 did something that legitimately freaked someone out.

qrdl

qrdl

@QRDL

Jun 12

Better: chatgpt.com/c/6a25de89-e8d0-… My top five by likely human leverage P vs NP, one-way functions, and average-case hardness, because they define the boundary between feasible computation and infeasible search. Navier-Stokes, turbulence, and hydrodynamic limits, because fluids control climate, transport, energy systems, and engineering. Post-quantum cryptographic hardness, because civilization-scale digital security is already migrating toward assumptions that still need deeper mathematical grounding. Mathematics of reliable AI, because the gap between empirical capability and provable reliability is becoming a central safety and engineering problem. Quantum many-body theory, especially quantum PCP, area laws, and Yang-Mills, because these determine what can be simulated, what quantum computers can do, and how rigorously we understand fundamental physics.

This tweet is unavailable

1,184

qrdl

qrdl

@QRDL

Jun 13

And Acer blocked me for this.. Bizarro

qrdl

qrdl

@QRDL

Jun 12

It says acer has blocked me, but that would be very weird, right?

clem 🤗

qrdl retweeted

clem 🤗

@ClementDelangue

Jun 10

Concentration of power, capabilities and economic wealth is the biggest risk in AI. We need open science and open-source more than ever!

111

481

3,092

161,997

qrdl

qrdl

@QRDL

Jun 8

Yes, except now do it for revenue. I mean.. face palm

nxthompson

@nxthompson

Jun 7

This is a pretty striking shift toward Chinese models by American AI startups since the start of the year. substack.com/@profgmarkets/p…

nxthompson

qrdl retweeted

nxthompson

@nxthompson

Jun 7

This is a pretty striking shift toward Chinese models by American AI startups since the start of the year. substack.com/@profgmarkets/p…

150

407

2,057

579,882

Daniel Litt

qrdl retweeted

Daniel Litt

@littmath

Jun 6

My general view has always been that it's not worth rushing to be the first to do work that someone else could do next week; clearly the marginal impact of such work is low. Worth thinking about this in the rush to publish results where the intellectual labor was done by AI.

416

24,709

prinz

qrdl retweeted

prinz

@deredleritt3r

May 19

I/O 2026 should help us decipher Google's direction in the wake of the recent wave of model releases by Anthropic and OpenAI. Those who have been paying attention know that Demis Hassabis has been generally skeptical of the research direction being pursued by Anthropic and OpenAI - i.e., coding agents leading to acceleration and eventually full automation of AI research. Instead, Google has been pursuing its own separate 5-to-10-year track to AGI, to be achieved through continual learning, world models and a link to the physical world (robotics). Just 4 months ago, in Davos, Hassabis spoke about the "limit[s on] how fast the self-improvement systems [i.e., those being pursued by Anthropic and OpenAI] can work" (see the screenshot below). But now the pace of releases by Anthropic and OpenAI has become relentless. It is clear that AI (Codex and Claude Code in particular) is significantly accelerating the pace of AI research at these two labs. And we have recently heard rumors that an important faction at Google - led by none other than Sergey Brin - is not happy about these developments. Brin has allegedly formed a "strike team" at Google, tasked with achieving “AI takeoff or AI that can improve itself" through improvement in Google's AI coding abilities. For those paying attention, this is the exact path to fully automated AI research and RSI that is currently being pursued by Anthropic and OpenAI. And here lies the tension. Two paths are open to Google now. Will Google turn away from the "Hassabis path" and pursue RSI? Or will Google stay on its current path, knowing full well that if OpenAI and Anthropic are wrong and the approach of fully automating AI research does not turn out as fruitful as they had hoped, then Google's lead in areas like world models and robotics may prove to be decisive? Or, finally, is there room (talent, resources, compute) to pursue *both* of these approaches simultaneously? How Google's leadership answers these questions may very well prove decisive to the outcome of the race to AGI. We might get some hints as to where Google is headed starting tomorrow.

409

67,359

qrdl

qrdl

@QRDL

May 10

People keep saying calculators wasn't the end of anything, when talking about AI. But what about the slide rule factories? And sure, cars didn't stop humans progressing, but it did lead to a lot of redundant horses getting slaughtered...

Tibo

qrdl retweeted

Tibo

@thsottiaux

Apr 22

Team is hard at work together with @steipete to make OpenAI models and ecosystem be the obvious way to to enjoy your claw. A lot more to come next week, but a reminder that you can use OpenClaw as part of your ChatGPT subscription today already. (also still having too much fun with ChatGPT Images 2.0 today)

pash

@pashmerepat

Apr 22

I've embarked on a new sprint. My mission is to make OpenAI models feel magical in OpenClaw in the next few weeks. Diving in today, I noticed a bug. When you configured OpenClaw to use the Codex harness with OpenAI models, auth was broken, and the system was silently falling back to the Pi harness. So nobody knew it was broken. Two PRs later (fix the auth bridge, stop the silent fallback), the Codex harness actually works. And the difference is night and day (pic related). Before: the agent didn't feel magical or proactive. It did the exact same shallow loop every heartbeat. Read the heartbeat file, check Discord, see nothing, say HEARTBEAT_OK. It ignored the rest of its instructions. Sometimes it would even reason about doing work and then just... not issue the tool calls. After: full agent loops. It reads its workspace context, interprets the entire checklist, inspects the repo, makes real edits, tries to verify them, and gives honest status reports when things are blocked. Later heartbeats show continuity, it doesn't repeat work, it picks up where it left off. I didn't change any prompting or scaffolding. Just swapped in the codex harness for pi. Lesson here is use the codex harness if you're building with OAI models. A lot more to do but this is a strong start.

227

109

2,487

455,421

Tibo

qrdl retweeted

Tibo

@thsottiaux

Apr 21

I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them. Transparency and trust are two principles we will not break, even if it means momentarily earning less. A reminder that you vote with your subscription for the values you want to see in this world.

Amol Avasare

@TheAmolAvasare

Apr 21

For clarity, we're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected.

580

670

11,787

1,705,322

clem 🤗

qrdl retweeted

clem 🤗

@ClementDelangue

Apr 8

"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." aisle.com/blog/ai-cybersecur…

AI Cybersecurity After Mythos: The Jagged Frontier

When AISLE tested Mythos's showcase vulnerabilities on small, cheap, open-weights models, most found the same bugs. Here's what that means for cyber.

aisle.com

110

333

2,432

726,444

Yann LeCun

qrdl retweeted

Yann LeCun

@ylecun

Apr 9

Replying to @ClementDelangue @guillaumgrallet

Mythos drama = BS from self-delusion.

1,071

61,041