Strategy | innovation | knowledge creation in science, engineering, and technology | organization | cognitive science | AI

Joined July 2011
1,233 Photos and videos
Brian Gordon retweeted
space exploration technologies might be the hardest company name of all time. insane cultural victory for tech that the company with the most definite optimist vision of the future is the biggest ipo in history
9
14
212
8,591
Fable 5 shows the BigAI response to commodification, I think. The strategy is 2-fold: Teecean control of complementary assets - inference & gated access to the frontier. Access will increasingly require shared economics as models become a fourth factor of production.
I got a lot of flack when I said this before, but I continue to think that model providers will end up being a lot like airlines: Critical for the global economy but highly commoditized, with huge capex requirements and low (and often negative) margins.
1
167
This is predicated on their being hard to replicate data advantages arising out of the exhaustion of the public corpus and restricted distillation, two big assumptions. But if we are moving to a world where there are isolating mechanisms with respect to model capabilities, then…
86
Brian Gordon retweeted
There’s a big, under-appreciated reason why people may have very different experiences and opinions about using AI for work — are they using it for tasks they’re already an expert at, or tasks they can’t do themselves? The former leads to a *growth cycle* and the latter leads to a *dependence spiral*. When I use AI to do something I’m an expert at, like coding, I treat it as a tool. I can build quickly, maintaining an understanding of the code, knowing that if necessary, I can fix the code myself. It feels empowering. It frees up my time to think about the complex, judgment-oriented parts of software engineering that I can’t or won’t delegate to AI. That means my own skills improve rapidly, and I get to climb the ladder of complexity and develop higher-level skills, much more so than when I write the code myself. I feel in control. I can lock in and achieve a flow state — when AI is working, I’m reviewing, building understanding, and planning the next steps. I never get the feeling that the tool is about to replace me. This is the growth cycle. (Of course, the growth cycle is not automatic. I still need to exercise agency to use AI responsibly. But it’s the same challenge with any productivity-enhancing technology, and those who’ve navigated such transitions before are well-equipped to navigate it with AI as well.) On the other hand, if I use it for tasks I don’t understand and haven’t learned to perform myself, I have no choice but to treat it as a superintelligence. If something breaks, the best I can do is ask AI to fix it and hope for the best. I generally can’t evaluate the quality of the output myself. The only way to find out if it's any good is if and when the work is ultimately reviewed by an actual expert. The experience is confusing, unsettling and disempowering. And forget about flow state. By over-relying on AI, I risk losing whatever skill I had at the task in the first place, even if it boosts productivity in the short term. This is the dependence spiral. It’s no wonder that entry-level workers and students preparing to enter the workforce find themselves in a bind. To compete with the AI-enabled productivity of more seasoned workers, they must adopt AI themselves, but doing so risks the dependence spiral. I have some thoughts on solutions that I will share in later posts, but I think having a clear diagnosis of the problem is a useful first step.
36
50
228
29,970
If I had to guess, I would say the BIgLabs seem to moving towards an economic model where access to state of the art capabilities are contingent on some form of partnership with the lab, where governance, oversight, and economic stakes by BigLab are necessary for access.
The issue isn't the existence of safeguards. It is that: - the classifier is terrible, exceedingly trigger happy, unusably so for many - silently degrades responses if it's about AI - captures all user data These are *actively* bad, not just a mistake. There are real tradeoffs in safety, and this release chose none of those. It basically nerfed the model in the most blatant way possible, taking none of the nuances into account. AI safety researchers can't use it. Bio researchers can't use it. Cybersec researchers can't use it. Even by the system cards own admission this isn't in "immediately develop superweapons" territory, which makes it even more egregious. They did it because they can. Which invites scrutiny, how can you trust anthropic to do the right thing when it counts? We just had this argument about their fight with DoW. That anthropic didn't want to be the final arbiter, just wanted safety. This is the opposite, they really do want to be the final arbiter.
2
1
328
Inference and SOTA capabilities are new factors of production, up there with labor, land, and capital. They will get their due, ultimately. Counterfactual AI Safety concerns, likewise, push for deeper governance oversight claims by BigLab and government/regulatory activism.
99
Brian Gordon retweeted
The issue isn't the existence of safeguards. It is that: - the classifier is terrible, exceedingly trigger happy, unusably so for many - silently degrades responses if it's about AI - captures all user data These are *actively* bad, not just a mistake. There are real tradeoffs in safety, and this release chose none of those. It basically nerfed the model in the most blatant way possible, taking none of the nuances into account. AI safety researchers can't use it. Bio researchers can't use it. Cybersec researchers can't use it. Even by the system cards own admission this isn't in "immediately develop superweapons" territory, which makes it even more egregious. They did it because they can. Which invites scrutiny, how can you trust anthropic to do the right thing when it counts? We just had this argument about their fight with DoW. That anthropic didn't want to be the final arbiter, just wanted safety. This is the opposite, they really do want to be the final arbiter.
Seeing a lot of Fable safeguards hate on the timeline, but "what did y'all think [AI safety] meant? vibes? papers? essays?" The reality is that there are real tradeoffs in AI safety. Anthropic deserves credit for aggressive resolution of these tradeoffs in favor of safeguards for a model that it believes (and is in fact) is a step-change in vulnerability research capability. It's kind of difficult to justify coercive proactive harm mitigation, especially in a libertarian-ish society, but we clearly see the value in mandatory vaccination programs or beatcop policing or surveillance cameras. We should applaud Anthropic for being one of the few institutions in American public life that actually follows through on its convictions, including in implementing really aggressive monitoring, squelching of AI development work (already accounted for in its ToS -- I think the clandestinity is cool too), and exclusionary limits on use for information security-related queries. The whole point here is that we do not have herd immunity here: our network edge devices, authentication apps/services, and productivity software are extremely vulnerable, not sandboxed, and lack introspection capabilities. We need programs like Glasswing, better cross-company threat detection, and a more effective APT exploitation strategy before we democratize such a robust vuln research capability. The counterfactual here is that MSS contractors use VPS to access Fable, find jailbreaks for weaker safeguards, and use the system to build an active directory exploit that enables remote access to every O365 app. Not so bueno, huh? This is incredibly hard; Anthropic may not have calibrated every safeguard correctly this time, but there'll be learning. Model release cycles are getting more concise: they will adapt as they better understand and mitigate risks and competitive pressures manifest. Histrionic claims of anti-competitive behavior and safetyist hysteria are victim to precisely the error that is being alleged.
6
10
131
6,492
Brian Gordon retweeted
Sorry. I got to say this publicly. I really agree with @karpathy point here. My wife @mioana is a leading economist and we discuss this all the time. The singularity think of the AI community is rather misguided.
Andrej Karpathy thinks AGI's impact on the economy will just be folded into the existing rate of growth. AI will be barely noticeable in GDP statistics. When he came on the show, I pushed back, saying AGI will cause a massive jump in productivity and growth. Watch our back-and-forth on this:
18
21
190
35,622
Brian Gordon retweeted
Scientific research is fundamental to advancing civilization and helping people globally to solve the most critical problems, from medicine to materials, from brain science to physics, and much beyond. This is only possible when scientists have access to the best tools of the time to conduct scientific research, including having access to AI-based tools.
119
467
3,075
189,611
Brian Gordon retweeted
Imagine building a computer and not allowing its use in CS research. Thats some dystopian shit.
43
116
2,169
90,359
Brian Gordon retweeted
Degrading performance on ML research *without telling the user* is shockingly hostile and a terrible look. That could silently damage all sorts of work, including some of my own. Also the type of thing that could raise the eyebrows of antitrust enforcers worldwide.
Labs starting to pull up the ladders on the ability to diffuse AI was inevitable. Doing it without telling the user is misaligned.
46
122
1,442
107,214
Brian Gordon retweeted
you're not dreaming big enough
8
36
261
9,425
Many such cases
Replying to @rao2z
Those who don't know the history of their field are doomed to repeat it 🙄
1
1
2
191
I am confused as to why it is surprising that throwing oodles of test time compute can increase the solvability horizon (modulo a good verifier in the loop).. wasn't this known even from as far back as depth-limited minimax?
5
3
28
5,340
There are two AGI problems, the 0 to 1 problem (building an intelligent machine > some capability threshold) and the 1 to n problem (building heterogeneous artificial intelligences). Solving the first doesn’t necessarily solve the second.
Students without access to LLMs are 2 to 8 times more creative than students with access. That is the finding of a new paper comparing 2,200 college admissions essays written by humans before ChatGPT with essays generated by GPT-4. The key point is not individual creativity. GPT-4 can write well, sometimes better than individual students. The problem is collective creativity. Each new human essay added new semantic territory. New ideas. New angles. New experiences. New combinations. Each new GPT-4 essay added much less. The authors call this the diversity growth rate: how much novelty each additional text contributes to the collective pool of ideas. Humans kept expanding the pool. GPT-4 made the pool converge. Even when the authors pushed GPT-4 to be more creative, changed parameters, or used chain-of-thought prompting, the homogenizing effect remained. This is the real danger of AI in education. Not that students will write worse. That everyone will write the same. * Full paper in the first reply
2
271
And a series of autocatalytic, but not recursively self-improving, improvements can have massive economic consequences. The 1st, 2nd, and 3rd industrial revolutions, printing, urbanization, agriculture.
Appreciate Anthropic’s intellectual honesty in their observed autocatalytic effects. Autocatalytic does not presage RSI. Computers, compilers, and the Internet all had autocatalytic properties.
1
2
101
Brian Gordon retweeted
Appreciate Anthropic’s intellectual honesty in their observed autocatalytic effects. Autocatalytic does not presage RSI. Computers, compilers, and the Internet all had autocatalytic properties.
5
9
102
7,869
Brian Gordon retweeted
A pause continues to be utter and complete nonsense and it always will be. 1. Let's make planes safer by not making planes! 2. What exactly happens in a "pause"? Do labs get to keep working and we all just get to sit on our hands waiting for this eureka moment? I guess someone gets to keep their checks while we just bring the economy to a crashing halt based on a few people's vague ideas about some imaginary future problems that they came up with while huffing glue and reading Dune! 3. Who will fund the labs when they are not putting out new products? I guess VCs will continue to just give them 100s of billions out of the goodness of their hearts! 4. What actually justifies a pause? Apparently, the wonderful world of imagination! Theoretical future problems that haven't happened yet. Like massive job losses! Um, jobs are increasing including in the areas most affected by AI, like coding, so I guess not that. So what? I know the jobs apocalypse is coming because I can imagine it and imagination is reality, right? Maybe advanced AI weapons? Ah, so the government will pause weapons research too? Well no, they will keep doing that anyway because they always do that. It's what governments do. Okay so we're going to ban Chat Bots while the government keeps making weapons? That should solve everything we're worried about with AI! Well then what about recursively improving models that grow to superintelligence overnight? Yeah that's not really a thing. That's the plot of an Avengers movie. Models are bound by the same real world constraints we are like compute (brains/chips) and time (will this drug have side effects in twenty years can only be known in twenty years) and fuzzy multiplicity (not right or wrong but right-ish and wrong-ish means you can't make a reward signal for "is this the right decision for my business") and the subject/object paradox (the thing improving is judging its own improvement. Yeah chew on that one for a bit.) But I imagined AI overcoming every real world constraint instantly so it's true! 5. How would we know we did everything we needed to do in a pause? How do we know when it's over? We dont. We just want a pause now because we want it! Don't you see my pause ⏸️ emoji? It's nice right! 6. Who gets to decide we are ready to unpause? The government or the people! Great, because vague ass platitudes like this always go well for concrete policy design. Looks, none of this is real. It's theater. It's not real policy. It has no basis in reality. It's a mass hallucination. It's pushed by people who believe in magic and magical solutions. And anything that comes out of it will do infinitely more damage than the imaginary thing they were trying to protect us from in the first place.
Jun 8
now on the eve of RSI it seems everyone is more mutual conditional pause agreement pilled than they used to be and that seems like a good development
19
21
132
39,913
I guess I would say 2 things. First, it’s not like we know we have a dominant design yet. Second, there are foundational parts of the human cognitive repertoire that haven’t been cracked via scaling and there is no reason to be optimistic they will. Innovation remains imperative,
I feel like the obsession with continual learning / sample efficiency leads the field in the wrong direction. It's the bad career strategy of focusing on addressing your weaknesses instead of maximizing your strengths. Yes, there is an existence proof in the human brain, but it doesn't by any means guarantee that that'll be the most interesting AI. It may require $100T of R&D on chips and AI methods to get that unlock. On the other side of things, it's obvious that the coming models are extremely transformative and built on technologies that we already have. There's great reason to focus on just maximizing this. In reality, this is what the frontier labs are doing. They're going as fast as possible down the current development tree. This is good for progress and mixed for safety/geopolitics. Things like "automate white color work" and "replace the AI researcher job" are the guesses of labs because it's super hard to imagine futures for what these dramatic technologies will be. Don't take the labs too seriously about this being the exact goal. The exact goal is to push the frontier and monetize later. Solving continual learning, sample efficiency, etc would be great, but its trying to predict when a scientific breakthrough will come instead of trying to grapple with how the 100% sure thing coming technological revolution will change our lives. This isn't to say the Dwarkesh post is bad, it addresses some reasonable critiques, but it is the least bitter lesson pilled thing to be obsessed with human intelligence and how that can inform AI. We are in the AGI era of research. This is about embracing the unknown, scaling resources, and seeing what is enabled by making a series of magical tweaks to complex recipes that build frontier models. Lean into the alchemy. (it should be pretty clear that I personally, investing in open research agree we need fundamental science -- just not agreeing that this is what the "cutting edge of the frontier" is governed by)
86
Brian Gordon retweeted
You’d be shocked by how many people in think tanks/academia/government/“strategic classes,” including in the U.S., are convinced that Chinese models are now “good enough” and leading the world in adoption. Meanwhile, the reality I see is a fairly wide, and still widening, gap.
A thread with a good collection of hard/private/OOD evals where the Western frontier is comprehensively dunking on Chinese/open source models and it's not remotely close.
24
21
322
60,136