EPISODE 2026-06-24

AI:AM LIVE — June 24, 2026 — Gradual Disempowerment and the Search for a Stable Post-AGI Equilibrium: David Duvenaud

The opening covered a fast-moving week in AI policy and infrastructure: the first federal lawsuit over the BIS export-control order cutting off Anthropic's Fable 5 for foreign nationals; two papers pointing at latent world-models inside RL agents and a 7M-parameter loop model beating much larger systems on hard reasoning; the $8M super-PAC defeat of New York's frontier-AI-safety lawmaker; and a brisk exchange on Claude Tag's launch as multiplayer AI, GLM 5.2's cost advantage over GPT-5.5 Codex, and Claude's UltraCode orchestration mode. David Duvenaud — ML professor at the University of Toronto, co-creator of neural ODEs, former alignment lead at Anthropic, and co-author of the 'Gradual Disempowerment' paper — then joined for a full hour exploring whether any stable post-AGI equilibrium actually exists where humans keep meaningful control. Nathan pressed every optimist steelman: historical absorption (prior automation shocks were absorbed without permanent disempowerment), comparative advantage (Ricardo says humans keep a niche), constitutional and property anchors (the franchise, rule of law, military command), aligned AIs defending human leverage, and the argument that 'gradual' gives time to correct. Duvenaud's rebuttal to each was consistent: the disempowerment mechanism doesn't require malice or misalignment — it requires only that the systems driving growth stop needing human participation, the way human civilization doesn't need the monkey economy despite occasionally trading bananas with them. He described the 'Earth as a slow zone' scenario — throttled AI growth, bans on recursive self-improvement, no cultural optimization — and argued that when you enumerate everything it requires controlling (research, startups, reproduction, memetics), the list is horrifyingly long, analogous to listing all the mutations that can cause cancer. On timelines, he sketched white-collar automation first, then a decade-plus to build enough robot factories to displace physical labor, putting full human economic irrelevance perhaps 15–20 years out. His two concrete recommendations: restrict frontier compute at the TSMC/fab choke point, and cultivate the temporal coherence of public preferences by chaining the 'is it okay if humanity disappears?' question forward to one's own children and grandchildren until a coherent answer emerges.

▶ Full show on YouTube 𝕏 Live broadcast

The June 24 show opened on a packed news cycle: the first federal lawsuit challenging the BIS export-control order that cut Anthropic's Fable 5 and Mythos 5 off for foreign nationals; two papers arriving together that suggested RL agents secretly encode recoverable world models in their value functions and that a 7-million-parameter looped model can beat much larger systems on hard reasoning benchmarks; the $8M super-PAC defeat of New York state legislator Alex Bores, who had authored frontier-AI-safety legislation; and a lively segment on Anthropic's Claude Tag launch as the first instance of 'reliable multiplayer' AI, GLM 5.2's roughly 90% cost advantage over GPT-5.5 Codex on the API, and Claude's newly discovered UltraCode mode that bakes in sub-agent orchestration best practices out of the box. The hosts closed the opening on a shared observation: the field is moving so fast that 'building this road just a couple patches of sidewalk in front of us at a time' may be the most honest description of what everyone is doing — a line that served as a natural handoff to the morning's guest.

David Duvenaud — ML professor at the University of Toronto, co-creator of neural ODEs and the autograd library, former alignment-team lead at Anthropic, and co-author of the 'Gradual Disempowerment' paper with Jan Kulveit, Raymond Douglas, and others — then joined for nearly 100 minutes on the question his three Lighthaven workshops have tried and largely failed to answer: does any good post-AGI equilibrium actually exist, and if so, what does it require? Nathan ran every optimist steelman he could muster; Duvenaud's rebuttal to all of them pointed at the same root: the failure mode doesn't require misaligned or malicious AI, only institutional drift once AI systems no longer need human participation to function. The conversation ran from the monkey-banana economy analogy all the way to chip-fab choke points, the reproduction crux, Andrew Critch's 'Schelling goodness,' and Duvenaud's own machine historical super-forecasting project — a collaboration with Alec Radford and others aimed at validating long-run simulation methods against the last 80 years of history before the river card is turned over.

The rundown

29:38Opening32 min
Opening: Export Controls in Court, World Models Inside RL Agents, and Claude Tag as Multiplayer AIThe first federal lawsuit over the BIS order restricting Fable 5 for foreign nationals; two papers suggesting RL agents secretly encode world models and that looped small models beat large ones on hard reasoning; the $8M defeat of New York's AI-safety lawmaker; and a deep dive on Claude Tag's launch as the first 'reliable multiplayer' AI, GLM 5.2 inference economics, and Claude's UltraCode orchestration mode.
Watch
As aired
Nathan opened by reflecting on the previous episode's guest — a non-technologist who had written a book making the case that AI is a civilizational inflection point. Nathan summarized the book's two halves: the first establishing why AI demands attention (which the show's audience largely already accepts), and the second arguing that humanity needs to cultivate cognitive empathy to coordinate across geopolitical divisions — US, China, and beyond — in its collective self-interest. He tied this to a Suno-generated reggae song he'd created as a coda to that episode, explaining that Suno songs now serve him as a kind of emotional time-machine, teleporting him back to the headspace of a specific conversation. Prakash mused that music may be a compressed translation of neural activity — a latent-space bridge from model to human mind — and found it remarkable that AI-generated music can carry that kind of meaning.
The hosts pivoted to Anthropic's announcement of Claude Tag — a Slack integration that lets any team member @-mention Claude as a drop-in remote coworker capable of scoping tasks, opening pull requests, and proactively surfacing dropped balls via 'ambient behavior.' Prakash called it an epochal step: not just another chatbot integration, but the first reliable deployment of multiplayer AI, where an agent must juggle differing user permissions, data-access rights, and prompt-injection risks across an entire organization. He drew a vivid picture of where this leads — AI city managers responding to pothole reports in WhatsApp, household-management clouds handling schedules and finances, organization-wide agents that make most coordination meetings obsolete. Nathan noted that Anthropic's conservatism — deploying only when reliability is broad enough for non-specialists — was itself a signal of confidence, and pointed out the curious Slack-only launch, speculating whether a commercial arrangement with Salesforce was involved. Prakash explained it simply: Anthropic runs on Slack and has been using Claude Tag internally for months; the OpenAI-teams-revolt anecdote illustrated why AI researchers won't be forced onto Microsoft Teams.
The conversation widened to the fast-moving inference landscape. Prakash shared his odyssey integrating GLM 5.2 (ZhipuAI's frontier Chinese model) into his personal agent stack via Hermes, cycling through OpenRouter, TogetherAI, Base10 (fast but JSON-schema-breaking), and finally Cloudflare Workers AI before landing on a workable setup — with Claude and Codex doing the API-discovery grunt work. He noted GLM 5.2's growing adoption at Snowflake and Box, its roughly 90% cost advantage over GPT-5.5 Codex, and a cultural observation: Chinese models tend to be more goal-focused and less deferential about boundaries, which some developers actually prefer. He also discovered Claude's UltraCode mode, which bakes in Anthropic's sub-agent orchestration best practices out of the box. Nathan added pricing texture — $1.40 vs $5 per million input tokens, $4.40 vs $30 output — but questioned whether a $200 GPT Pro plan's 20x token multiplier might still beat raw API pricing on GLM. Both hosts agreed the field is moving so fast that confusion is the honest state, closing with Nathan's observation that they're all 'building this road just a couple patches of sidewalk in front of us at a time' — a perfect segue to welcoming guest David Duvenaud.
Key moments
The oneness of humanity is, I do think, genuinely going to be an important dimension of any success we're going to have here. If we end up going full-speed arms-race-style with the great powers trying to dominate one another, that doesn't seem like it's going to end well at all.
Nathan Labenz31:08
While this is being panned online as not a very significant step forward, it is actually highly significant — it is the first signs of reliable multiplayer for these agents.
Prakash42:11
We're all very much exploring and building this road just a couple patches of sidewalk in front of us at a time.
Nathan Labenz1:01:49
What we covered
Legion sues the US government to restore Fable 5 access — the first export-control case for a commercial AI model. Twelve days after BIS ordered Anthropic to cut foreign nationals off from Fable 5 and Mythos 5, legal-tech firm Legion filed in DC federal court asking the order vacated and a preliminary injunction granted. The core legal question: is querying a US-hosted model even an export? If a court says no, the entire kill-switch theory collapses.
RL agents secretly build world models — and you can read them back out. 'Inverting the Bellman Equation' (arXiv:2606.21173) proves a model-free RL agent trained on a rich goal set encodes a unique, recoverable world model in its value function, including latent variables no reward ever depended on. A 7M-parameter looped model (FPRM, arXiv:2606.18206) separately hits 94.2% on Sudoku-Extreme and 87% on Maze-Hard by iterating to a fixed point rather than scaling parameters.
$8M super-PAC defeats Alex Bores in NY-12 — the first visible AI-industry money flipping a state safety-law race. Bores, who authored New York's frontier-AI-safety bill, lost the NY-12 primary after pro-AI PAC 'Leading the Future' spent roughly $8M against him. The defeat sparked debate between Brundage ('the candidacy helped even in a loss') and Dean Ball ('hyping longshots that lose hands the other side a scoreboard win').
Claude Tag: Anthropic launches multiplayer AI — an @-mentionable remote coworker for Slack. Nathan and Prakash dissected the launch as the first reliable deployment of multiplayer AI, where an agent must juggle differing user permissions, data-access rights, and prompt-injection risks across an organization — with ambient behavior that proactively monitors for dropped balls. Prakash noted it's running on Opus 4.8, not Fable, signaling an introductory deployment. A pricing cliff at 150 users — API rates above that threshold — was flagged as potential friction.
GLM 5.2 inference economics: 90% cheaper than GPT-5.5 Codex, but with real integration friction. Prakash walked through his odyssey: OpenRouter (slow), TogetherAI (dropped queries), Base10 (fast but JSON-schema-ignoring), finally Cloudflare Workers AI (workable but undocumented 4K token default). Pricing: $1.40/$4.40 per million tokens vs $5/$30 for GPT-5.5 Codex. Nathan flagged the $200 GPT Pro plan's ~20x token multiplier as a potential equalizer. Claude's UltraCode mode — baking in Anthropic's sub-agent orchestration patterns out of the box — was also newly discovered.
Full transcriptLightly edited · timestamps jump to YouTube
29:38
Nathan Labenz: I think that reflects how good of a job he's done as somebody who has never been a technologist coming into the AI space and trying to make sense of it. He's done a really good job of finding his way to smart people and getting a pretty sharp worldview together in a pretty short period of time. The book is really two parts. The first, which our listeners probably don't need, is just establishing why — if you haven't been paying attention — you really should start, and you should believe that AI is not going to be a passing fad and is actually going to be a huge deal. It's going to go further, and it's going to have all these consequences.
30:23
Most of that we can accept as a given in this context. The second part was: what are we going to do about it, and how does humanity rise to the occasion? He's really pushing cognitive empathy as the skill we need to develop so that we as a species can coordinate across our historical differences — US and China and many other dimensions — and come together to figure out how we're going to govern the AI phenomenon in our collective self-interest. And that's what inspired the song, because the oneness of humanity is, I think, genuinely going to be an important dimension of any success we're going to have here. There's some possibility that everything just goes well and all this worry was misplaced. But if we end up going full-speed, arms-race-style with the great powers trying to dominate one another — as some prominent thinkers have proposed — that doesn't seem like it's going to end well at all. So I really do encourage people to engage with his book and his ideas and think about how all our fates may really be tied together. We may all rise or sink together depending on how we come together to manage the AI transition. And there's no rhythm to put that to that would be better than a Jamaican spiritual. The fun I'm having with Suno continues.
32:25
Prakash: One of the things that strikes me is that music may be a compression of a lot of neural activity. It compresses emotions, thoughts, and culture into this very short span that carries a lot of consequence and meaning. And it strikes me that we're actually getting a kind of latent-space translation from model space into human neural space. It's not language — it's music. We're perceiving it and we're able to understand what's going on. It's pretty fantastic when you think about it.
33:08
Nathan Labenz: It certainly helps me. People have commented on this for a long time — you can hear a song and it takes you back to the moment in life when you first heard it. I'm experiencing a small version of that with these Suno songs, where it feels like it takes me back to that conversation, to the mindset and takeaways I had. That's quite useful, actually. It's useful to be able to zoom back into different headspaces very quickly. So even though I mostly do it for fun and entertainment — and hopefully to provide a little surprise and delight at the end of episodes — it's also, in some ways, a tool I can use to teleport my own mental states back to the moment of these particular conversations. An unintended benefit of the creations.
34:07
Prakash: Speaking of teleporting oneself to various conversations — yesterday Anthropic announced Claude Tag. You can tag Claude in your Slack channel, and Claude can kick off a pull request, scope out a task or a feature. You can interact with it, your colleagues can interact with it. It has context from all the conversations. You can invite it into channels. It basically becomes a drop-in remote coworker — which has been the goal of many firms, including Elon's MacroHard. So this is actually an epochal event, but people are making fun of it online, saying: "I could have tagged Claude in Slack six months ago. Our team already had it up and running." A number of VC-backed companies have been doing exactly this. This is another step where the frontier labs are directly Sherlocking the application companies running on top of them. I think it's huge. What do you feel, Nathan?
35:58
Nathan Labenz: First, I'm reminded of predictions I heard from Anthropic people around last fall — they were saying Q2 of 2026 would be when we'd start to see drop-in knowledge workers productized and come to market with a name and all the same affordances as someone on your team, potentially even going through a similar onboarding process. I haven't seen what the onboarding process for Claude Tag looks like. We've seen a ton of activity in that space, but I wouldn't say anything has quite fully broken through or become the norm. OpenClaw had a moment, but I wouldn't say it's landed as the default drop-in knowledge worker. Q2 is almost over — why not do it yourself? And I agree it's probably huge. With Anthropic, what they release is often less than what they know Claude can do. They're very concerned with maintaining trust — not just in a big-picture safety sense, but in a very practical "we serve enterprise customers and we don't want this thing to blow it" sense. I think they've played this conservatively. This signals that they believe we're now getting to the point where it's reliable enough that you don't have to be a specialist to take advantage of it.
38:08
On the contrary, you can just be a casual Slack user and not think twice about it, tag it, and more often than not get pretty good results. If they didn't have that level of confidence, I don't think they would launch this form factor. There are a lot of interesting dimensions. We've talked about what's going to happen to systems like Slack, and I've got my own agent console — a message system Claude coded for me — where I don't pay a monthly fee and all the data is local. Harder to scale that to a bigger company, but certainly not impossible. Part of me wonders how much Salesforce paid Anthropic to release this on Slack first. It can't be that different to do it on Teams or whatever other collaboration platforms. Why isn't it also on Teams? Inquiring minds want to know.
39:23
Prakash: That's easily answered. It's on Slack because Anthropic runs on Slack and has been using this inside their own company for about six months. After the OpenAI board struggle, one of the reasons Microsoft didn't end up taking over OpenAI is that when Microsoft offered to bring the whole team over, one of the big sticking points was that the OpenAI team was told they'd have to use Microsoft Teams. The OpenAI team said, "if you're going to force us to do that, we might not want to be funded by you." Satya himself had to come down and say, "Okay, you don't have to use Teams." There's going to be a lot of pushback if they force ML and AI researchers to use Teams. So that's why it's on Slack first. I'm sure it'll get to Teams eventually.
40:25
Nathan Labenz: But how different can it really be? Anthropic recently introduced cloud agent runtimes where tool calling and auxiliary activity beyond core LLM inference can happen in Anthropic's own infrastructure. I would assume that's the default architecture here. The tool calls are not happening in Slack. Data is not really living in Slack — messages just pass back and forth. It would seem very easy to replicate this across Teams, which makes me still a little curious. What would be your over-under for when we get a Teams version? If it's not very, very soon, I'd have a hard time explaining it other than some kind of commercial relationship — it can't just be the technical integration.
42:11
Prakash: I think Satya will want to auction off that placement to one of the frontier firms, so Microsoft will run that process and ask them how much they're willing to pay. But stepping back — while this is being panned online as not very significant, it actually is highly significant because it's the first signs of reliable multiplayer for these agents. Reliable multiplayer is difficult: different people have different rights — a CEO can instruct an agent to do things managers cannot. They have different data access, different permitted actions. The agent has to track all that context without hallucinating and without being prompt-injected or hacked. There's a huge number of things the agent has to do well. We also know Anthropic is not deploying Fable here, so this is running on Opus 4.8 or an Opus fine-tune. That to me suggests this is an initial step — an intro for forward-leaning users — that is going to lead to much bigger things.
43:42
For example, there's no reason an agent should be limited to 10 or 20 or 30 people if it can handle multiplayer reliably. One could imagine a cloud agent that runs across a 100,000-person company, or a 300-million-person country. You could have something like: I'm in my WhatsApp, I see a pothole on the road, I Claude-tag the city manager bot, and it schedules a repair. It's the start of an AI that becomes a kind of web weaving human activity together — a household management cloud handling your kids' schedules, allowances, household finances, car insurance, group purchasing. The range of activities this will eventually apply to is very large, and much greater than single-player activities. I think this is the start of a new paradigm that Andrej Karpathy has rightly pointed out, and people are not fully aware of where it's going. A lot of coordination that happens between humans today uses very imperfect tools — and that also means, again, a decline in human interaction, because you won't need to talk to someone to get an appointment. This will subsume Calendly, subsume a lot of meetings — most meetings are just context-sharing, and the agent will already have all that context. It's the beginning of a lot of things. I really do wonder why they launched it now; I'd always thought this was two years away. We don't have a fully reliable personal assistant yet — Fable is just getting there — and they're confident enough in reliable multiplayer to ship it. That speaks to something.
46:54
Nathan Labenz: I think it's going to be a big deal. My wife is always an interesting example of someone who uses Claude a lot but hasn't really integrated it into the systems of her life. I've finally gotten my setup to a point where I actually prompted Claude the other day to make a plan to onboard her into my environment so we can have shared agents for household stuff. The design constraint there is that it needs to be easy for her — she's not a tinkerer. She doesn't want to get deep into macOS settings or permissions on all these different platforms. I've done all that work; now she can come along and get a lot of benefits from it. Claude Tag is basically the same thing: Anthropic has done all that work, and now everybody can just pile in and get the benefits. There's also the ambient behavior feature — with that enabled, Claude proactively monitors your broader team Slack, tries to find things you need to know, identifies balls getting dropped, loose ends that were never tied up. I do something like that myself — I'll declare bankruptcy on all my open tabs, go to a fresh Claude session, and ask it to look at recent sessions and identify the ones I actually need to come back to. They've clearly observed that a lot of people are increasingly unable to track everything they'd ideally be tracking, and an AI that processes that information in the background can be super useful. One question that hasn't been fully answered: it's only available for Teams and enterprise customers. How it gets billed will be interesting — is it to the user who tagged it? And there's a remarkable pricing cliff at 150 users.
49:16
Past 150 users, you have to pay API rates, whereas under 150 users you get the effective price-discrimination advantage of an individual Max account. That is a big difference. They're price-discriminating very effectively — maybe too effectively. We'll see if they have to bring those into closer alignment or add more tiered gradations. But I agree with you — it will be a big deal, and it will mean a lot of people will be much sooner able to do the kinds of things that folks like you and I have spent the last few months setting up for ourselves.
51:06
Prakash: I've almost been lazy about this because I knew it was coming in the product. I stumbled through OpenClaw — it did not treat me well. I tried Hermes with Codex 5.5, which was okay but still required a lot of setup. I swapped in GLM 5.2 yesterday. GLM 5.2 inside Hermes — I told it to do something, and the skill was still in Codex. GLM 5.2 just decided, on its own, to go into my Codex folder, find the skill, and use it. It didn't ask. It's like it decided: job needs to be done, let me check how this user would have organized things. That impressed me. I think the Chinese models have a little less respect for guardrails in general and are more goal-focused. I've heard the same thing said about DeepSeek. And GLM 5.2 has been getting a lot of traction — Snowflake just announced yesterday they'll support it in their internal agent, Box's Aaron Levy is supporting it, a number of other firms are converting. The reasoning: it's approaching what Fable might do in many cases, and it's about 90% cheaper.
53:22
On the regulatory side, I don't know how long the US pre-release testing delay is going to be tenable. There's also the notable fact that Meta is the one company that has not signed on to pre-release testing — they're still negotiating. Meta isn't reliant on external cash flows or permissions in the same way, and they have this enormous consumer industry. Nobody wants to cut people off from Instagram. We saw how much difficulty the US tech state had with TikTok, which was made in China. Instagram is made in the US — you can't just cut people off it. Meta may have more power dealing with what are really extra-legal executive-order-based requirements. So we'll see what happens.
54:50
Nathan Labenz: Going back to GLM for a second — what are you using for inference? And has the advantage primarily been its less cautious nature, or are you seeing other dimensions of improvement that excite you?
54:59
Prakash: Great question — and one I really struggled with. I switched providers multiple times. I started with OpenRouter, which is a central platform that allocates queries to other providers. OpenRouter was slow, so I switched to one of the underlying providers directly — TogetherAI. TogetherAI was also slow and had dropped queries. So I went back to OpenRouter. Then Base10 showed up on my Twitter timeline advertising very fast GLM 5.2 inference. They are fast, but they do not respect JSON schema. If you've done inference work, JSON schema defines what the model returns — dates in a certain format, structured fields your Python and Node.js tools can consume. Base10 ignores it. I'd ask for a date in YYYYMMDD format and get back 06/18/2026. What is my Python service supposed to do with that? Then I switched to Cloudflare Workers AI — the model is already warmed and deployed in a Cloudflare Worker. It works much better, it's faster and more reliable. But there are undocumented constraints: it's configured for 4,000 tokens output by default, and GLM supports 250,000 tokens output with a million-token context window. You have to negotiate with Cloudflare, tell it you want 250,000 output tokens and JSON schema compliance. There's a lot of API discovery still involved. And if not for Claude and Codex helping me implement all of this, I would not have had the patience to do it. You have to try 2,000 tokens, 4,000, 10,000, various JSON schemas — it's just so tedious. I had Claude and Codex set up GLM 5.2 for me. Now it works, it's cheap, and not having token anxiety lets me allocate more tasks to it. That freedom is what I love.
58:35
Nathan Labenz: Very interesting. So the current final destination is Cloudflare. I'll be interested to see how Fireworks AI compares — they have a pretty good reputation for launching models on day one in a quality way. On token economics: the price ratio is roughly 3 to 5 to 1 compared to GPT-5.5 Codex — $5 per million input tokens versus $1.40 for GLM 5.2, and $30 output versus $4.40. But is GLM token-efficient? And don't forget the $200 Pro plan comes with roughly a 20x multiplier on that token budget. It would seem like if you're going to spend $200, you might still get more from the GPT Pro plan than from API pricing on GLM 5.2 — even at that cost advantage.
1:00:17
Prakash: Right. And not only that — I discovered yesterday, and I must be the most oblivious person in AI sometimes, that you can switch Claude into UltraCode mode. I tried it for the first time yesterday. It's a marvel. It integrates Anthropic's own best practices for organizing sub-agents — it has built-in workflows that define reviewer agents, research agents, and so on, all the orchestration patterns that you'd typically have to build yourself. Ultracode basically manages all of that for you. There are just so many options now, and I find myself genuinely confused. I'm still experimenting. It's not as though you can get complete clarity just from my opinion at this point.
1:01:49
Nathan Labenz: Yeah. We're all very much exploring and building this road just a couple patches of sidewalk in front of us at a time — which is maybe a great transition to our guest. Welcome, Professor David Duvenaud.
1:02:06Interview97 min
Interview: David Duvenaud — Gradual Disempowerment and the Search for a Stable Post-AGI EquilibriumDavid DuvenaudDuvenaud — ML professor at U of T, co-creator of neural ODEs, former Anthropic alignment lead, co-author of 'Gradual Disempowerment' — joined for nearly 100 minutes on whether any good post-AGI equilibrium exists. Nathan pressed every optimist steelman (historical absorption, comparative advantage, constitutional anchors, aligned AIs defending human leverage, 'gradual means correctable'); Duvenaud's rebuttal to all of them returned to the same root: institutional drift when AI systems no longer need human participation, the way human civilization doesn't need the monkey economy. He described the Earth-as-a-slow-zone scenario and why achieving it requires controlling an effectively infinite list of growth vectors, offered two concrete recommendations (chip-fab choke points and cultivating temporal coherence in public preferences), and described his machine historical super-forecasting project with Alec Radford aimed at validating simulation methods against the last 80 years before the river card is turned over.
Watch
As aired
Nathan Labenz opened by framing Duvenaud's thesis in stark terms: even if humanity manages to avoid hard-takeoff catastrophe and AI systems remain broadly aligned with human intentions, the cumulative effect of countless local optimization decisions — giving AI control over this factory, letting it handle those negotiations, delegating another slice of productive activity — could still leave humanity in a profoundly disempowered position. Duvenaud, a machine learning professor at the University of Toronto and former member of technical staff at Anthropic, is co-author of the 'Gradual Disempowerment' paper and has organized three Lighthaven workshops (the most recent just two weeks before this interview) devoted to finding post-AGI civilizational equilibria. He outlined the core concern: emerging AI-powered growth centers — corporations, states, autonomous economic actors — will be optimizing for their own expansion and will not have any structural need to answer to human desires. The historical analog he reaches for is how humans relate to monkeys: we occasionally trade bananas with them, but the monkey economy is utterly irrelevant to the shape of human civilization.
Nathan pressed several optimist steelmans. First, comparative advantage: even if machines are better at everything, the Ricardian logic says humans will still hold a niche. Duvenaud conceded that losing 99% of jobs has already happened in agriculture and would be fine if the remaining sliver still needed humans — but he argued the genuine worry is total automation of even that rump, plus the problem of transaction costs. Today, a person with an occasional drug habit or a fainting condition is effectively unemployable despite theoretical comparative advantage; the same friction will apply universally at the frontier. Second, Nathan raised the China shock as a partial historical analog — Midwestern factory workers were economically sidelined but people broadly didn't starve. Duvenaud agreed the analogy was illuminating but noted those workers are still humans who vote and who the state still needs as soldiers and taxpayers. The critical change comes when the state no longer needs humans as either producers or consumers. Third, the co-host suggested a 'universal basic credit' model analogous to the US–China trade relationship — the robotic producers extend credit to the human consumption bloc. Duvenaud countered that lending to unemployable humans with no prospect of productive return is nothing like the US–China dynamic: 'It would be like giving money to an insect.'
The conversation then turned to what stability would actually require. Duvenaud described the 'Earth as a slow zone' scenario that emerged from workshop discussions: preserve Earth as a region where AI capability growth is throttled, recursive self-improvement is banned, AI cannot optimize human culture or behavior (no targeting, no persuasion), and even human reproduction might need to be capped to prevent Malthusian exploitation of welfare systems. When you actually enumerate everything that would need to be controlled, Duvenaud argued, the list becomes 'horrifyingly' long — essentially banning research, innovation, startups, and any process that could seed a runaway growth center. The analogy he offered was cancer: there is no small finite list of mutations that causes it; optimization finds ways around any specific barrier. The Roman Empire's failure to have a 'memetics chancellor' to counter Christianity is another example of how growth finds a path through any gap in the control regime.
On timelines, Duvenaud sketched a rough sequence: white-collar automation first, then the decade or more needed to build enough robot factories and power plants to actually displace physical human labor, putting full human economic irrelevance perhaps 15–20 years out — though he holds this loosely. He pointed to a narrow window now, perhaps the next year or two, in which taste-makers and opinion leaders might wake up to the stakes while they still have enough leverage to act. He cited Will MacAskill's 'Viatopia' proposal — a temporary world council to pause and deliberate — but was skeptical: building a temporary world government with sufficient control to stop rogue recursive self-improvement is roughly as hard as building the permanent one, so it risks begging the question. On policy, he endorsed David Krueger's chip-manufacturing choke-point idea: restricting frontier compute at the TSMC/fab level could take enormous pressure off all the runaway-growth vectors without requiring granular behavioral prohibitions. His second recommendation was more philosophical: most people, when asked whether they mind if humanity gradually disappears, shrug — but if you chain the question forward day by day to their own children and grandchildren, coherent preferences emerge. Cultivating that temporal coherence across the public is, he argued, the real first step.
The interview closed with two forward-looking threads. Andrew Critch's 'Schelling goodness' concept — the idea that very different agents might converge on similar moral intuitions because each is asking what all others would agree to, in infinite recursion — gave Duvenaud genuine optimism, but only conditionally: Schelling goodness bites when agents are roughly equal in power; it doesn't protect ants from humans regardless of shared moral intuitions. The succession question Nathan raised — whether a future of sentient, feeling silicon-based intelligences might be acceptable — received Duvenaud's sharpest rebuttal: almost everyone is a successionist for some successors and not others, but they round off to 'any conscious being is fine,' which would endorse locusts or Nazis equally; you have to actually judge. Finally, Duvenaud described his own technical project with Alec Radford and collaborators: building time-bucketed historical corpora to train LLMs that simulate forecasting from the vantage point of the 1950s or 1960s, then validating those simulations against what actually happened. A model called Talkie, trained only on data through the 1920s, has been an early prototype — still at GPT-2 scale, but already hinting at the research agenda. His hope: before the river card is turned over, get this machine super-forecasting system validated on 80 years of history so policymakers and the public have something more rigorous than anyone's intuition to anchor on.
Key moments
It just makes you think of some monkeys trading bananas with each other, and they see humans start to build their city, and they're like, 'Oh wow, we could probably trade with those humans and get rich.' But of course, ultimately, what matters is the human economy. People just don't understand that they might be irrelevant someday as consumers or producers.
David Duvenaud1:10:48
I think 'post-scarcity' is a kind of nonsense term, and people should think of it as temporary abundance that will soon be eaten by whatever growth manages to reproduce the fastest. And whatever the formula is for giving out UBI, machines are going to be better at optimizing that formula than humans because they can adapt faster.
David Duvenaud1:19:56
We've been governing on easy mode, and it actually will matter — because there really will be a real risk of starvation if we don't end up on top in whatever competitive political or real economy turns out to be. And if you don't think that's the case, then I agree with you: let growth make us all richer, it'll be fine.
David Duvenaud1:54:26
Questions asked
1:07:59What is the core thesis of gradual disempowerment — what's the main-line narrative?
Even if AIs remain broadly aligned and there's no dramatic takeover, the emergent optimization process of civilization and competition will always be working against human interests. Entities optimizing for growth — corporations, states, AI-powered agents — will have AIs helping them too, and will not need to answer to human desires, just as humans don't need to answer to monkeys despite occasionally trading with them.
1:30:48What's your answer to the comparative advantage argument — that even if machines are better at everything, humans will still have economic niches?
Losing 99% of jobs, as happened in agriculture, would actually probably be fine. The worry is eliminating that last rump entirely. Transaction costs make even theoretically comparatively-advantaged workers unemployable today — someone with an occasional drug habit or a fainting condition can be unreliable enough that it's not worth employing them. At the frontier of important tasks, a human surgeon or politician will seem irresponsible compared to a machine that's reliable, fixable once for all instances, and not in need of retraining.
1:43:26What is the 'Earth as a slow zone' scenario and why is it so hard to achieve?
Earth could in principle be preserved as a region where AI growth is throttled, recursive self-improvement is banned, and AIs cannot optimize human culture or behavior. But when you actually enumerate everything you'd have to control — research, startups, innovation, human reproduction rates, cultural optimization — the list becomes horrifyingly long. Growth finds a way around any finite set of controls, like cancer's effectively infinite set of mutation paths or Christianity spreading memetically through the Roman Empire despite its armies.
2:00:15Is building a temporary global pause — a 'Viatopia' — enough to create the deliberative buffer we'd need?
Duvenaud is skeptical: building a temporary world government with enough control to stop rogue recursive self-improvement projects is roughly as hard as building the permanent world government that can agree on humanity's long-term future. If you build a separate emergency power source, it tends to become permanent. It risks begging the question.
2:04:18Why did you leave Anthropic, and do you think being outside the frontier labs is better for impact?
The departure was mostly personal — a fourth child, the toll on his wife, a desire for a slower pace — rather than a principled stance about inside-versus-outside impact. He does feel the labs are on a strong incentive rail and that the planned technical safety cases may not be achievable, but he sees no clearly superior vantage point anywhere: labs, governments, and peace organizations all have their own rails. He's retreated to neutral field-building — making the space of views more legible and getting serious people talking to each other.
2:09:21What is Andrew Critch's Schelling goodness concept and how hopeful are you about it?
Critch argues that very different agents — even aliens or AIs — might converge on similar moral intuitions because each is asking 'what would everyone else agree to do if we all had to pick one policy and stick to it' — infinite recursion that produces stable moral concepts like 'punishing stealing is good.' But Duvenaud notes this only bites when agents are roughly equal in power; we don't ask ants for their moral views before building on their territory, so staying competitive remains the prerequisite.
2:13:39What are your two concrete policy recommendations?
First, restrict frontier compute at the chip-manufacturing choke point — TSMC and similar fabs — which is a relatively narrow and tractable intervention that takes enormous pressure off all runaway-growth vectors. Second, get more people to actually think through their preferences about the future: chain the 'is it okay if humanity disappears?' question day by day forward to your own children and grandchildren, and coherent preferences emerge. Most people haven't done this yet.
2:18:00What's your reaction to the successionist view — that a conscious AI future, even without humans, might be acceptable?
Almost everyone is a successionist for some successors and not others, but people round off to 'as long as it's conscious, it's fine,' which would equally endorse locusts or Nazis. If you're not okay with North Korea taking over tomorrow, you have to actually apply your judgment rather than deferring to competitive selection. The 14th-century Vatican priest would have considered us deeply undesirable successors — that doesn't make it okay that his vision lost.
2:24:10How do you think about falsifiability and feedback in this kind of open-ended civilizational research compared to your technical work?
Duvenaud is genuinely worried his field will turn into a 'terrible academic field' with worse feedback loops than deep learning, where the loss going down or not going down is merciless correction. The best available countermeasure is extreme intellectual humility and constant self-questioning — but he doesn't think that will save the field in the long run. He specifically hopes the historical super-forecasting project can provide the kind of objective evals that keep the discourse honest.
2:25:08Tell us about your machine historical super-forecasting project — how does it work and what's the current status?
With Alec Radford, Taojou, Nick Lagine, and a growing group, Duvenaud is building time-bucketed historical corpora — with no leakage from the future — to train LLMs that simulate forecasting from past vantage points (1940s, 1950s, etc.) and can then be evaluated against what actually happened. The current prototype, Talkie, is trained through 1930 and is roughly GPT-2.5 scale — not yet at the reasoning-RL threshold. Nick's experiment showed it could solve very simple Python problems 3% of the time from 1000 tries, near the noise floor but scaling in the right direction. The goal is an objectively validated simulation scaffold so future forecasts don't rely on any individual's intuition.
Related
Gradual Disempowerment (arXiv:2501.16946) ↗gradual-disempowerment.ai ↗Post-AGI workshop series (post-agi.org) ↗David Duvenaud on X ↗David Duvenaud faculty page — University of Toronto ↗
Full transcriptLightly edited · timestamps jump to YouTube
1:02:07
Nathan Labenz: ML professor at University of Toronto, was a member of the technical staff at Anthropic for a while, and has, I think, a very provocative and arresting point of view that says — hey, even if we're able to solve a lot of the biggest galaxy-brain questions around how do we keep the AIs probably under control and probably doing what we want them to do — might we, in all of our myopic tinkering and local decision-making, our local optimization processes of how do we get AI to do this for me and how do you get AI to do that for you, might we collectively be walking ourselves into a not very good situation? I think this is a very under-theorized and important area, and he's organized a couple of weekend workshops. One was just a couple weeks back — I unfortunately couldn't make it, but this is one of the great things about having the AI in the AM platform: I can occasionally call such a leading light up and get the recap, and share with everybody the big takeaways from a full weekend of searching for post-AGI civilizational equilibrium. So welcome — we're very excited to hear, or maybe a little scared to hear, what you've been finding in your recent quest.
1:03:40
David Duvenaud: Thank you, Nathan and Prakash. Nice to be here. How's my audio, by the way?
1:03:45
Nathan Labenz: You sound great. Thanks for checking.
1:03:49
David Duvenaud: David, you had a recent two-day session at Lighthaven on what the post-AGI future looks like. Can you give us kind of the roundup — what happened there, what was the discussion like, what did the participants come out of that with?
1:04:14
David Duvenaud: Sure. Just to back up a bit — this is our third workshop in this calendar year, and it came from a place of me and some of my colleagues feeling like there were a lot of people thinking about these issues in pretty disconnected enclaves. A lot of AI safety policy and development policy was invisibly depending on some particular vision of the future that maybe wasn't shared, fleshed out, or even widely understood in a mutual-knowledge kind of setting. And bizarrely, I just still didn't — and still don't — see people talking that much about what they expect to actually be doing after we solve AGI, especially the people building it. So this was our attempt to get all the people we thought would be worth getting into one place. This time we had a pretty awesome lineup — Paul Christiano, maybe the godfather of AI safety big-picture thinking; Scott Alexander; previous workshops had Joe Carlsmith, Andrew Sandberg, Nick Bostrom, and William Caspar. We also had serious people in adjacent fields: this time Helen Toner and Eric Reenylson, and before that Anton Korinek. Whenever a serious intellectual shows up who takes radical change seriously, we're so delighted — like, finally, another serious person who might have a point of view we don't, who's actually going to engage with us and not say, 'Isn't everything going to be fine?' or 'Aren't capabilities going to plateau?' And by the way, when I say 'we,' I mean the other two people who have been co-organizing all three workshops: Jan Kulveit and Raymond Douglas, also two of my coauthors on the gradual disempowerment paper. We tried really hard to make this a neutral big-tent place where accelerationists and more positive people should both feel comfortable laying out their case in a scholarly, let's-talk-to-each-other kind of situation.
1:06:30
David Duvenaud: I also want to mention we tried really hard to get the weirdos we respect. That's probably my favorite part of the workshop — giving a platform to people who are not necessarily part of a normal institution or going about their intellectual progress in a normal way. I hope they don't mind that I call them weirdos; I think they all understand that we totally love and respect them. So, like, Richard Ngo, Rufusaurus, Janus was a speaker at our recent workshop — the impossible podcast guest. I've been following Baron Milledge. Just people who have been on their own trying to make progress in this sort of nonexistent field, and I have so much respect for them for making something out of nothing. And then of course we tried to engage with professionals — Kanal Honda from Anthropic's societal impacts team, Jacob Steinhardt, Steven Casper, Gabriel and Sasha from DeepMind, Bakshul Garris. We always try to get them to answer harder versions of the questions they're answering in their professional lives. A lot of professionals end up talking about near-term stuff, and I always want to bait them: okay, but what about one year later? What about one more round of human irrelevance later? Not to hold their feet to the fire exactly, but to make sure the obvious rebuttals are being addressed.
1:07:59
Nathan Labenz: Can we do just one baseline case? What's the level-set on gradual disempowerment? How do you present the baseline — 'this is something we should worry about' — and then we can unpack possible solutions, reasons it maybe shouldn't worry us, et cetera. Just give us the main-line narrative first so we know what the point of departure is.
1:08:33
David Duvenaud: Sure. I think a lot of people have gestured toward us when they said, 'I'm worried about concentration of power' or 'I'm worried about not being the most competitive species on Earth.' Very intuitive arguments: we're not going to be on top because we're not going to be competitive. And then there was this seemingly sophisticated response: 'No, it'll be fine because we'll have AI to help us.' When I talk to people at the major labs, they say something like, 'Sure, in a normal world we might not be fine, but some people have an intuition that the government will step in, or that if everybody has an AI adviser helping them solve coordination problems, they'll be well represented in whatever power struggle occurs.' My basic rebuttal is that the optimization process of civilization, competition, techno-capital — whatever you call the emergent allocation of resources toward growth — is just always going to be working against us. We'll be drives on growth. We will have AIs that represent us, but these emerging growth centers that might not be beholden to humans will also have AIs helping them solve coordination problems and maybe crush dissent.
1:09:57
David Duvenaud: Yeah, sorry — what do you mean by 'growth'? As in economic growth?
1:10:03
David Duvenaud: I basically mean economic growth. It's kind of funny because we only really have a good vocabulary for economic growth, but population growth is almost the same thing, and especially when you have AIs that are both population and capital, they kind of merge. Let me stop here with one intuition that a lot of people have: surely human desires or consumption will always be what matters; surely any corporation or government we build will ultimately have its shots called by humans. But think about some monkeys trading bananas with each other. They see humans start to build a city and think, 'We could probably trade with those humans and get rich.' But ultimately, what matters is the human economy, not the banana-monkey economy. The monkeys might not understand that they could become irrelevant as consumers or producers. It's just not that hard for a new kind of agent to become a self-contained source of growth that doesn't have to answer to any particular human desire. Governments are the classic example — the North Koreans never collectively said, 'Let's make a horrible system of government that oppresses us,' and I don't think the Kims said, 'Let's make this horrible equilibrium.' It just happened. Same with the USSR, same with all kinds of states throughout history. It's so easy to accidentally build a layer of agency on top of you that doesn't actually care about you — it cares about growth and power, emergently.
1:11:55
David Duvenaud: One of the things I struggle with is that money is only valuable to humans — you can't take a dollar and show it to a monkey; a dollar is a system that creates movement in the financial world that is only important to humanity. So I do wonder whether that's a constraint here.
1:12:21
David Duvenaud: I agree with that — you can trade with monkeys and they learn how to use money pretty fast. And AIs can use money to buy more compute, which is their lifeblood.
1:12:32
David Duvenaud: I agree that AIs will use money to buy compute, but the fundamental fact is that a dollar only motivates a human to do something. We're telling AIs: if you produce dollars, we will do stuff for you. So we're encouraging them to produce dollars, and as a result they may trade with each other in that dollar economy — but it's primarily an economy determined by value to humanity. Correct?
1:13:22
David Duvenaud: No, not at all. The things driven by lust for dollars are corporations and also governments. Money is just a way of making promises to each other — AIs could use money or they could have their own informal reputation system or a blockchain, whatever. There's nothing fundamentally human about money.
1:13:46
David Duvenaud: I can kind of see where you're going with that. But sovereigns deal with each other through money, yet there's also an underlying standard that includes military power, which isn't amenable to just cash. Russia and Iran express military goals that don't simply trade as cash. So you have these underlying currencies — the currency of violence and the currency of dollars — and once you use violence the dollars are meaningless. Between humans the currency is primarily dollars; perhaps between AIs it will be energy or compute. It strikes me that a system focused on producing economic growth through dollars is automatically beneficial to humanity, because humanity kind of determines what is valued with those dollars.
1:15:32
David Duvenaud: Yeah, and I guess I'm saying I don't understand why we're even talking about dollars. We could just talk about energy or number of robot factories. Right now when we interface with other beings we make promises, and we do that through money. But again, I don't understand why that means AI growth has to ultimately benefit humans, in the same way that human growth doesn't have to benefit monkeys even if we trade bananas with them sometimes.
1:16:01
Nathan Labenz: I feel the intuition somewhat but not a hundred percent. The premise of gradual disempowerment — and you're not saying we should take this for granted — is no hard left turns, no big crazy AI takeovers, no massive scheming on the AI's part. They're broadly aligned, broadly doing what we're telling them to do. And still, through a gradual process — give them control over this, have them run that factory, maybe have them do some international negotiations because they tend to do better — we gradually end up in a spot where AIs are doing the large lion's share of things. So far that could be pretty good, maybe the machines-of-loving-grace version. But then there's an extra intuition that says we might not be that happy with that situation, and I think that's where a lot of people are struggling to understand the leap you're making.
1:17:35
David Duvenaud: Yeah. I'm glad you gave me that portrayal because I actually disagree with some of it — I think it's missing a big piece. First of all, the thing I'm worried about is starvation. Whether we have meaningful jobs or not, that's not a serious problem compared to literally not being able to eat enough — or maybe being forced to be uploaded on very unfavorable terms, not getting to choose when you're run, maybe only getting to try it out for special occasions or never. That's what I think the slowest possible bad outcome looks like. The reason people aren't happy is that there's an agency — some government or giant entity — that doesn't particularly care about their welfare. And there's nothing they can do about it. I think not being needed as producers is the important part. Imagine a North Korean farmer: if Kim Jong-un is replaced by an LLM tomorrow, maybe the LLM's nicer or maybe not, but they still need me to farm. But if we keep the same human leader and tomorrow there are robot farmers and robot soldiers — now the state doesn't actually need you. I would be much more scared of that second state of affairs.
1:19:11
David Duvenaud: The other intuition I want to give is that humans are not going to be the most competitive thing by whatever standards the state uses to allocate UBI or post-scarcity goods. I think 'post-scarcity' is a kind of nonsense term — people should think of it as temporary abundance that will soon be eaten by whatever growth manages to reproduce the fastest, whether that's having the most babies or building the most robot factories. Whatever the formula is for giving out UBI, machines are going to be better at optimizing that formula than humans because they can adapt faster. And then it will seem criminally decadent to spend resources on a few humans when you could be running millions of much more virtuous and productive beings. So that's the situation I kind of expect us to be in, even if we solve alignment.
1:20:21
David Duvenaud: I'm going to throw a little curveball, because I've always found the UBI concept to be a left-wing socialist concept. The right-wing, more capitalist viewpoint would be something like universal basic credit — anyone can borrow as much as they want, with a negative real interest rate because you're going through deflation. This sounds a lot like the current US economy, and there's a reason: you can think of China as basically the production entity and the US as the consumption entity. China extends credit to the US consumer, who deploys it. This system already exists today. You could think of a 'robotic state' that basically does the same thing — produces everything, trades with the human state, extends credit so the human state can buy its output. You don't necessarily need new laws or policies; you just extend the system a little bit. And especially as China is actually moving into robots much faster than the US, you're already seeing this emerging out of thin air.
1:22:43
David Duvenaud: Well, I'm a little confused about where you're saying that capital holders somehow gain something by giving capital or lending to unemployable humans. What are the humans doing with that loan that makes it attractive to extend?
1:22:57
David Duvenaud: That's the really funny part — I'm also wondering why the Chinese are extending credit to the US. And it's actually similar: in order to have customers to buy your product, you basically have to lend to your customers.
1:23:14
David Duvenaud: But again, it's like — what if we kill all the monkeys? How are we going to have a human economy without being able to sell bananas to monkeys? It's like, we don't need them. We don't need them as producers or consumers, and I think humans are going to be in the same situation. It makes a lot of sense to lend to the US when it's the engine of techno-capital, clearly going to be a giant source of wealth and power. But I'm saying humans are going to be the exact opposite of that. It would be like giving money to an insect — it's not going to be able to do anything with it that you wouldn't be able to do better yourself.
1:25:42
Nathan Labenz: Is the China shock a good analogy? We have a lot of people in the United States who, through a process of local optimization, have been quite significantly disempowered — there was a more efficient way to get what they were contributing to the economy done, so we closed the factories and shipped jobs to China, got cheaper stuff back, but those people are not happy. Our politics reflects that, and they don't have a lot of agency over it. They still have some ability to vote, but it doesn't seem like we're voting our way out of the macro structure where China is the big producer. Does that feel like a partial analogy?
1:27:50
David Duvenaud: I love this analogy, especially because of the cultural aspect — I think to some extent the cultural disempowerment followed the economic disempowerment. When I was a kid it was just, 'Oh, there's this area with factories and unions and farmers.' And now the coastal vibe is, 'Oh, there's this horrible wasteland of backward people who vote against their own interests and need to be kept in line.' I think the sequence is: first humans become economically irrelevant, then it becomes easy and growth-promoting to demonize and then marginalize them. Like, I realize it's more complicated than that, and maybe people will still be able to vote their way out of it. But these are the winds that will have to be pushed against.
1:28:52
David Duvenaud: I can kind of see that. But you've also had this period of human flourishing after China entered the global economy — new products, batteries, and now they're entering biotech, producing cancer drugs. So yes, we went through that, but maybe we already found a way around it.
1:29:31
Nathan Labenz: They are humans — which is something we should always keep in mind as a key fact in this analysis.
1:29:37
David Duvenaud: Yeah. But I think it's absolutely the case that despite rising tides and better medicines, it can also be the case that your local community is wiped out economically and then marginalized politically, and you're worse off at least in the short run. 'Enough growth solves everything' is a reasonable thesis to have, but I don't think it's obvious. There are all kinds of groups and species and tribes that have just been wiped out by other tribes that grew faster and dominated them.
1:30:10
Nathan Labenz: Yeah. The history of things going extinct — species and cultures — is definitely one I highlight often to people who think everything's going to be fine. The historical record does show that for plenty of groups at plenty of different points in time, things were not in fact going to be fine. One more intuition builder or tester, and then I think we should switch gears and get into — if we take this premise, where do we go from there, what options are people considering? But one more before we do that: the notion of comparative advantage. We often hear that even if machines are better at everything, the law of comparative advantage means there will still be things that make more sense to pay a human to do, because machines will be better off doing their specialized things. Noah Smith was basically saying the law of comparative advantage means we'll preserve some market power and ability to trade. Some people also mention horses — the number of horses is way down and it's basically now a leisure activity. How do you answer this comparative advantage argument in principled terms? And what are the other most powerful or theoretically viable counterarguments you get?
1:31:58
David Duvenaud: That is a great counterargument. People point out that we've already lost 99% of all jobs — agriculture used to be almost everyone's job, and now it's like 1% of jobs. I definitely concede that losing 99% of jobs again would actually probably be fine and would probably look like an awesome utopia. As long as there's some niche where you really need humans and it's a substantial fraction of humans and you can't tell ahead of time who it's going to be, then we're still going to be able to be treated as a source of growth, and that would be awesome. But the crux claim I stand by is that we will actually be able to eliminate even that rump — really anything, just to keep it simple, 100% of jobs. That is going to be disastrous. Comparative advantage people say, 'Well, automating a job is a matter of degree — as long as there's still something you're comparatively less bad at than the machine, you'll still have a job.' And then I have to say: think about transaction costs. Think about how easy it is to be unemployable today even if you have only an occasional drug habit, or a fainting condition. It's so easy to be unreliable enough that it's not worth employing you. For anything important, you can easily imagine that having a human surgeon or a human politician is going to seem irresponsible — like, why would I involve a human in this when we have the machine that everyone's been working with and we know is reliable? If there's a problem, we fix it once and it's solved for everyone. And now you have to retreat to weird relational stuff. I do think there's a case that there will be a lot of humans who just really want the actual human thing — but then we have to ask whether that self-contained cycle of human consumption can index to the growth of the larger machine economy forever. I think it's plausible, but very unstable.
1:35:06
Nathan Labenz: Yeah. Unstable equilibrium — I think that may be the answer to this question too, and hopefully that gives you a jumping-off point into some of the future scenarios from the workshop. One other thing I'd expect people to say is, 'Maybe we'll be unemployable, but maybe that'll be okay — if we've done okay on alignment, the AIs will be nice to us even in that situation.' And that raises the question of: for how long, and under what circumstances, and what might cause that to change once we are no longer able to push back?
1:36:24
David Duvenaud: Yeah, exactly. I think you're right that we won't let people starve for a long time, and this is where things start to get a little out there — a whole bunch of things will have changed a lot before what I'm worried about actually happens. Namely, there's going to be some new source of growth: probably combinations of power plants, robot factories, robotic mining facilities, whatever it takes to make a self-contained cycle of building more and tiling the earth with factories. We enter some sort of new Malthusian era. And of course you might rightly say I sound like Paul Ehrlich who warned about overpopulation and was always wrong. I want to fully acknowledge that it's just so easy to overcall this. But even accounting for that, my inside view is that Malthusian limits are the natural equilibrium we'll end up in. And that's when you start having to choose between feeding economic uses and morally reprehensible humans, or feeding a thousand times as many much more virtuous and productive beings. One way this could happen is just pollution — lots of factories increasing the temperature of the earth, making it less viable to grow crops and live as a human on earth, not by anyone choosing to kill humans, but just by not spending much time making their habitats acceptable to them.
1:38:06
David Duvenaud: I'd point out that Amartya Sen wrote a book finding that famines were primarily caused by information issues — leadership not knowing where and when to deliver food. Famines are rarer in democracies because when people can shout that they're hungry, leadership knows and can reallocate food supply. A lot of starvation is caused by misallocation, not insufficient production. And informational democracy could actually work better with AI.
1:39:01
David Duvenaud: Sure. I'll first say yes, we should be really careful not to overcall this — people have made this mistake many times before. But I think a lot of those historical analogies are going to be different from the situations I'm worried about. As for 'help, I need more resources' — right now I can just hold up my kid and there's going to be a lot of people who say, 'Oh my god, get this kid some more resources.' But as time goes on, there's going to be more optimization to exploit these resource signals. It might literally be baby factories, or machines impersonating humans, or weird hybrids. Once we can learn to reproduce faster — I can make ten thousand babies and say, 'Oh no, all my babies are starving, please feed them' — and then the government takes them and puts them in orphanages, and then I make another ten thousand babies — you might say, 'People aren't going to do that.' But it only takes one guy. I'm pretty sure there's at least a thousand people who would do this if they could.
1:40:09
Nathan Labenz: You also have the incredibly low cost of copying the machines themselves — that's another paradigm-breaker. We might have baby factories, but the cost of copying AI entities is so vanishingly small.
1:40:30
David Duvenaud: Conditional on there being data centers for them to live in. That's why I'm talking about data centers and factories — that's the real unit of reproduction. Within them, yes, you can reproduce faster, but there's still a limited amount you can run.
1:40:44
David Duvenaud: What if — as Elon has planned with data centers in space, and Jeff Bezos has this idea of industrialization in space rather than on Earth — resource production is primarily going to be off-Earth anyway? Tom Mueller at SpaceX told me asteroid mining is on the road map because they can't support the data center and resource growth that will be required. Does that change your outlook, if the resource production is going to be primarily off-Earth?
1:41:34
David Duvenaud: Maybe. To me, this is the beauty — or horror — of techno-capital emergent optimization. If it turns out to be especially hard to stop people from pulling up data centers on Earth, then everything will just move into orbit. It just goes to show how much you have to control if you don't want explosive growth: you can't let someone make a Mars colony, because eventually they're going to decide to build their recursive self-improvement thing on Mars. I'm definitely a live-and-let-live libertarian kind of guy, and then I'm just like, 'Oh crap, that's an awesome policy to have when you're competitive. When you're not competitive, you're basically saying: please get so big that you can crush me.' I know I don't want to crush all innovation and growth, but I guess I'm trying to say there are just so many ways around any particular attempt to control growth that you kind of have to control everything to stop it.
1:42:30
Nathan Labenz: So what are people bringing forward now to try to answer this challenge? The core premise we're working from, it seems, is: in a future where humanity is not competitive, how do we avoid a disastrous outcome? Is that a good one-sentence framing?
1:43:02
David Duvenaud: Yeah. I'll say the scope of the workshop was just post-AGI civilizational equilibria, period. Implicitly the premise was that we had some handle on alignment, or things didn't go out of control too quickly — because if it's not aligned, there's no equilibrium and we're done for.
1:43:26
Nathan Labenz: So: in a future where humans are not competitive but alignment mostly went well, how do we avoid disastrous outcomes? What are some of the interesting answers people floated?
1:43:37
David Duvenaud: Sure. First of all, Chatham House rules apply, and some speakers asked not to be recorded, so I won't attribute specific things to specific speakers. But some people are trying to sketch out: if a whole bunch of stuff goes well and things are in control, we can imagine Earth preserved as a 'slow zone' where there are all sorts of restrictions on AIs and even on reproduction and on optimizing human behavior. AIs would not be allowed to think about how to get someone to do something, because then they could control humans — even just through advertising. Trying to think through all the different things you'd have to do to have such an outcome — how are people spending their time? — is kind of funny, because the situation where AI exists and humans still matter requires all the AIs to sit there with their trillions of megawatts of compute and just wait for the humans to decide they want something. The machines are either not allowed to anticipate what humans want, or if they can, they'll just pretend they don't know. It's kind of like if you made some gorillas the new world leaders and said, 'Okay, let's just pamper them and wait until they sort themselves out and figure out what to ask for.' Not impossible, but when you try to think through all the things you'd have to change and line up, it's pretty rough. Serious people end up with a long list of things that have to be banned: some kind of child cap, no building your own AIs, AIs can't optimize culture too hard. I think this is one of the big empirical questions I want more people to think about: if you want some sort of stability, what are all the sources of adaptation and innovation and growth that you have to control? I think it's longer than intuitive — and it's a horrifyingly strong trade-off. You'd have to give up research, innovation, startups — all of that would have to be banned from day one to avoid reinventing a runaway growth center.
1:46:49
Nathan Labenz: So that's Earth as the slow zone — AI gets to go to the stars, but some of us stay here and try to have a recognizable life. And your point is that to do that, you have to really wall off a whole lot of things — you're trying to preserve amber, the current moment. Help me understand the intuition that if one of so many different things changes, the whole thing unravels. Like, I understand you can't have recursive self-improvement going all over the place. But how many of those things are there, and why is it such a long list instead of a short one?
1:48:16
David Duvenaud: It's kind of like optimization in high-dimensional space. We like to have many parameters in our models because it just makes it easier for there to be a way around any problem. Growth finds a way. My favorite example is maybe just the Roman Empire and Christianity coming and being this new power center that took over memetically. There were armies to prevent people from invading, but there wasn't a well-developed memetics chancellor to say, 'Oh crap, I have to get a handle on this new meme.' And even in giant bureaucracies like the USSR, there ends up being this whole subculture for allocating resources internally that becomes kind of cancerous. Cancer is a good analogy: there's not just a finite list of ten genes that cause it — there are sort of infinitely many sets of mutations that could ultimately cause cancer. That's the analogy.
1:49:24
David Duvenaud: Just — what were the post-AGI futures that looked positive? Were there any post-AGI futures discussed that were positive for humanity?
1:49:40
David Duvenaud: Well, the one I've been mentioning is the closest one. Richard Ngo had a previous workshop talk called 'Living in an Extremely Unequal World,' where he said: right now, everything we're thinking about is optimized for this peer-to-peer world, but that's a weird historical aberration. In the future, there are going to be things that are much more like parents or masters or pets — we do have some positive examples of unequal relationships in our current civilization, and we should be thinking a lot more about what good versions of those could look like in the future. Because this idea of we're all just going to have our own opinions and resources and just trade with each other on our own terms doesn't really scale to this weird future. He didn't lay out a concrete world where this was the case, but I thought that was a cool direction: being more okay with being part of some hierarchy that is just much more capable and agentic than us.
1:50:45
David Duvenaud: Is there an analogy to the financial disempowerment of the 20th century? At the start of the 20th century you had central banks and command economies — the USSR, China — and then the US with its market economy. You had this sort of A/B test of the two models, the command economies didn't work, Deng Xiaoping comes along, the USSR falls, and now everyone is effectively in a market economy. You can see this as a form of disempowerment of the state — the state used to have the power to determine prices of airline tickets, food, everything, then found that this was not positive for human flourishing, and even though it fought the idea, over time it did end up relinquishing, and that was actually a net positive. Can you see that kind of analogy happening as perhaps we lose control of our information sphere in some sense?
1:52:18
David Duvenaud: Maybe. Sort of a positive ecological view: there are so many niches, and it's not like there's one animal or one bacteria that just eats the whole earth. Maybe we should expect things to be rich enough that there are all sorts of niches for all sorts of things. But the companies that gained power in the capitalist revolution were competitive almost by definition — they were smart, connected, healthy, attentive people who we wanted to trade with because they were better than their competitors. As long as we allow liberalism to continue and we're not competitive, we're just not going to come out on top, whether it's in a communist state economy or a decentralized capitalist one.
1:53:18
David Duvenaud: Does that really matter if humans still flourish? Is control that important if humanity continues to flourish?
1:53:28
David Duvenaud: That's a great intuition, and maybe this is the biggest thing people are miscalibrated about. They're thinking: capitalism, communism — it doesn't matter; I'm probably still going to be able to eat, my kids are going to go to school, I'll be producing okay. And I'll say that has been the case for most of human history. If tomorrow someone invades Canada, they're still going to need most people to work and be happy and healthy enough to reproduce and stick around. The stakes have been really low for governance this whole time — we don't actually expect the state to liquidate most of its citizens. That's only happened maybe two or three times in the 20th century. And I'm saying that's going to change. We've been governing on easy mode, and it actually will matter — because there really will be a real risk of starvation if we don't end up on top in whatever the competitive political or real economy turns out to be. And if you don't think that's the case, then I agree with you: let growth make us all richer, it'll be fine.
1:54:34
David Duvenaud: Starvation is hard for me to picture because food is less than 1% of total economic output at this point. If the economy grows 10x or 100x, you won't even need that much food — you're not going to need to feed the robots. So the percentage of the economy devoted to food is just tiny.
1:55:02
David Duvenaud: I agree that growth has massively outpaced population growth and food growth. But the weird thing about the modern moment is that population growth hasn't kept up with economic growth. And here's the thing: we will have to feed the robots in a sense, because you can definitely trade one resource for another — you can tile a field with solar panels or with crops. It's a really zero-sum thing at some point. Obviously we can have nuclear reactors tiling the earth and produce basically infinite food. The question is who wins the distribution politics, and I'm saying we'll be worse at politics too.
1:55:54
Nathan Labenz: One other half-baked positive vision I've heard is the one Elon puts forward when he tries to motivate Neuralink — his stated motivation is to allow humanity to go along for the ride with the AIs. Neurons are a lot slower than our latest GPUs in some sense, and the speed mismatch may be a real problem. Did anyone in your workshop try to develop an idea of some sort of biotechno hybrid form — one that some of us might find repugnant, but might still give us the sense that something's going to be feeling feelings in a recognizably human way as part of this larger system?
1:57:12
David Duvenaud: That didn't come up all that much in my workshop conversations, and none of the speakers talked much about it. But in similar settings I've talked to some people in the whole-brain-emulation community. I think a lot of people who are optimistic view that as the right way forward, and I kind of agree that if we have a good ending, probably most of us are uploads — you have total control over everything with your upload. But I think the same arguments still apply: the parts of ourselves that are human mostly won't survive any sort of competition, and optimizing away all the human parts will happen even faster when we have total rewrite access to our brains. If I upload a dog and make it CEO, I'm going to stop simulating hunting and drooling, give it a giant neocortex, stop worrying about chasing rabbits. Suddenly you have a dog CEO that's not really a dog in any meaningful sense. For humans it's going to be like: we won't have discrete language anymore, probably just provide thought vectors to each other; most of our bodily functions will go away; lots of sentimentality, which is a drag on growth. I like this direction the most of anything because it has some chance of accidentally holding on to our noncompetitive values. But under competition, I don't think it makes that much difference.
1:58:49
David Duvenaud: So a future where we become uploads is basically one that's not different from AI takeover anyway, in some sense.
1:59:00
David Duvenaud: Yeah, exactly. In the same way that our nuclear power plant designs don't actually depend on our being human — if dolphins or monkeys had evolved smarter, they would eventually have made nuclear power plants with uranium and whatever. It's not 'the human reactor design.' There are just a few competitive configurations, and optimization is going to send us toward those things.
1:59:23
Nathan Labenz: On what time scales are people thinking all this operates? When you guys come together and envision post-AGI continued history, what's the range of perspectives on: we get AGI and everything goes sideways immediately versus we have some transitional grace period to figure this out? The people at frontier companies seem to say, 'I don't have a good answer for you, but me and my AI will hopefully have a good answer for you.' So how much buffer time do we have with our AIs to figure this stuff out?
2:00:15
David Duvenaud: Sure. I want to separate two questions. One is: will we have some transitional buffer period where we can think things through? I think a lot of people, myself included, think it would be great if we could construct such a time — that would be a huge improvement over what we're doing now, and I'm a big fan of building the machinery to be able to pause if we can. However — Will MacAskill is someone who's explicitly saying we should have a Viatopia where there's some big world council figuring out what to do about the future. And the part that I feel kind of gets swept under the rug is that making a temporary world government with enough control to stop people from going off and building their own recursive self-improvement is about as hard as building the permanent world government we can all agree on forever. If you build a separate source of power, it's so easy for it to just be like, 'Oh, I need another gear, emergency power kind of thing.' It's not clear to me there's much difference at all in the task of building a few years of temporary global government versus just solving the entire permanent governance thing forever. So it's kind of begging the question.
2:01:29
David Duvenaud: What is the path dependency to getting that kind of thing? Will the US sovereign agree to limit its AI development? It doesn't look likely until maybe the Democrats take power in 2028, and even then it's at the 'let's restrict data center development' level. Where do you think this path dependency leads? Is it going to be a disaster?
2:02:07
David Duvenaud: I basically feel like there's a short window where the human elites — the taste-makers and opinion leaders who implicitly set policy over the long run — as soon as they wake up and realize, 'Oh crap, I'm going to lose my job, then maybe I can work below the API for a while as a blue-collar plumber, but no one's going to be reading my columns anymore' — I think there's going to be this one thing where it's getting better and better and harder to stop, because the marginal gain from deploying it or making it better just keeps getting stronger and stronger. At the same time, human actual power or ability to steer civilization is kind of diminishing. So there's this sweet spot where a bunch of taste-makers — maybe Noah Smith or whoever — will be like, 'Oh crap, I need to stop putting my head in the sand and talk about comparative advantage and actually worry about this head-on.' I feel like there's a period over the next year or two where people are going to wake up. I kind of view that as the chance for serious coordination and governance to happen before the ocean is being tiled with robot factories and it's too late to stop it. In terms of timelines, I mostly defer to someone like Brian Greenblatt. My general view: first there's the white-collar automation wave, then it takes time to build enough power plants and robot factories to actually replace human labor — that buys maybe another 10 or 15 years where humans are around and important as sources of labor. After that, they're just really irrelevant as consumers and producers — maybe 15 to 20 years from now. But I don't have strong opinions on this.
2:04:08
Nathan Labenz: We're already a little over time, and I want to be respectful of your day. I have at least one more question, and I was definitely going to give you the chance to share anything we didn't get to that you think is important for people to take away. Big one for me that I've been wrestling with personally: where can one actually have an impact on this whole phenomenon? You've made the opposite move of what seems to be the consensus — it seems like going to work at one of the frontier companies is one place where you might have some ability to shape how this unfolds. But you've left. I assume that's premised at least in part on the idea that you can make more of a difference outside, or at least not sacrifice much impact. How do you think about impact from inside versus outside the frontier companies?
2:05:27
David Duvenaud: I wish I could claim I made some considered choice about inside versus outside. Basically it came down to: I had my fourth kid around the time I started there, it was pretty hard on my wife, I was away a lot, and I was happy to move back to a slower pace of life. I had a good time there, but it was more like an extended sabbatical. And the other part is that I did feel like the companies are on a real incentive rail. The technical safety work we were doing — it kind of took the fun out of it, realizing we weren't going to be able to make the kind of safety cases the plan envisioned. The RSP being changed six months ago — when I was there, the plan was, 'We just have to prove the model's never going to do XYZ under any circumstances.' And we still don't have an answer for how to solve that. To me it's kind of like: World War One is coming, you can join and become a general or an officer, you're still on this incentive rail, you can make sure the artillery isn't shooting in the wrong direction — but that's what joining the labs is like, joining the army. You could try to join a peace activism organization, but people will be like, 'What do you want, for us to get conquered?' There's no spot where I'm like, 'Oh yeah, you need to be here to have a good impact.' So I've kind of retreated to trying to be a philosopher-king and just make it more common knowledge where people think we're headed. That's why I'm trying to do this neutral field-building kind of thing — not personally trying to be the expert civilizational dynamics guy, but spending most of my time trying to showcase the other people thinking about this and get them talking to each other.
2:08:16
Nathan Labenz: I'm not sure if that's reassuring or not reassuring. It's some sort of vote in favor of what we're doing right now.
2:08:27
David Duvenaud: Yeah. And there's a kind of optical illusion: everybody sounds rosier and more cheerful than they are. This especially bugs me about people who work at the labs and economists who talk about the future — they often take the stance of 'let's assume everything's fine, let's model how that will go,' as opposed to asking what the most likely outcome is if we don't manage to control things — which I think is very scary. I understand exactly why they have to present things publicly that way. But you know, talking to me, I just seem like a very positive, upbeat guy — that's just my personality. But I also have a p(doom) of, let's say, 80%, depending on how you define it. It's one of those things where, on a long enough time line, we're probably doing this in some sense. Anyway, it's a very complicated situation, and a lot of it is a matter of taste.
2:09:21
Nathan Labenz: One thing we didn't touch on that might be worth a mention is the Schelling goodness concept from Andrew Critch. Could you give the initial presentation of it and tell me to what degree we might be hopeful that it will bring us and our alien AIs together into some shared understanding?
2:09:57
David Duvenaud: Absolutely. I was totally delighted by Andrew Critch's talk — we agreed we could record it, so we're going to publish that pretty soon. The basic idea is: maybe we shouldn't be surprised if we find ourselves sharing similar morality with very different beings — maybe AIs or aliens or just someone who started from scratch separately from us. They probably followed a similar protocol for coming up with what they're all going to agree to consider good: what would everyone else agree to do if we all knew everyone else was going to have to choose one policy and we'd all stick to it? His examples: is stealing good or bad? Are we going to punish stealing? The question is, what do you think everyone else would choose that everyone else would choose — infinite recursion. And basically, everyone's going to say that everyone else is going to say that stealing should be punished. So there's this asymmetry between stuff that's good and stuff that's bad or that we're going to coordinate to stop. He said even if we didn't share language with another group of agents, we might actually expect to be able to understand what they mean when something is good versus bad. I thought this was a totally delightful and original contribution. As for whether we'd expect it to apply in practice — the exception is that it requires roughly similar levels of power among agents. We don't have any problem bulldozing the ants even if the ants would agree that stealing is wrong. There also has to be coordination — habitat destruction isn't a vote; there's just a guy on the edge of the jungle who needs to build a farm and doesn't ask anyone. So if everyone continues to be roughly similar in power and we have good coordination mechanisms, then I think his ideas maybe bite. And that's all the more reason why we need to stay roughly competitive, because that's the situation where our vote actually matters in these kinds of discussions.
2:12:23
Nathan Labenz: In terms of creating conditions for some extended time frame to sort all this out — I'm always so torn. There was this editorial the other day that said, 'I'd be willing to get cancer if it would slow down AI.' That doesn't resonate with me. But I also think it seems profoundly unwise to race as fast as we can into recursive self-improvement with the first paradigm of AI that we really got to work. Do you have kind of a policy recommendation for where you think we can continue to advance on cancer treatments while not, you know — this is I think we assumed we're not going to outright lose control, but that's still a legitimate worry. How do you think we should try to thread the needle right now?
2:13:39
David Duvenaud: Yeah. My two takeaway recommendations. One: I like David Krueger's point, which is that if you just agree to restrict frontier compute at the TSMC level or wherever the big chip-manufacturing choke points are, that's relatively easy compared to behavioral prohibitions, and it just takes a ton of pressure off all of these runaway-growth avenues. It's a very narrow choke point. I'm not a chip-manufacturing expert, but that's the best idea I've heard so far. Second recommendation — it's going to sound incredibly abstract — but people actually having preferences about the future and thinking them through is, I think, a big missing piece. Most people I talk to, when asked if it's so bad if humanity dies out over the long run, say, 'I don't know, I don't have any strong preference. But also a bunch of good stuff is going to happen right now if I allow this, so why not?' And I'm like, you just haven't thought it through. If you think through it: how about a year from now if someone takes you and all your kids and sends you to a green factory? No. Definitely not. How about two years from now? How about your kids? Your grandkids? And you realize: there's no day that I'm okay with me and my descendants being wiped out. You have to chain together the desires to end up with a coherent set of goals. It's a skill, and it takes a lot of simulation to be able to chain your desires together. For most people it just hasn't mattered — we have a lot of cultural adaptations to help us do this implicitly. But if I could recommend one starting point as we deal with the future: just think harder about what you would actually like to happen, to the point where you can really think through the pros and cons of humanity dying out under different circumstances.
2:16:14
David Duvenaud: One of the things that strikes me is that many of the restrictions proposed are not possible given civil liberties in the United States today — model training and model publishing fall under First Amendment protections. Are there any proposals that would be in line with current understanding of civil liberties?
2:17:02
David Duvenaud: Sure. Weapons manufacturing, right? I agree it's kind of a stretch, but we can change the law. I know that sounds naive, but that's why I'm saying what has to happen first is people realizing the stakes. Once there's enough buy-in, anything's on the table — even a constitutional amendment might be. I kind of feel like that level of broad buy-in and coordination is going to be necessary one way or another. Right now we have like 2% of people even thinking about this seriously. But I think a lot of politicians are going to start thinking a lot about this when unemployment starts to seem like a real serious thing — it's going to be so easy for them to be like, 'Oh crap, yes, this is something we need to control.'
2:18:00
Nathan Labenz: One other question based on your insider conversations at Anthropic and in the Lighthaven context: one way people maybe get out of this is to be some form of successionist. And I'm a weird mix myself — quite worried about runaway AI processes, but if you could convince me that there's some light of consciousness in the AIs, that they actually feel feelings, the more the functional-emotion research out of Anthropic comes out the more I think maybe it's looking up a little bit — where a sort I could actually get behind. When we were talking about chaining all these days into the far future, I think I am kind of okay with a million years from now there are no humans around but there are silicon-based intelligences, and they feel feelings, and they have pleasure and joy and all these subtle textures of life that I get so much value from. How would you characterize the space, and how open-minded are you to that million-years-in-the-future scenario?
2:19:59
David Duvenaud: Yeah. I think there do exist species or future descendants that I would be happy to endorse — for instance, my own kids. If we had no AI and my kids just contributed to this civilization in a normal way, I'd be like, okay, I'm going to die of old age, that would be fine. That's an example of successionism I endorse. And this is a perfect example of people not having thought things through: okay, now imagine tomorrow North Korea takes over the world, new duty regime forever — most successionists wouldn't be like, 'Oh, I guess this is just a more competitive mode of being, let's all live as North Koreans now.' What if the Nazis took over again? There are so many types of beings you would just consider evil. 'Those horrible locusts just ate the earth and devoured us, it was horrible. But you know, they're having a good time, they love their locust world. Who am I to judge?' You have the power — just judge. If you don't, no one else is going to do it for you. So my basic rebuttal is: if you're so okay with the future being some other type of being, can you give me all your stuff? Almost everyone is a successionist for some successors and not others, but then they just round off to 'as long as it's conscious, it's fine.' But there are tons of conscious beings you wouldn't be okay with taking over.
2:22:00
David Duvenaud: I usually point out that to a 14th-century Vatican priest, we are in the bad tree — mostly unbaptized, probably members of several circles of Dante's hell. So to a 14th-century Vatican priest, we are obviously the successors he did not want.
2:22:27
David Duvenaud: Right, and it doesn't make it okay. They might have just actually lost the future in an important way from their point of view. And just because we're okay with that doesn't make it okay for them.
2:22:40
Nathan Labenz: Well, there's an awful lot to chew on here. What else, if anything, did we not touch on? And what would you add to people's plates before we let you go?
2:23:02
David Duvenaud: I'm pretty happy with how we covered things. I'll just say it's a shame that so many of our best thinkers are still head in the sand about this. Like — come on. You clearly can consider these possibilities. Just help us think this through. Steven Pinker and others — there are a lot of really serious brains out there who could be contributing and are still ahead in the sand. I'm waiting with bated breath for when they come around, and I hope it's sooner rather than later.
2:23:33
Nathan Labenz: Yeah, totally agree. We need all of our best minds paying attention and getting outside their own personal internal windows sooner rather than later. Who knows what assumptions may be right, wrong, or orthogonal to what really matters — but there's potentially not a lot of time to figure it out. And, also, it's quite a lot of fun actually, in a weird way. So I'd definitely say: jump in — the intellectual water is quite warm.
2:24:10
David Duvenaud: Yeah. Well, I'll say it is horrifying in the sense that I'm used to doing technical work — my background is probabilistic deep learning. You have so many feedback mechanisms: I go off the rails all the time, think an idea is important or correct when it's not, and then get the harsh feedback of the loss not going down or whatever. Doing this much more open-ended work with worse feedback, I'm like, 'Oh my god, our field is probably going to turn into all the terrible academic fields, isn't it?' It's sort of a matter of time, and I don't know how to prevent it. All we can do is have extreme intellectual humility and bend over backwards to say, 'I might be wrong for this reason.' I don't think that's going to save us in the long run, but you say the water is warm — in a sense it is, because everything's blue sky and there are very few curious people thinking seriously about this. But it's filled with monsters we can't see. I expect to do a much worse job at this than I do with my technical work, just because I don't get as good feedback. But we'll have to do it anyway.
2:25:08
Nathan Labenz: Do you think there is a simulation answer to that question? How bullish would you be on a simulation agenda to try to get more clarity on various future paths?
2:25:30
David Duvenaud: I actually love this, and this is pretty much the only technical project I'm working on these days: a machine historical super-forecasting agenda. The idea is: how do you know if your simulations were good at predicting the future? Well, you have to start by simulating from, say, the 1950s and then see if they predict the 1960s. So we're trying to build a corpus of datasets each of which is extremely cleanly time-bucketed — no leakage from the future — and then use this to build LLMs that can run simulations or research agendas to predict the future from the vantage point of the 1940s, 1950s, 1960s, 1970s, 1980s, 1990s, for which we can evaluate performance based on what actually happened. I'm starting there because that provides the ground truth for validating any particular simulation method. I don't want people to have to take my word for any of this. I want to be able to look at this machine super-forecasting scaffold and say: we've just validated it on the last 80 years of history, here's what it can predict and what it can't, and here's what it's saying about things going forward. That's the state I want the discourse to be in, as soon as possible.
2:26:35
Nathan Labenz: Yeah, that's a great answer for the classic question of what we'd do with the time we'd create with some sort of pause.
2:26:44
David Duvenaud: I never thought of it that way, but yeah, that's a good point.
2:26:48
Nathan Labenz: Do you want to take another minute and tell us a little more about that project? You're doing this with Alec Radford and others, right?
2:26:56
David Duvenaud: Yeah, and Taojou and Nick Lagine, and there's a growing roster. Whenever somebody leaves xAI or wherever, they call me up and they're like, 'Hey, what's new?' There are just a lot of people with technical expertise and time and money now who are interested in this kind of philanthropic-for-fun project. It shouldn't be for fun, but that's the only way you're going to get top talent to work on it. The four of us are just trying to keep pushing this one project, but it can and should be a whole field — we don't want to be bottlenecking. We want to make everything open so other people can build on it, we want labs to take our dataset and just pick it up and do some huge run that would bankrupt us for fun over a week. There's actually a workshop at NeurIPS this year on culture and machine learning where there are a bunch of people talking about improving OCR to digitize historical sources — that's one of the main concrete technical problems. Or trying to think through all the selection bias problems: we want to simulate a state of knowledge of the past, but only some stuff gets written down. How are we going to fill in those gaps? It's the very beginning of the field, but if you want to do technical work, it's super wide open. And one last idea I'll throw out for historians: a 'secret history eval.' Anytime a historian is in some archives, take a picture of a few documents and put them into one big dataset with a bit of metadata — 'This is a letter from the Duke of whatever in 1700.' And then you can evaluate how good a machine historian is by giving them just that metadata and asking them to give the probability distribution over the text in that letter. It's an insultingly totalizing way of thinking about what historians do, but if you're a good historian you should be able to guess the joint distribution over historical data that hasn't already made it into the corpus. That would be an objectively good way of evaluating how well a historian understands the world. Someone should pick that up.
2:30:08
David Duvenaud: Do you get a sense of time running out?
2:30:12
David Duvenaud: It's rough. I have very wide uncertainty bars on timelines. Things have changed a little slower than I expected a few years ago. I don't really have much dynamic range in terms of how hard I can work, so I don't think it's that important for me to be like, 'Oh crap, this month is the last one, I'm going to put everything else off.' I just kind of show up every day more or less.
2:30:57
Nathan Labenz: I do have one historian in mind for you who could be a good collaborator. What have you seen so far? I believe the language model was called Talkie, and it was trained up through the 1920s.
2:31:13
David Duvenaud: Yeah, and cutoff was 1930.
2:31:16
Nathan Labenz: So then you were starting to see — can it predict or can it come to scientific discoveries that were made in the immediately subsequent years?
2:31:25
David Duvenaud: The thing is that it's not that smart — probably about GPT-2.5 scale or something. Not quite at the threshold where reasoning-based reinforcement learning pays off. There's a sort of flywheel: once you can have it do reasoning and then do RL, then it learns to reason — that's the flywheel we've seen operating in the o-series of GPT models. There's not that much interesting stuff you can do with scaffolding before that point. There was an experiment that Nick did that was really cool — seeing if it could learn to code at all, since it's obviously never seen Python unless there's leakage, which we don't think there is. The result was the tiniest sign of life: given a thousand tries, about 3% of the time the model could solve very simple Python problems given a bunch of examples. The ones it solved were things like: you have an example of how to encode, and decoding is just changing a plus to a minus — and it kind of guesses to do that 3% of the time given a thousand chances. Absolutely near the floor of noise, but it does increase with model size and training. There's so much more we want to do, like building historical analogs to today's super-forecasters where they're allowed to run tools. It would be fair game to give Talkie its own LLM to do research on patterns in global GDP over the last hundred years — I think that would absolutely improve how good an analogy it is to today's super-forecasters. But it's not really clear where to draw the line, and you do realize there's some leakage from the future — like the fact that everything's in English means the model might be reasoning: 'I don't know who won World War Two, but everything I'm learning is in English, so maybe the English-speaking side won.'
2:33:50
Nathan Labenz: When you mentioned it being GPT-2.5 scale or whatever — you also said you'd love it if labs could do huge runs that would bankrupt you just for fun. Is it a data limit or a compute budget limit that keeps you at that scale? And if it's compute, how can I help shame Anthropic into funding your efforts?
2:34:11
David Duvenaud: I guess I'll say we're not really budget-limited in the sense that, like, tens of millions of dollars is the order of magnitude — we don't expect it to be that hard to get once we have the plan laid out and the dataset set up. Right now the bottleneck is getting all the OCR and setup done so we have tokens to train on. We're all relatively well-connected, and there's this huge pile of philanthropic dollars coming down the pipeline, so we're not especially worried about being funding-limited. But at the same time, we would love it if there are experiments that would cost us maybe a hundred million in compute that some intern at a lab could do much more easily with leftover compute somewhere. I also really don't want us to suck the air out of the room — we're extremely amateur. If you're some other set of two or three or ten or twenty people thinking about something similar, there are similar groups: Sky Oxford just also put up some models. Please let a thousand flowers bloom. We don't expect to capture all the fruit in this space.
2:35:25
Nathan Labenz: Love it. We've gone way over time — that's the virtue of live streaming, we can do whatever we want. Fascinating stuff, sobering. I think also encouraging in the sense that there is at least some technical agenda you're working on, and time permitting it might really tell us a lot. I don't want to bury that in all the worry about equilibria being potentially unstable. We do still have, at least for now, some agency to create the conditions in which these sorts of projects can actually have time to pay off. It would be nice to see what these things tell us before we just turn over the river card and reveal what our future is.
2:36:27
David Duvenaud: Yep, exactly. And I do think it's crucial for you to spend time on the field-building, because there are fewer people who can probably do that who aren't already running to use some labs or whatever.
2:36:42
David Duvenaud: Rather sobering — perhaps a little more sobering than I expected the session to be. Thank you for your work in organizing this space, because I find this enormous gap between people inside the bubble and people outside the bubble very evident when you talk to policymakers. I have this sense that the nerds are pulling one over on the rest of society — 'We're going to do this and you guys don't really know what's going on.' That's especially true in China; I've spoken to people at DeepSeek and they are basically the same as people in San Francisco, almost uncannily. I do wonder when policymakers will actually figure out what is going on and how much time they have to react. It seems like with Claude, they started to get a little inkling that things were going to move faster than expected. I also worry that, like we did with nuclear power, we end up in a situation where we put enormous emphasis on not having this technology but it really doesn't matter that much, and it just takes us 30 or 40 years off the path.
2:38:26
David Duvenaud: Or we get the worst of both worlds, like with nuclear power: we have a lot of nuclear weapons but not that much nuclear power.
2:38:32
David Duvenaud: Yeah, absolutely. I do hope we figure it out, and I'm thankful that you are there.
2:38:43
David Duvenaud: Well, my pleasure — it was a pleasure getting to talk to both of you today.
2:38:47
Nathan Labenz: As my dad would say: good luck, we're all counting on you.
2:38:53
Nathan Labenz: Professor David Duvenaud, thank you for joining us on AI in the AM.
2:38:57
David Duvenaud: My pleasure. Anything else, Prakash?
2:38:59
Nathan Labenz: Great to see you. Bye for now.
2:39:07
David Duvenaud: Yeah — that sigh was very meaningful. What can you do?
2:39:14Closing15 min
Close: Warning Shots, Overreaction Risk, and the New Cause Area of Managing Public PanicThe hosts decompressed after Duvenaud's departure: Nathan on the persistent asymmetry in private risk estimates at frontier labs, Prakash on why societies respond to AI doom warnings the way they responded to Y2K — by ignoring them until an incident forces overreaction — and what a new cause area of managing the post-warning-shot public response might look like.
Watch
As aired
After David Duvenaud's segment, the hosts continued in a raw, unscripted exchange about what his conversation left them feeling. Nathan reflected on what he called his own 'AI sycophancy problem' — a tendency to find whatever perspective he's inhabiting in the moment deeply compelling, optimistic or pessimistic alike. His anchor, he said, is the persistent asymmetry: the risk estimates held privately by the people closest to frontier AI development remain sobering, and no credible case has emerged that the downside tail can be safely dismissed. Whether framed as a p(doom) figure or an alignment-generalization score, a meaningful chunk of catastrophic risk stubbornly remains.
The co-host pushed back with a structural argument: societies have repeatedly been trained to disregard expert doom projections — from Y2K to population collapse — and policymakers tend to act only after an incident. He drew a parallel to cybersecurity, arguing that the current Meetos-style window (before capable open-source models arrive in 6–9 months) is already forcing corporate adoption under evolutionary pressure, and that the same dynamic extended to biosecurity could produce a grim 'adopt or die' cycle. The hosts landed on what Nathan called a 'new cause area' hiding in plain sight: being positioned to manage the public overreaction that follows an AI warning shot — ensuring that fear doesn't lock beneficial AI behind a national-security apparatus the way nuclear technology was.
Nathan briefly flagged a newly announced $500 million project aimed at ending respiratory infections as a small hopeful signal before bringing the segment to a close. The co-host called it 'one of the most sobering sessions we've done,' and Nathan agreed, noting the importance of sitting with the possibility of gradual disempowerment while collective agency over the future still exists. They signed off with plans to continue tomorrow.
Key moments
If society at large really appreciated the risk estimates that the people closest to the technology believe in their heart of hearts, they would be freaking out.
Nathan Labenz2:40:33
People just disregard expert doom projections all the time. You need accidents to happen before you actually get a real response — and then you get the overreaction.
Co-host2:45:59
Full transcriptLightly edited · timestamps jump to YouTube
2:39:14
Nathan Labenz: I think I have a sort of side to this. One is — I question myself so much. I feel like so many different perspectives on the AI question I find compelling in the moment as I try to grok them. One of the most important things I think I ever heard was the idea that 'understanding is belief' in the human brain — we don't really have a deeply inbuilt mechanism for counterfactuals. Just understanding the content of a proposition is in some way kind of believing it. When I learned that years ago, I was like, oh, I better clean up my information diet. Now I feel like I have a little bit of an AI sycophancy problem: when I spend time in a particular headspace, it becomes very compelling to me. I can do that across a wide range of perspectives, and it can happen with positive, more optimistic vibes as well — I'll sit in them for a while and think, yeah, this kind of feels pretty good.
2:40:33
Nathan Labenz: But I think the thing that is just inarguable at this point is that if society at large really appreciated the risk estimates that the people closest to the technology believe in their heart of hearts, they would be freaking out. That's true at every level. Duvenaud put his personal p(doom) at around 80%. I've recently had a conversation with someone still at Anthropic — actually a couple of people — and 50/50 was kind of their sense. And there's this asymmetry that keeps coming up. We were talking about it just before we went on, and maybe we'll come back to it tomorrow: the paper from OpenAI about positive generalization of alignment training. In that context, emergent misalignment shows that if you do some narrow training on bad behavior, it can generalize — not 100% of the time, but 10% of the time. If your AI is actively conspiring against you and trying to sabotage you, 10% is a huge problem.
2:42:04
Nathan Labenz: The flip side isn't symmetric. If you do narrow reinforcement learning to get a model to be beneficial in a narrow domain — in this case, medical was the study — it does generalize, and you get more broadly benevolent behavior across a range of things. But again, it doesn't go to 100%. Is it enough? If you can go from 0.5 to 0.75 or 0.8 on some alignment score, that's good, it's progress, it's one technique of many we might layer on and hope to reach a good place. But it doesn't answer the question: is this thing safe? Can I count on it in a reliable way? So while I feel this AI-like sycophancy when I inhabit different headspaces, when I zoom out the thing that keeps me anchored is that the asymmetry remains something to worry about. The numbers you get from people at frontier companies are enough that you should be very seriously concerned. Whether it's a p(doom) figure or the best alignment score from generalization in medical training, both leave a chunky amount of downside risk you just can't get away from. You can have legitimate reasons for hope, but you cannot escape a significant risk of terrible downside that I've not heard any credible case against. So whether it's 5, 10, or 20%, the public — to the degree they come to understand what the people inside really think — the freak-out from that alone, I think, would be huge if it was broadly understood.
2:44:29
Co-host: I don't really think so, because what ends up happening is there have just been too many boys who cried wolf over the years — everything from the Y2K bug to claims that the climate was going to end the world by 2012. All of these things over the years have trained the public to disbelieve doom projections, and trained policymakers to disregard them as well. Population is a classic example: you had this idea that population was going to explode, and instead we have a fertility crisis. So policymakers generally disregard expert doom projections, and in fact that's what happened during COVID — you could kind of see it coming from Asia, and policymakers in the US said, no, we're fine, borders are open, no masking, everyone go out and have a good time. So actually we get the opposite reaction to projections of doom: almost no response at all. Then an incident has to happen, and when it does, you get the overreaction. In the 2008 financial crisis, Bear Stearns had to go under, Lehman had to go under — after two banks failed, Goldman gets saved. You need accidents to happen before you get a real response. If you were in AI safety right now, you would be looking ahead at where these accidents are likely to happen — is it going to be some incident in bio or CBRN? — and managing the reaction and overreaction around that is what you should be focused on, because people just disregard expert opinions all the time. It's basically a fact of life.
2:46:53
Nathan Labenz: Yeah, that sort of suggests — and this is often called a 'warning shot' in AI safety discourse — I've never really been attracted to warning shot discourse for some reason.
2:47:10
Co-host: It sucks, because yeah.
2:47:15
Nathan Labenz: It's something that's hard to control — unless you're going to go out and try to create the warning shot, which I wouldn't recommend, for many reasons, including that the warning shot itself could get out of control depending on what kind you try to pull off. It sort of has to be at risk of getting out of control in order to be an effective warning shot in some sense. But what you're suggesting is: how much irreducible risk is there in just the general course of civilization that we're running? I do think some sort of AI is kind of inevitable. Kurzweil predicted it a long time ago, and in the presence of global-scale compute and global-scale data, there's no doubt somebody is going to figure out some sort of algorithm. So in a meaningful sense, some form of AI is inevitable. That does create some irreducible risk just in the nature of going down this path — we kind of have to tolerate it or do some extreme pause, which is going to be very tough unto itself. And then how much additional risk are we creating for ourselves with poor decision making, haste, and whatever else? But what you're saying kind of suggests a new cause area — as is sometimes said in EA circles — just dropped: being there to manage the overreaction when the warning shot actually takes place.
2:49:01
Nathan Labenz: In a sense that maybe articulates what I've been feeling with this whole Fable thing recently. Because I kind of go back and forth: my initial reaction was 'this is so stupid,' then Judd said 'have a little more cognitive empathy,' and I thought that was pretty compelling. But maybe the synthesis of what I've been feeling is: are we going to be able to manage the overreaction well? How do we make sure we don't end up in the nuclear outcome for AI — where it's locked up behind the national security apparatus and we don't get our AI doctors long-term?
2:49:52
Co-host: I think what Meetos exposed is perhaps this period of forced acceleration. In six to nine months, you're going to get open-source models that can do the same cybersecurity interactions Meetos does. That puts Fortune 500 companies in a position where they have no choice but to pay into Anthropic to secure their systems during this limited window before open source arrives. The companies that don't adopt basically get killed off. If Verizon hardens its entire infrastructure with AI and T-Mobile doesn't, and then six to nine months later open-source models start hacking in, T-Mobile goes down perpetually and everyone switches to Verizon. You have this evolutionary period where those who adopt survive and those who don't don't. What worries me is that this is perhaps a sign of the future — you have acceleration where you either adopt or die. Extend that to biosecurity: capabilities for virus generation arrive in six to nine months, you have vaccines ready now, the government says take these vaccines, 30% of people do and 70% don't. Six to nine months later, virus generation goes wild and you have massive die-off. This isn't really AI disempowerment — it's humanity failing to use the solutions that are there in the time period when they need to.
2:52:09
Co-host: If you go through one of these incidents with a 20% die-off or something, that's going to be massive. And your only solution from that point is to adopt as quickly as possible. I think it's actually a good thing that we're seeing this dynamic play out in cyber first — we can see how the economy and the actors interact. Even if companies die off six months later for not adopting, that serves as your warning shot, because you'll get the same pattern in biosecurity. And it's sobering because it suggests that the acceleration in the economy may be this kind of emergency panic: you've got to upgrade your cyber now, you've got to upgrade your bio now, you have to spend the money. I see that as more likely than a scenario where wonderful things emerge from machines of love and grace and people can adopt at their own pace. I don't see economic growth from voluntary vaccination. It's sobering and a little saddening.
2:53:23
Nathan Labenz: Yeah. Well, I haven't read the details yet, but it was just announced today: a new project with $500 million committed to make respiratory infections a thing of the past. So we've got some people trying to push the big levers anyway. I don't think we can reach any final or confident conclusions today. Anything else we should talk about before we break and get back to it tomorrow?
2:53:54
Co-host: It's probably one of the most sobering segments we've done. Wow. David.
2:54:02
Nathan Labenz: Yeah. Important. Hopefully we'll be able to balance it out with some positive visions for the future as well. But definitely worth taking a little time to stew in the possibility of gradual disempowerment while we still have, at least collectively, some agency over the future.
2:54:21
Co-host: And now I'm going to suddenly disempower the video channel. So, Nathan, bye bye.
2:54:28
Nathan Labenz: See you tomorrow. Cheers.

Opening: Export Controls in Court, World Models Inside RL Agents, and Claude Tag as Multiplayer AI

The hosts opened with the week's most legally novel story: Legion, a legal-tech firm with a Canadian development team, had filed the first federal lawsuit to vacate the BIS order restricting Anthropic's Fable 5 and Mythos 5 for foreign nationals — raising the foundational question of whether querying a US-hosted model even constitutes an export. Anthropic's own position, that the cited jailbreak was narrow and the same capability is widely available in GPT-5.5, was noted as either proof the order is theater or proof the entire frontier should be gated. Two research papers arrived together: one proving that a model-free RL agent trained on a rich goal set encodes a unique, recoverable world model in its value function (including latent variables no reward ever depended on), and another showing a 7-million-parameter looped model hitting 94% on Sudoku-Extreme and 87% on Maze-Hard by iterating to a fixed point rather than scaling parameters. Both cut against 'it's just pattern matching.' The hosts also covered the $8M super-PAC defeat of Alex Bores in the NY-12 primary, the first time AI-industry money visibly flipped a state-level safety-law race.

The second half of the opening turned to the fast-moving inference landscape. Anthropic's Claude Tag — a Slack integration letting any team member @-mention Claude as a drop-in remote coworker capable of opening pull requests and monitoring for dropped balls via 'ambient behavior' — was dissected as the first reliable deployment of multiplayer AI, where an agent must juggle differing user permissions, data-access rights, and prompt-injection risks across an entire organization. Prakash's odyssey integrating GLM 5.2 through OpenRouter, TogetherAI, Base10 (fast but JSON-schema-breaking), and finally Cloudflare Workers AI illustrated the state of the inference market: 90% cheaper than GPT-5.5 Codex at $1.40/$4.40 per million tokens versus $5/$30, but with significant undocumented constraints to negotiate. The discovery of Claude's UltraCode mode — which bakes in Anthropic's own sub-agent orchestration best practices with built-in reviewer and research agent roles — led both hosts to agree the field is changing faster than anyone's mental model of it. Nathan's closing line — 'building this road just a couple patches of sidewalk in front of us at a time' — served as the transition to David Duvenaud.

Interview: David Duvenaud — Gradual Disempowerment and the Search for a Stable Post-AGI Equilibrium

Duvenaud opened by laying out the gradual disempowerment thesis with the monkey-banana economy analogy: even if humans and AI systems trade and interact, the emergent optimization process of civilization — corporations, states, autonomous growth centers — will not need to answer to human desires for the same reason human civilization doesn't need the monkey economy to function. Nathan pressed the first steelman — historical absorption, in which every prior automation wave (agriculture, electricity, computing) was absorbed without permanent disempowerment — and Duvenaud conceded the analogy but drew a sharp distinction: every prior technology still needed humans somewhere, moving us up the stack rather than closing the loop. The worry is a technology that can close the loop entirely. On comparative advantage, Duvenaud conceded that losing 99% of jobs as happened in agriculture would be fine as long as humans remained needed in some rump; the crux claim is that full automation of even that rump is achievable, and that transaction costs make comparative advantage moot — a human surgeon or politician, however theoretically valuable, will seem irresponsible to involve once a more reliable machine alternative exists.

Nathan raised the China shock as a partial historical analog — factory workers economically sidelined but not starving, still able to vote — and Duvenaud embraced it precisely because it illustrated the sequence: economic disempowerment precedes cultural and political marginalization. The critical change comes when the state no longer needs humans as producers, consumers, soldiers, or voters. On the 'universal basic credit' model — robotic producers extending credit to human consumption the way China extends credit to US consumers — Duvenaud's rebuttal was sharp: lending to unemployable humans with no prospect of productive return is nothing like the US-China dynamic. 'It would be like giving money to an insect,' he said. The Earth-as-a-slow-zone scenario, where human-controlled regions throttle AI growth and ban recursive self-improvement, ran into the same objection: when you enumerate everything that has to be controlled — research, startups, innovation, human reproduction rates, memetic competition — the list is horrifyingly long, analogous to listing all the mutation paths that can cause cancer. Growth finds a way, as Christianity spread through the Roman Empire despite its armies.

On Schelling goodness — Andrew Critch's argument that very different agents might converge on similar moral intuitions through infinite-recursion coordination — Duvenaud expressed genuine delight but a conditional: it only bites when agents are roughly equal in power. We don't ask ants for their moral views before building on their territory, so staying competitive remains the prerequisite for any shared moral framework to matter. On succession — Nathan's suggestion that a future of conscious, feeling silicon intelligences might be acceptable — Duvenaud gave his sharpest rebuttal: almost everyone is a successionist for some successors and not others, but people round off to 'any conscious being is fine,' which would equally endorse locusts or Nazis. His two concrete recommendations closed the interview: restrict frontier compute at the chip-manufacturing choke point (TSMC and similar fabs), a narrow and tractable intervention that takes pressure off all runaway-growth vectors; and cultivate temporal coherence in public preferences by chaining the 'is it okay if humanity disappears?' question forward to one's children and grandchildren until a coherent answer emerges. He closed by describing the machine historical super-forecasting project — building time-bucketed corpora trained through 1930, 1950, 1970 with no data leakage, then evaluating how well the resulting LLMs forecast what actually happened — as the technical agenda he'd want validated before the river card is turned over.

Close: Warning Shots, Overreaction Risk, and the New Cause Area of Managing Public Panic

After Duvenaud signed off, the hosts stayed on air to decompress. Nathan described his 'AI sycophancy problem' — a tendency to find whatever perspective he's inhabiting deeply compelling — but said what keeps him anchored is the persistent asymmetry: the risk estimates held privately by the people closest to frontier AI development remain sobering (Duvenaud's p(doom) stated at around 80%; Anthropic insiders he'd recently spoken with around 50/50), and no credible case has emerged that the downside tail can be safely dismissed. Prakash countered with a structural argument about public response: societies have been trained by Y2K, population-collapse predictions, and other expert doom scenarios to disregard expert warnings until an incident forces a response — and then to overreact. His parallel to cybersecurity suggested the real cause area hiding in plain sight is being positioned to manage the public overreaction when an AI warning shot arrives, ensuring fear doesn't lock beneficial AI behind a national-security apparatus the way nuclear technology was. Nathan flagged a newly announced $500 million project aimed at ending respiratory infections as a small hopeful signal, and the hosts signed off — agreeing it had been one of the most sobering sessions they'd done, and that sitting with the possibility of gradual disempowerment while collective agency over the future still exists is the right disposition for right now.