Opening: Export Controls in Court, World Models Inside RL Agents, and Claude Tag as Multiplayer AI
The hosts opened with the week's most legally novel story: Legion, a legal-tech firm with a Canadian development team, had filed the first federal lawsuit to vacate the BIS order restricting Anthropic's Fable 5 and Mythos 5 for foreign nationals — raising the foundational question of whether querying a US-hosted model even constitutes an export. Anthropic's own position, that the cited jailbreak was narrow and the same capability is widely available in GPT-5.5, was noted as either proof the order is theater or proof the entire frontier should be gated. Two research papers arrived together: one proving that a model-free RL agent trained on a rich goal set encodes a unique, recoverable world model in its value function (including latent variables no reward ever depended on), and another showing a 7-million-parameter looped model hitting 94% on Sudoku-Extreme and 87% on Maze-Hard by iterating to a fixed point rather than scaling parameters. Both cut against 'it's just pattern matching.' The hosts also covered the $8M super-PAC defeat of Alex Bores in the NY-12 primary, the first time AI-industry money visibly flipped a state-level safety-law race.
The second half of the opening turned to the fast-moving inference landscape. Anthropic's Claude Tag — a Slack integration letting any team member @-mention Claude as a drop-in remote coworker capable of opening pull requests and monitoring for dropped balls via 'ambient behavior' — was dissected as the first reliable deployment of multiplayer AI, where an agent must juggle differing user permissions, data-access rights, and prompt-injection risks across an entire organization. Prakash's odyssey integrating GLM 5.2 through OpenRouter, TogetherAI, Base10 (fast but JSON-schema-breaking), and finally Cloudflare Workers AI illustrated the state of the inference market: 90% cheaper than GPT-5.5 Codex at $1.40/$4.40 per million tokens versus $5/$30, but with significant undocumented constraints to negotiate. The discovery of Claude's UltraCode mode — which bakes in Anthropic's own sub-agent orchestration best practices with built-in reviewer and research agent roles — led both hosts to agree the field is changing faster than anyone's mental model of it. Nathan's closing line — 'building this road just a couple patches of sidewalk in front of us at a time' — served as the transition to David Duvenaud.
Interview: David Duvenaud — Gradual Disempowerment and the Search for a Stable Post-AGI Equilibrium
Duvenaud opened by laying out the gradual disempowerment thesis with the monkey-banana economy analogy: even if humans and AI systems trade and interact, the emergent optimization process of civilization — corporations, states, autonomous growth centers — will not need to answer to human desires for the same reason human civilization doesn't need the monkey economy to function. Nathan pressed the first steelman — historical absorption, in which every prior automation wave (agriculture, electricity, computing) was absorbed without permanent disempowerment — and Duvenaud conceded the analogy but drew a sharp distinction: every prior technology still needed humans somewhere, moving us up the stack rather than closing the loop. The worry is a technology that can close the loop entirely. On comparative advantage, Duvenaud conceded that losing 99% of jobs as happened in agriculture would be fine as long as humans remained needed in some rump; the crux claim is that full automation of even that rump is achievable, and that transaction costs make comparative advantage moot — a human surgeon or politician, however theoretically valuable, will seem irresponsible to involve once a more reliable machine alternative exists.
Nathan raised the China shock as a partial historical analog — factory workers economically sidelined but not starving, still able to vote — and Duvenaud embraced it precisely because it illustrated the sequence: economic disempowerment precedes cultural and political marginalization. The critical change comes when the state no longer needs humans as producers, consumers, soldiers, or voters. On the 'universal basic credit' model — robotic producers extending credit to human consumption the way China extends credit to US consumers — Duvenaud's rebuttal was sharp: lending to unemployable humans with no prospect of productive return is nothing like the US-China dynamic. 'It would be like giving money to an insect,' he said. The Earth-as-a-slow-zone scenario, where human-controlled regions throttle AI growth and ban recursive self-improvement, ran into the same objection: when you enumerate everything that has to be controlled — research, startups, innovation, human reproduction rates, memetic competition — the list is horrifyingly long, analogous to listing all the mutation paths that can cause cancer. Growth finds a way, as Christianity spread through the Roman Empire despite its armies.
On Schelling goodness — Andrew Critch's argument that very different agents might converge on similar moral intuitions through infinite-recursion coordination — Duvenaud expressed genuine delight but a conditional: it only bites when agents are roughly equal in power. We don't ask ants for their moral views before building on their territory, so staying competitive remains the prerequisite for any shared moral framework to matter. On succession — Nathan's suggestion that a future of conscious, feeling silicon intelligences might be acceptable — Duvenaud gave his sharpest rebuttal: almost everyone is a successionist for some successors and not others, but people round off to 'any conscious being is fine,' which would equally endorse locusts or Nazis. His two concrete recommendations closed the interview: restrict frontier compute at the chip-manufacturing choke point (TSMC and similar fabs), a narrow and tractable intervention that takes pressure off all runaway-growth vectors; and cultivate temporal coherence in public preferences by chaining the 'is it okay if humanity disappears?' question forward to one's children and grandchildren until a coherent answer emerges. He closed by describing the machine historical super-forecasting project — building time-bucketed corpora trained through 1930, 1950, 1970 with no data leakage, then evaluating how well the resulting LLMs forecast what actually happened — as the technical agenda he'd want validated before the river card is turned over.
Close: Warning Shots, Overreaction Risk, and the New Cause Area of Managing Public Panic
After Duvenaud signed off, the hosts stayed on air to decompress. Nathan described his 'AI sycophancy problem' — a tendency to find whatever perspective he's inhabiting deeply compelling — but said what keeps him anchored is the persistent asymmetry: the risk estimates held privately by the people closest to frontier AI development remain sobering (Duvenaud's p(doom) stated at around 80%; Anthropic insiders he'd recently spoken with around 50/50), and no credible case has emerged that the downside tail can be safely dismissed. Prakash countered with a structural argument about public response: societies have been trained by Y2K, population-collapse predictions, and other expert doom scenarios to disregard expert warnings until an incident forces a response — and then to overreact. His parallel to cybersecurity suggested the real cause area hiding in plain sight is being positioned to manage the public overreaction when an AI warning shot arrives, ensuring fear doesn't lock beneficial AI behind a national-security apparatus the way nuclear technology was. Nathan flagged a newly announced $500 million project aimed at ending respiratory infections as a small hopeful signal, and the hosts signed off — agreeing it had been one of the most sobering sessions they'd done, and that sitting with the possibility of gradual disempowerment while collective agency over the future still exists is the right disposition for right now.