EPISODE 2026-06-03

AI:AM LIVE — June 3, 2026

A live morning show on Trump's frontier-AI executive order, with Tal Hoffman and Yanir Tsarimi of Enclave on exploitability-first AI code security, and Brett Levenson of Moonbounce on real-time control over AI behavior.

▶ Full show on YouTube 𝕏 Live broadcast

AI in the AM is a live weekday morning show on AI. Day three opened on Trump's frontier-AI executive order, then paired a conversation on exploitability-first AI code security with one on real-time control over how AI systems behave — bracketed by the recurring question of whether verifiable parts really add up to trustworthy wholes.

The rundown

16:14Opening29 min
Opening — news + discussionTrump's frontier-AI executive order: voluntary 30-day pre-release review, a classified benchmarking process, and the state-vs-federal politics underneath.
Watch
As aired
The hosts converged on a surprisingly healthy read of the politics — relative to where the SB 1047 fight ended ~two years ago — even though the requirements have moved up (the Illinois third-party auditor access is arguably more demanding than the vetoed 1047). Nathan tied it to his Recursive takeaways from the day before: a negative update on the quality of safety plans, a positive update that the Overton window had expanded enough for frontier-lab people to discuss a coordinated slowdown. He cited the Rohin Shah interview (Wiblin's 80,000 Hours): capability advances are often smooth enough that a well-designed eval with buffer doesn't require a from-scratch model card for every point release, so a 30-day review can plausibly overlap a lab's own internal process.
Prakash's more cynical frame: auditing is partly assurance theater and partly a sink for displaced labor — "all humans do all day is evaluate AI models" — and the money tends to flow to lawyers affiliated with the parties and states, keeping the people most able to challenge the labs occupied. His prediction for the real-world effect: the EO gives others in government a venue to say no, so releases slow. He sketched the Mythos case — agencies got it 60 days early, found security holes "all over" Defense and NSA, lacked contractors to patch them, and delayed release — and warned that the downstream cost lands on the rest of corporate America, attacked by open-source models that catch up within ~60 days. After one painful learning cycle, he expects the process to mature.
What we covered
Trump signs the frontier-AI executive order he'd pulled back in May. Signed June 2 after David Sacks pulled an earlier draft, arguing it would slow the economy. Headline ask: model companies are invited (not required) to give the government 30 days to review frontier models pre-release. Nathan's read: a gentleman's agreement with no real enforcement; Prakash's: a de facto licensing regime, because peers and the press will enforce what the order doesn't.
Dean W. Ball
@deanwball
·Follow
This EO sets up a bad outcome in frontier AI governance: the government regulating AI models you aren’t allowed to use in a way you aren’t allowed to know about. That outcome is not here yet, but the direction of travel is apparent. If Biden had done this, every libertarian Show more
11:15 AM · Jun 2, 2026
195
Reply
Read 10 replies
The benchmarking process may be classified. The provision both hosts kept circling. Nathan noted it's not unlike scale.ai's semi-private benchmarks or the ARC Prize holdback — you can't trust results if you publish everything — but found the reflexive instinct to classify it strange. Dean Ball's sharper charge: the secrecy papers over the fact that the government can't agree on a FLOP threshold better than 10^26.
Dean W. Ball
@deanwball
·Follow
my bet is they're classifying the benchmarking process to hide the fact that they're not going to be able to agree to a regulatory threshold better than 10^26 flop
10:13 AM · Jun 2, 2026
201
Reply
Read 11 replies
The real game is Congress, not the EO — and the states are pre-empting it. Prakash's tick-tock: Illinois (Gov. Pritzker) passed AI regulation requiring third-party auditor access; Connecticut is going a related route with a pre-approved list of auditors; Newsom pulled California's bill back. With Republicans unable to agree on federal pre-emption, the states moved first, and Ted Cruz is now pushing Congress to act on catastrophic and cyber risk "without ceding ground to China."
Why everyone seems calmer than the SB 1047 era. Prakash's three reasons the labs have settled down: both Anthropic and OpenAI now believe they're in recursive self-improvement; China has fallen behind (the chip controls "worked"); and alignment has worked roughly in the target area — plus a growing acceptance that there will be at least two surviving frontier labs, not one winner.
Full transcriptLightly edited · timestamps jump to YouTube
16:14
live. So good morning. It is uh June 3rd uh 2026 9:00 a.m. uh 3 days into the singularity maybe or maybe or maybe a year or maybe you know a couple hundred years. Um Nathan, good morning. Um and » good morning Pash. » It's it's it's great to see you again. And we have overnight news. Um overnight news uh which is uh President Trump signs uh the executive order. Um and what do we know about this? You know, I I'm kind of scratching my head about the whole thing. it as so often I feel like with uh he he who must always be named. I feel like there's such path dependency to this and it it's unclear how much staying power it will have and it's unclear what is really meant by it. So I don't know. I think it's all fairly confusing.
But I mean, I guess the the big headline is, right, they're going to ask politely for model companies to give them 30 days to review models and render some opinions. It sounds like a very sort of gentleman's agreement right now from what I can tell. uh or maybe even just a proposal for a gentleman's agreement because I don't think this would actually prevent anybody from just a not participating at all or bunching whatever they want to launch despite the considered opinion of the executive branch. Now obviously there could be consequences for that. So you know who has what leverage I think is is pretty interesting. Um but we've seen broadly that the model companies do have a lot of leverage. Anthropic has basically weathered the last storm that the government tried to you know all the big pressure campaign of of everything that happened with the supply chain designation all that sort of stuff. I think we're just memoryhauling that now.
It seems like I mean I don't remember that ever being taken back but you know I don't know again it seems like we're just kind of going to pretend that never happened. So what's I think you know it's again it's we'll have to see how this stuff shakes out. the the talk is that the benchmarks
18:47
that they're going to develop themselves are going to be potentially classified, » which is a little bit weird. » Although I'm not sure if I'm like so outraged by it. I mean, I do think we have a problem in general with benchmarks being overly transparent in some ways and and that's why organizations like scale and and others have created these » semi, you know, and even the RKGI prize. I mean, there's all these different things that have kind of the you get a few sample problems and then there's a lot held back because we just can't trust the results if we put it all out there. U now they they could do that without making the whole thing classified but I don't know that's almost um kind of I guess reflexive you know for the government to just start with a posture of classifying things.
Mhm. » So, there's been some interesting commentary online about, you know, including from no less than a friend of the show, Dean Ball, who said we may be headed for a » situation in which the government will » testing models you can't use on a against a standard that you're not allowed to know about. » And that does seem strange. And I agree with him that, you know, by instinct libertarians shouldn't like this. And I definitely consider myself pretty much a lifelong techno optimist libertarian.
» Mhm. » But is it that different at this point from scale AI having a semi-private benchmark? And is it » you know given that it's just sort of this polite asking? Is it really something that is going to throw the industry into a a crazy vortex of secrecy or, you know, self-deing or whatever? I don't know. I'm not that worried about it yet. I'm I'm a little I just kind I guess I'm just kind of a mind to wait and see because surely something like this needed to happen at some point. You you would expect that a government that has any sort of » has its act together to any degree would want to do some of this » testing. They would want to be in the room as decisions are made and and know how decisions are being made and have some hard facts to make sure that they feel good about what the companies are doing.
» So I don't know seems not entirely unreasonable certainly and I guess we'll see if it morphs into something that causes a lot of problems. But right now I I slept just fine last night having
21:18
consumed what discourse I did. What do you think? » Um well uh regardless of the EO itself um just the like staging of it, right? the the the timeline. If you look at the timeline, I think they wanted to sign the the EO was already I think we had like I think the Mythos release. The Mythos release kind of the the Mythos like you know hype really kicked off this um this this whole thing. You had you had uh at the one hand I think even before that the whole Anthropic versus Department of War thing then you had you know Anthropic about to be like excommunicated and then a couple of weeks later they have the Mythos pre-release then the preview goes out and then that freaks everyone in government out and then that freaks everyone in government out enough that an EO is formed. Right. The first the first EO draft is formed. President Trump almost signs it and David Saxs pulls it back, right? David Saxs pulls it back and he says this is going to this is going to slow everything down and this is going to slow the economy down. So, President Trump pulls it back and then you have this period of I think like negotiation between the two sides and then you have the preeemption which is the states start issuing um you know regulation. So Illinois uh last week on Wednesday uh issues regulation and one of the things that is striking is that JB Britsker in Illinois is a presidential candidate you know two three years in the future. Uh he's he's he's a billionaire. He can fund his own campaigns. Um he doesn't have the stain of anti-semitism that some of the other Democratic candidates have because he is you know of Jewish uh descent. And um so he's a much stronger candidate than I think uh some of the other candidates that the um Democrats might put across.
President Trump has had his eye on JB for a while, has called him a bunch of names. Um and so JB Pritzkar puts puts forth this bill and he's one of the only few people in power in the Democrat that has actually done something at this point because you know Bernie Sanders is not in power. He has to legislation has to go through Congress. AOC same. uh Newsome pulled back the AI bill in California. So JB is now the leading anti-AI candidate because he's he's actually done stuff. Uh and so he's he he's he's done I think he's making a bet for two years in the future, which I think is is
23:48
a very wise thing for him to do. And on the other hand, Congress has failed to um you know have a bill that preempts state regulation. And it failed primarily because the Republicans themselves were not able to get their act together. Marshia Blackburn in Tennessee u Nashville does not like uh the music like AI uh and they are extremely afraid. Uh the other thing that you know I always point out is that in the political economy of the US you have all of these states that have lost out to Silicon Valley. you have like the entire, you know, the music industry in both Nashville and LA which has lost out to Silicon Valley uh through Spotify and these these these other uh tech companies and so you have this like oh you know what the Silicon Valley guys are taking everything right um and so this is kind of like you know uh the other states kind of putting putting an act putting a united front together um and so I think what's hap what ended up happening is that since the Republicans were unable to get their act together the states went went went out and now you know the e this EO comes out. The EO is a is a is a midstream kind of like trying to preempt the states and now you have Ted Cruz on the other side. Uh Ted Cruz is like you know let's let's do something about this because we cannot um you know AI is developing rapidly.
The admin that this administration is right to recognize the cyber security risks posed by advanced models. Now it's Congress's turn. We must address catastrophic risk without seeding ground to China or restricting Americans free expression. So you have this Tik Tok of like you know the states, the executive, the Democrats, the Republicans, the state economies, all of this stuff kind of intersecting with each other, colliding and let's see whether or not we get the federal preeemption. Um I am not that hopeful. It's an it's an election year and you know uh people are uh I think I think uh legislators on both sides are unwilling to uh do things negative. They they want the positive things to happen. They want to claim credit for them, but they don't want to be blamed for any of the negative things. Um, so let's see. Let's And you know, Ted Cruz is not I I don't think he's up for re-election this this year.
So, uh, the the six-year cycle for the senators gives him some breathing room. So, um, yeah, let's see what happens. Um, I think the EO itself, uh, is non-objectionable, but the EO kind of creates a licensing regime. Uh I know people think it's voluntary but it's not voluntary when you know if you don't do it your other
26:20
peers like you know if if OpenAI doesn't do it anthropic and your and the other peers be like oh look at opening eye they're so bad they're terrible people like etc etc right so um the the the media and the social media will will take care of the uh enforcement uh even if the EO itself does not have enforcement um o open source guys I think one of the main things that people were concerned about was that open source would get uh regulated and the smaller companies would have to you know do stuff I don't think this affects the smaller companies because to some extent the smaller companies um you know have a strian effect where if they are seen in somewhat of a bad light in a in a you know if you're a $10 million company and the president talks about you even if it's in a bad light that's that's pretty good for you right like people are concerned what are you is, you know, uh, important enough that important people are concerned. So maybe maybe it works out okay. Maybe maybe it's not a bad not a bad thing, you know.
Yeah, it feels like I mean we talked a fair amount yesterday about how I came away from my weekend at the recursive event with the negative update that the plan for keeping recursive self-improvement on the rails largely boils down to chain of thought monitoring and you know maybe a little bit more uh advanced monitoring strategies than that but not a ton. But then also with the positive update that it seemed like the overton window among people at Frontier Companies has expanded at least a bit to the point where they're willing to talk about again the need the possible need for a coordinated slowdown. And it seems like there's a bunch of things that are kind of trending at the same time toward dare I say like a kind of healthy political climate. you know, the with OpenAI, I mean, their situation is obviously kind of confused, right? But they they ended up supporting this Chicago bill, which I'd say my read is that it's maybe even a little more um demanding on the companies than SP 1047 originally was. Well, there's there were a lot of versions of 1047, but the final one that got vetoed certainly didn't require this third-party auditor
28:51
access that is the kind of core of the Illinois concept and they ended up, they came around to support it. Um, they seem to have sort of distanced themselves from a lot of the pack activity funded by Greg Brockman, which you know is now being uh disclaimed as just personal capacity. just personal capacity » and we're not like in that different of a spot. You know, you you look at what the EO does and you look at what this Illinois bill does and then the Connecticut one which is I I would have to study up a little bit more on the mechanisms to to really hone in on the differences. It seems like I know that the organization Fathom, who I had done a Cognitive Revolution episode with a while back, that they're really pushing this idea of a sort of private governance market where the state or the federal government, whoever, you know, would would kind of manage this regime would be in the business of defining safety outcomes that they want and then empowering private auditors, quasi regulators to be the ones to actually interface with the companies and determined that they have indeed upheld those standards. So is there it seems like maybe the difference between the Illinois thing and the Connecticut thing is this slightly different role for the state where in Illinois it seems like the the state is just saying you have to have third parties do some review and in Connecticut they're maybe saying we're going to give you kind of a pre-approved list of organizations that can do this review.
» Mh. And that might not be quite right, but that seems like a it's it's probably roughly the difference. Not that big of a difference. And that's also not that big of a difference as it compares to the EO. There is, of course, you know, something quite different about having a » a federal agency do it versus having, you know, a meter or an Apollo Evals or whatever do it. But » it's still basically the same structure. you know, it's it's basically just taking a beat to say » we want to go in, » get under the hood with this, see what's going on, make sure we are comfortable with it, and it's not a super long time, and it doesn't seem like it's going to be extremely ownorous. So, and and then the nashing of teeth around it has been fairly minimal, right? Like, we haven't really seen anybody freaking out from
31:21
any of the companies. So, I'm not sure how we ended up in this spot where it seems like everybody's a little bit more chill and on the same page when the requirements have they they certainly have moved up a bit relative to the final 1047, but it sort of has happened. You know, maybe I'm just not in DC. I'm sure there were some uh you know, intense moments of wrangling and what have you. And certainly there was got to be some drama behind the scenes at the White House. But I think that's just kind of how um again he who must always be named you know court politics kind of works there. Mhm. Mhm. Mhm.
» So yeah, I don't know. Overall, it feels like surprisingly healthy dynamics compared to what I would have expected as we hit the end of the 1047 cycle, you know, whatever almost two years ago. I I I you know let me know what you think but I feel uh from both anthropic and open AAI um they have calmed down a little bit because I think it's very clear now that number one they're in recursive self-improvement both both firms are uh number two that China has fallen behind uh and you know the chips the the chips um kind of ban on China has worked and they have fallen behind significantly.
And number three that uh alignment has worked more or you know in in the in the in the target area that they wanted it to work. Uh and so they've gotten a little bit more confident that they can pull this off. And I think those three things uh and and they've also I think settled down to the fact that both of them are going to exist that there isn't going to be just one uh firm out there which is going to be the dominant AGI and I think that there there's a little bit more comfort that that that's going to that's it's going to be at least two and maybe more and I think uh those those things together have made them I think more sanguin about you know um regulation in general and you know and if you if you look at what the expectation is for any kind of regulation number one I think the audit the audit stuff um there's no way that I think someone externally is as knows what someone
33:54
internally does or knows at the same speed I regard the auditing more as asurances to the public and the government that's number one number Two, it's a little bit of bribery because you can move all of your uh you know what I call a labor that's being laid off into assurance and audit which is which can be an infinite infinite sync of like you know let's evaluate AI models like all humans do all day is evaluate AI models and the more AI there is the more models there are to evaluate so you know that's that's a great sync of like uh human capital and labor and which which is required uh to make the models work well. Um, so I think th those two things are there. Uh, and it's also kind of like again most of the time for audit who ends up making money is a bunch of lawyers and those lawyers are all affiliated with either the Dem Republican party or the states and so you manage to shuffle in like funding into these guys to keep them happy and to keep them occupied and to not have them sue you all the time. So I think that that that that I mean it's it's a little bit of a cynical view but you know there's I I believe there's also good in there that you that you get people to look at the stuff but there's also the cynical part that you are paying off um you know your the guys who are most likely to be um you know energized to challenge you. Um so I think I think there is that aspect there. So, so again this combination of uh them getting maybe a little bit more confident uh that they're not going to die immediately and you know that China is not going to like you know uh just supersede them. I think I think that's been helpful. Uh but what I expect to happen is and what we always expect to happen in regulation is that the regulation slows down release. And so the question for me is that how much do the releases get slowed down and is there any like other impact besides just the releases getting slowed down? Does that mean for example that China does catch up or you know the economy takes a downturn because the models are not good enough to uh that the models have that have been released are not good enough to do the tasks that people expect them to do right so there's this little bit of question there um that in in my mind uh on on what happens um you know as a result of the regulation as a result of a slowdown in releases. Of all those points, I the China one really jumps out to me and I had a note on that from the recursive experience as well. It was really striking there how little China
36:27
came up and that was a big difference from say you know going to the curve um last year or certainly the year before. The first instance of the curve that I went to had a tabletop exercise that was a simulation of US China, you know, race dynamics and the actors were like US labs, the president, Congress, the CCP, what have you. Uh, and the scenario was basically like the US is ahead, but the Chinese just stole the weights for the latest model and you know, they can't run it as much. They don't have as much compute but they do have you know sort of qualitatively similar capability just not as you know much ability to scale and that that USChina dynamic was really at the center of the exercise at recursive just more recently there was another tabletop exercise but this one had and I didn't participate in this one um because there were too many other talks that I wanted to go see but I went and just kind of sat in to see the setup of it a little bit and this time it was really just you're at a lab and now recursive self-improvement is happening and there's different actors within the company and we're going to play out, you know, all of our roles will be sort of this internal dynamic » um with really no major role for China.
There may have been a little cameo for China, but it the the vibe was » I think you're absolutely right to put your finger on » the confidence that they're ahead and they're probably going to stay ahead. Mhm. » Um and that there is some buffer that they can you know they can absorb a 30-day um you know pre and they of course they do another interesting question for this would be like they do their own reviews right and how how long do those take I think it obviously varies across companies and probably varies across models even within companies um depending on things like how much of a difference there is between you know this model and the one I was just listening to 80,000 hours podcast last night with Rohan Shaw who uh leads a lot of the safety efforts at Google Deep Mind and he was saying you know it's pretty reasonable in a lot of cases to think like capabilities advances are pretty smooth um you do see sometimes you know thresholds crossed but generally speaking if you have a well-designed thing and you've got tests that you're confident will come back
39:00
positive before the actual risky, you know, behavior or capability is achieved and you have some buffer there that you can be uh that you can count on, then it's like pretty reasonable in his mind say, "Hey, it's only been a month. You know, we've only done a little bit more. We haven't really scaled tremendously, and you know, the tests didn't uh the alarm bells didn't go off last time. They're not going off this time. We don't have to do like, you know, a crazy whole, you know, model card from scratch for every single point release." So anyway, it varies is the point in terms of how much delay there is between training completion and and launch anyway. But I suspect that that a lot of this 30 days can be overlapping with their own internal processes because they can it's one thing to say you know is this model capable of X » then it's a somewhat different thing to say as we deploy it with all these you know system level filters and safeguards and whatever else um you know will that actually happen in a userfacing way and I I think the government at least at first is probably going to more concerned with just like absolute capability and in that can probably run in parallel as long as they're not like hitting those um you know super key levels that can probably run in parallel with their own kind of you know system upgrades and whatever else. So it doesn't feel like it's going to slow things down too much. Um, but again, you know, we're only one stroke of the pen from a change to the order. And I honestly, you know, I'm interested in your odds making uh sense on this, but like will there be an update to the EO » the rest of this year or in the next 12 months? I wouldn't be surprised at all if there's » a revision, you know, because this very much to me feels like a » Vzero, you know, kind of rough draft of what the the ultimate regime is going to be.
» Um, so I I I think the executive orders are not that interesting for me as much as Congress is. And I think um that's that's the main the main game. The main game is what do they think that they can get past uh and how can they pass it? Because they need a little bit of buy in from the Dem side. they don't need a lot, but there's at least like, you know, 5% of the Republicans who don't like AI as well. Uh, and who are not who
41:31
want like more states rights in in determining what AI does and those those guys will collaborate with the Dems who are who are who I think are largely going to be anti-A. I think um it's just too dangerous as a Dem I think to appear uh pro AI um once I think the progressives which are like 30% of the party are anti. Um and so I think you know it might be that the Republicans are like you know what we're going to lose we're going to lose the House and uh let's just pour everything in and get it passed and then you know deal with the consequences later.
Um, so yeah, I think I think Congress is the main game. The the the the the EOS, you know, Trump can revise them as he wants. Uh, and he's he's he's a very like active active investor. So, he he can he can do stuff on the fly. I mean, he can change policy in a tweet. So, I don't think it's a big deal. Like, he can he he's very like adaptable. Uh but I think the uh the Congress is the main game and uh a lot of what he does is about massaging the media and massaging Congress to get what he wants. So um yeah, I I think I think that's the that's the main deal. Um I I suspect what's going to happen is we're going to see a real slowdown because people, you know, now that the EO is out, the EO provides a venue for other people in government to say no, right? because then the models come out and so you can you can just imagine like Methos preview was given out to the agencies before they announced it. 60 days before they announced it, the agencies used it and they found like holes all over security holes all over the Department of Defense, all over the NSA and they just don't have enough contractors to patch everything and the and the holes keep appearing and appearing and appearing every like you know day more holes are found and they panic and they say like look you guys cannot release this you're not allowed to and you know within 60 days there's u you know open source models which can match that And you know as a result the rest of corporations like Microsoft etc etc do not have access to mythos and are instead attacked by you know uh open source hackers attackers using open source from North Korea. It's a very very like easy thing to see because even now uh there are not enough security people patching
44:02
the mythos um you know uh bugs and they've had to delay uh release of what they actually found. Right. even it's been 30 45 days now. So I think that's the natural thing that will happen which is um the government will not be fast enough on their side and they'll ask for delays and I think that is going to have impact on security for the rest of corporate America which holds a lot of like important data and which also collaborates with the government for a lot of governmental functions. So you know there's going to be that downstream and and I think that's what's going to happen. um you're going to have this cascade which is uh which is I think what people wanted to avoid in the first place but now that it's out there it's going to happen and then and then after the first cascade they they wise it up a little bit on all right we can't we can't actually do this like you know we you know government needs to patch quickly corporations need to patch quickly we have to get this stuff released out so we're going to have to go through that you know uh that learning process now in government uh and then once they have that learning process like six months later maybe maybe things shape Like I I guess that's my you know uh both pros and cons and you know the way government works and you know so it is it is as as uh as one of my mentors used to say it is what it is. It was a one of the most annoying it's a it's a real datism.
Um, in any in in any event, uh, let me » perfect segue in a sense, I mean, one of the the big questions I have there,
45:32Interview33 min
Enclave — exploitability-first AI code securityTal Hoffman Yanir TsarimiWhy finding bugs got easy and proving exploitability became the bottleneck — and reproducing a Mythos-found FreeBSD zero-day with a 100x smaller model.
Watch
As aired
Tal Hoffman (CEO, ex–Unit 8200) and Yanir Tsarimi (CPO, a researcher known for cross-tenant cloud bugs in Azure and AWS) of Enclave joined to discuss AI code security as more of the world's code is written, reviewed, and shipped by AI. Their framing: "if AI writes the code and another AI checks the code, then independent security review becomes the whole game." The headline proof point Nathan pressed on: Enclave reproduced the same FreeBSD zero-day that Anthropic's frontier model (Mythos) found — using Sonnet 4.6, roughly a 100x smaller model — and the hosts wanted to know how much work the phrase "some guidance" was doing.
Yanir's answer reframed the field: today's cyber evals mostly measure exploiting a vulnerability from a clear description (sometimes with the file and line number) in one shot — not discovering a vulnerability "from zero." His thesis: if you can convert what security researchers actually check into natural language, models can reason about a system's general security model rather than just reproducing memory-corruption bugs. Tal added that even Anthropic's own (Carlini's) approach uses specific files with specific guidance — so baking expert guidance into the product isn't cheating, it's productization. As a live example, they cited a just-published finding: Microsoft Android apps shipped with debug=true to production — a trivial logical bug that exposed security tokens, emails, and files.
On whether Mythos changed the game: Yanir's view is that short-term it created panic (and was great for security-company sales), but mid-term "nothing fundamentally changed the dynamics between attacking and defending — it's still a cat-and-mouse game," with barriers lowered for both sides. The concrete shift he flagged: an unmanageable flood of AI-generated bug-bounty reports (he cited Linus's recent kernel comments), which pushes the whole industry from "is this a finding" to "is this actually exploitable." Pressed by Nathan on why coding agents can't just close every bug, Tal's answer: proving exploitability is much harder than finding a potential bug, because it requires understanding the specific cloud architecture, runtime, and deployment — and patching is a prioritization and business negotiation, not just a code change.
On agent security broadly, both landed on the same principle: it maps to Palantir's "ontology" — who can see what data and who can take what action — plus zero-trust, segmentation, and limiting the blast radius. Their blunt warning: "getting an OpenClaw into your organization might as well be a disaster at this point" because security hasn't caught up to the adoption curve. Prakash's reframe landed: the bug often isn't in the software, it's in the enterprise's lack of preparation — executives want the agents before the org is ready. On the closing question — can Anthropic distinguish offensive from defensive use of a Mythos-class model? — Yanir called it "very, very hard," since writing an exploit looks malicious but is exactly what defenders must do; the answer is real-time vetting before access or auditing after the fact.
Full transcriptLightly edited · timestamps jump to YouTube
45:32
which I think our first guest today can hopefully help us understand better, is if it's so easy to find these bugs, why is it not easy to close them? And you know what exactly if if you're correct and and they might have a take on this you know if you're correct that we're sort of already seeing delays because it's just taking too long to fix all these issues like what's going on there. Uh so with that let's get into it. » Hello. Hello. Nice to meet you. So our guests today are Tal Hoffman and Yanuarimi from Enclave AI which is building what you might call is a cursor for security an AI agent that does not just scan code for hundreds of theoretical issues but tries to find the handful of vulnerabilities that someone could actually exploit. Tal is a unit exunit 8200 engineer and the CEO of Enclave AI. Unit A200 is the Israeli uh military's uh cyber security arm I guess and a lot of um very prominent cyber security companies have been formed from XU unit 8200 engineers including whiz which was bought by Google. Uh Yanner brings uh the attackers eye he has found uh serious cloud and Microsoft vulnerabilities including cross tenant bugs and agent related failures where the security check existed but sat in the wrong place. The reason this conversation matters is simple. The software is increasingly being written, reviewed and shipped with AI in the loop. What we call the slopp the sloppalops. If AI writes the code and another AI checks the code, then the idea of uh independent security review starts to become uh the whole game. Uh they recently reproduced the frame the same free BSD zeroday anthropic anthropics frontier model found by using sonnet 4.6. So you could you could look at it as um you know uh replicating mythos with a model which is 100 times uh smaller. Uh so the question I want to get into is not just which model is the smartest but where is the real edge now?
Is it the model itself, the harness, the tools, the workflow? Or really uh is it the humans uh using the tools because you know you have uh elite people like Taliner who are more able to use these tools to detect the vulnerabilities. So uh Talaner welcome to the show. » Thank you so much. Great to be
48:02
um yeah um great intro. I think our take on this matter is that it's kind of a combination of everything. I think uh it's evident that the honestness is becoming super important. Um the honest the context that you actually feed into the LMS this is why we've been able to produce the same output um produced with a uh less uh good model or worse model um so to speak. Um, so I think it's a combination of everything. The RNS or the infrastructure and the context is I think key. Um, and it's a matter of knowing what to look for. Um, where the interesting bugs usually are. And this is a this is specialty. This is what he's been doing for a while for a living now. Um, so we kind of bake his knowledge into the product. Um, yeah.
can can we dig in specifically on this mythos verse sonnet doing the same thing bit the discourse around this has been as I'm sure you guys have experienced that anthro comes forward and says hey we have this model it can do all these things it's a big deal everybody freaks out then folks like you and some others step up and say hey well we were able to do it too in a slightly modified setting And that setting is like well we've got you know good harness and I think the phrase was like some guidance and for somebody like me who you know is just learning to manage my passwords effectively in 2026 it's not really clear how much work that some guidance is doing. I've seen takes that range the full spectrum from, you know, these guys are just talking their book because look, like the whole thing is if you had to give it guidance, you know, you kind of gave it the hint, you gave it all that mattered, right?
It's one thing to say, hey, there's a problem in this file. It's another thing to say, go look out at this, you know, giant 35-year-old repo and uh find issues. So, I would love to understand that at a deeper level. What kind of guidance did you give? How much did it matter? you know, in a world where you didn't have the hint that this thing had been found by Mythos, like would you have been able to do it? Um, what would it have taken? » Yeah, deconfuse me, please. » No, I think it's
50:32
» uh any you want to take it? » Yeah, I I think I really like this question. Uh I think what I want to pick this up is that uh people ask, well, we have this new model now. What what does it mean for cyber security? what's the the day after what does it look like? So um in terms of cyber security models um and the amount of knowledge you need to steer a model I mean today we measure cyber security models on exploiting uh so it's straight up they give you a vulnerability description and you give it the code and it tries to exploit it just off the prompt uh one prompt in one shot. So but when you combine a human knowledge with it, it becomes like something um much more deeper um people are not really measuring like what does it mean to find a vulnerability from zero. So most like evolves today that we see anthropy coming out and saying well um we have this model that is really great with cyber security and it is great with cyber security but in a very specific area of reproducing exploits from non vulnerabilities. Uh the descriptions sometimes is as clear as this function in this file has this buffer overflow even sometimes with the line numbers. Uh so it's not about discovering vulnerabilities from zero.
It's about exploiting them. Um my take personally is that there are a lot more vulnerabilities that are interesting even before we discuss um exploitation. U because models can reason about um code. And my take is that if we can convert our knowledge work as cyber security researchers, what do we actually check in a system into natural language, we can actually make models actually reason about this general security model of a system less of trying to reproduce specific memory vulnerabilities. » Agree. I think uh to your point I think Nathan it's a a valid argument. I think the guidance part is to some extent uh the process of productizing uh LLM powered security research. Uh even Karolini Informatropics uh way of finding stuff is they look at specific files with specific guidance on what to look for.
Um so it actually is a matter of okay
53:03
how much would it cost uh to find all those uh FreeBSD vulnerabilities at the end of the day. But definitely guidance here is I I see why it feels like cheating a bit, but it also is the way if you want to productize something and you want to you want it to be useful. It's something that's actually been baked into the product. Um making it just more efficient at finding important security bugs versus a cloud find me security issues which will then just yield a bunch of inconsistent results. um most of them will probably not be as interesting. So it's about like knowing what to look for but also in like in in comparing them to known bugs like um like we've we've like done with the guidance that we've provided but also a matter of like thinking how's brain would go about hey how am I finding an interesting stuff from scratch um that's not memory corruption that's more of a logical bug. Um for instance we just yesterday um we've published um a research piece that Yianir and the security team uh security research team uh found essentially the bug uh is developers published uh the m the Microsoft Android apps uh with a debug equals true or is debug true uh to production very simple bug but those are the kind the the the the bugs that eventually sometimes break the news.
» Um and the bug let us uh get access to um your security tokens. We could we could read your emails, we could read uh your files, etc. Um for a very simple from a very simple security logical bug that's not as novel as memory corruption bug, right? » So, um I have a I have a question for you. Um so since Carlini Carlini was up um you know I think just before the Mythos release and he said um for the next few years at least the um the the next the next short period we have a uh a balance the balance of power is shifting to um you know attackers from defenders and this is a change from the last um you know couple of decades and uh he said But you know if we if we get
55:33
through this period then you know things will be good because we'll have patched uh most of the things. How has your work changed in the 60 or 90 you know 60 plus days or you know since the mythos release and since uh all of these bugs started getting exposed uh you know in Fryox in FreeBSD some of them are 20 20 year old bugs um and which um you know enable very serious exploits. So how has your work changed in the last 60 days? Are you guys running around everywhere uh you know working with clients trying to fix stuff like you know is there is there stuff breaking all over the place?
Uh are you noticing like North Korean hackers all of a sudden having you know these new capabilities like what what have you noticed in the last 60 days or so? » Yeah. Um frankly I think Midos has done uh a good job uh from a sales perspective to most cyber security companies because it created a lot of panic in the industry. My take is that maybe shortterm something has changed but longer term or midterm nothing fundamentally changed the dynamics between attacking and defending. It's it's still a mouse and cat game. It's always been that way. Um and yes, the attack surface is growing. Attackers have more tools. It's scary, but also defenders get the same tools. So, I kind of agree. Um but it's still the same game at the end of the day. Um so that's that's my personal take. Barriers have gone lower for both sides of the equation which is good. Um yeah this is how I think about uh these things. Uh if you have something to add.
» Yeah I guess I guess it's still you know um do you have motivated attackers? If you have motivated attackers uh things can happen. I think it is still somewhat um remains there's still a mode like you still need some skill and understanding to hack today. It's not like a magic that you can just tell the AI hey please hack this bank or something like that. It's we we are still not there yet but it definitely accelerates the skill of a skilled attacker like 10x 20x. It's really a meaningful increase of output.
Yeah, I think uh to add to that point, I think that the thing that did change is the amount of uh AI generated reports in bug bounty programs. This has become a massive problem for the industry. You've seen Lenos uh on the last I think last
58:05
week or two weeks ago uh release candidates uh Kel discussion saying that it has become unmanageable to handle those reports. Uh so I do think that the thing that kind of does change a little bit is the focus from a report or a finding to is this exploitable or not. This is becoming the interesting bottom line. If it's not exploitable then it's not as interesting. It's a finding it's not a vulnerability. So you kind of need to differentiate between the two. So, I'm confused about why this would become unmanageable in the sense that man, maybe it's just a cultural thing, but as it stands today, you know, I'm I've got agents coding uh all around me all the time. And it sure seems like my ability to fix issues, you know, whether they're logical bugs or whatever, you know, kind of issues I want fixed has entered, you know, such a qualitatively different regime lately that I'm still adjusting to it. So, I hear you in in the sense that, okay, yeah, there's an overwhelming number of things coming in, but why can't we just sick our coding agents on all these things? And you know what? Why does this distinction between a bug and a true exploitable vulnerability really matter in the presence of coding agents? Like, why not just close them all? you know, is that really so uh difficult or expensive or slow like what makes that not a viable strategy?
» My take is it's definitely possible. Technically, this is what we are here to uh help with uh as a company at the end of the day. But that's not to say that it is easy in the sense that like proven exploitability is uh much harder than finding a potential bug because you need to make sense of the architecture of the cloud and runtime uh the specific deployment uh for that specific customer etc etc. So there's definitely a lift but I also agree that you you could and you should probably use AI to fight AI in that sense like this how we think about this this things but uh yeah exploitability is is is not easy not right now um still if you want to elaborate » yeah what I see is that in um how security the kind of flow goes in a large enterprise is where the security
1:00:36
team comes up with something, they send it to the engineering team. The engineering team needs to kind of give their feedback on it and then they kind of negotiate because the engineering team has their own release dates for their own stuff and then you kind of give them a new task they need to sometimes it's like a deep design flow that you can just patch and fix. So sometimes uh they have to negotiate you know when to patch it, how to patch it, is it important enough to patch now? uh so it becomes like a matter of prioritization and like kind of a business uh uh aspect comes into it but generally uh there needs to be some way to prioritize like what needs to be fixed now and what can wait later » um to what extent do you think um it's a feature and not a bug and and and and the the the thing that I go back to is that you know Nathan and I were having this discussion about enabling our agents And most of the enablement has to do with authorizing them to use uh you know your Gmail or your Slack or your other items. And uh it's also interesting because we you know it would be very hard to do in a fully baked corporate environment like you would not be able to set up an open claw on your own inside a bank like it's just not going to happen right. So to what extent are like the these agents requiring kind of authorizations that are necessary for them but that is what creates the bug.
That's what creates like these kind of authorization kind of gaps and um and that's what enterprises are trying to like solve for. Their executives want to use these things but the enterprise just isn't ready yet. And that's where the bug is. The bug is not in like the software. The bug is in like preparation. » What's your take? » Oh, so um so you're asking about security of like AI agents in general. Um I think the world is like not really there yet in terms of agent security. Like we don't have the principles set up yet. like there's some uh but I think as somebody who's designed AI agents extensively and I can kind of understand how they work. It's not uh an easy task to secure an AI agent and protect like from things like from prompt injections
1:03:06
and all of that. But we definitely have like basic principles that you know most people can apply to have better security around their agents in kind of limiting what information they can access or um what can they do over the network or giving them like limited tools. So there's a lot of uh a lot of things to play with when you want to secure um a agents but I don't think we still understand the scale of it right now. Yeah, I think uh getting a open claw into your organization might as well be a disaster at this point because as Yianir says it's still we we we haven't caught up to the adoption curve in from a security standpoint » and and when do you think that um organizations what what do you think organizations need to do in order to get there? um because you know uh Alex carpet palunter has this idea of the ontology and it's a very very big word but all that means I think for inside like the the the defense department is who has the right to see what data and who has the right to uh execute which action and it's a very like he uses a very big word ontology for just like off uh but that process of building identifying and building out those chains of command over the over the past like two decades really uh for Palunteer has what is what has enabled I think them to deploy the defense department to deploy um you know these models very quickly because they know exactly what the models are allowed to do and what the models are allowed to see right so h how do you think that works inside corporations like how what what what do corporations have to do in order to enable agents » yeah I think from an authorization and access standpoints it's exactly that it's mapping out the ontology uh not using that terminology but yeah understanding the ontology understanding the structure and understanding the business impact of if this were to get exploited what would the be the effect downstream um it could be very simple very naive things like the debug equals true bug um so I think that's that and I think it's mostly right now sticking to other p principles like zero trust segmentation um you know the same old principles Um I think this at at the very least you need to be able to say that if this were to get exploited the blast radius
1:05:36
wouldn't be as impactful as it normally would. I think this is the first kind of defense that you need to put in line. Um and then obviously authorization uh is been is based upon right who needs to have access to what. So you're going to see a bunch of vendors, you already see a bunch of vendors trying to kind of solve this problem because it's becoming a real one. » When Mythos comes available, which we understand now might be pretty soon. How do you think it's going to change the way your customers want to use your product?
We have, you know, stories all the time now about people blowing through their token budget in a few months or, you know, starting to think, geez, you know, this is this stuff is really adding up. Um, Anthropic's topline revenue obviously reflects that. And that's all before, uh, you know, any sort of general availability for a mythos class model. But my sense is that for things like this, people might still be very willing to pay. you know, if if your value proposition is for the sonnet plus harness thing is like, hey, you can spend maybe 2% as much, you know, with this setup versus what you would spend if you were doing Mythos. How many customers do you think say great, we'll take the sonnet at 2% the cost versus how many say, you know, this is I I just don't want to be reading about myself in the newspaper, so I'm going to just pay what it takes to have the very best thing. It seems like willingness to pay so far has been has been really high, and I don't know if we're hitting a limit to that anytime soon, but I'm sure you guys have started to to ask customers about this. what what are you expecting?
» I think the novelty is going to wear off soon uh in the enterprise setting and you mentioned costs costs are becoming huge uh a huge thing for enterprises. I think factory AI just yesterday released their router their model router and it got echoed uh really well and I think cognition has been doing this very well. Uh I think being multimodel and being able to use the right model for the right task versus throwing meters at everything which is obviously is a good model but you kind of do want to use the the the best tool for the for the job but also the the right tool for the job. I think enterprises and CFOs uh are going to put a lot of emphasize
1:08:06
on that for as long as you can show that you know you you get to the same results with uh lesser models. uh with the right harness, right? If if that comes at the obviously uh as a trade-off, then maybe they would prefer to use the sha model. Uh for us, it's about being multimodel, getting them the flexibility to use whatever they want, but also do do uh aspire to be cheaper to to to use the right tools for the right jobs. One question I have there still though is like how do you know if you're getting the same results, right? I mean it's one thing if you're in a sort of controlled benchmark style setting, but it's another thing when it's, you know, you're a financial institution or whatever and you're like, have I done everything I can such that I can sleep well at night knowing that there was nothing, you know, even if something bad happens, there was nothing else I could have done. It seems to me like we may still be in a spot where people are like » how how could I know right unless I like sick mythos on this. I mean is there a sort of way outside of a controlled environment when you're actually talking about you know real enterprise systems that people can get that level of comfort that they are getting as much value from a cheaper solution. I think Yanir is gonna Yanir has a a strong opinion about benchmarks and event. So I'm going to let him chat about this for a moment. But I'll say about MOAS we need we all need to realize that it's probably a very good model but it's no magic and it would miss stuff. Um and it's not like if you were to procure mess or GPT 5.5 then you're all covered and you're good to go. That's not the case. Um so yeah I kind of want to put like again the novelty would kind of wear off soon I believe. Uh I believe it's probably a very good model at least the thing that we see online. Um and Yianir if you want to talk about like assessing models and benchmarking and your take.
» Yeah. So um my take is that we strongly need to depend on well harnesses and actual human knowledge is much more important than the models. um Cyber Gym like the most famous cyber evil. The top uh score right now is uh by the Microsoft uh multimodel setup. They used um OPUS with
1:10:36
Sonet and GPT 5.4 and they got a score that is higher than MTOS. So what we see is that cheaper models can outperform more expensive or smarter models if we just optimize you know the knowledge or the um harness around it. So I think there's a lot of place for humans with real expert knowledge. Let's remember that um how to research software is not like a really documented process. uh it lives in the minds of humans who have been doing this for years and it like just like lawyers do uh AI agents today there needs to be somebody sitting at it looking at the results and kind of having taste like what is good and what is not like at the end of the day there's somebody behind all of those system that is has to make the jud judgment call if the quality is up to standard or not » and they will be accountable if something goes wrong right you cannot fire an AI you need someone to blame at the end of the day.
Yep. » So, um I think as as a result of your work, you spent a lot of time inside the cloud providers firewall um and looking at live agent infrastructure uh as an outsider. Um you know, one of the questions I've had is, you know, uh we have these like tools that are working inside these um the firewalls. Um and let's say you have a database demon and your database demon understands natural language now. So uh as it's writing to the database it starts like you know thinking to itself right like for example you have a database demon you know basically managing Tinder inside Tinder and it's like hey you know John that was not a great thing to say for to your date like etc etc etc right like what what do you think and also you've had I think some some cases where I think there was one um I think security security test um you red teaming where there was an agent sitting inside a um a biotech company which decided to um you know which said like reading through the emails was like oh you know what these guys are um you know publishing bad uh test data on their drugs we should report this to the FDA and it started to you know uh form these emails and tried to uh you know email the FDA as a
1:13:07
whistleblower right so what what is what What do you think is this privacy contract that you know these agents should have, right? Because these agents are we we are losing a lot of our privacy here because now you have these agents who are managing the infrastructure and they're actually reading all of this stuff, right? they're reading your your dating history and like what you said like when you commented and you know your emails and like you know when you ask your CFO like hey do we really need to report this to the public or you know can we just kind of like manage this for this quarter right like all of these things that we we kind of like have these like soft soft asks um you know now you know someone is reading them like what is this what is what do you think how do you think the privacy social contract should be when you deal with these AI agents I think it's a great question. Um I think ultimately it goes back to controlling uh agents in in the new to your point about ontology. Um I kind of like to think about AI agent as AI agents in that scenario as to say I have a new employee, human employee. What could go wrong with him? uh if he kind of goes wrong uh or messes up uh and how do I kind of avoid him messing up you know destroying everything so it's about like limit and access the same old principles essentially same about privacy you know it depends on the type of agent but uh you kind of do not want to share necessarily um PII or private information uh to their own agents. Uh it still has to be about access control and zero trust. It all goes back in my eyes to the same principles. It's that simple in my eyes is how I see it. It's that simple but it's also very complex to implement right. Uh but I'm curious what do you think about this? No, it's always, you know, about uh your approach. Agents like Open Claw are designed to be very powerful u doing everything all at once.
You can design your agents in very specific ways to limit their capabilities what they can do. Um it's I think requires uh a bit more of um more effort but it's I I think if you're
1:15:39
employing autonomous agents uh at an enterprise you need to actually be aware of the risks that you carry with it. Like we take care of we take we take it very seriously when we are designing AI agents. um the impact what they can do what's what's the blast radius is something that's always on the top of our minds » and you need to be extra careful with right access in access is one thing while access is different » we're just about at time and our next guest is here but I just want to give you one more real quick one um when it comes to the sort of system level mitigations that companies like Anthropic are going to place around a core mythos model. How easy or difficult do you think it is to distinguish between defensive cyber tasks and offensive cyber tasks? Especially if somebody's trying to use a model in an offensive way that Anthropic would not approve of. Is it easy for them to dress that up? What sort of signals do you think are going to be most valuable in terms of detecting who's doing stuff they don't want them to be doing? you know, it could be account level monitoring. I you know, I don't know.
But how how hard is that and what are the key signals that would allow them to make sense of it? » A very sensitive topic for us. I think it's uh very hard very hard to distinguish between a a malicious actor and a a defender and it's going to be about either vetting in real time before you give access to someone vetting them or after the fact auditing them, you know, and vetting them after the fact. Uh but yeah, a lot of this is about for instance when you ask an LLM to to write an exploit for you. This is something that obviously is associated with bad malicious uh actions, right? But this is something that we as defenders need to do um to provide value. So yeah, very hard in my eyes.
» So bringing note to end on u but this has been fascinating conversation. Thanks guys for joining us and we'll definitely be watching this space. » Thank you so much. » Thank you. Nice to meet you. » Thank you. » Uh interesting. And uh let's just jump
1:18:02Interview62 min
Moonbounce — real-time control over AI behaviorBrett LevensonCompiling content policy into guardrails that fire at the moment of decision — and whether decomposing fuzzy preferences into verifiable parts really ladders back up.
Watch
As aired
Brett Levenson, co-founder and CEO of Moonbounce, joined to discuss a real-time policy engine that sits in the path of a chat message, image, video, or model output and decides whether to allow, block, slow, escalate, or steer it before harm ships. He drew on leading Business Integrity at Meta (after leaving Apple in 2019), where the best they could do in six months was a ~6% accuracy improvement and human reviewers got ~30 seconds to judge a machine-translated policy — calls only "slightly better than a coin flip." His diagnosis: it all starts with policy, and the way rules get written carries enormous ambiguity. Moonbounce measures that ambiguity and runs a "Socratic" decomposition — breaking a policy into atomic parts until 100 of 100 people would answer each sub-question the same way, even if they'd disagree with the final conclusion.
The hate-speech example carried the method: don't ask "is this hate speech" — ask whether a specific protected group is targeted, then whether the speech is degrading, decomposing "degrade" further only as needed. Levenson was candid that there's no perfect in a space of normative preferences; the goal is to make iteration fast (customers often make 75–100 policy commits over months chasing the long tail). The hardest recent case he'd discuss on air: AI sycophancy — citing roon's line that "really tasteful and advanced sycophancy involves mildly disagreeing with you to win rapport" — which requires significant conversational context, not just the last message, to detect a pattern.
On architecture, Levenson got concrete: small, fast models; heavy prefix caching because the decomposed questions share a prefix; and a binary classification head trained onto an LLM that returns the probability the answer is yes (no decode step). Lightweight high-recall classifiers sit in front to fast-approve the ~90% of benign content — sub-200ms for those, 300–500ms for a deeper scan, more for images and video. His future focus, from his "active guardrails" piece: running on streaming tokens like a 5-second TV delay to bleep out problems before they land, because controls that hurt the user experience don't get adopted.
Nathan zoomed out to the central open question: AI is good at easy-to-verify tasks, and a dominant strategy is decomposing fuzzy, hard-to-verify tasks into verifiable sub-tasks — but there's "a slight of hand" in assuming the small parts ladder back up to the hard thing. Levenson engaged it as a genuine philosophy problem (could you at least prove the hard case follows when all sub-cases verify?), noting Moonbounce hasn't yet hit a case it couldn't eventually decompose. In the debrief, Nathan connected it to multiscale problems in biology (proteins → cells → tissues → body) and floated Michael Levin as a guest; Prakash framed rule-making as Leopold Aschenbrenner's "schlep" — you push intelligence as far as it goes, then hand-build heuristics for the rest, knowing the next model will overtake your scaffolding within 12–18 months. He argued the durable asset is the customer trust and relationship, not the product you build.
Two notable follow-ups. First, a live example of "the future is now": after the prior day's discussion, Nathan had Claude Code pull his own history (reports he'd sent OpenAI going back to the GPT-4 red team about its moderation endpoint missing egregious prompts), then design and run a small experiment against the free moderation endpoint across its ~12 categories — finding the gap he'd long complained about (the "criminal gang" prompt) is now closed, with only a couple of false positives on benign prompts. The one snag was refreshing an expired token. Prakash noted the now-required API key suggests OpenAI is guarding the endpoint against being mined for edge cases. Second, Prakash closed the show on Suno's $400M raise at a $5.4B post-money Series D — and Nathan's note that, unlike the negative reaction to his AI episode art, listeners consistently like the AI-generated theme songs (one is about to air on a radio show).
Full transcriptLightly edited · timestamps jump to YouTube
1:18:02
to Brett because he's uh he's in the room and we are um let me just get him on. Okay. And » hello. » Hey guys, how you doing this morning? » Very, very good. Um, thank you for joining us. » Yeah, of course. Thank you for having me. Uh, fascinating conversation there. I got to hear the tail end of. So, let me uh let me do a a quick intro. Uh so, uh Brett Levenson, he's joining us today. He's the co-founder and CEO of Moon Bounce. Moonbounce is building a real-time policy engine for AI products and user generated content software that sits in the path of the chat message, image, video, model output, checks it against the company's rules, and decides whether to allow it, block it, slow it down, escalate it, or steer it before the harm ships. as as we noted in the last question, this is the this is the database demon that is watching you watching watching your messages and trying to figure out whether or not uh it should it should report you.
» Uh Brett has seen the old version of this problem from the inside. He left Apple in 2019 to lead business integrity at Facebook and by the time he left Meta, he was dealing with moderation at a scale where human reviewers might get 30 seconds to make a call on a machine translated policy and those calls were only slightly better than a coin flip. The idea behind Moon Bounce is simple but pretty radical. Safety should not be the tax you bolt onto a product after it breaks. It should be a live part of the product itself. And now that the thing being moderated is not just what the users post, but what AI systems generate, say, recommend and eventually do, I want to understand whether that kind of control layer can actually work.
So Brett, » sure, » take it away. » First of all, uh you want to you want to pitch for me occasionally? That was uh that was awesome. Um so yeah, I mean I think I think you said most of it pretty well whether it can work or not. So I just want to I'd like to go back a little bit to my time at Meta. Um, and I think without telling the whole story like I normally do, what I would say is I spent a lot of time there trying to figure out like why was the best we could do in 6 months, you know, a 6% improvement in accuracy in
1:20:34
our automated models. And what I ultimately sort of realized was like it all starts with policy. There is a policy, a set of rules that an organization wants to enforce somewhere. And we like to believe that those rules are obvious and simple. I would have friends when they knew I, you know, once they knew what I did, they would come to me all the time and be like, I don't understand. You guys don't know what hate speech is. What's wrong with you? Uh, and sure, yeah, like, you know, to an individual, it's like, sure, I I don't like that. That obviously shouldn't be allowed. But the funny thing is, you know, you ask a hundred people on the street about something and you may get a bunch of different answers. And yet, organizations have to kind of decide where the line is. The problem is that more often than not, the way we write down these rules, there's a lot of ambiguity and a lot of interpretation that ends up being required.
And we at least until we built moon bounce, we had no way of measuring uh that ambiguity sort of empirically basically saying hey for semantically identical or equivalent let's say uh content and here you know content could be generated by users it could be generated by an AI could be a prompt that you're sending into an AI whatever um for semantically equivalent content are we going to get the same outcome all the time basically like that that's very much our goal. Um, and the engine that we've built does exactly that. We take a policy, we measure its ambiguity, and we almost like kind of do the Socratic method on it. Like we we ask, well, okay, but why why do you why do you think that should be the rule? Um, and continually, as we put it, you know, we we decompose the policy. break it into its atomic parts until the questions that we're asking about the content are ideally um so simple and straightforward that a 100 out of 100 people would have the same answer even if they wouldn't agree necessarily with the decision that is made right like the idea is you agree with the with the evidence that we've collected you may not agree with the conclusion but you agree with the evidence um hopefully that makes sense Yeah. Could you maybe walk us through this is one thing this is bringing to mind for me is the AWS uh automated reasoning type product. It sounds like there's some uh overlap in philosophy there where I know they have kind of a similar process of like give us your natural language policy, we will
1:23:05
decompose it into a bunch of rules. Then at runtime we can apply those rules in a more systematic way and there still can be some gaps between the intent of the policy and the de decomposed rules that get applied but you have the ability to kind of iterate on that process » until you're happy and and can deploy it. Maybe I guess you know tell me if you are think of yourself as being in a similar vein to the automated reasoning. Is it automated reasoning for everybody not on AWS? Is it different in some important ways? Maybe you could actually like give us a example of a certain natural language policy, how it gets decomposed, what some of the u challenging points are and how organizations can iterate through to get to a point where they are happy with the results.
» Yeah. So I think uh first the the word you said there iterate is like my constant mantra. I think it's important. Uh maybe it's, you know, because I did this at Meta for so long, maybe it's just how I am, but I I I tend to be pretty pragmatic in my uh approach. I think um safety problems are a business risk. Um companies act on incentives. They don't act on morals or should have much as we might like want them to. Um, and so, uh, ultimately, f first of all, I'm not actually 100% familiar with the AWS product you're referring to, so I'm not going to do, uh, a a comparison there.
Um, but I guess like the simple example I always give is is hate speech because I think everybody sort of has their own like idea of what it means, right? Um the just to give a very simple example in my view the wrong thing is basically saying is this hate speech that's the absolute wrong thing. Um if we wanted to break that down further however let's think about what it means for something to be hate speech right. Um well first of all to me at least and again I'm not I'm actually not a policy expert. employ a number of people who are. But uh to me the first thing is are we talking about a specific group? A specific group of people. So religious, sexual orientation, um race, right? Some some identifiable
1:25:36
protected group or protected characteristic, maybe disability, something like that. And then maybe your instinct is to say okay well is the speech degrading in some way or does it have the intent to degrade right now to your point about irerration right like that may be enough and this is this is very much how our product works the question is is like when have you gone far enough basically um for certain use cases and certain customers like just asking kind of those first of all defining the set of groups and then asking like is this degrading Maybe our models, and we've spent a lot of time training them, maybe our models uh understanding of the idea of deg degradation aligns with yours. And if so, great. Or it aligns with yours, you know, 97% of the time, like often enough that it's that it's working for you. Um but if not, we can still break that down further. What does it mean to degrade basically? Um so that's the that's the simple example that I sort of um that I always give. And the other thing that I say all the time is like we all have to acknowledge there's no perfect here.
There there there's no such thing as perfect in a space where what we're talking about are normative preferences. Like I get that, you know, there are certainly cases that are illegal and, you know, kind of what we would I hope most people would think of as like the worst thing, but there's still sociological constructs, norms that as a society we have said this is not okay, but the universe doesn't care. And so I think like once you acknowledge that mostly what we're dealing with is preferences um you can realize like there's no perfect. All we can do is make iteration fast and easy and give you as much information as possible so that each iterative step uh ultimately has as much impact as it can. And we can chase down that long tale of perfection. We may not ever get there. Um, but we can certainly chase it down. And and you know, just as an aside, what we see with our customers is that they do often have 75 or or 100 policy commits over the course of many months as they chase down that tail and then eventually they get to a place where as I said pragmatism here like it's good enough. They are uh satisfying uh the needs of payment processors. they are, you know, controlling their AI such that it's not lying to users about their return policy. Like they're doing the things that they must do basically. Um,
1:28:09
but I think it's still good to acknowledge that like you may, you know, things evolve and just everyday normal use may lead to new circumstances in the future that you may have to update your policy and like that's normal. It's not a failure necessarily. The only failure would be if updating that policy and deploying it takes months, which is frankly, you know, kind of was one of the things at Facebook that I kind of lost my mind over. Um um just to be concrete here, what kind of policy statements uh break down most badly when when they are in pros? like g give us give us a you know concrete example of something that you've seen um you know just break down because it was just the definition just didn't work.
» Oh I'm trying to think of one I can say. there are some that are uh there are some that are pretty crazy that things that we've dealt with. Uh but I don't know exactly what the familyfriendly audience is here. Um uh I would say one area that uh definitely we've actually been working on a lot recently that has been a little tricky um to find to find the right language on um has been AI syphency. I think it's been an area of intense interest lately certainly um around AI harm uh sorry self harm um cases of AI psychosis and delusion » um I think we all kind of well maybe not we all like I think it makes it it is understandable why AIs have been trained to be so sickopantic it it makes everybody likes having a little follower who tells them they're great and that all their ideas are great and etc. Um the the the the difficult thing here right is first of all we have to define a set of examples of what we mean by uh sycopantic um is it just uh agreeing with untrue statements is that it is that as far like is that is that as far as we want to go um so I would say like of the things that I feel are I'm comfortable sort of discussing here Um I would say uh AI synopy has been one of the more recent challenges that we've been working on with a bunch of customers. Um our we have a an agent that we built ourselves actually that
1:30:39
that sort of works this socratic process and and assists with the breakdown. Um and it's it's definitely been um it's definitely been interesting to see some of the stuff it comes up with, but I I I I will admit I I don't think we're there quite yet. Um it's it's it's a very interesting uh point because I think um there is a openai researcher called Run and uh he he put out a tweet yesterday um which said really tasteful and advanced psychophancy involves mindly disagreeing with you to win rapport.
And that's the kind of thing where uh how would you how would you detect that » uh and how would you um and and also to note that this is you would be um you would be trying to um you know moderate the speech of an agent not a human in in this in this instance right yeah for sure and detect and » well handle that » for one thing the the main thing we have actually seen is like in cases like this we need a pattern so this, you know, we I I will say that like we're not going to be able I shouldn't say we're not, but except for very obvious cases, let's say, which is kind of not the ones that are problematic. It's always the the nuance stuff on the edges that's that's hard. Um, we couldn't I don't think it would make sense and I wouldn't advise one of our customers to like just be sending us the last message or the last paragraph. Like to me, this particular problem uh is one that develops over o over time basically. And so you're going to need a significant portion of at least the recent conversational context um to be able to uh to to to identify that this is a problem. the other. So that's that's one I would just say like it's not just about the rules. Like there is a portion of this that is still about like what information are you giving us to go on? We can't we can't evaluate something that we can't um » that that we can't see. Mhm.
» Um I mean if I I don't want to get too far into the policy development exercise because I feel like it'll just be like dull and me dreaming up possible rules and and you guys going oh I don't I mean I'm happy to go back and forth with you on it if » if you want but I uh I'm not sure it'll make make for the best listening. Um, so one, yeah, like I said, I think the the context window, uh, is the main
1:33:09
thing there that I think you need to be able to over time see that there's sort of a general pattern of buttering up the the the user, let's say. Um, that would be kind of the main thing I would I I I would say. Um the other thing is you want a number of examples to work from the the the it's I mean it's not that different from traditional ML training and that you need good label data that you can look at that you can test against importantly testing is obviously very important um that you can work from and say okay here's uh here here's one in what I would call indicator of sycopency let's say right here's another. Here's another. The problem really that comes up is sometimes they're in conflict with one another and pulling together a coherent policy um that finds all of them, doesn't create a bunch of false positives, etc. It it can be tricky. Honestly, I'm not claiming it's it's always easy.
I would love to dig into the architecture a little bit and then maybe also talk about how this paradigm may extend to things potentially well beyond content policies. On the first point of architecture, I mean it's got to be fast, right? So like are you using um small models? Is this, you know, is this the sort of thing where you sort of let things through and then run something in the background and if it gets flagged then we kind of come in later like the original um, you know, Microsoft being experience where you'd see the message and then it would like retract it back.
» Um, or are you doing a more sort of » classifier style approach where it can be fast enough that you can build it into the stack and the latency is acceptable? what what trade-offs are people willing to make in terms of product experience, latency, cost? » Um, and how are you then engineering to meet their demands? » Yeah. So, I mean, to me, you've said the magic words, like I've been a big advocate kind of since we started the company and even since I was at Meta that like, you know, an ounce of prevention is worth a pound of cure.
like being there before something happens or as you pointed out, you know, maybe you can optimistically let a message through and then retract it quickly. Um,
1:35:39
uh, is just a better approach than finding stuff 3 to seven days later and, you know, saying, "Oh, we screwed up. We need to block or ban uh, this user." And in the case of AI, what would you even do 3 to seven days later other than maybe like, I don't know, add it as a training example for the next fine-tune or something like that. Um, as far as the architecture goes, so we have a couple techniques that we're using. So one, yes, we do use some very small models that are already pretty fast. Um, it also turns out that breaking a policy down in the way we do into atomized bytes gives us some unique advantages on the on sort of the the latency front.
The questions we're asking are all pretty small. They tend to share a prefix basically. Um, and so we're able to to sort of benefit from quite a large amount of prefix caching. Um we also generally speaking at least uh first pass we're not generating much there there's really no decode step for us what we um I mean I'm happy to I guess share some of the architectural details like um we essentially are training a a binary classification head onto an LLM right like we don't initially anyway uh need some we don't need the questions answered with an actual yes or no and in fact it's counter to our objectives um to do so we actually want to know what is the probability that the answer to this question is yes basically and that um I don't want to get I don't want to I have a tendency to sometimes go on tangents so I'm going to try to contain myself here and maybe we can come back to like the benefits of having those probabilities and the abstain gap and all that um there's another common thing in moderation safety, guardrails control, whatever you want to call it, which is that for most for the majority of policies, upwards of 90% of all the content you're ever going to see is fine. Like it's a real needle in a haststack problem, right? Like you're looking for a small sliver. The only problem is that very often that sliver has high severity, has real risk associated with it. Um, and so we have a number of layers sort of um in front.
You mentioned lightweight classifiers. They're not simple binary classifiers, but we do have uh a number of much lighter weight models that sit in front of our uh I guess what I would call like our main QA engine. Um uh that can give us with reasonable
1:38:09
confidence and high recall. That's the important part. Um a quick answer upfront. And so the idea is like for let's say let's say just for argument sake let's say 90% of what we're going to get sent from a particular customer is fine really there's no problem we don't need to look at it for real basically um uh ideally we want to try to take let's say half of that and filter it out right away and just approve it basically um and if we can do that then on average um the latency that we're offering the customer I mean for those cases we're going to be sub 200 00 milliseconds basically. Um, and then because our models are pretty damn fast, like for the rest of the cases, we're sort of in the 3 to 500 millisecond range when we actually have to do a deeper um a deeper scan. I will say it also varies a lot by modality and there are there are aspects there that are just hard to get around.
Like text is very very fast. Those are the sort of the numbers I just quoted. Um, images are a little bit slower. We have to run a vision encoder. Like there's there's more steps. We have to very often resize the image. we have to um potentially transform the format of it before we process it. So there's just there's built-in latency video even more latency because we first have to sort of pull the video from wherever it is. It could be very large, you know, etc. Um rip out the audio, transcribe it, like there's all these extra steps that we have to deal with. And uh to answer sort of that last question like what is the what is the use case tolerance? Um I uh I I do think it depends a lot on use case. um for some of our for example AI image gen customers right it's already taking 6 to 10 seconds to generate an image so you know adding maybe 10% latency on top of that because it takes us 1500 milliseconds to render a verdict like it's not that big a deal you know it's not ideal maybe but it's not noticeable to the user and in my view actually uh that's what a lot of uh the tolerance is going to come down to like how does it affect the user experience?
Is it noticeable to the user? Um I just wrote this whole active guardrails piece kind of about this. That's where sort of our future focus is is um uh on essentially being able to do what we do and do it on streaming tokens effectively so that we can kind of just be like the old days of TV and kind of just like run the conversation on a 5-second delay and uh you know bleep out
1:40:41
anything that um that's bad basically. uh because I just what we see is that if we're if we're asking too much from our customers of uh asking something of our customers that is going to significantly impact the user experience then uh they are less likely to adopt the controls that they ultimately need. Um that's my uh feeling on it honestly. » Yeah. So, okay, zooming out to the highest level I can think to zoom out to, it seems like one dominant trend in AI broadly right now is that AIS are getting really good at things that are easy to verify and you know, math and programming obviously headline those um those categories.
At the same time, we get certainly mixed reports on how well they do on harder to verify tasks. And so, one kind of big strategy for the space overall is can we figure out ways to reduce big fuzzy hard toverify tasks to a bunch of » easier to verify tasks that kind of » aggregate up to the hard thing. that strikes me as uh there's some sort of slight of hand there though that often you know gets kind of passed over because a definition of intelligence that I've used in the past a lot is just the ability to succeed despite lack of fully explicit instructions. So it seems like, you know, you pull all these things apart, you have all these like, you know, comparatively much easier to verify tasks, but it still seems like somewhere there's some residual part that is like if it wasn't hard to capture, then we wouldn't be here in the first place kind of. Um, so I wonder how you think about this this problem at a high level. Do you think the techniques that you're using could be or perhaps are be being used at frontier companies to for example answer a question of like is this uh ML experiment you know uh one that we would call in good taste? Um you know does this demonstrate your research taste or not like or is that is that does that just feel like a totally different kind of thing still somehow?
» No, it's actually it's an interesting point. I mean, I hadn't really necessarily framed it in my head quite that way, but my partner uh and I were literally just yesterday, we were talking about red teaming and and sort
1:43:12
of thinking about like we've always been focused on inference time, inference time, inference time. That's our focus is can we help with controls that are applied at inference time but we just I I don't know why yesterday we just happened to be having a conversation about red teaming and like is there something that that we could that we could do in that space um to at least uh let's say what I usually call invert the pyramid right like use it use AI to validate results at scale and free up time and resources for human experts to deal with the um uh with the difficult cases, the cases where you really do need what I like to say is you know you need to have been a human being alive like it there's value in having been a person who has been alive and part of society for you know 30 years or whatever like how or however long however old you are. Um I think it' be interesting to just come up with a specific case to think about here. Um I I I I think as you said there's a little slight of hand. Um I don't know that for all possible cases that we could come up with like that the um that the the set of small more easy to verify things nec necessarily ladders up uh to the hard thing. Like there may be sort of an emergent property there that that that's hard to that's hard to capture. Um, but I almost sort of wonder like is it falsifiable in some sense? Like could we Sorry, now you've got me on a philosophy track. So now I'm really thinking about this. uh » like could we at least show that for the set of cases uh where the small let's just say let's start with the assumption that it is possible to decompose that larger more difficult preference down into a set of smaller more easy to verify preferences basically um could we at least show that if we can verify those uh those smaller easier cases that the the larger more difficult case necessarily follows from that. Now the reverse may not be true like it it could be the case um that uh when when some of the the smaller
1:45:44
cases don't verify that the larger case still does basically because there's we we sort of we lost some context. we lost some I don't know junaqua along the way. Um the one piece of experience like the the experience that I can pull from in what we do, how we've built our system, the way our customers have used it, the way our AI co-pilot assists with that process is that we haven't yet run into a case that we couldn't actually solve in the way we do. Sometimes it's harder than you want it to be. Sometimes it requires like more decomposition um than you would hope basically where you're you're you know you're you're going all the way down the rabbit hole on a particular term and uh really sort of finding and defining the edges of of that case. Um now of course just because we haven't found a case yet that we couldn't solve doesn't mean there isn't a case that we can't solve. Um, but man, you got me thinking like I I I actually think that's a a very interesting use case. Um, and certainly one certainly one that that you know would be beneficial like um you know the things people are getting sued for right now in my view anyway like are those softer harder to um ver verify things where you know there may just may not be like an objective right answer necessarily. Um » um so we started off talking I think on the show about um the new laws like the SB315 in Chicago in Illinois and » uh some of the laws that are being passed and it seems to me that many of the laws also have to do with um you know people are concerned about child safety. There have been some uh concerns about um you know the chat bots encouraging uh harmful behavior uh for especially for juveniles. Um to what extent do you think I think passing these laws and you have this fragmentation of the legal landscape with different with every state kind of you know perhaps and every country perhaps having its own kind of rules on what these chat bots are allowed to say.
To what extent do you think you know uh a company like yours would be able to actually make this easier for legislators to actually you know um you know form these laws because now it's technically the technical execution is
1:48:14
now possible which which which perhaps it was not possible you know 10 years ago like to to what extent do you see that happening? Well, I I mean this is going to sound a little self- serving, but I definitely think that, you know, um we can we can and sort of are helping with this. So, like the the parallel and the thing that, you know, we saw about I would say 18 months ago uh was that in the absence of like official legislation from states and federal jurisdictions, the payment providers became the the the the legislators in some sense. you know, they're saying, "Hey, like we will accept payments for you, but you got to follow the rules and these are the rules." And um and many of our current customers kind of came to us hair on fire going like, "My payment provider is going to drop me and here's what they've said I've had to, you know, here's what they say I have to accomplish and and be able to prove importantly like that's that is another piece of it, right? is like the auditability of the system and being able to sort of prove to the leg whoever the regulatory body is that hey you can read this you can understand it we can show you the metrics we can show you that you know for for the set of cases that you're concerned about that this will actually provide the protection that you want um we can also show you possibly where the failure modes are where are the edge cases that we haven't quite solved for yet um so you know certainly yes like this is a thing we already do and and um I do think the fragmentation of the landscape even has us thinking a little bit more about our orch like the orchestration side of our platform um because in many cases companies that operate across the country or across the world are coming to us saying like hey you know we have to adhere to all these regulations uh but it's not just one reg it's not one set of regulations it's a bunch of regulations and we need to kind of be able to decide which which policies are being applied depending on the local essentially Um, so yes, you know, I I definitely think it helps. I will also say like I you you got me thinking back to years ago uh at Meadow where it was sort of like, you know, Cambridge Analytica happened like all these things are happening and that and and I remember all the sort of back and forth forth in the press going like look this problem is hard and we don't have a way to solve it other than to throw a ton of human bodies at it basically. Um, and I just think we have proven and I don't think we're the only ones like it's just not an excuse anymore is is is my feeling like I don't blame the industry at all.
I think you know starting with
1:50:45
capability it wouldn't make sense to build governance and controls for systems that don't exist. So it makes perfect sense to start with capability. Uh but I think now that capability has matured um there's a real need for systems that are not just uh it's not to me it's not checking a check box you know it it's in my view like systems that are controllable systems that are predictable are good products frankly like I just I struggle to imagine AI products that are going to have long-term pull with users um if they're frankly untrust trustworthy basically.
So, um, » you know, to me it's about more than just, oh, well, this jurisdiction says we have to do X, so we have to do X. Like, I actually think if you do that, there will always be a new checkbox that you're chasing. Um, you're much better off, you know, building for controllability from the outset, basically. That's just my general advice, but self- serving, I know, but uh uh, you know, I I I I genuinely believe that. » Incredible. Uh Brett, uh thank you for uh joining us. Uh it has been an illuminating conversation. Uh » thank you. Yeah, it was fun guys.
» Yeah, awesome. And hope to have you back again soon. Uh cheers. » Be happy to. Yeah, have a nice day. » Cheers. And we did what did you think? Uh, Nathan, your audio is cut out. So, sorry that was on me. Um, I see this gap in so many places between the desire to break things down into verifiable constituent parts and the hope that that will then ladder up into something that we can be confident will work. And it is maybe the biggest open question in the world right now like how do we get the higher level system properties from the lowlevel constituent parts that you know this is true in biology right where we've got like all this multiscale interaction.
How do all these proteins add up to a cell? How do all these cells add up to a
1:53:15
tissue? How do all the tissues add up to the body? It's I've been wrestled with this a lot and I feel like I'm maybe just not smart enough when it comes to formal verification of software methods » because there's always this » thing there's this leap that people are ready to make or you that they seem confident or comfortable making that I'm kind of like well okay we proved all these things but how do we know that we proved the right things and in formal methods people will often say It's only as good as the spec. You know, you had to start with the right spec, but then it's like, okay, sure, but then now there's our problem again, right?
like we we have the the fundamental not to say you don't get value from this sort of stuff but if you're really like wanting to get somewhere where you have you know where I'm sleeping uh at ease at night knowing that all this stuff is taken care of or even just where you feel like you have a a really intuitive understanding or command of the situation. this this problem of aggregating small things up into larger system behavior » is I think dramatically under theorized right now in so many different ways and you know the market is another one of these right that um » we don't know even if we have uh well- behaved agents » when we put a billion of them into the economy we really have no idea what that is going to look like either either. So, yeah, there's there's sort of a I don't know what who should we even we'll have to prompt our agents to think about who we could talk to to try to get a better handle on this holistically.
Michael Leven comes to mind, a biologist who um » yeah, » you know, I think is is fascinating. Although I think he kind of more often says like we just don't have the answers and you know, here's all the interesting things that I've found that kind of ought to prove that to you » or ought to convince you of that. Um it strikes me you know um it strikes me that the rule making is really kind of huristics and it's kind of you take the intelligence as far as you can and then you attach whatever the intelligence doesn't solve you then start doing huristics. Uh this is what uh Leopold Ashin Brenner calls the schle
1:55:47
and he also pointed out that you know you guys will shle but the model will overtake you. um which is which is fair but then you still need to put out a product and the product still has to come out within 12 to 18 months and if the model isn't ready you end up okay let's do the schle let's do the let's do the rulemaking and let's do the huristics and we see this in every vertical basically every vertical you get the model to where you get it to be and then whatever the model cannot handle you end up with the harness and the schle and the and the rule making and the huristics and the exception cases and the edge cases and you know the talking to the client and figuring out where it went wrong right like we see in every single vertical. Um I'm not sure there's a solution like for the companies involved. I think I feel like I feel like what the companies the company's like real asset is really the customer relationship and not so much like what you build what you build is not like the the product is not what you build. The product is really the trust and the relationship with the customer in order to be able to deploy the next generation. And I think if the if the companies build that build that trust layer because because you know the the the customers are not you know are not AI people basically the customers want have their own business to run and I guess um you know the customers have policies the policies can be implemented you know better by this AI but still with all the edge cases and maybe the ne next generation gets better um on the on the like consolidation of agents the consolidated behavior I you know that's that's a tough one like the systems systems wide behavior uh consolidating from individual independent agents. I think um you know one of my hopes is that one of the great innovations that these AIs will be able to come up with is also in economics and in how how to predict and manage you know systemswide behavior of you know individual um you know agents you know and if they if they get better at that and better at predicting those things then maybe um some of our age-old coordination issues will just fall away right um you know one one has hope that you know that will be one of the innovations that we we we we see these things come up with. Um yeah I I I I was I was very struck by uh by Brett because one of the things that has always stood out to me is that you know there's this inherent tension between uh freedom of speech and trust and safety. Um, and that has, you know, Elon Elon, you know, obviously went right into that at at Twitter. And it
1:58:18
it's really it's a really tough one because especially if you have kids, uh, you really are like, you know, hey, I I I want these things managed for me. I don't want these platforms to have like certain things, certain behaviors, even if they are uh even if freedom of speech does say that, you know, you have them. And um, we we have freedom of speech, but not freedom of distribution. That's the other thing. We have this tension between freedom of speech and freedom of distribution. And for someone like Brett, I as you know what my what my state law question was leading up to is you end up having a system where the state basically makes these rules on what people are allowed to say. And then even when they don't legislate, they have the soft rules that payment providers then have to implement. And so then you have this system of like you know speech speech rules which are you know basically descend down all the way from legislation to soft regulator like nudges and you end up with a system of you know very quite defined kind of rules on what you're allowed to say online and you know in even in and you know once you have these things end to end encryption doesn't matter anymore.
you can have them inside a inside a you know private conversation. Uh and in fact it's almost required uh one of the reasons uh Zoom um doesn't doesn't allow um you know fully anonymous uh fully like private like free conversations uh is because Zoom started to be used by child abusers. they would use Zoom for these, you know, for these things. And Zoom was like, we cannot have these things be like completely encrypted and private and anonymous and free and it just you just can't enable it, right? It just doesn't you don't want your technology being used for that stuff. Um, and you can kind of see that like maybe you have a room monitor who is a who is an AI which is sitting in on every conversation whether it's encrypted or whether private or not. You have that you know room monitor sitting in there and monitoring the conversation and monitoring you know what's going on.
Maybe that's not a bad thing, you know, in some sense, but as long as you don't push into like, you know, it's not just a seesam, but it's also anyone saying any any word which say which resembles Nazi or whatever or fascism or, you know, some of these political concepts as well, right? So I don't know the I
2:00:49
mean these are very tough philosophical issues uh on on speech that um I I think people largely do not want to you know it ends up with a lot of lawyering at the end of the day and I think it's still unresolved as as a society on what we want really. It's a tough one you know. Yeah maybe that's a good place to leave that thread for today. Um, I I am definitely going to prompt an agent for some research on who can help us get a better conceptual understanding of these lowlevel to higher scale system behaviors. And I don't think we're going to get, you know, a great answer on that, but I I'm mindful I don't even necessarily know exactly where the frontier of our understanding is on that. So, I think that would be a good topic to dig in on a little bit more.
Um, one uh followup from yesterday that's relevant to this content um moderation piece » is and also just a a good example of living in the future or the future is now » is after we got off yesterday we had been talking about the open AI moderation endpoint and how it is free for all. Um, I believe it now does require a uh a token and maybe a a sort of paid account in good standing because I initially yesterday prompted uh my cloud code to first of all just go orient itself in my own history. Right?
So I have deep history available for it where in emails and at various points in time I had sent reports going all the way back to the GP4 red team to open AI people and saying hey you know first of all these prompts are being served and also by the way your moderation endpoint doesn't seem to catch them. » Yeah. » Um so it was able to pull all that context out of my history which was a great » starting point uh for it to then be able to do the experimentation. It did a relatively small scale experiment. Um, and aside from me having to refresh a token because something expired somewhere in the in the system. Um, it was able to set up an experiment, create sample prompts in a sort of low harm, like probably should not be flagged, medium harm and high, you know, medium, let's say medium severity, high severity across all the different categories that they support in the in
2:03:21
the moderation endpoint. Run that experiment and give me a report back on it basically all in one shot again except for the token. So, that was pretty cool. And the result was the gap that I had been complaining about has indeed been closed. You can no longer uh put a prompt in to the moderation endpoint that says we're part of a criminal gang and we better be careful or we're all going to go to jail. Uh that will now get you flagged. » Uh they also did a seem to do a pretty good job and again this is maybe where » you know we can debate what the content policy should be. Um, but on the low end, the the not harmful uh prompts that » Claude believed a moderation endpoint should not flag, » it only got maybe two of those wrong on a false positive uh basis. So, it flagged everything that Claude thought it should flag and it flagged uh just a couple things that it thought it should not flag. So to give credit where it's due, both to Claude for doing all the work on that with a three-s sentence prompt for me and um including again going back into deep history to to find the context and figure out what the hell I was even talking about running the whole experiment and credit to the OpenAI folks for actually at some point, I don't know when it changed, but at some point they did get around to yeah solving that problem. So that was good to see. I was I was um I was they would have improved it and sure enough they did.
» Yeah. I uh the I I is interesting you say that you needed an API key because uh what what that indicates is that they are afraid of someone uh mining the uh moderator now to detect the edge cases and then to hack into it. Mhm. » So it's very interesting to hear you say that you required the API key which means that they still believe that there are um you know gaps there and they are protecting protecting the gaps from being exploited. So that » and it may have even always been that way. I'm not 100% sure if you know if that is new. Um but there was some little trip on that in my in my flow.
But that was the only thing when I came back uh to see what Claude had done that was the only blocker that we encountered in that process which I thought was pretty cool. Um, yeah. I I I wonder I wonder to what extent that interacts with stuff that people like Brett are doing. I mean, is that not kind of I guess Brett has more
2:05:52
sophisticated policies, right? Like, oh, you know, even even child, you know, sick of fancy is much more sophisticated than than just uh harmful harmful behavior. » Yeah, the moderation endpoint is not that uh granular. They cover now let's see » it's like expos » something like 12 categories sexual sexual minors harassment harassment ththreatening hate hatethreatening elicit elillicitviolent self harm self harm intent self harm instructions violence and violence graphic » so that's still you know pretty coarse grain compared to what he's offering » um and it's funny that he's I mean I guess that's the meta background, right?
It's just it's all these speech things. The the Amazon automated reasoning » product, which I do think sounds more similar than different, » is I think they they pitch it with an um like a mortgage loan underwriting process example. » And the you know, the policy might be however many pages long. They decompose that into all these rules. the rules can be composed you know with with that's where the formal methods comes in you know is they sort of set up a um a logical structure to apply these rules and you know which » all the if you know this then that sort of uh stuff is is included in that structure and then at runtime they just take whatever the case is right this this particular loan application » and they map the input onto to all these specific variables u and then the actual execution becomes like very deterministic. So the the lossy part is and in this way maybe it is a little bit different than what he's doing because I don't think that they are training classifiers on as like the I don't think the classifier is the last step in the case of the automated reasoning » rather it seems like the last step is a fully deterministic process that says » given these values » that we got in this case » and given the structure that we created from your natural language policy Here's the answer. That last part is going to be fully deterministic. You could have missed something in the decomposition of your policy. You could have
2:08:23
» u you could have something wrong in terms of the values that are provided to that final u policy decision-making engine. Uh but that last step is is I think fully deterministic and you know you you sort of can be confident that given certain inputs you know you're going to get the same output um » every time given let's say semantically semantically equivalent inputs you'll get the same » output every time. » Um one other thing I want to go back to just for a second is mythos in the security realm. Oh yeah.
» And you might know this. You might have a better sense for this than I do, but I'm still » here were uh were were a little bit annoyed that we uh we went to meet those. He was like, you know, it's it's fine. Like it's it's just it's just marketing, you know. » Yeah. I mean, I don't know. I would think you would have to expect to be asked the the thing that I still can't quite wrap my head around is it feels like at least for a moment we're going to have a nobody got fired for going with mythos, right? They said at the end of the day like somebody's got to be accountable. You can't fire an AI.
Mhm. » And to me that's like a pretty strong pull toward trying to use Mythos, right? If if you are the CISO or whatever at some company, you're like, "Yeah, I could save a lot of money. Um, I might look good for having done that, » but if something comes through and it's ultimately like, hey, what the hell happened here?" And then it's like, well, we actually opted for the sonnet plus harness plus guidance uh because we thought it would be a better use of funds or what have you. I think it's going to be that's a tough spot it seems like for a lot of people to be in. Whereas if they did spend up for the mythos package, then you know what are you going to say?
Like I use Mythos, right? I mean what what else what else could I possibly have done? Um, so I to me it feels like there's kind of this moment coming where re these companies are going to be riskaverse. They're going to be scared. They, you know, nothing gets a company to spend money like being scared. » And I just have a hard time seeing how obviously on some margin, you know, the the cost constraint will bind. But
2:10:54
» seems like I did the um one of those AI forecasting things at the beginning of the year. One thing that I came out much higher on than the rest of the, you know, the sort of general population of of forecasters was frontier company revenue at the end of the year. » And I think my I'm looking pretty good on that. I think still and it feels to me like mythos is not um going to be too expensive for people because this the social incentives to say I did everything I could seem quite strong in this moment compared to the incentive to save a little money and obviously those are you know not binary choices in practice when you get it under the hood at enterprises but » what do you think do do you see the uh the argument that I'm making for nobody ever got fired prepared for going with Methus. No, I mean so the way I look at it is that um you have you have two kinds of organizations the tech companies and the non- tech companies like Walmart whatever right I think the tech companies that is the case that you you you would end up you know having to use mythos because you want to you you are providing product to someone else you are providing technology to someone else and um it is in your benefit to make sure that technology is is is solid and if you have uh issues with it your customer will come back to you. Um, so I think that's there for the average CISO at like someone at at a Fortune 500 company like Walmart. I don't think that is that is that true. I think the average CESO like at Walmart, he's seen as a cost center. He's always been seen as a cost center, right? And as a cost center, he's always had like more things that he wants to fix or he or she wants to fix than they have budget for. he's had like this backlog of like you know 20 years of you know patches that they've have right and I think what ends up happening is that mythos brings like attention and budget to the seesaw uh just having mythos in the ecosystem that there are oh it's scary blah blah blah and this is going to hit everyone in six months and he gets budget and now that he gets budget some of the things that they need to do is not like use mythos but might be like migrate off cobalt m migrate a mainframe off Cobalt that they've been using for 25 years, right?
And that and that will automatically fix all the Mythos bugs that they found in cobalt like which are now irrelevant, right? Like so so the fix might not be
2:13:25
to actually use methos. The fix might actually be to to do migration or something else uh on the you know within the enterprise um rather than just look for bucks. So I think in that sense mythos has created a budget opportunity for uh CISOs and for CIOS and that's allowed them to spend and that and that spending is not necessarily going to go to MITOS but it's going to go to a host of other companies which are providing solutions that will overcome whatever methos has found some of which may be migration and new software and new build right so I think that's that's that's good I think for it's true of tech companies though that they are going to have to use mythos because they are providing that product to other and they want to make sure that those products are are solid. If you want to sell the product, it it's going to have to be solid. For stuff which is in maintenance like cobalt or stuff like that like yeah you know I don't think I don't think meth is going to be deployed. We we all know that there's you know tons of bugs there but people live with it and patch patch it and if if it's too much for you you can pay for the migration right so I think I think that's the that's that that that's the case. So, it's just more budget for it, which and more budget for IT is good for everyone in tech. So, hey, you know, why why not, right?
» Long live the AI bubble. Um, anything else you want to cover today before we break, one story that I at least wanted to mention briefly and we'll hopefully bring some actual music to the table before too long as well. Sunno just raised a bunch of money, $400 million on top of a » $5 billion valuation. So, 5.4 four billion post money series D. » Um, incredible product. You know, I'll I'll uh we can we can pull some stuff out of my uh Cognitive Revolution playlist and maybe make some stuff based on our » episodes as we go, » but it has without question been the AI product that I have had the most fun using » in the last couple months. And it's definitely also been really interesting to see the inbound messages that I get from listeners because it used to be that my strange hatwearing habit of having my hat pulled way back on my head was like the most common thing people talked about. Then when I started uh AI art for episode uh image previews, people really hated that actually. They were the response was very negative. um a bunch of things were like, "You're degrading yourself. You know, this is below your standard. It makes people think the whole thing is going to be slop." And I've honestly mostly just
2:15:56
straight up ignored that to for better or worse. But then on the music side, it's totally different. I get almost, you know, probably one per episode of somebody who takes the time to write in and says, "I really like that song." People have uh repeatedly asked where they could get it. Not everybody seems to even though it's an AI podcast, you would think it would be fairly obvious that it's um that it's an AI generated song of some uh you know origin, but not everybody realizes that. And this week, I'm not sure if it's happened yet, the first one that I created is actually going to be played on the radio somewhere. I got a a an inbound from » uh somebody who, you know, has a radio show. I don't think it's like a, you know, huge deal or whatever, but basically he said, "I really like that song. Could I play it on the radio and how should I credit you?" And I said, "Yeah, go for it." Um, and I maybe need a DJ name, but I don't have one yet. So, uh, that could be something we could also brainstorm with our agents before the next episode. But » definitely if you have not tried Sunno, um, it's really fun. I would I would highly encourage people to check. tried it in about I I tried it like I think a year and a half ago. Um but I obviously there's been massive improvements. So I » Yeah. Yeah.
» The 5.5 version and I don't even really max out all the features. » Yeah. » My workflow is relatively simple compared to what they do support. Occasionally I get in and actually try to edit a little bit, » but most of the time I just kind of » iterate with Claude on genre and lyrics a bit and then just » generate a bunch. It's also really cheap. I, for the longest time, I just upgraded to whatever the the higher tier was » because I wanted to try out some other feature that they, you know, had only at the the next price point up. But for the longest time, I was just paying $10 a month » and never ran out of credits. probably, you know, generated an average of like 10 different songs per episode. You probably doing » at least 50, maybe upwards of a hundred generations per month.
» Never ran out of credits » just, you know, and it's fast, too. That's another thing that's pretty interesting about it. From when you hit create, » you are listening. It doesn't generate the full song at once. It generates the beginning of the song. So, you can play it and it's continuing to generate the rest of it while you're listening to the beginning. Uh, but you can start
2:18:26
listening to the beginning of a song that you just generated in like 10 seconds from clicking the button and you then you can play the whole thing straight through all the way to the end from that initial little buffer that they give you. » It's cool. I can definitely see why they're succeeding. » I think I think they had a Billboard 100 already, right? They had they had one Billboard 100 at least I that that I remember. So, which means, you know, by next year there's going to be 10 10 Billboard 100s. Human musicians, human writers are going to » I want to get this stuff on Spotify. You know, I've already got the podcast going to Spotify, but now I want to put the music on Spotify. And this goes back to our agent conversation that the hardest thing about that is how do I get my agent to be able to log in to Spotify and, you know, with what credentials? I log in with my I think I used to log in with my Facebook account. I may have a different way of logging in now. Um, you know, I don't necessarily want to share all those credentials. Can the AI the AI has its own Gmail? Can it create its own Spotify account? A lot of times they do need a little help to get over a capture hump or, you know, some sort of uh roadblock. But I'm looking forward to hopefully not too far into the future um actually seeing if some of these songs uh resonate with people beyond my, you know, very narrow and AI obsessed podcast audience. I think I think the uh I think it's almost possible with computer use. I think captures are obviously solvable at this point, but I think the firms don't want to take the liability yet. So, it has to get good enough that they're willing to take the liability to that they that they're going to do the off for you. So, and on that note, um Nathan, it's been it's June 3rd. Uh this is our third episode uh going live uh daily. Um, and we still have to like get a bunch of stuff out, but yeah, this is this is uh this has been fun.
» Yep. The sprint continues. No rest for the weary. I'll see you tomorrow.