Discussion about this post

User's avatar
Tedd Hadley's avatar

See also https://www.dwarkeshpatel.com/p/leopold-aschenbrenner

> When he says “we just need to teach the model a sort of System II outer loop”, there’s an awful lot hiding in that “just”. The entire argument is full of handwaves like this.

The key impression I got from all this is that Leopold has access to the private conversations in top-tech companies, the off-the-record discussions, and most importantly the mood. My conclusion: they are not discouraged in any way. We might be discouraged about the lack of agents right now and the huge challenge of System II, they are not.

And that can only be because they have two or three things working now, two or three things to try next to improve that, and two or three things to try after that. That's the way I get optimistic on a problem, when the solution tree is boundless. I think Leopold's optimism reflects the optimism of the top engineers at the top AI companies.

What I suspect is going on is that OpenAI and others are already sitting on huge discoveries that they would prefer never see the light of day. Secrecy and closed AI is quickly becoming the new norm, Leopold was just a bit early seeing (and pushing) the importance of that.

(This is of course very disturbing to me that so much important technology advance could be done in complete secret for the next decade or more, leaving us in the dark about the true capacities of AI models.)

On System II unhobbling was it just handwaving or did he, in fact, give away a bit of why this might not be such a large problem? Consider:

From the website:

"But unlocking test-time compute might merely be a matter of relatively small “unhobbling” algorithmic wins. Perhaps a small amount of RL helps a model learn to error correct (“hm, that doesn’t look right, let me double check that”), make plans, search over possible solutions, and so on. In a sense, the model already has most of the raw capabilities, it just needs to learn a few extra skills on top to put it all together. "

From the interview:

"In the short timelines AI world, it’s not that hard. The reason it might not be that hard is that there are only a few extra tokens to learn. You need to learn things like error correction tokens where you’re like “ah, I made a mistake, let me think about that again.” You need to learn planning tokens where it’s like “I’m going to start by making a plan. Here’s my plan of attack. I’m going to write a draft and now I’m going to critique my draft and think about it.” These aren’t things that models can do now, but the question is how hard it is."

From the interview:

"What humans do is a kind of in-context learning. You read a book, think about it, until eventually it clicks. Then you somehow distill that back into the weights. In some sense, that's what RL is trying to do. RL is super finicky, but when it works it's kind of magical."

I submit that this confidence comes from secret conversations and direct experience: the AI companies are doing RL on planning tokens.

> For my part, both paths look perilous, so I hope to hell that ASI is a lot farther away than 2030.

Agreed. While taking Leopold's predictions seriously could be dangerous, ignoring them could be dangerous too. We need more time!

Expand full comment
Thalia Toha's avatar

Steve- I never thought that situational awareness would be something we talk about in the context of technology, let alone life, so this is very informative. I appreciate you sharing. Hope you're well this week. Cheers, -Thalia

Expand full comment
4 more comments...

No posts