A Response to "Situational Awareness"

I think the scenario is wrong; let's come up with a plan that works either way

Jun 06, 2024

Leopold Aschenbrenner, who until recently worked on the “superalignment” team at OpenAI, believes that the United States should go on a wartime footing to develop advanced AI and use it to take over the world – so that China doesn’t get there first.

I believe this is a fair summary of his recent 165-page treatise on the future of AI, “SITUATIONAL AWARENESS: The Decade Ahead”. I also believe that his assumptions are wrong, and his conclusions dangerous.

In this post, I’m going to summarize the paper, and explain where I think it goes wrong. I just published 2500 words explaining why it’s better to talk to people than to argue with them on the Internet, so: Leopold1, if you’re reading this, please message me here on Substack, I’d love to talk; in the meantime, I’m going to try to get in touch with you. And instead of just disagreeing, I’m going to finish by presenting some thoughts on how to best move forward given that we should both be humble about our ability to predict the future.

Warning to regular readers: compared to most of what I publish here, this piece is more hastily written, and aimed at a narrower audience. As a rule of thumb, if you hadn’t already heard of the paper I’m discussing, you might not be interested in reading this.

Overview

I know, I know, if we already have robots like this, the race is nearly over. Don’t overthink it.

In Leopold’s view, the next half-decade or so will be the most decisive period in human history:

The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace many college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be unleashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.

And he sees it as his duty to sound the alarm:

Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.

I’ve speed-read the paper, and listened to the first hour of an interview Leopold gave on the (always excellent) Dwarkesh podcast. Here is my crude summary of his thesis:

AI capabilities will continue to improve until we develop ASI2.
Whoever controls ASI will wield overwhelming power – economically, militarily, and otherwise.
Thus, whoever gets there first will be in a position to rule the world – potentially, in perpetuity.
It would be bad if this were the wrong people – for instance, China.
The only way to avoid this is for the US and its allies to get there first.
All of this is likely to play out by 2030.
Achieving ASI, especially at that pace, will require multi-trillion-dollar investments in chips, electricity generation, and other necessities.
Hence, we (again, the US and allies) must go on a wartime footing: marshaling resources, suspending environmental protections, imposing strict security requirements, and sprinting to ASI.

I can’t overstate how bold the claims are. The paper contemplates data centers costing a trillion dollars each; doubling the pace of fracking in the US so as to obtain natural gas to generate the necessary electricity; immediately placing all advanced AI research under military-bunker-level security; and many other steps of a similar nature.

Most Of The Logic Actually Makes Sense

As a syllogism, the overall thesis is sound. If you assume claims 1 through 7 above, then claim 8 does pretty well follow. So let’s consider each in turn.

AI capabilities will continue to improve until we develop ASI: I am on record as believing that this is virtually inevitable. Here’s a piece I wrote a little over a year ago: Get Ready For AI To Outdo Us At Everything.

Whoever controls ASI will wield overwhelming power – economically, militarily, and otherwise: Leopold’s argument is that a fleet of ASIs will yield astounding progress in essentially every field. Importantly, this will include AI research, robotics, material science, manufacturing techniques, energy, weapons design, and military strategy. Advanced robots, controlled by superintelligent AIs, will quickly be able to build the mines and factories to build more robots, and pretty soon we will have a massive supply of advanced equipment of any sort – including military hardware.

I think Leopold massively underestimates the complexity of such an undertaking, and as a result he’s much too optimistic about the speed at which this could play out. But the fundamental logic seems sound. We are already on track to build decent robot bodies within a matter of decades, if not sooner; superintelligent AIs would certainly be able to finish the job. And the rest follows straightforwardly enough: given millions of superintelligent robots – which can build billions more – the question is not whether we would see widespread progress, but merely how quickly things would proceed.

(If you don’t think an artificial superintelligence could do all these things, then I’d ask you: what is a specific thing you don’t think an ASI could do? Could a large crew of superintelligent people do that thing, assuming they are also strong, dextrous, dedicated, and brave? If the answer is "yes”, then actually what you’re objecting to is the idea that we will ever create ASI.)

Thus, whoever gets there first will be in a position to rule the world – potentially, in perpetuity: if you have massive factories churning out military hardware, more advanced than anything else in the world, all controlled by AIs that are faster and smarter than anything else in the world, then yes, I think you have the option of exerting global control. Leopold takes a strong view of this:

It seems likely the advantage conferred by superintelligence would be decisive enough even to preemptively take out an adversary’s nuclear deterrent. Improved sensor networks and analysis could locate even the quietest current nuclear submarines (similarly for mobile missile launchers). Millions or billions of mouse-sized autonomous drones, with advances in stealth, could infiltrate behind enemy lines and then surreptitiously locate, sabotage, and decapitate the adversary’s nuclear forces. Improved sensors, targeting, and so on could dramatically improve missile defense (similar to, say, the Iran vs. Israel example above); moreover, if there is an industrial explosion, robot factories could churn out thousands of interceptors for each opposing missile. And all of this is without even considering completely new scientific and technological paradigms (e.g., remotely deactivating all the nukes).
It would simply be no contest.

Even with a large technological lead, I would be, let us say, nervous about attacking a nuclear power. Not to mention the possibility that that power might decide to use the nuclear threat to force us to stop short of ASI. But there’s no denying that if one country develops ASI well before any other, they would have a profound advantage.

As for the “in perpetuity” part:

A dictator who wields the power of superintelligence would command concentrated power unlike any we’ve ever seen. In addition to being able to impose their will on other countries, they could enshrine their rule internally. Millions of AI-controlled robotic law enforcement agents could police their populace; mass surveillance would be hypercharged; dictator-loyal AIs could individually assess every citizen for dissent, with advanced near-perfect lie detection rooting out any disloyalty. Most importantly, the robotic military and police force could be wholly controlled by a single political leader, and programmed to be perfectly obedient—no more risk of coups or popular rebellions. Whereas past dictatorships were never permanent, superintelligence could eliminate basically all historical threats to a dictator’s rule and lock in their power (cf value lock-in). If the CCP gets this power, they could enforce the Party’s conception of “truth” totally and completely.

(I’ve written about this idea myself.)

It would be bad if this were the wrong people – for instance, China: I don’t have anything insightful to say about this.

The only way to avoid this is for the US and its allies to get there first: this is one of the places where I disagree with Leopold. I’ll say more about this below.

All of this will play out by 2030: this is another point of disagreement. Again, more below.

Achieving ASI, especially at that pace, will require multi-trillion-dollar investments: if you assume that world-shaking ASI is coming by 2030, then all the other seemingly fantastic elements of Leopold’s scenario – trillion-dollar data centers, doubling US fracking, and so forth – start to look pretty reasonable, or at any rate necessary. However, I consider this to be moot, since I don’t think ASI is coming by 2030.

Hence, we (again, the US and allies) must go on a wartime footing: again, if you accept the earlier premises, then this does seem to follow, but I don’t accept those premises.

No, We Won’t Have ASI By 2030

I can’t overemphasize how quickly Leopold thinks things will go from here. Some snippets:

To put this in perspective, suppose GPT-4 training took 3 months. In 2027, a leading AI lab will be able to train a GPT-4-level model in a minute.

We are on course for AGI by 2027. These AI systems will basically be able to automate basically all cognitive jobs (think: all jobs that could be done remotely).

In fairness, I should note that he does acknowledge that the pace of progress is uncertain:

To be clear—the error bars are large. Progress could stall as we run out of data, if the algorithmic breakthroughs necessary to crash through the data wall prove harder than expected. Maybe unhobbling doesn’t go as far, and we are stuck with merely expert chatbots, rather than expert coworkers. Perhaps the decade-long trendlines break, or scaling deep learning hits a wall for real this time.

But he generally speaks as if skyrocketing capabilities are pretty much inevitable:

In any case, do not expect the vertiginous pace of progress to abate. The trendlines look innocent, but their implications are intense. As with every generation before them, every new generation of models will dumbfound most onlookers; they’ll be incredulous when, very soon, models solve incredibly difficult science problems that would take PhDs days, when they’re whizzing around your computer doing your job, when they’re writing codebases with millions of lines of code from scratch, when every year or two the economic value generated by these models 10xs. Forget scifi, count the OOMs: it’s what we should expect. AGI is no longer a distant fantasy. Scaling up simple deep learning techniques has just worked, the models just want to learn, and we’re about to do another 100,000x+ by the end of 2027. It won’t be long before they’re smarter than us.

I’m not going to bother nitpicking his analysis in detail. Forecasting AI progress more than a year or two out is incredibly difficult, and Leopold appears to be doing his earnest best. However, the result is full of credulous intuitive leaps. He waves away deep shortcomings of current models as due to the models being “hobbled” in ways that he believes are easily fixed:

With simple algorithmic improvements like reinforcement learning from human feedback (RLHF), chain-of-thought (CoT), tools, and scaffolding, we can unlock significant latent capabilities.
But unlocking test-time compute might merely be a matter of relatively small “unhobbling” algorithmic wins. Perhaps a small amount of RL helps a model learn to error correct (“hm, that doesn’t look right, let me double check that”), make plans, search over possible solutions, and so on. In a sense, the model already has most of the raw capabilities, it just needs to learn a few extra skills on top to put it all together.
In essence, we just need to teach the model a sort of System II outer loop that lets it reason through difficult, long-horizon projects. [emphasis added]

When he says “we just need to teach the model a sort of System II outer loop”, there’s an awful lot hiding in that “just”. The entire argument is full of handwaves like this.

I won’t bother picking it apart point by point. I will just list two of the biggest reasons I believe his estimates are wildly over-optimistic.

First, his timeline depends on an extraordinary number of things all breaking in the right direction, from AI progress to buildout of data centers, electricity generation, chips, and other necessities; trillions of dollars of investment capital arriving on cue; recursive self-improvement3 having a massive impact (it might not); advances in robotics, manufacturing systems, and all of the other fields discussed earlier; and the successful, breakneck-speed adoption of AIs and robots throughout the economy (seems like a stretch). In prospect, it’s easy to overlook the many practical details that cause a complex project to bog down, no matter how much energy and urgency is brought to bear. And this would dwarf the scale and complexity of the Manhattan Project. There’s a reason that so many sayings from computer science and engineering have the same message:

The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.

Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law.

And my personal favorite:

The Programmers’ Credo: we do these things not because they are easy, but because we thought they were going to be easy.

Second, he drastically oversimplifies the task ahead. For instance, he states that GPT-4 is as intelligent as a “Smart High Schooler”. But in actual fact, GPT-4 is greatly superior to a high-schooler in some ways, and falls hopelessly short in others4. The results don’t average out. For instance, a 1924 automobile was much better than me at traveling in a straight line on a paved road, but much worse at navigating an obstacle course. The subsequent 100 years of progress have not resulted in an automobile that can now tackle the obstacle course. I wrote about this tendency to overstate progress toward AGI in The AI Progress Paradox, and I say more about why I don’t think further scaling alone will get us to AGI in What To Expect When You’re Expecting GPT-5.

I’ll conclude this section by noting that there are nice ideas throughout the paper. For instance, I like this argument that there may be room to extract a lot more value from existing training data:

But perhaps, then, there are ways to incorporate aspects of how humans would digest a dense math textbook to let the models learn much more from limited data. In a simplified sense, this sort of thing—having an internal monologue about material, having a discussion with a study-buddy, trying and failing at problems until it clicks—is what many synthetic data/self-play/RL approaches are trying to do.

Is Racing To World Domination Our Only Hope?

If ASI isn’t isn’t coming anytime soon, then this entire discussion is moot. But if it is coming?

Leopold’s core thesis is that to prevent China from using ASI to achieve world domination, we need to achieve world domination first. But there’s another path forward: a negotiated agreement not to unilaterally develop ASI.

Leopold, like many before him, points out that this would be very difficult:

Some hope for some sort of international treaty on safety. This seems fanciful to me. The world where both the CCP and USG are AGI-pilled enough to take safety risk seriously is also the world in which both realize that international economic and military predominance is at stake, that being months behind on AGI could mean being permanently left behind. If the race is tight, any arms control equilibrium, at least in the early phase around superintelligence, seems extremely unstable. In short, ”breakout” is too easy: the incentive (and the fear that others will act on this incentive) to race ahead with an intelligence explosion, to reach superintelligence and the decisive advantage, too great.

He’s absolutely correct, a successful and effective treaty would be very difficult to pull off. It will be hard to reach agreement, and enforcement and monitoring pose serious challenges. However, many voices in the AI community believe that aligning a superintelligent AI – that is, making sure that it doesn’t decide to wipe out humanity, or lead to some other highly regrettable monkey’s-paw outcome – appears to be super difficult, double secret difficult. Leopold more or less waves this away, saying that we can “muddle through”:

I’m incredibly bullish on the technical tractability of the superalignment problem. It feels like there’s tons of low-hanging fruit everywhere in the field.

This is a bold claim, given that up to this point no one has even managed to construct a chatbot that will reliably refuse to provide napalm recipes. His plan for alignment is vague, and does little to address the many deeply-thought arguments that have been presented to make the case that it will be difficult.

Probably the most frightening thing about the entire proposal is that it depends on getting alignment right on a very fast time scale:

But I also want to tell you why I’m worried. Most of all, ensuring alignment doesn’t go awry will require extreme competence in managing the intelligence explosion. If we do rapidly transition from from AGI to superintelligence, we will face a situation where, in less than a year, we will go from recognizable human-level systems for which descendants of current alignment techniques will mostly work fine, to much more alien, vastly superhuman systems that pose a qualitatively different, fundamentally novel technical alignment problem; at the same time, going from systems where failure is low-stakes to extremely powerful systems where failure could be catastrophic; all while most of the world is probably going kind of crazy. It makes me pretty nervous.

It makes me nervous too! I’d like to find a way to avoid it!

Leopold notes that if we get to the threshold of ASI and aren’t confident that we’ve solved alignment, we can pause for a few months while we finish the alignment work. He doesn’t address the question of what happens if solving alignment takes longer than our lead over rival nations – a lead which could easily be measured in months. He also doesn’t address the possibility that uncertainty as to the size of our lead could lead us to take excessive risks – nor the fact that, by racing ahead, we will be pushing both ourselves and our rivals to give short shrift to safety.

In short, Leopold assumes heroic steps and optimistic outcomes toward building, scaling, and aligning ASI, but does not allow for heroic steps or optimistic outcomes toward a treaty.

For my part, both paths look perilous, so I hope to hell that ASI is a lot farther away than 20305.

Where Do We Go From Here?

In Leopold’s scenario, the stakes could not be higher:

Our failure today will be irreversible soon: in the next 12-24 months, we will leak key AGI breakthroughs to the CCP. It will be the national security establishment’s single greatest regret before the decade is out.

Perhaps the single scenario that most keeps me up at night is if China or another adversary is able to steal the automated-AI-researcher-model-weights on the cusp of an intelligence explosion. China could immediately use these to automate AI research themselves (even if they had previously been way behind)—and launch their own intelligence explosion. That’d be all they need to automate AI research, and build superintelligence. Any lead the US had would vanish.
Moreover, this would immediately put us in an existential race; any margin for ensuring superintelligence is safe would disappear. The CCP may well try to race through an intelligence explosion as fast as possible—even months of lead on superintelligence could mean a decisive military advantage—in the process skipping all the safety precautions any responsible US AGI effort would hope to take.

He recommends aggressive steps for the US and its allies to develop and control advanced AI. These steps would be risky, destabilizing, and massively expensive. If his assumptions are correct, that might be unavoidable; if they are incorrect, it should be avoided. What to do?

Extraordinary claims require extraordinary evidence. Before we undertake such drastic steps as suspending climate change mitigation, doubling US production of natural gas, or moving “all research personnel” into bunker-like SCIFs6, we should verify that we really are living in the scenario Leopold describes. Here are three assumptions he relies on, which I believe are importantly wrong:

ASI (and its physical-world implications) is likely to arrive by 2030
We can confidently align ASI on that schedule7
An effective international treaty is impossible

In this light, I would suggest that Leopold articulate the following:

What predictions can he make about the next 6, 12, 18 months, that (if they bear out) would support his estimate of rapid progress toward ASI? For instance, can he propose a way to validate the potential of “unhobbling”?
What predictions can he make that would support his expectations regarding the feasibility of aligning AGI and ASI?
If his scenario is correct, what steps need to be initiated most urgently? How can those steps be designed so as to minimize their impact (financial, political, diplomatic, and otherwise), especially in the near term (before evidence begins to accumulate that his scenario is correct)?
What steps can be taken to shed more light on the feasibility of rapid ASI, feasibility of alignment, and difficulty of a treaty?

I would especially encourage Leopold to articulate “no-regrets” moves that will be worthwhile whether or not his scenario is correct – such as increased investment in AI safety and alignment research.

In short, if the future really is on the line and drastic measures are required, it’s not enough to paint a picture. You need to articulate priorities, and prepare to marshal increasing levels of evidence to support increasingly drastic steps.

And Leopold, as I mentioned at the beginning: let’s talk!

Apologies for referring to you on a first-name basis when we’ve never met, but that’s how it’s done nowadays, right? Last names seem awkwardly formal.

That is, Artificial Superintelligence – systems that are vastly superior to human intelligence at essentially any task.

I.e. the idea that, as AI approaches and then exceeds human-level intelligence, the AIs themselves will begin to drastically accelerate progress in AI research.

In a later section, Leopold himself notes a similar phenomenon:

This is because AI capabilities are likely to be somewhat spikey—by the time AGI is human-level at whatever a human AI researcher/engineer is worst at, it’ll be superhuman at many other things.

And my biggest reason to doubt my belief that timelines will be slower, is the knowledge that I’m hoping they will be.

Sensitive Compartmented Information Facility

To say nothing of coping with the massive societal disruption and non-alignment-related risks that wide deployment of ASI would entail.

Tedd Hadley

Jun 6, 2024Edited

> When he says “we just need to teach the model a sort of System II outer loop”, there’s an awful lot hiding in that “just”. The entire argument is full of handwaves like this.

The key impression I got from all this is that Leopold has access to the private conversations in top-tech companies, the off-the-record discussions, and most importantly the mood. My conclusion: they are not discouraged in any way. We might be discouraged about the lack of agents right now and the huge challenge of System II, they are not.

And that can only be because they have two or three things working now, two or three things to try next to improve that, and two or three things to try after that. That's the way I get optimistic on a problem, when the solution tree is boundless. I think Leopold's optimism reflects the optimism of the top engineers at the top AI companies.

What I suspect is going on is that OpenAI and others are already sitting on huge discoveries that they would prefer never see the light of day. Secrecy and closed AI is quickly becoming the new norm, Leopold was just a bit early seeing (and pushing) the importance of that.

(This is of course very disturbing to me that so much important technology advance could be done in complete secret for the next decade or more, leaving us in the dark about the true capacities of AI models.)

On System II unhobbling was it just handwaving or did he, in fact, give away a bit of why this might not be such a large problem? Consider:

From the website:

"But unlocking test-time compute might merely be a matter of relatively small “unhobbling” algorithmic wins. Perhaps a small amount of RL helps a model learn to error correct (“hm, that doesn’t look right, let me double check that”), make plans, search over possible solutions, and so on. In a sense, the model already has most of the raw capabilities, it just needs to learn a few extra skills on top to put it all together. "

From the interview:

"In the short timelines AI world, it’s not that hard. The reason it might not be that hard is that there are only a few extra tokens to learn. You need to learn things like error correction tokens where you’re like “ah, I made a mistake, let me think about that again.” You need to learn planning tokens where it’s like “I’m going to start by making a plan. Here’s my plan of attack. I’m going to write a draft and now I’m going to critique my draft and think about it.” These aren’t things that models can do now, but the question is how hard it is."

"What humans do is a kind of in-context learning. You read a book, think about it, until eventually it clicks. Then you somehow distill that back into the weights. In some sense, that's what RL is trying to do. RL is super finicky, but when it works it's kind of magical."

I submit that this confidence comes from secret conversations and direct experience: the AI companies are doing RL on planning tokens.

> For my part, both paths look perilous, so I hope to hell that ASI is a lot farther away than 2030.

Agreed. While taking Leopold's predictions seriously could be dangerous, ignoring them could be dangerous too. We need more time!

Expand full comment

3 replies by Steve Newman and others

Thalia Toha

Aug 20, 2024

Steve- I never thought that situational awareness would be something we talk about in the context of technology, let alone life, so this is very informative. I appreciate you sharing. Hope you're well this week. Cheers, -Thalia

4 more comments...

Second Thoughts

Discussion about this post

Ready for more?