The Centaur Era

The question isn't what AI can do, it's what you can do with AI

Apr 17, 2026

We talk about “AI capabilities” as if there are things AI can do, and things it can’t. But there’s a huge messy middle, where outcomes depend on patience, skill, and luck. ChatGPT might generate a sound analysis if prompted one way, but reflect the user’s biases if prompted differently. Gemini might diagnose my medical condition, but not yours. Claude Code might be unable to build a complex app on its own, but succeed with a bit of guidance from an experienced developer.

The most powerful uses of AI fall in this intermediate zone, and yet it is poorly studied and publicized. This discourages people from learning to get full value from AI tools, and confuses the debate regarding AI progress. Benchmarks don’t explore the sorts of open-ended tasks that often arise in the real world; they also don’t measure what people can accomplish when working in conjunction with AI1. (Human-computer collaborations are sometimes called “centaurs”.)

Today’s post is a combo pack. I’m going to share my experience building an AI research agent, and the weaknesses it exposed in the reasoning abilities of current AI agents. I’ll explain how I managed to get good value from the AI’s research despite the flaws. I’ll wrap up by presenting implications for the pace of AI progress. And I’ll illustrate all of this through an analysis of Elon Musk’s proposal to put data centers in space.

Using AI as a Research Agent

I have a mental list that I keep adding to. You might call it Topics Where Someone Really Ought to Do The Math. These are debated questions about the real world, that would benefit from a hard-nosed quantitative analysis. I’d like to do those analyses, but who has the time? AI, that’s who.

As a starter problem for AI investigation, I chose the idea of building data centers in orbit. There’s been an interesting debate about this recently. Elon Musk used it as justification for merging his companies SpaceX and xAI, aiming for an IPO at a stratospheric valuation of $2 trillion. Prominent GPU industry analyst Dylan Patel argues that it’s economically impractical.

Why would anyone take a perfectly good data center and blast it into space? The argument goes like this: in the race to build ever-larger clusters of AI chips, it’s getting hard to find locations with an adequate supply of electricity. In orbit, there’s 24-hour solar power, the inky blackness provides free cooling, and there are no environmental regulations or construction moratoriums. (Spoiler alert: at least two of these claims turn out to be exaggerated.) So there’s no need to keep scrounging for gigawatts on Earth.

Does this argument hold water? To see whether the question falls within the zone of current AI competence, I posed it to ChatGPT Pro – generally acknowledged to be the most capable analysis tool among mainstream AI offerings.

ChatGPT Pro Thinks for 42 Minutes and Comes Back With a Useless Muddle

I told it to start by reading three recent publications, including transcripts of interviews with Elon Musk and Dylan Patel:

Prepare a report on the potential economical viability of orbital data centers for AI inference over the next 15 years. Specifically address the relevant ideas and arguments presented in https://www.dwarkesh.com/p/dylan-patel, https://www.dwarkeshcom/p/elon-musk, and https://caseyhandmer.wordpress.com/2026/02/10/i-guess-were-doing-moon-factories-now/. Make sure to do your own research and analysis – don’t rely excessively on these three sources.

Its report is, like many AI outputs, a sort of Rorschach test. At first glance, it appears impressively definitive and erudite; on closer inspection, it’s a pile of worthless mush. Readers can easily come away confirming whatever preconceptions they’d held, whether they believe (a) orbital data centers are the next big thing, (b) they’re the next big scam, or (c) irrelevant because any such AI report will be stochastic-pigeon slop.

The seven page report2 cites 25 sources. The conclusions align with all three of the expert sources I provided, which is no small feat given that they extensively contradict one another. It’s full of jargon – all used correctly. But there’s no logic to the analysis. As the expression goes, it’s “not even wrong”.

Orbital Data Centers Memo Final

228KB ∙ PDF file

Download

The report doesn’t present any answer to the central question: how much will orbital data centers cost, and how does that compare to the earthbound alternative? It throws around plenty of relevant figures – solar cell efficiency, launch costs, etc. – but at no point does it assemble them into a comprehensive total or tie them to any conclusion. Instead, it confidently asserts a narrative that incorporates elements from each of the three initial sources I’d instructed it to read. Like a parent unwilling to play favorites, it appears to have looked for a way to agree with all three sources, rather than performing critical analysis.

So, that’s what I got in return for a tiny investment in effort – a single prompt. Next, I’ll present a more in-depth project to generate an AI analysis of this topic.

Creating A Research Agent In my Own Image

I’m a systematic kind of guy. If I were investigating orbital data centers, I’d begin by framing a precise question:

In what year is the cost of launching and operating an orbital data center likely to become competitive with terrestrial data centers?

This immediately suggests additional questions. What are the major costs involved in each type of data center? How will those costs evolve over time? Each investigation may spawn further questions. For instance, Elon Musk argues that it will soon be impossible to generate enough electricity for earthbound data centers. That begs questions such as: What will the demand for new data centers be? Will there be enough chips to fill them? What in fact are the prospects for powering them – whether with solar power, gas turbines, unused capacity from the existing electrical grid, or otherwise?

My typical research process is something like this:

Frame the question in precise terms.
Do some initial research.
Informed by this research, break the question down into smaller questions.
Research each sub-question, and combine the findings into an overall report.
Cross-check against key sources (such as the Elon Musk and Dylan Patel interviews). If they disagree with our findings, and our existing research doesn’t provide evidence to refute them, then perform further research to decide who is correct.

In this process, I keep adding new questions, and new sources bearing on those questions, until I can:

Provide a clear answer to the original question.
Explain how the analysis relates to each key source. Hopefully, on a point-by-point basis, I can either explain how my analysis is consistent with the source, or cite evidence that the source is incorrect.

I had Claude Code create a “skill” (basically, a set of guidelines) for itself to follow precisely this process. Over a period of a week, Claude and I worked together to refine the skill. It now incorporates multiple “passes”, each of which can be run multiple times. A pass might check for arithmetic errors, verify that each conclusion is supported by evidence, or prepare a point-by-point comparison of our conclusions to those of a published paper. The results are... still not what I’d hoped for, but they show promise. They helped me assemble what I believe to be a fairly well-grounded picture, which I’ll present next.

Stop Trying to Make Orbital Data Centers a Thing

Here’s the punchline: under present conditions, it simply does not make sense to put data centers in space. The only potential advantage would be access to plentiful solar power, and that’s not enough to make up for the complexity and expense of orbital deployment. In the extensive research performed by Claude, as well as the ChatGPT Pro report, no other advantages were identified that strike me as significant. Most of the cost of a terrestrial data center goes to components such as GPUs, memory chips, and networking equipment – all of which would be equally needed in space. This leaves very little scope for orbital data centers to come out ahead.

It is indeed getting harder to find places to connect data centers to the grid, at least in the US. However, the data centers are still getting built, often relying on off-grid solutions such as gas-powered generators. Because electricity constitutes only 10-20% of the lifetime cost to build and operate a data center3, even expensive workarounds don’t add all that much to the overall project expense. The limiting factor on data center construction continues to be chip supply, not power4.

Placing data centers in orbit would introduce a long list of expenses and challenges, such as:

Launch costs. Each AI server will require a healthy allocation of solar panels, batteries5, cooling equipment6, and other hardware. Putting all of this in orbit costs hundreds of dollars per pound today7, and even very optimistic projections are still tens of dollars per pound.
Financing costs. Data centers have very high upfront costs. Lenders or investors will charge a higher premium for financing higher-risk space deployments.
Commissioning delays. It takes months to go from electronic components delivered to a satellite manufacturer, to the satellite being in orbit and operational8. This would waste the most valuable period of a GPU’s operational life, and add to financing costs.
Lifetime reduction: launch stress, radiation, and temperature variations may lead to premature GPU failure.
Inability to repair: AI data centers need maintenance to replace faulty components and fix flaky network connections. This is not currently practical in space.

I’ve only hit a few highlights; there are many additional complications, unproven technologies, and other risk factors to be considered, without even getting into unknown unknowns.

One dilemma: how large should individual AI satellites be? On Earth, the trend is to group ever-larger numbers of GPU chips into a single “scale-up domain”, with intricately designed, ultra-high-speed network connections between chips. Current scale-up domains fill an entire server rack, using around 120kW of power, and megawatt-scale scale-up domains are on the horizon. By comparison, the entire International Space Station typically uses less than 100kW9.

The move to closely integrate large numbers of chips is driven by the fact that larger scale-up domains can operate AI models more efficiently. However, building individual satellites of this size poses serious challenges, from the likely need for in-orbit assembly, to the increased likelihood of component failures.

On the terrestrial side, it’s true that there is a supply crunch for electrical components such as transformers and gas turbines. However, there are many ways to work around this problem. Anyone who wants to argue that it would be difficult to increase production of these components, will need to explain why that is harder than bootstrapping the entire new industries that would be required for orbital data centers.

It’s also true that anti-data-center sentiment is rising in some parts of the US, sometimes making it harder to get construction approvals. In Maine, a bill imposing an 18-month moratorium is awaiting the governor’s signature. However, so far as I can tell, the industry is still finding places to build; I’ve seen no reports of GPU racks piling up in warehouses. And finding places to host a massive number of rocket launches would pose its own challenges.

The inescapable conclusion is that under current circumstances, orbital data centers are something of a fool’s errand10. This seems unlikely to change until one of the two following things occur:

We’ve built so many data centers that a significant portion of Earth is covered in solar panels to supply power.
We’ve established a substantial industrial economy in space for other reasons – such that there is an established population of either astronauts or robots to perform construction and repairs.

Neither of these seems plausible for at least the next 5 to 10 years. As a result, orbital data centers are a topic for long-range research, not active commercialization. After reviewing the report generated by my new automated research agent, I feel reasonably confident in making this claim. However, I’m basing that on my own analysis – the agent’s analysis is riddled with flaws.

My Research Agent Gave Me Good Input, but Bad Analysis

I had my new Claude Code research skill perform many, many passes over the report: checking its own work, identifying new questions, and locating new sources. I also had ChatGPT Pro repeatedly critique the work; it turns out to be much better at poking holes in another agent’s output than in doing a rigorous job itself.

You can read the resulting book-length report here. It’s extremely detailed: 119,481 words across 46 web pages, drawing on 403 sources. That’s far more thorough than anything I would have undertaken on my own. But there are myriad errors in logic, and they make for interesting reading – revealing an emphasis on narrative over logic. Here are a few examples:

It sometimes gets things backwards. The report presents “optimistic”, “central”, and “conservative” estimates. When presenting the optimistic scenario for viability of orbital data centers, it uses its lowest estimate for the cost of earth-based electricity. That’s optimistic for terrestrial power, but pessimistic for orbital data centers.
It ignores facts that clearly contradict a narrative. To estimate the rate at which GPUs fail and need replacement, Claude relies on a paper from Meta, which describes a training run which experienced “148 ‘Faulty GPU’ interruptions over 54 days on 16,384 GPUs”. Claude notes that only 3 of these 148 interruptions required manual intervention, implying that most of the interruptions did not involve failed hardware. However, it still uses 148, not 3, as the basis for its calculation of failure rates.
It fails to enforce consistency across different sections of the report. At one point, the report discusses GPUs being “economically obsolete in 2–3 years”, an idea that is in widespread circulation, on the basis that GPU designs keep improving. However, demand is so high that even rental prices for five-year-old chips are skyrocketing – a fact that Claude cites elsewhere in the report.
At one point, when discussing the cost of terrestrial power sources, the report presents prices for solar panels plus four hours of battery storage, a typical configuration for non-data-center, grid-connected applications. It acknowledges that four hours is insufficient (nights last longer than that!), but doesn’t attempt to compute the cost for a freestanding system capable of powering a data center 24/7. It does note that “longer-duration storage (8-12 hours)... is required”, but doesn’t compute how that would impact costs, and in any case it’s still wrong – in winter months and/or bad weather, a pure solar+battery setup would require more than 12 hours of battery power.
In earlier drafts, it repeatedly cited an FCC “5-year deorbit rule” as limiting the maximum lifetime of an orbital data center to 5 years. This is obviously silly, as many satellites are used for longer than 5 years. The rule actually imposes a requirement for cleaning up satellites after they are no longer in use. Claude ignored hints about this from me; I finally had to explicitly point out that it was misinterpreting the rule.

The full report contains dozens, possibly hundreds, of such errors – despite many rounds of refinement by Claude and ChatGPT Pro11.

While the report’s analysis is faulty, I do believe that the iterative critique-and-refinement process has done a good job of surfacing relevant questions and evidence. On my own, I would have been unlikely to think about financing costs. I might not have bothered to question Musk’s assertion that space provides uninterrupted solar power12. I would have had much less access to quantitative estimates of many relevant factors. I would have been worried that I was missing some major consideration. The report, flawed though it may be, was invaluable in helping me to generate my own analysis.

I started out by referencing the messy middle zone of semi-competent AI capabilities. What did this project teach me about that?

The Messy Middle is Where The Action Is

Since the dawn of LLMs (the large language models that power AI chatbots and agents), it has always been the case that better prompting techniques yield better outputs. Simple tricks like “answer like an expert” are no longer helpful (if indeed they ever were), but there is more room than ever for human skill to elicit better results. In this project, I incorporated techniques such as:

Building a multi-step workflow for Claude Code, rather than just asking it a single question.
Setting things up so that Claude could critique and revise its own work.
Finding ways to collaborate with the AI, using my own judgement and taste to suggest new questions for it to investigate.

This has implications for the pace of change. Human skill at using AI isn’t a static factor, but it’s often a neglected one. When we look at a chart of benchmark scores, we ignore the fact that people are finding increasingly sophisticated ways of using AI. This compounds the pace of real-world impact.

Keep all this in mind when someone says “AI can’t do X” (how hard did they try?) or “AI just did X for me” (how much work did it take to get that result?). This also applies to benchmark scores, including those behind the famous METR “the size of coding tasks AI can do is doubling every ~~seven~~ four13 months” graph. Benchmarks exaggerate AI capabilities, because they feature artificially tidy tasks and narrow evaluation criteria; but they also fail to measure the potential for human-AI collaboration.

What you get out of AI tools has a lot to do with what you put in. If you’re only using AI when you’re confident it can give you an accurate answer to a quick question, you’re missing out. If you’re willing to push through an extended collaboration with a chatbot or AI agent, a much wider range of possibilities opens up – and you’ll develop new skills in the process. My cobbled-together research agent is, as the expression goes, “the worst it will ever be”. I already have plans for significant improvements, and the underlying AI models will continue to improve (Opus 4.7 was released as I wrote this concluding paragraph!). I have a long list of questions to research, and high hopes for plowing through them at an accelerating pace.

Thanks to Abi Olvera and Taren Stinebrickner-Kauffman for suggestions, feedback, and images.

Some studies do explore human-AI collaboration. For instance, a study by METR on software engineering productivity, or this 2023 paper by Erik Brynjolfsson et al studying customer support agents.

You can review the original ChatGPT conversation here.

Commentary I’ve seen consistently describes electricity supply as a small fraction of AI data center TCO (total cost of ownership, meaning upfront investment plus ongoing operational costs). To put this on firmer ground, I asked ChatGPT Pro to create a report, and then insisted that it find specific citations for each fact asserted in its report. Here’s the result; this is ChatGPT’s research and analysis, reviewed and rewritten by me. I would not take this as definitive, in part because I can’t vouch for the quality of the sources, but it looks reasonable and is consistent with what I’ve read on the subject:

Building a data center costs around $35 million per MW. For instance, one source puts the “shell and core” (physical building, etc.) at $10.7M / MW and the fancy electronics (GPUs, memory, networking, etc.) at $25M / MW.

One megawatt of electronics accounts for roughly 1.2 megawatts [ChatGPT’s estimate, no source provided] of utility power, the balance going to cooling equipment, conversion losses, and other overhead. If the data center runs at full power 24x7 (an unrealistic assumption, therefore inflating the contribution of electricity cost to TCO), then over a 5-year lifetime, the data center uses 52,560 MWh of electricity (5 * hours-per-year * 1.2).

IEA figures place the 2025 U.S. industrial average electricity price at 8.62¢/kWh ($86.2/MWh). That’s the price for utility power, but the whole point of the argument is that it’s getting harder to find utility power for new data centers. One alternative is “behind-the-meter” generation (generating your own electricity on site), for which less-efficient-but-faster-to-build gas-powered “peaker” plants cost $149 – $251 / MWh. (Many other options are possible, incorporating a mix of grid power, gas, solar, batteries, etc.)

Here we have three estimates of electricity price – $86, $149, or $251 per megawatt-hour. Multiplying by 52,560 and comparing with the $35M construction cost puts electricity’s share of 5-year TCO at 11%, 18%, or 27% of overall TCO, respectively.

This is a highly simplified model. Factors which would cause electricity’s actual share of TCO to be higher: the $25M / MW figure was the top end of the source’s estimated range for cost of GPUs and other equipment; the cost figures for on-site generation probably assume power generation equipment is amortized over much more than 5 years (I didn’t check). Factors which would cause electricity’s share to be lower: financing costs (some aspects of electricity cost aren’t paid until the electricity is used, while equipment has to be purchased up front); data centers aren’t always consuming their theoretical maximum power draw; operating and maintenance costs; cost-per-MW of AI computing equipment has been increasing over time.

I asked ChatGPT to follow up on the 20% overhead estimate for power usage (1.2MW in per 1MW going to the electronics), and it found one source putting the overhead at 14% (figure 4.5 of this report), and another source giving similar figures. This would lower electricity’s TCO share slightly.

I would imagine that, over time, the cost-per-megawatt of GPUs will increase, again lowering electricity’s TCO share. Overall, the simple calculation of 11-27% is almost certainly on the high side, especially going forward.

Dylan Patel has suggested that power may soon become a limiting factor on AI data center expansion, but only temporarily. In the medium to long term, scaling the manufacturing of gas turbines, solar panels and batteries, or other power sources is a vastly simpler proposition than scaling chip fabrication.

Even the supposed #1 advantage of space – continuous solar power – turns out to be exaggerated. It turns out that the best near-Earth orbits pass into shadow for up to 21 minutes per 90ish-minute orbit, meaning that satellites would either need to suffer periodic downtime, or carry batteries that add to launch mass. (An orbit around the dawn/dusk line has no shade, but there is no orbit which maintains this orientation over the course of a year.) Orbits farther from Earth have fewer issues with shade, but they’re also more exposed to radiation.

Contrary to popular belief, it is not easy to keep high-power electronics cool in space. Space may be cold, but vacuum doesn’t conduct heat, so cooling requires large radiators. Worse, AI computation requires lots of high-powered GPUs to be packed together in a small space, and removing all of that heat requires pumping liquid through lots of little pipes – a tricky business in space, where no repair technician can hear you scream.

This is the estimate for SpaceX's internal cost; commercial prices are much higher.

The server rack needs to be integrated into a satellite, thoroughly tested so that any bad components can be replaced before launch, shipped to the launch site, wait for a launch slot, and then raised into its permanent orbit.

The ISS solar panels can generate a bit over 200kW, but the station spends some of its time in the shade, so some power is needed to charge batteries.

In some cases, it may make sense to incorporate “edge compute” GPUs into remote sensing satellites, to analyze images without having to transmit the full raw camera data down to Earth. However, this would not substitute for construction of terrestrial data centers.

It's interesting to ponder why. My sense is that these models emphasize a compelling paragraph-by-paragraph narrative, and fail to notice or address logical contradictions between different parts of a report, even when asked to look for inconsistencies.

It doesn’t; see footnote 9.

This isn’t an editing glitch; earlier estimates were that the scale of coding tasks which could be handled by AI were doubling every 7 months, but recently the trend seems to have accelerated. Note that there are many caveats to this result, one of which is that the benchmark doesn’t include many tasks difficult enough to challenge the latest coding agents.

Apr 18

Interesting article, thanks for providing something like "raw data" and not just yet another polemic piece!

Two small errors: Footnote 6 apparently got overwritten with footnote 7, and footnote 12 refers to "previous footnote" but presumably should refer to footnote 9.

One disagreement: you talk about how "finding places to host a massive number of rocket launches would pose its own challenges", but I don't see how that is a new problem? SpaceX already has several launch sites that it can use to deliver Starlinks at a huge pace, why could it not use them for this purpose as well?

And one takeaway: I think your footnote 11 is the most interesting thing here. In my experience, when testing an AI for some task (e.g. in my experience transcribing handwriting), either I succeed in five minutes or nothing ever works, until perhaps a later model one-shots the task. These rounds of "refinement" or "self-improvement" or such never seem to work. I think it isn't an inability to *find* errors (if anything, an AI prompted to look for errors tends to give you a huge list that contains the relevant stuff and a lot of non-issues), but an inability to react to them effectively. It seems that AI lacks something like self-awareness or flexibility, and when it sees a problem, it doesn't have any way to change its approach.

It would be interesting to dig deeper into why this is the case, but I don't know how to even pose this question more rigorously. One test would be to try to simplify your approach and try to just produce 300 pages of *anything* with no corrections, then summarize it into a table of contents. My guess is it would be similarly (in)effective as your results, and the only difference between the two approaches you show is size of the output.

2 replies by Steve Newman and others

Carsten Bergenholtz

Apr 18Edited

This is a fantastic post, that could be teaching material in many educational settings. Benchmarks and model capability don't tell the full, real story. In real life, messy and open-ended tasks, GenAI can help you think but only if one is careful and always in cognitive control. A number of studies support the example and line of thinking presented here:

- this study https://dl.acm.org/doi/pdf/10.1145/3772318.3791796 shows that GenAI only contributes positively to solving an open-ended critical thinking challenge, if there is no/limited time pressure. When there is time-pressure, using LLMs led to worse results. Likely because of the same reasons outlined by Steve in this post: LLMs are too eager to present conflicting arguments.

- the following two articles both show that the impact of using GenAI can depend on one's expertise: GenAI helped lower performers, while high performers did not benefit. Mechanism: If you don't know much, then getting some info (e.g. on data centers in space) is better than nothing. Yet, if you in fact already do know something (on data centers) then getting pages and pages of info that on the surface looks smooth and plausible, but is in fact somewhat incoherent / quite right - can lead to a performance not improving. Qualitative interviews showed that the challenge of monitoring, filtering and evaluating the plausible information disrupted the higher performers thinking. See (co-authored by me) on a business school case here: https://journals.aom.org/doi/10.5465/amle.2025.0029 and an example involving legal reasoning here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6525800.

If one has time, and one puts in extensive effort - then they can really be helpful.

3 more comments...

Second Thoughts

Discussion about this post

Ready for more?