Discussion about this post

User's avatar
MD's avatar

Interesting article, thanks for providing something like "raw data" and not just yet another polemic piece!

Two small errors: Footnote 6 apparently got overwritten with footnote 7, and footnote 12 refers to "previous footnote" but presumably should refer to footnote 9.

One disagreement: you talk about how "finding places to host a massive number of rocket launches would pose its own challenges", but I don't see how that is a new problem? SpaceX already has several launch sites that it can use to deliver Starlinks at a huge pace, why could it not use them for this purpose as well?

And one takeaway: I think your footnote 11 is the most interesting thing here. In my experience, when testing an AI for some task (e.g. in my experience transcribing handwriting), either I succeed in five minutes or nothing ever works, until perhaps a later model one-shots the task. These rounds of "refinement" or "self-improvement" or such never seem to work. I think it isn't an inability to *find* errors (if anything, an AI prompted to look for errors tends to give you a huge list that contains the relevant stuff and a lot of non-issues), but an inability to react to them effectively. It seems that AI lacks something like self-awareness or flexibility, and when it sees a problem, it doesn't have any way to change its approach.

It would be interesting to dig deeper into why this is the case, but I don't know how to even pose this question more rigorously. One test would be to try to simplify your approach and try to just produce 300 pages of *anything* with no corrections, then summarize it into a table of contents. My guess is it would be similarly (in)effective as your results, and the only difference between the two approaches you show is size of the output.

Carsten Bergenholtz's avatar

This is a fantastic post, that could be teaching material in many educational settings. Benchmarks and model capability don't tell the full, real story. In real life, messy and open-ended tasks, GenAI can help you think but only if one is careful and always in cognitive control. A number of studies support the example and line of thinking presented here:

- this study https://dl.acm.org/doi/pdf/10.1145/3772318.3791796 shows that GenAI only contributes positively to solving an open-ended critical thinking challenge, if there is no/limited time pressure. When there is time-pressure, using LLMs led to worse results. Likely because of the same reasons outlined by Steve in this post: LLMs are too eager to present conflicting arguments.

- the following two articles both show that the impact of using GenAI can depend on one's expertise: GenAI helped lower performers, while high performers did not benefit. Mechanism: If you don't know much, then getting some info (e.g. on data centers in space) is better than nothing. Yet, if you in fact already do know something (on data centers) then getting pages and pages of info that on the surface looks smooth and plausible, but is in fact somewhat incoherent / quite right - can lead to a performance not improving. Qualitative interviews showed that the challenge of monitoring, filtering and evaluating the plausible information disrupted the higher performers thinking. See (co-authored by me) on a business school case here: https://journals.aom.org/doi/10.5465/amle.2025.0029 and an example involving legal reasoning here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6525800.

If one has time, and one puts in extensive effort - then they can really be helpful.

3 more comments...

No posts

Ready for more?