Second Thoughts

It is great to have these examples laid out in one place -- thank you! All of them reveal an inability to identify and track the relevant state of the world, such as net profit, cost of inventory, and so on. As you point out, it also reveals yet again the absence of meta-cognition: in this case its own time management (as well as its own identity and task). Finally, we see that these systems may know many things (e.g., that discounts are a bad idea), but can't reliably apply that knowledge when taking action.

Expand full comment

Abhay Ghatpande

Hi Steve, I've been reading posts by Gary Marcus, Peter Voss, Srini P., and others, and their opinion AFAIK is that agents would need "world models" to operate. (Not the world models introduced recently as part of gaming/video generation, but an actual "model" of the world.) I've not been able to find more info on exactly what they mean here, because the supposed leader in this space, aigo.ai, has zero info on the site. If this were true, it leads me to believe that agents would need to be highly specialized and narrowly focused on a task because building a broad, general purpose model (of concepts and their relationships) is close to impossible.

I would love to see a hybrid agent that combines both LLMs and Cognitive AI. If you are aware of any such efforts, please point them out. Thank you for your (second) thoughtful posts and efforts to educate us.

Expand full comment

Reply (2)

It's not clear to me why building a world model would require specialization – after all, people (seem to) build broad models of the world. For many domains, it's not even clear to me that a narrow model is possible, because everything in the world around us connects to everything else.

World models do seem important. I believe there is debate as to the extent to which LLMs might be developing internal world models.

Expand full comment

Abhay Ghatpande

That's such an interesting POV. Thank you.

I was thinking that it's not about our world, but the agent's world -- what it's built for, what's its remit. And as you say, since everything is connected in the physical world, it would be almost impossible to capture all the relationships. So by limiting the "world" (space) that an agent needs to operate in, constraining the entities that it would interact with, it could be realistic to define dependencies between them. That's why I thought that specialized agents would be required for autonomy.

Expand full comment

It would depend on the task. For an agent that writes Python code, the "world" could perhaps be fairly narrow. For an agent that plans business strategy, the "world" includes customers, employees, competitors, etc. and it seems like you need a pretty broad view. Then the question becomes how many useful tasks fit into the narrow-world vs. broad-world scenarios.

Expand full comment

Seth

Gary Marcus is very committed to a literal, and explicit, and symbolic world model; but AFAIK there's no reason a world model can be "implicitly encoded" in the weights or dynamics of a neural network. It's just very, very hard to learn the "correct" world model from observational data.

Expand full comment

Forget it.

Sep 7

Fascinating run down. Also highlights the giant gap between ‘all the AI will replace you corporate hypsters’ and the reality of real world decision making. Do the CEO’s really think humans are this ineffective?

Expand full comment

Sep 7Edited

I'm thinking more and more about the progress made in the last year, and I believe we are expecting too much from general-purpose models. Most real-world tasks don’t need broad intelligence; they need focused competence. In most cases, it is like bringing the firehouse to water a houseplant: too much pressure, not enough control. General-purpose LLMs often feel like an over-engineered solution to most practical problems. We should build small, specialized models for specific domains and let a general model handle orchestration only when cross-domain reasoning is required.

- Use specialists for perception, parsing, and domain-specific decision-making with clear, structured state and verifiable constraints.

- Wrap them with simple verifiers and uncertainty checks to ensure reliability, and escalate to humans when needed.

- Reserve general models for coordination, open-ended dialogue, and genuinely multi-domain problems.

This systems approach—specialists for depth, a generalist for glue—delivers better performance, lower cost, and higher trust than forcing a single general model to do everything.

Expand full comment

Possibly! I wonder whether how many tasks are sufficiently "specific" for small, specialized models to work. It's certainly a worthwhile experiment that will get tried, so we'll find out.

Expand full comment

Sep 8Edited

When I talk about tasks, I’m referring to them at the domain level. For example, we already have enough coding models. We need models tailored to specific fields like physics, chemistry, biology, or even combinations of closely related domains, when collaboration between fields (e.g., chemistry and biology, or physics and mathematics) makes specialization more efficient than relying on a general-purpose model.

A good metaphor for the inefficiency of generalization is: "Bringing the whole library when you only need one book."

Rather than creating models that do everything, we should focus on specialized models designed for specific domains—or modular systems that allow closely related fields to work together seamlessly. For instance, a biochemistry model that combines chemistry and biology expertise, or a physics-math model for solving advanced physical problems.

While I don’t have concrete proof, I suspect that general-purpose models are more prone to "hallucinations," or providing incorrect answers by filling gaps with unrelated or inaccurate knowledge from other domains. They may also overanalyze or overgeneralize, making them less effective for domain-specific tasks.

By contrast, specialized models are likely to ensure greater efficiency, accuracy, and relevance, avoiding the pitfalls of overgeneralization.

I’ve observed a similar tendency, even among some brilliant people I work with at my day job. Sometimes, they overanalyze straightforward problems by trying to fill any gaps with their extra intelligence or tacit knowledge. While they intend to be thorough, this approach can lead to unnecessary complexity when a simple solution would suffice.

We need to match the tool to the task, which often means choosing just the right book(s) from the library, rather than hauling the entire collection.

Expand full comment

Perhaps. But much of human knowledge doesn't partition neatly. Politics relates to economics, which relates to transportation, which relates to innovation, which relates to other things. Perhaps a narrow biochemistry model would make sense, but many (most?) parts of many (most?) jobs don't fall into such tidy narrow buckets.

Even for a simple task like "summarize this email", a model with wide and deep world knowledge might do a better job.

Expand full comment

Most novel problems or tasks related to significant scientific discovery will likely require a general model, as will individuals working in several domains. However, most operational, day-to-day business activities don’t necessarily need such a general model. A general model better handles tasks like summarizing an email, and it makes sense for people who frequently write or respond to emails to have it. However, email or document summarization is a relatively small part of daily work for many jobs, especially with the widespread adoption of collaboration tools like Slack, Microsoft Teams, and others. The need for a general or specialized model will vary depending on the domain, organizational culture, and individual job roles.

There will always be specific tasks where a general model is ideal, but it may not be necessary for many activities. I understand the appeal of a general model—maintaining a single model instead of managing hundreds is far more efficient and manageable. However, it’s worth remembering that models like GPT-5 likely orchestrate multiple specialized models behind the scenes to address different needs. The same is probably true for other leading models, suggesting that we are already moving towards a level of specialization, even if it isn’t yet explicitly domain-specific (beyond coding models, for instance).

This trend aligns with the insights from the research paper “Survey of Specialized Large Language Models” (https://arxiv.org/pdf/2508.19667), which highlights the rise of specialized large language models (LLMs) tailored to specific domains such as healthcare, finance, law, and engineering. These models demonstrate significant performance improvements on domain-specific benchmarks compared to general-purpose LLMs. Examples include BioGPT for biomedical tasks, BloombergGPT for financial analysis, and Med-PaLM 2 for healthcare applications. Can they do every part of a professional job? The answer is most likely no. I hope these models will become smaller in the long run as model training and other techniques evolve to provide better cost-effectiveness and efficiency.

Expand full comment

https://substack.com/@microexcellence/note/c-157136596

Sep 18

Here is an article in The Economist: https://www.economist.com/business/2025/09/08/faith-in-god-like-large-language-models-is-waning

As David Cox, head of research on AI models at IBM, a tech company, puts it: “Your HR chatbot doesn’t need to know advanced physics.”

If you do not have access to it, here is the summary:

Expand full comment

Seth

Coming from a neuroscience background, it does seem like the most obvious major thing that brains have that AI agents do not is highly structured memory. AFAIK, LLM-based agents have basically "short-term" memory in their context window and "long-term" memory embedded in their trained weights, but the interactions between the two is pretty crude. Brains seem to have a small "context window"--from 1 to 6 "items", depending on who you ask--but several different types of long-term memory, and they spend a huge amount of effort deciding exactly what to move from long-term to short-term memory and vice versa.

I say this because this seems incredibly obvious, and there are much smarter neuroscientists than myself working in AI, yet it doesn't seem like there's been much progress on this front. I'm guessing "they" are working on this, but it's just very very hard. Maybe because it requires new architecture, and not just layering things on top of a transformer?

Expand full comment

I do not have a neuroscience background, but I share your understanding / view that current LLM architectures have an impoverished memory hierarchy, that it's not obvious how to get past that with current architectures, and something will have to give at some point.

It's probably relevant that people don't have the clear training vs. inference distinction that is fundamental to current LLM architectures.

Expand full comment

Mike Newhall