Discussion about this post

User's avatar
Tony Asdourian's avatar

Hey Steve, a friend of mine sent me this link-- I think you and your subscribers might find it interesting/amusing: https://nicholas.carlini.com/writing/llm-forecast/question/Capital-of-Paris

Expand full comment
Tony Asdourian's avatar

I continue to enjoy and read all of your posts-- thanks for laying things out so clearly.

One thing I can't quite get out of my mind, having followed computer chess since the early 80's, is how the response to the programs as they were improving seems extremely similar to what we see with LLMs. At first, the programs made laughable moves and were seen as a novelty. Then, they played moves about as good as an average club player, and everyone quickly (and correctly!) pointed out that the average club player isn't doing much more than calculating combinations, and computers are understandably better at that.

Then, the programs started playing at the master level, and as such they would occasionally make moves that, if another master saw the move without knowing it was made by a computer, that master would say it was creative or clever. But that was quickly dismissed as basically luck, a function of it having to pick some move, and the move it picked that was supposedly creative was just picked because the computer was trying to maximize its advantage. And that, too, was also true.

As computers progressed to the grandmaster level, the commentary about their play started to change. The fact that computers started playing undeniably clever and creative moves on a regular basis was attributed to the fact that it could do millions of computations a second and to the fact that it had such a clear goals of material gain and checkmate. No question that that was still true! At the same time, a kind of cynicism about human grandmasters became popular, that all but the top 100 or so weren't really very creative, they were just reusing known patterns from previous games in different ways. And since computers were often hand coded to recognize many of these patterns, it wasn't surprising that computers, being faster in the obvious ways, did better.

Which brings us to what I see as the key analogy with today and LLMs. Because once computers got to strong grandmaster level, lots of chess people began saying that chess programs would probably beat all but the best players, but that the programs had not shown anything original. They weren't going to be capable of new ideas. The car could beat the human in a sprint, but nothing was being learned.

What happened, of course, is that as computers got faster, the programs, as they got stronger, just invented techniques that humans hadn't previously seen. They were still just trying to maximize their advantage, but came up with "new" ideas as a by-product of seeing deeper. Especially in the realm of speculative attacks and robust defense, humans learned there were new possibilities in positions they had previously ruled out. It is true they often could not copy the computer's precision, and thus couldn't always utilize this new knowledge, but they knew it was true, and it changed expert humans' approach, especially to the opening and endgame.

And once AlphaZero and other neural nets tackled chess, where they were not being programmed with human knowledge to build upon, but were learning and teaching themselves, they introduced other new ideas to experts that, this time, humans could emulate a bit more easily. In Go, even more so-- the human game has been revolutionized by the new ideas AlphaGo demonstrated.

So while I (think I ) understand your point that the real world-- writing a novel, solving an original programming problem, writing a non-generic analytic essay that has original insights-- doesn't have clear ways to evaluate quality and to learn and improve, and may thus be non-analogous to Chess/Go/etc., I do wonder how much a sheer increase in scale may be the big difference after all, a la "The Bitter Lesson".

Put another way, sometimes I feel like we are looking at GPT-4 much like we might look at a chimpanzee. Surprisingly smart, but limited in important ways. But how is our brain fundamentally different from theirs? Perhaps, to keep the analogy going, that our brains have one or two advances, like transformers with LLMs, that allowed ours to develop abstraction and language far beyond the chimps. Or did those advances just "appear" because of increased scale or development through more "training data"? I wish I knew the slightest thing about such topics.

OK, thanks again. I'm looking forward to reading your next entry on Memory!

Expand full comment
2 more comments...

No posts