Some notes on computers, AI, and the mind

by Bill Benzon

AI – artificial intelligence – is all the rage these days. Most of the raging, I suspect, is a branding strategy. It is hype. Some of it isn’t, and that’s important. Alas, distinguishing between the hype and the true goods is not easy, even for experts – some of whom have their own dreams, aspirations, and illusions. Here’s my 2 cents on what we do and do not know.

And it’s only that: 2¢. Not a nickel or a dime more, much less a 50 cent piece.

What are computers, animal, vegetable, or mineral?

One of the problems we have in understanding the computer in its various forms is deciding what kind of thing it is, and therefore what sorts of tasks are suitable to it. The computer is ontologically ambiguous. Can it think, or only calculate? Is it a brain or only a machine?

The steam locomotive, the so-called iron horse, posed a similar problem in the nineteenth century. It is obviously a mechanism and it is inherently inanimate. Yet it is capable of autonomous motion, something heretofore only within the capacity of animals and humans. So, is it animate or not? Consider this passage from Henry David Thoreau’s “Sound” chapter in Walden (1854):

When I meet the engine with its train of cars moving off with planetary motion … with its steam cloud like a banner streaming behind in gold and silver wreaths … as if this traveling demigod, this cloud-compeller, would ere long take the sunset sky for the livery of his train; when I hear the iron horse make the hills echo with his snort like thunder, shaking the earth with his feet, and breathing fire and smoke from his nostrils, (what kind of winged horse or fiery dragon they will put into the new Mythology I don’t know), it seems as if the earth had got a race now worthy to inhabit it.

What was Thoreau doing when he wrote of an “iron horse” and a “fiery dragon”? He certainly recognized them as figures. He knew that the thing about which he was talking was some glorified mechanical contraption. He knew it was neither horse nor dragon, nor was it living.

Or was it? Did Thoreau really know that it wasn’t alive? Or did he think/fear that it might be a new kind of life? We live in a world where everyone is familiar with cars and trains and airplanes from an early age, not to mention all sort of smaller self-propelling devices. We find nothing strange about such things, NOW, for we have long been used to them.

We are not yet so used to computers and, moreover, what they can do has been changing so rapidly that computer is similarly ambiguous for us. It is clearly an inanimate machine. Yet we interact with it through language; a medium heretofore restricted to communication with other people. To be sure, computer languages are very restricted, but they are languages. They have words, punctuation marks, and syntactic rules. To learn to program computers we must extend our mechanisms for natural language.

As a consequence it is easy for many people to think of computers as people. Thus Joseph Weizenbaum, with considerable dis-ease and guilt, has told of discovering that his secretary “consults” Eliza—a simple program which mimics the responses of a psychotherapist—as though she were interacting with a real person. Beyond this, there are researchers who think it inevitable that computers will surpass human intelligence and some who think that, at some time, it will be possible for people to achieve a peculiar kind of immortality by “downloading” their minds to a computer. As far as we can tell such speculation has no ground in either current practice or theory. It is projective fantasy, projection made easy, perhaps inevitable, by the ambiguous ontological status of computers.

What is intelligence?

I don’t know the detailed history of the concept, but its use in this context has certainly been strongly influenced by the idea of intelligence as something that we can be tested for through, you know, intelligence tests. What is intelligence? It’s what is identified through intelligence tests.

That’s not very helpful, is it? But let’s think about it a bit. Intelligence tests are instruments for measuring human performance. An IQ score is thus said to be a measure of something we call intelligence.

Lots of things are measured, automobile performance for example. Acceleration is a common measure of that performance. As measurements go, it is pretty straight forward. Note, though, that acceleration measures the whole system that is the automobile, not some one component of it.

If a team of engineers is charged with improving the acceleration of an auto, what are they doing to do? There are lots of things they could do. We know that because acceleration equals force times mass, that anything in the car that has mass affects acceleration. So, reduce the mass. Not only that, there’s friction between the car and the ground and atmosphere, and that’s “outside” the equation between force, mass, and acceleration. So, reduce external friction. And, of course, increase the available force – more power in the engine, better transmission, whatever. But there isn’t some one thing in the car that is the acceleration system.

Is there some one thing in the mind or brain that is intelligence, the thing measure by IQ tests? I doubt it. Those tests simply measure whatever abilities are required to solve the problems posed in a given test. If I had to hazard a guess, I’d say they’re mostly about abstract reasoning. But that is not a very well defined set of capacities and, in any event, is not the total of human mental capacities. I doubt, for example, that it is central to doing play-by-play commentary on football games, a task we’ll examine a bit later on.

If, then, we don’t even know what intelligence is, how can we design and construct devices that embody it? All we can do, and all we have been doing, is create devices that can solve specific problems and see how well they do. That is true even in the case of current technology based on some form of machine learning. The learning architecture may itself be open ended, but it can’t do anything until it has been fed mountains of data thereby allowing it to construct a specialized device to deal with whatever objects are represented in the data.

Every time you read or hear something about the coming Singularity when computers will be super-intelligent – perhaps even smart enough to solve all our problems, or alternatively, condemn us to, you know, The Other Place –think about the problem of establishing colonies on Mars. What do we know about achieving that? We have sent people to the moon and brought them back. We’ve sent devices that landed on Mars and then made observations at our control. And we’ve had people live in space for as long as a year in near space orbit. Given those things that we’ve already done, we have no reason to believe that we can’t send people to Mars, though it will be very expensive. What we don’t know, however, is how well people can handle the psychological stress of such a voyage nor do we really know how our bodies will cope. On the whole, we know a great deal about voyaging to Mars, not everything we’d like, but enough to make plans and work toward achieving them. We have a pretty good sense of what we do not know.

Our knowledge of “intelligence”, either in humans or machines, is not comparable to that. Belief in the coming of superintelligent computers is in the realm of “tech-bro monotheism”, a term coined by the physicist Sabine Hossenfelder. As far as I can tell, the plan for achieving the superintelligent computer is to build and build and build and one day the magic will happen all by itself. We haven’t got a clue about what we don’t know.

Speech to text, WOW!

Every once in awhile I transcribe a bit of speech from a YouTube video. It is tedious work. I was doing this yesterday when, in poking around, I accidentally toggled closed captions. All of a sudden I noticed two lines of transcribed words crawling across the bottom of the video. And they were pretty accurate, accurate enough to make transcription a bit easier. Just stop the video and copy what you see on the screen. If it’s not completely accurate, what you see nonetheless is enough to cue you so you can type a more accurate version based on what you heard.

I don’t know just when this started happening. Google acquired YouTube over a decade ago (in 2006) and it would have been since then. I’d guess it was in the last five years. Just when it happened doesn’t matter much, the underlying technology developed gradually over the last four decades.

Now to the point: To a first approximation this is the same kind of machine-learning technology that powers Google Translate. Google Translate is certainly not perfect and, for many purposes, it may not even be very good. But for casual use it is useful and certainly better than nothing. I’d guess that most of us – when we’re not caught up in being defensive about computing ­– find Google Translate pretty impressive. But speech-to-text? Not impressive, routine.

And yet the underlying technology is pretty much the same. In neither case does the machine understand language. But we know that translation requires understanding and reflexively and mistakenly project such understanding to the machine. Somehow transcription seems easier. You’re just writing down what you hear, but that doesn’t require real understanding, does it?

In these cases our untutored intuitions betray us. As I noted above, because so many of our interactions with computers involve language, we project abilities onto computers that are not in fact there. Reasonably accurate speech-to-text is no more and no less impressive than crude translation. However, to the extent that the ordinary sense of “translate” implies understanding, Google Translate doesn’t do translation. What would we call what it does? Transmute? My dictionary glosses “transmute” as “change in form”. That seems more accurate: Google Transmute.

But simply having a term such as “machine transmutation” out there won’t do much unless there is a coherent concept behind it. What’s the concept? Well transmutation, of course. But what does one need to know in order to grasp it in that context?

At this point, if all were well with the world, I would go on to explain more or less how transmutation differs from translation and I would do so in terms of the difference between Old School work in machine translation, say between 1970 and 1980 after it had regrouped from financial collapse in the later 1960s, and the current work. The older work, at its most sophisticated, aspired to something that we could call translation. Why do I say that? Because it attempted to deal with language meaning. The newer techniques, which power Google Translate, for example, do not even attempt to deal with meaning, nor is it possible to see how they could, even in principle, arrive there.

Alas, all is not well with the world, and so I cannot do that. The problem lies, not so much with my personal understanding, though there are certainly problems there [1], but with the general state of our understanding in these matters. I suspect that we are in reach of such understanding – or at least those with sufficient mathematical sophistication are – but either the work is not, for some reason, being done or if it has in fact been done, it has not been communicated to the general educated audience. I can, however, say this much: So called “common sense” understanding is one of the problems that plagued Old School MT (and AI in general), and it afflicts current systems too. That is something I can talk about.

Common sense reasoning: Cat on the field

Last week a cat walked onto the field of a football game. It’s walk was caught on video, which went viral:

What would be required for a computer to deliver commentary like that? For one thing, it has to make sense of the visual scene in real-time. And has to be able to focus on the cat while noticing the actions of the people as well. It must also know of course that the people are responding to and acting in relationship to what the cat is doing. While this perceptual and cognitive activity is going on the system must improvise appropriate commentary and then utter it, with appropriate intonation.

As someone with no more than a casual interest in football, I can tell you that I simply do not see what’s happening on the field with anything like the detail and comprehension of play-by-play announcers. To be sure, they have the advantage of being there in the stands watching while I’m only watching on the TV. But still, they’ve watched many more football games than I have, and have analyzed those games, and so know how to follow the action. I suspect, in passing, that this ability, whatever it is, is not directly relevant to standard IQ tests.

Let us, however, for the sake of argument, set all that aside and imagine that we’ve got a computer system specialized for delivering football play-by-play commentary. In this case Harlan, the announcer, isn’t calling a football play. Would a game-calling computer program even know about the existence of cats? If not then it wouldn’t be able to describe what’s going on. Would such a system know about state troopers, who and what they are, why they’d be at a game, and what they’re doing on the field? Routine play-by-play doesn’t call for such knowledge. Nor does it call for knowledge of the tunnel leading through the stands to and from the field itself. No, even if we had a very good play-by-play system, it is not likely equipped to handle cat-on-the-field, not unless it also has knowledge of things that are, for the most part, peripheral to the game itself.

Umm, err, maybe it would be able to improvise something about an unidentified prowling animate object?

You think so, eh? What would THAT require? How’d it come up with that weird generalization, “unidentified prowling animate object”?

Imagine that you’re the developer of a football play-by-play system. You realize that it may well be called on to provide chatter about non-football things, like cat-on-the-field, of maybe just about the weather. What non-football information and knowledge are you going to equip your system with? Non-football knowledge, that’s a vast and unbounded category.

Much of that vast and unbounded expanse of knowledge is commonsense knowledge, all those little bits of routine knowledge we have about the world. It’s one of the problems that contributed to the implosion of symbolic AI back in the mid-1980s. Marvelous though current state-of-the-art AI systems can be, they’ve got problems with commonsense knowledge as well, and at least some commentators recognize it [2].

Where are we?

We don’t know. The original research regime of machine translation fell apart in the mid-1960s. It wasn’t working. The original research regime of artificial intelligence fell apart in the mid-1980s. Current research in both areas is based on fundamentally different techniques and has produced much useful, and sometimes frightening, technology ­– not only speech-to-text, language “translation”, text generation capable of fooling human readers, facial recognition, various kinds of driving technology – some of it available know, while the goal of autonomous vehicles remains in the future, and spectacular performance in playing games like chess and Go. But the problem of common sense knowledge remains and along with it the various problems of building complex specialized knowledge on that base. And we’re a long way from constructing a robot with a six-year old’s capacities for getting about in the world.

This leaves us we where began: What kind of a thing is this, this computer? We don’t know. Not yet. And when, finally, we do know, we will have also learned a great deal about ourselves, things we currently do not know and cannot imagine.

[1] I have a number of posts at my blog, New Savanna, that discuss that difference. If you search on the name “Martin Kay” they’ll be at the top of the list. Martin Kay is a researcher in computational linguistics who came up under the Old School and is familiar with the newer work. Here’s the search URL:

[2] I’ve got a number of posts on common sense reasoning. This URL will summon them: