You Really Don’t Think in a Natural Language!

by David J. Lobina

Painting on Cardboard, by Helen C. Boodman

You are waiting for Helen, and as she comes out of the underground, she blurts out, before saying hello or embracing you, as is her custom, the following sentence:

I decided to write to him on a boat

Yes, good idea, you say, followed by wait, what? what do you mean? and are you sure?! You are not being flippant, and you know full well who Helen is intending to write to, though you may be uncertain about what she might write; it is just that you find the sentence unclear. Or as a linguist would put it, the sentence is ambiguous, for it can mean two different things, depending on the intended meaning.

Helen may have taken the decision to write to the mysterious man while she was on a boat (perhaps she has just been on a boat trip) or she may have decided that she will do the writing while she’s on a boat (perhaps she’s planning a boat trip).

The ambiguity is syntactic in nature, as it is a matter of whether the phrase on the boat modifies the verb to decide or the verb to write – that is, it depends on the way in which these words and phrases relate to each other within the sentence. Linguists often use so-called syntactic trees to explicitly outline the structure of sentences, but we can make do with brackets here, as shown below (I’ll let you work out which intended meaning is which; I may or may not have changed the order in which I have described the possible meanings in order to confuse, though).

I decided to [write to him [on a boat]]

I [decided to write to him [on a boat]]

This phenomenon is rarely if ever noticed in conversation – no-one but a linguist would ever say to an interlocutor such a thing as I’m sorry, but that sentence you have just uttered is syntactically ambiguous, what do you mean exactly? This is because the overall context in which conversations take place usually helps hearers (and indeed readers) work out what the speaker/writer actually meant.

Intonation can help out too. The following sentence usually trips up participants in experiments in the psychology of language:

The horse raced past the barn fell

You got confused when you reached the word fell and had to go back, didn’t you? So do most people, but had I presented the sentence aurally, and had I pronounced it normally – that is to say, with the right intonation – the situation would have been rather different (this sentence is usually presented to participants on a screen and their performance is analysed in terms of reading times or eye movements).[i] Intonation is a rich source of information to hearers, and intonational phrases, similar to syntactic phrases but applied to speech, and with which they tend to correlate, provide a clue regarding the structure of the sentence that is being uttered – and, thus, of the intended meaning (in this case, the intonation would have indicated that one major phrase ended after barn but the sentence hadn’t finished).[ii]

It is unlikely that intonation would have helped in the case of Helen’s sentence, however (unless it had been really marked, which would have been slightly unnatural). Nevertheless, and more to the point I want to make, Helen was not confused as to what she was thinking, for the thought itself was not ambiguous, and she knew full well what she meant with that sentence. Nor is it the case that Helen had chosen an inadequate sentence to express the thought she wanted to communicate. According to the take I offered last time, Helen entertained the relevant thought in a conceptual system (the language of thought), and then her language faculty generated the relevant sentence, in stages.

To wit: first, the appropriate words (or lexical items) are selected; such words are then combined into a structure, giving rise to what I called in part 1 of this series on language and thought a SEM representation; and lastly, the hierarchical structure a SEM encompasses is then flattened and linearised to accommodate the fact that the physical vehicles we use to communicate to others (speech, hand signs, writing, etc.) require that each element is produced one by one, this flat and linear representation usually called a PHONetic representation, effectively a chain of morphophonemic elements. This is what happens within the language faculty, within Helen’s mind, but the very last thing to happen, of course, is that the sentence is uttered (produced or externalised) by Helen’s motor systems (lungs, vocal folds, mouth, and tongue) on the basis of the PHON representation.[iii]

So framed, I appear to be defending the idea that you think in the language of thought but communicate in a natural language. This is not new, as last month I provided some reasons for believing that we think in something other than a natural language; but now I seem to be suggesting that the actual representations the language faculty generates are not apt for the purposes of thinking. PHONs are flat and linear and thus are out of contention, since our thoughts are dependent upon the right hierarchical relationship between predicates and arguments (again, see post 1 of this series), while SEMs manipulate lexical items (in fact, only the syntactic features of lexical items) rather than concepts, and so they would not be the right vehicles for our thoughts, either.[iv]

The state of affairs is actually more nuanced than the impression I am giving and I certainly do not want to suggest that natural language plays no role whatsoever in our thinking. One particularly interesting case is the role that talking to ourselves, either aloud or inwards (outer and inner speech, or overt and covert speech), can have on how we solve a problem or come to specific conclusions or decisions. This is a topic that has engaged many philosophers and psychologists in the past, and has also proved rather popular in literary works as a way to describe a character’s inner life.

We talk to ourselves for all sorts of reasons, after all: as a memory aid, for self-motivation, to rehearse a talk or focus attention, to imagine conversations and dialogues, and more. And some philosophers have certainly thought that inner speech is very central to thinking, from those who believe that speaking to ourselves is a way to make our mostly unconscious thoughts more explicit, bringing them to awareness, such as Peter Carruthers, to those who regard inner speech a central component of what psychologists call System 2 reasoning (the slow, reflective reasoning that contrasts with the fast, rule-of-thumb type of reasoning of System 1), as is the case with Keith Frankish, developing some ideas of Carruthers himself.

I think there is a lot of over-intellectualising of what goes on in inner speech in these takes, and it is little wonder that philosophers might think this way – their inner speech probably does involve rehearsing thoughts, dialogues, and conclusions to an argument. Or maybe not all of them. Jerry Fodor was obviously half-joking when he reported what his inner speech experience felt like – ‘I can’t solve this; it’s too hard. I’m not smart enough to solve this. If Kant couldn’t solve this, how can they possibly expect me to’ – but he was also alluding to the fact that different people report different types of inner speech, which might give rise to different theories about it, while some people claim not to inner speak at all (and so, there would be no theory in this case?).[v]

From the perspective of psychology, there have also been many studies on the role of outer and inner speech in our thinking processes. An interesting case has to do with the effect language production seems to have on a task called spatial reorientation.

Imagine our Helen in a funny sort of room, where a specific feature stands out (e.g., a short white wall, the others blue and of the same height) and an object is placed within its proximity (usually hidden nearby, but always near a corner). We show Helen around the room, place her right in the middle of it, and then put a blindfold over her eyes (we only have good intentions here). We proceed to disorient her by turning her around and around, and then at the end we unmask her (she will be relieved) and ask her to go straight to the location of the object that had been placed by the short white wall. How will she do? Will she search randomly in different corners or go straight to the right one?

The idea of the experiment is that, after disorientation, participants will only be able to locate the hidden object if they combine the geometrical and non-geometrical information available to them – this combination is exemplified by a sentence such as the ball is to the left of the short white wall, where “left of” would constitute the geometrical information and “the short white wall” the non-geometrical kind. In fact, the underlying idea of the experiments is that you need to make use of linguistic representations to pass the task – that is, to work out where the object is – and so children under the age of 5, who supposedly haven’t mastered the right words and sentences, would fail the task.[vi]

A pretty easy task at first sight, and one Helen should excel at. But adult participants fail to adjoin geometrical and non-geometrical pieces of information if during the experiment they are also asked to carry out a secondary linguistic task such as speech shadowing – that is, the shadowing of sentences that are presented to them over headphones as fast and closely as they can. They don’t seem to have much trouble if the concurrent task involves rhythm-clapping shadowing instead (that is, clapping to a rhythm), and this seems to lend support to the hypothesis that it is language – language production, in fact – that is doing the trick.[vii]

In the past (see footnote 6), I have argued that the adult data actually point to possible processing effects due to using language in what is effectively a verbally-mediated task, which ought to be explained, I believe, in terms of what mental elements language and thought trigger in such tasks, and how these interact in real-time processing.

There are two sort of integrations to carry out in the speech shadowing version of the experiment: on the one hand, the integration of geometrical and non-geometrical information; and on the other, the processing and production of the linguistic material to process. To be more precise, speech shadowing is a type of linguistic production, as it doesn’t incur quite the same sort of processing costs that one would expect in normal production. At the same time, speech shadowing also involves whatever cost comprehending the sentences to be repeated imposes, as shadowing is not an automatic parroting of the material one hears. Speech shadowing may not be a full-blown case of either language comprehension or production, but properties of both processes are operative. In short, speech shadowing involves a type of production task and a type of  comprehension task.

We should then expect a working-memory overload during the spatial reorientation experiment stemming from the two cognitive processes at play. The integration of geometrical and non-geometrical information, on the one hand, and language comprehension and production, on the other. It is plausible to claim that both integrations take place in the language of thought, as all this information is conceptual in nature, and a fortiori that such an accumulation of factors must cause a significant memory load, which is not the case for the rhythm-clapping version of the experiment – and this would explain the differing performances. Since, therefore, all we seem to have is a difference in terms of processing loads, what we do not have is a difference in terms of whether linguistic representations are available or not.

This is not all. I have in the past also argued, though no-one has really paid much attention, that philosophers and psychologists have often ignored a central property of linguistic behaviour when discussing data such as those from spatial reorientation experiments, a property that in fact discounts the supposition that inner speech partakes in causal mental processes, for linguistic behaviour just can’t work that way.

I am referring to the most important point Noam Chomsky made against behaviourist theories of language learning some 60 years ago. Namely, the fact that linguistic behaviour is effectively stimulus independent. Paraphrasing Chomsky somewhat, this point amounts to the claim that the circumstances that surround us at any given time don’t compel us to say anything in particular; all we can say about the matter is that we can be incited to say something, but there really is no telling what a person might say when so prompted.

Imagine we take Helen out of that weird room and place her into the nicer environment of a museum, and in front of a painting by Alma-Tadema. She might say a number of things in such circumstances: about the painting itself, its surroundings, something else entirely, or indeed nothing at all. There is no way we can predict what Helen will say in this case, and given that inner speech is a type of linguistic behaviour, I would suggest that the general point I am making is even more pronounced when it comes to covert speech. Who can control their inner speech, after all? And how can we be assured that participants would come up with the right sort of sentence to pass the spatial reorientation task?

This feature of linguistic behaviour doesn’t seem to apply to thought, or at least not in the same manner. While there is a great deal of voluntary action involved in the tokening of an utterance, that’s not quite the case in the tokening of a concept. The Alma-Tadema painting may not elicit any utterance from Helen, but that doesn’t mean that its observation would proceed in a thoughtless mental vacuum; some conceptualization surely takes place; some concepts would be tokened by Helen – say, CLASSICAL WORLD PICTURE (concepts in upper case, as mentioned last month). And likewise regarding many of the reasoning tasks psychologists have run over the years; given the right set-up, psychologists can, to a reasonable extent, predict what sort of reasonings (i.e., thoughts) participants will manifest, but it is unlikely that they will be able to verbally externalise the thoughts they indeed had.

This is commonplace in daily life, in fact. I am myself certain that during my waking life I am constantly having thoughts, and that most of these thoughts are entirely unconscious in that they do not turn up in my speech. The apparently simple act of crossing the street, for instance, may well involve various inferences – e.g., if I cross the street now I will be run over by the incoming car and there is enough space between this car and the next, therefore… – and none of this is usually put into words, aloud or to myself. Or at least they don’t need to be for me to be able to carry out such reasoning and not get run over.

If you see a car you can’t help but entertain the idea of a car, though you need not say the word car at all, to yourself or to others. If you see a person attempting to cross the street between two moving cars and you see that there is not enough space to do so, you can’t help but recognise this as a dangerous situation. You may say so to someone or to yourself, but by the time you say it you have already thought the thought. More likely, you would shout watch out, and in this case the thought that you are witnessing a dangerous situation would go unsaid. What’s more, in this and many other cases, there is a lot of thinking going on prior to what you end up saying, most of which goes entirely unsaid. You did “see” that there wasn’t enough space between the two cars to cross the street safely, and this required a judgement that is also an act of thinking.

I am not suggesting, let it be clear, that given a specific stimulus, we can be entirely certain of the precise set of concepts and thoughts that would surface, let alone the specific type of response; all I am saying is that some concepts would be tokened, whereas no linguistic tokens need be. And this is another pointer of the centrality of the language of thought in our thinking lives. Some of us can’t but entertain the concept MISS when we see Helen, but we need not say anything at all when we do see Helen (though we of course do).

This is not to deny that, in the case of Helen and her future letter-writing-while-on-a-boat, she might do a fair bit of inner speaking prior to the actual writing as she thinks of what to say, how to say it, when to say it, as well as the consequences of saying anything. This is a feature of inner speech that writers exploit very often in order to explain a character’s motivations, epiphanies, or what have you, and it is something I will discuss next month, giving Helen some respite (though we will miss her).[viii]


[i] If an eye tracker is employed to monitor participants’ eye fixations, what is usually observed is that after reaching the word fell, eye fixations go back to raced past, a sign that participants are trying to work out what went wrong.

[ii] Or so one hopes. The “raced past” sentence is an example of so-called garden-path sentences – sentences that begin in such a way that a reader is led to what turns out to be the wrong interpretation (the name is a reference to the saying to be let down/up the garden path). Namely, when you reach the verbal phrase raced past, you assume this is the sentence’s main verb, which the following phrase, the barn, the verb’s main argument, only seems to confirm. In reality the sentence is what linguists call a reduced relative clause, the unreduced version being the horse that was raced past the barn fell, with naturally has a clear and very different meaning. This is a rather long-winded way of saying that, from the perspective of a hearer, there is an awful lot of working out to do when processing language. It is all very fast and unconscious, but your “language comprehension system”, as psycholinguists call it, computes any linguistic input it receives word by word, generating an interpretation after each word as it does so. So, technically speaking, there is always some ambiguity to resolve at every stage of listening to or reading a sentence – to go back to Helen’s sentence, the comprehension system tries to work out or predict what comes after I, then after decided, then after to, then after write, etc. It is worth adding that ambiguous sentences are not recognised as such even when there is no context to help hearers. It is not usually the case that the language comprehension system computes all the potential interpretations of an ambiguous sentence, and one is certainly never consciously aware of this, either (another fun exercise for you: the sentence I almost had my wallet stolen is supposed to be three-way ambiguous, try and work them all out!). Instead, the system computes one meaning, perhaps the default reading of a given sentence, and stops at that. For what is worth, my own intuition on the opening sentence is that Helen intended to do the writing while she was on a boat, and the other interpretation wouldn’t have been a very natural interpretation for me to obtain (and I don’t know if Helen likes boat trips).

[iii] As stated in post 1, and despite its name, a SEMantic representation is a syntactic object. The language faculty does help produce semantic interpretations for sentences; the meaning of a sentence is typically read off the syntactic structure, but I shall put this issue to one side here (or for another post, depending on the feedback!). The language faculty also generates a phonological representation accounting for intonation and the like, thus establishing how the sentence is to be pronounced, but I won’t discuss this type here, either. SEMs and PHONs are the (generative) linguist’s main representations, the phonological and semantic arising at the interface with the other systems of the mind that the language faculty is said to interact with (namely, the sound and conceptual systems, respectively; see post 1 again for a reminder).

[iv] And this, in turn, brings us back to the old ideas of William Ockham, who was mentioned in last month’s post, and who in the 14th century already argued that the thoughts of our mental language could not be ambiguous or our concepts polysemous (among other things).

[v] To give some background on other philosophers, Plato may have been alluding to inner speech when he described thinking as ‘a talk which the soul has with itself’ in the Thaetetus, whilst in the Philebus he talks of how we can use an interior dialogue to form opinions by formulating questions and answers to ourselves. Outside platonic dialogues, St Augustine supposed that inner speech was the voice of imagination and St Aquinas regarded it as a way to practice outer speech. All rather “learned”, then. This article, on the other hand, and which I referenced in last month’s post, chronicles the phenomenon of people who claim not to experience inner speech at all.

[vi] I discuss these experiments much more thoroughly in a chapter of this book, and in fact I have an idea of what is going on here, which I would love to put to the (experimental) test. If anyone fancies funding this work, do get in touch 😉

[vii] In the original experiments, moreover, 5-year-olds also failed the test (there was naturally no secondary task for them), prima facie providing further support. However, 18-montholds actually also demonstrate mastery of the task when the size of the experimental setting is big enough, and this, at the very least, would suggest that natural language is not a necessary prerequisite for combining the two sources of information.

[viii] I know what you are thinking: What became of Helen’s letter? Was it well received? What happened next? I bet the letter was welcomed indeed, though I fear it may have given the recipient the wrong impression and it all probably ended in tears. The realist will think this is for the better; the romantic will still believe in grand gestures and redemption.