by David J. Lobina
By now I feel like the linguist-in-residence at 3 Quarks Daily; and that would be quite right, as every outlet should have an in-house linguist (a generative linguist, of course), considering how often, and how badly!, language matters are discussed in the popular press. Perhaps I should start a series of posts calling out all the bad examples of the way language is discussed in the media, correcting the mistakes as a sort of avenging angel as I go (God knows I would spend most of my time talking about machine learning and large language models, and I have been there already); but nay, in this note I am more interested in bringing attention to one of the most important books in linguistics since its publication in 1965, and certainly the most influential: Noam Chomsky’s Aspects of the Theory of Syntax.
But why? For ulterior motives, of course (it will come handy in future posts, that is); in this particular post, though, my aim is to highlight the importance of Chomsky’s monograph to other fields of the study of cognition. Having done this (today), the application of some of the ideas from Aspects, as most linguists call it, will be more than apparent from next month onwards.[i]
In particular, I would like to discuss how the distinction introduced in Aspects between competence and performance plays out in practice, and to that end I shall focus on a particular way to study cognitive matters: one that accounts for a given mental phenomenon at the different levels of analysis that the neuroscientist David Marr and colleagues identified some thirty years ago. If only modern research in AI would follow this framework! [ii]
What follows will include my own editions and additions on such a perspective and I hope the result isn’t too tendentious, but at least the discussion should give a comprehensive idea of the sort of interdisciplinary work Aspects spurred – in this specific case, in the psychology of language.[iii]
Four different levels of analysis were introduced by Marr and Poggio (these reduced to three in Marr’s book, Vision), usually presented in the following hierarchical order:
- the hardware or implementational level
- the level of mechanisms
- the level of the algorithm
- the level of the theory of the computation
According to Marr and Nishihara, the computational level is the most important, as ‘the nature of the computations that underlie’ a specific cognitive domain depends ‘more upon the computational problems that have to be solved than upon the particular hardware in which their solutions are implemented’.
The computational level, to be more precise, outlines and analyses the mapping from one type of information into another, including the properties that derive therefrom. The point of such a level, ultimately, is that in order to study a given cognitive domain, one must understand what such a domain actually does – what problem(s) it solves, what it aims at.
This is a stance Chomsky’s Aspects introduced to linguistics in the form of the competence and performance dichotomy – a distinction, roughly, between the system of knowledge underlying our language abilities and actual linguistic behaviour – two concepts that can be linked to Marr’s levels of analysis. In fact, Marr himself pointed out at the time that Chomsky’s competence was rather similar to his theory of the computation.[iv]
Let us call the overall collection of mental systems that fully describe “language” (knowledge base plus language use) as an assemblage that will contain not only what linguists call the language faculty (an assemblage itself of lexical items, or words, a computational system that combines these words into sentences, and the two interfaces it interacts with: the conceptual/intentional and the sensorimotor), but also a parser for language comprehension, the relevant memory capacity involved in language processing, and whatever other relevant systems are (attention, and not least the language production system).The different levels of explanation of Marr, I submit, engage and explain different aspects of an otherwise unified capacity for language – or Language.
The algorithmic level, the second topmost, aims to work out how the mapping function studied at the higher level is in fact effected in real time, a level that is partly determined by, on the one hand, the nature of the problem to be solved, and on the other, by the available mechanisms. Thus, the algorithmic level would not pertain to the study of the language faculty proper, but to the study of language comprehension, with the proviso that whatever systems effect linguistic comprehension will have to access or interact with the faculty of language somehow – after all, the operations of the linguistic processor must be constrained in some principled manner.
Naturally, a parser will be a rather central, if not the absolutely central, element of the algorithmic level, whilst such orbiting components as memory capacity, attentional resources, and processing strategies will properly pertain to the level of mechanisms, the third level of analysis or explanation.
These so-called mechanisms, according to Marr and Poggio, will be ‘strongly determined by hardware’ (in this case, the brain substrate), as the physical implementation, the last level of explanation, determines the actual nature of memory capability, attentional resources, etc., even if the level of the mechanisms retains its theoretical independence nonetheless – indeed, according to Marr et al, each level is more or less independent of each other.
Such an approach suggests a theoretical progression, the common thread underlying Marr’s and Chomsky’s ‘division of explanatory labour’ frameworks, in fact; namely, the framework suggests the stance that theories of cognition ought to proceed in an orderly fashion. Indeed, Chomsky’s whole oeuvre can be seen as an attempt to follow the rather sensible view that it is a prerequisite of cognitive science to understand the nature of a given domain before analysing the behaviour it produces.
In this sense, then, there is a progression to follow in the study of cognition: from an analysis of what Chomsky calls ‘attained states’ of the mind (say, the language faculty) to an attempt to determine how such states cause behaviour (including how they develop in ontogeny). In a similar vein, Marr emphasises the importance of the computational-level analysis, given that the nature of the algorithm effecting real-time processes is discerned ‘more readily by understanding the nature of the problem being solved than by examining the mechanism (and the hardware) in which it is embodied’ – so, the other way around completely from how contemporary AI proceeds.
The development of psycholinguistics after Aspects was published is a case in point. The 1960s and 1970s saw a proliferation of studies that attempted to experimentally probe the constructs that generative linguistics had put forward,[v] a research programme that eventually developed into the so-called derivational theory of complexity.[vi] This theory, roughly speaking, posited a close correspondence between processing complexity during language comprehension and the number of linguistic rules employed to derive a sound–meaning pair (say, a sentence); in simpler words, the more rules a sentence necessitates to be generated (as a theory of grammar), the more difficult it should be to process it in real-time comprehension. Experimental evidence quickly dispensed with the attempt to provide such a direct connection between the grammar of a language and the parser of language comprehension and the derivational theory of complexity was eventually abandoned.
Perhaps unsurprisingly, the actual point of contact between the theories linguists and psycholinguists construct has proved to be a contentious matter ever since. Still, I believe that the position defended in Jerry Fodor, Thomas Bever, and Merrill Garrett’s 1974 book, The Psychology of Language (itself a landmark work), is implicit in most studies.
In this now classic publication, the authors argued that language processing involves the construction of the right structural description of the incoming signal, that is to say the right internal (and therefore mental) representation. Consequently, a theory of these mental representations becomes a necessity, a point that gestalt psychologists, the example Fodor et al. discuss, understood well.
Gestalt psychologists realised that what is important to behaviour is not only the proximal stimuli – the actual energy pattern received by the organism – but how the distal stimuli are represented (a distal representation that is obtained from the proximal representation; in the case of vision, the image reflected in the retina); and hence, their attempt to subsume the perceived stimuli under so-called “schemata”.
These psychologists certainly had a theory of the proximal stimuli – this was provided by the physical sciences – but the principles and properties governing their schemata were of an altogether different nature and they had no viable theoretical account of the requisite mental representations. Fodor et al.’s proposal addressed precisely this fault, arguing that the grammar constructed by the linguist constitutes just such a theory, since it specifies the set of structural descriptions the parser must encode and decode during an actual linguistic exchange. The crucial question is how all these factors relate to one another in bringing about the linguistic behaviour we observe in normal daily life.
So how do competence and performance relate, then? This is not a trivial problem, but there is a sense in which the dichotomy should not stress the psycholinguist too much. If Fodor et al. are indeed right that what the linguist’s theory of grammar provides for psychologists is a description of the structural properties the parser decodes and encodes, then the state of affairs is one according to which the operations involved in performance are constrained by grammatical properties. What needs to be worked out, therefore, is how the requisite properties are actually reflected in processing.
But how does it all fit together? There have been plenty of proposals in the literature, and here I have myself accepted the point that grammatical principles must be operative in parsing somehow. It seems to me that not enough focus is sufficiently paid in the field upon the fact that language processing is, in part, a perceptual phenomenon; the parser, after all, is the result of the connection between the grammar and the perceptual systems that capture the linguistic input they receive.
One of the tasks of the psycholinguist, thus, is to work out how the perceptual systems manipulate linguistic structures – that is, what strategies and operations they effect for this task. The crux of the matter, then, is to figure out what the right balance between grammatical principles and parsing operations is, and some consideration to the quintessential tasks the parser must conduct can be useful in this respect.
Some of these tasks include segmenting the string of sounds one receives from (the usual case of) an interlocutor into units (words, clauses, etc.); assigning syntactic roles to those units (a verbal phrase, etc.; and also subject, object, etc.); establishing relations and dependencies between elements; setting up a correspondence between syntactic and thematic roles (agent, patient, etc.); and interacting with semantic/pragmatic components.
I would like to focus on the segmentation of the input into linguistic units here, as this will highlight the perceptual aspect of linguistic processing even more (and it is the easiest thing to describe too!).
Perception starts when the sensory organs are stimulated by certain physical properties (light, sound, or something else), resulting in a process – transduction – in which the input energy is transformed into neural activity. It is the job of specific mental capacities to then a) recognise that some of the representations the transducers produce fall under their domain, and b) perform inference-like transformations (i.e., computations) upon these representations in order to interpret the input correctly.
Thus, the first task of language processing is to recognise that some external auditory information is linguistic in nature. After all, during the comprehension of language words have to be extracted from a signal that is continuous (there are no spaces between consonants and vowels, or between words), unsegmented, and highly articulated (that is, there is parallel transmission, for typically a signal carries information about different segments, such as phonological units, simultaneously).
Speech perception relies on the phonemic inventory of the language that has been internalised, and as a result suprasegmental information such as the duration, pitch, and amplitude of segments in one’s language can be employed to identify external noise as linguistic material, including where word boundaries are, paving the way for, and in combination with, lexical retrieval.
This particular phenomenon constitutes an early stage of the language comprehension process, but I think that a careful analysis of how this stage relates to subsequent steps of linguistic processing, and the theoretical solution needed to account for this particular juncture, can illustrate what the problem of language comprehension overall involves. Work by David Poeppel and colleagues on this issue will help me outline what I have in mind.[vii]
Poeppel et al. adopt an analysis-by-synthesis (AxS) approach, a model for speech perception that was initially proposed by Morris Halle and Ken Stevens (I cite from the latter in this and the next paragraph).[viii] The AxS is based on the observation that in linguistic communication the receiver must recover the intended message from a signal for which they know the “coding function” – that is, the receiver is capable of generating the signal.
A good strategy for the receiver, then, would be to ‘guess at the argument [that is, the structure of the signal – DJL]…and then compare [this guess] with the signal under analysis’. In general terms, the model generates patterns internally, and these are matched to the input signal by the employment of a number of rules until the final analysis ‘is achieved through active internal synthesis of comparison signals’.
One way to output patterns would be to provide the system with a large repository of structures, but this would not accommodate the open-endedness of language. Rather, an AxS system must be endowed with a generative grammar – the aforementioned coding function. This is not to say that all the possible outputs of the generative grammar would be applied ab initio, as the computations would take a very long time to converge to the right interpretation of the signal, and this is clearly not what happens in language comprehension.
What must be the case instead is that the perceptual systems carry out a preliminary analysis in order to eliminate a large number of comparisons, a step that would then make available just a small subset of possible representations. The subsequent comparisons among the internally generated representations would have to be ordered somehow.
The AxS model can be viewed as encompassing two main stages, the first of which involves the generation of a candidate representation of the input (the preliminary analysis), followed by a comparison of this candidate representation as it is being synthesised. Note that it is only once the initial information has been processed that phrasal segmentation can proceed at all. Note, also, that this model is not one in which linguistic operations are literally employed in speech recognition. Rather, some linguistic information is used to derive specific hypotheses about the input signal, but the strategy in use (viz., the analysis) is perceptual in nature.
Halle and Stevens anticipated that the AxS model could be employed for the study of parsing as well, given that intensive pattern recognition, which is what an AxS model effects, can expose different types of skeletons, be they of words, phrases, or indeed structures of all kinds.
There seems to be much in favour of this general take on things, as it accommodates the fact that language comprehension is partly a perceptual phenomenon with the observation that the representations the parser builds are linguistically structured, and hence generatively constructed. The AxS model captures these desiderata, given that the first component operates on the input using an intrinsically perceptual strategy – indeed, the imposition of a template – which is then analysed by staged applications of a generative grammar that is stored in the second component. Naturally, it is a matter of research to find out what operations exactly participate in this model (including their timing and memory requirements), but much information is by now known about some of these factors.[ix]
I am conceptualising the connection between competence and performance, and in turn that between the grammar and the parser, in terms of mental architecture, continuing with the theme of this post. To wit: the level of competence is composed of a computational operation, lexical items, and two interfaces, a conglomerate we have termed the language faculty, its adult state a particular (mature) grammar, whilst the level of performance brings together, at the very least, the language faculty (the grammar), a parser, a formulator (for language production), memory and attentional capacities, and the perceptual systems.
So, how does this relate to the potentialities of behaviour announced in the title? Come back next month!
[i] A more extensive treatment of Aspects is to be found in MIT Working Papers in Linguistics number 77, released as it was on the occasion of the 50th anniversary of the publication of Chomsky’s book (though I think the celebration was a bit of a missed opportunity; the publication is a collection of papers by the usual suspects, and to be frank this sort of cliquiness doesn’t always produce the best results).
[ii] Marr, born in Woodford, London (my first hunting grounds, of sorts) passed away at the age of 35 in 1980, and what a loss to cognitive science it was. Some of the influential works he left behind, and the ones I draw from in this post, include the following: David Marr, “Artificial Intelligence – A Personal View”, Artificial Intelligence 9 (1977): 37-48; David Marr, Vision: A computational investigation into the human representation and processing of visual information (San Francisco, CA: W.H. Freeman and Company, 1982); David Marr and Tomaso Poggio, “From understanding computation to understanding neural circuitry”, MIT AI Lab, memo 357 (1976): 1-22; and David Marr and H. Keith Nishihara, “Visual information processing: artificial intelligence and the sensorum of sight”, Technology Review 81 (1978): 28-49. His 1982 book, in particular, remains highly cited to this day and it was quite important to me in my early career; the subtitle of my 2012 PhD thesis references the subtitle of Marr’s Vision book, in fact.
[iii] The influence of Aspects on psychology is the most obvious case for the purposes of this post, but the conceptual underpinnings of the theory received much attention from philosophy as well, yielding numerous debates. I am personally more interested in these debates, but this issue will have to await another post.
[iv] This being the case, it is important to note that Marr’s computational level was meant for studying a particular type of system, one that is prima facie rather different in kind from language. Marr was interested in input–output systems of perception (early vision, in fact), but the language faculty, qua theory of competence, is not an input–output system (or not quite). Indeed, the mapping function underlying Marr’s theory of the computation was meant to capture the set of elements that are mapped in the process of constructing an internal three-dimensional representation from the two-dimensional representation reflected in the retina. Chomsky, on the other hand, has always emphasised that what the linguist aims to study is the internal computational procedure that generates structured expressions from stored lexical items. In any case, nothing stops us from adapting Marr’s framework to the study of language.
[v] For example, George Miller and Stephen Isard, “Some perceptual consequences of linguistic rules”, Journal of Verbal Learning and Verbal Behavior 2 (1963): 217-228; and George Miller and Stephen Isard “Free recall of self-embedded English sentences”, Information and Control 7 (1964): 292-303.
[vi] George Miller and Noam Chomsky, “Finitary Models of Language Users”, in Handbook of Mathematical Psychology, vol. 2, ed. Duncan Luce, Robert Bush, and Eugene Galanter (New York, Ny: Wiley, 1963): 419-491.
[vii] David Poeppel, Willian Idsardi, and Virginie van Wassenhove, “Speech perception at the interface of neurobiology and linguistics”, Philosophical Transactions of the Royal Society B 363 (2008): 1071-1086.
[viii] Morris Halle and Ken Stevens, “Analysis by synthesis”, Proceedings of Seminar on Speech, Compression and Processing (1959): 1-4; and Morris Halle and Ken Stevens, “Speech Recognition: A Model and a Program for Research”, IRE Transactions of the PGIT IT 8 (1962): 155-159.
[ix] My own application of an AxS framework can be found in David J. Lobina, Recursion (Oxford, England: Oxford University Press, 2017) and David J. Lobina, Josep Demestre and José E. García-Albea, “Disentangling perceptual and psycholinguistic factors in syntactic processing: Tone monitoring via ERPs”, Behavior Research Methods 50 (2018): 1125-1140, where we focus on a phenomenon that suggests an additive effect of perceptual and linguistic processes upon memory capacity.