by David J. Lobina
Well, first post of the year, and a new-year-resolution unkept. Unsurprising, really.
In my last entry of 2023, I drew attention to the various series of posts I have written at 3 Quarks Daily since 2021, many of which did not proceed in order, creating a bit of confusion along the way. Indeed, the last post of the year was supposed to be the final section of a two-parter, but instead I wrote about how I intended to organise my writing better in 2024. What’s more, I promised I would start 2024 with the anticipated second part of my take on why Machine Learning (ML) does not model or exhibit human intelligence, and yet in my first post of 2024 I published a piece on the psychological study of inner speech (or the interior monologue, as is known in literature), a fascinating topic in its own right, though I am not sure it got much traction, but it really has little to do with ML.
In my last substantial post of 2023, then, I set up an approach to discuss the supposedly human-like abilities of contemporary Artificial Intelligence (AI [sic]™, as I like to put it), and I now intend to follow up on it and complete the series. It is an approach I have employed when discussing AI before, and to good effect: first I provide a proper characterisation of a specific property of human cognition – in the past, I concentrated on natural language, given all the buzz around large language models; this time around was the turn of thought and thinking abilities – and then I show how AI – actually, ML models – don’t learn or exhibit mastery of such a property of cognition, in any shape or form.
It’s an attractive approach, I think, and whilst it can be illustrative in many respects, it is a bit by-the-by and also a bit in conflict with some of the stuff I have written about ML at 3QD Tower (what’s more, it feels like little more than a got’cha argument). As I stressed in some of my previous posts, there is little reason to believe that ML algorithms model any aspect of human cognition – that is, they are not theories of cognition – and therefore the whole issue is moot to begin with.
I would like to follow a slightly different course on this occasion, which nevertheless will allow me to rerun some of my previous arguments anyway.
The plan – the old plan – was to describe in some detail one of the properties of human cognition I had listed and briefly described here, and then show that ML can’t really exhibit such an ability. And to do so, I intended to focus on systematicity – the claim that thoughts (or propositions) are structured objects, and systematicity is the ability to process and represent structurally related objects – by focusing on a hierarchy of abilities, obtained by an analogy to linguistic systematicity, but argued to apply to thought tout court. The following list is from the well-known work on this issue by one Robert Hadley, who published two foundational papers on this in 1994 (here’s a revisit from 2004):
a) Weak systematicity is demonstrated if an agent can process ‘test’ sentences containing novel combinations of words.
b) Quasi systematicity: this involves weak systematicity and it is extended to embedded sentences – a sentence inside another sentence, as in the rat [the cat bit] ran away.
c) Strong systematicity (SS): weak systematicity plus the ability to process known words in novel positions (i.e., positions not present in the training set) – for example, this would be the ability to transform an active sentence, composed of known words, into a passive sentence, constituting a new pattern.
d) Strong semantic systematicity (SSS): strong systematicity plus the ability to assign appropriate meanings to all words in any novel test sentence – for example, the ability to understand verbs encountered in the indicative but now used in the imperative.
Crucially, human systematicity revolves around two important principles:
1) Structure Sensitivity: complex representations have an internal structure, as they’re composed of constituents that are literally there.
2) Compositionality, a property that has it that ‘the content of thoughts is determined…by the content of the…concepts that are their constituents’ (quote from here).
As Hadley showed in 2004, the language models at the time (so-called connectionist networks) did not fare very well, only demonstrating limited mastery of systematicity, and with rather simple sentences. The best networks were hybrid models, the combination of deep learning and symbolic representations, the latter a component that ML radicals such as Geoffrey Hinton summarily dismiss – then, as now.
Now, anyone reading this might think that modern language models – large language models equipped with technology transforming transformers – simply ace these tests, easily demonstrating these abilities – I have conversed with these models, you might say, and surely they clearly do all this already.
Fair is fair, but casually interacting with the software is a different proposition to putting any of these abilities to the (experimental) test.[i] In a recent paper that doesn’t mention systematicity per se but describes an ability that is close to it, it is found that language models suffer from what the authors call ‘the reversal curse’ – that is, the inability to generalise B is A when the model has been trained on A is B. Or less cryptically, language models are not able to automatically questions such as
‘Who’s the ninth Chancellor of Germany?’ after being trained on
‘Olaf Schols is the ninth Chancellor of Germany’.
Or in the case of my favourite example, used with GPT-4:
Who is Tom Cruise’s mother? System answers correctly 79% of the time (this information is naturally part of the training set, given that some LLM companies use all of the internet).
Who is Mary Lee Pfeiffer’s son? System only answers correctly 33% of the time (answer: Tom Cruise, of course!).
The model, to be more accurate, is trained on sentences of the form <name> is <description> and is being asked to predict text in the opposite direction: <description> is <name> , which sounds like a kind of logical deduction (and thus close to the spirit of systematicity), though language models are of course not meant to do logic (or soundness or truth or validity) but are instead trained to predict new linguistic output from the training set. This would be a case of going back to one of my earlier arguments; namely, that language models have been trained on text and what they can predict is text.[ii]
So far, so old. But perhaps there’s another side to the discussion, a more fruitful one.
In a recent paper, Eunice Yiu and colleagues have argued that language models, and perhaps generative ML models in general, should be seen as imitation engines rather than “agents” – that is, as software that can be used to probe what sort of cognitive capacities are enabled by the particular statistical analyses that ML models carry out (I would downplay this point a bit, though; Yiu et al point that infants are sensitive to some statistical distributions too when learning a language, but these are only operative in combination to expectations as to what language is like; see here).
Thus, ML models are not agents (let alone intelligent agents), but cultural technology, and imitators rather innovators. In this vein, Yiu et al draw a distinction between imitation and innovators with respect to infant cognition, as children go beyond imitation during ontogeny, given that most of infant learning is driven by ‘truth-seeking processes such as understanding the goals and intentions of others’, to quote Yiu et al – cognition as a cognitive agent does, in other words.
There is much to agree with here, and perhaps this perspective could save us from anthropomorphising AI [sic], as is constantly done on a daily basis – what I would give to read a journalist say something like ‘No, an AI did not do this, it was a human being specifically coding a neural network to solve a well-defined problem by leveraging the capabilities of Machine and Deep Learning’. To hope is to live (or is it the other way around?).
Useful and impressive tools, ML has little to do with intelligence or human cognition, and this includes human language too. Language models have been trained on text and have learned properties of already-produced language; that is, they haven’t actually learned a natural language, but the statistical distributions of text, with as many replications as non-replications in the outputs they so produce, a stochastic system that can be a bit hit-and-miss, in fact. To be extra clear: a language model is an imitating text generator, and the underlying neural network overall, as in all other Machine Learning models, an output generator from the inputs the network has been fed during training, typically not going beyond the properties of the distribution the network has analysed.
Machine Learning is mostly about pattern recognition; they are function approximators: given a dataset of inputs and outputs, it is assumed that there is a function that can map inputs to outputs in a specific domain, and which resulted in the patterns observable in the data. Neural networks are specifically designed as an algorithm that aims to “approximate” the function represented by the data. As with other statistical methods such as regressions, this is largely achieved by calculating the error between predicted and expected outputs, minimizing the error during training; or to quote an influential book in Deep Learning:
It is best to think of…networks as function approximation machines that are designed to achieve statistical generalization
Crucially, neural networks can be employed to approximate any function, which is one reason ML is applied to all sort of problems and settings, many of which have absolutely nothing to do with human cognition. In other words, neural networks are universal approximators; or to quote from the Deep Learning book again:
the universal approximation theorem states that a feedforward network with a linear output layer and at least one hidden layer…can approximate any…function from one…space to another with any desired non-zero amount of error, provided that the network is given enough hidden units [my emphasis]
To see language models, or ML in general, as informing the actual study of cognition, or of linguistics in particular, is very close to committing a category mistake, though spelling this out is for another time (Lobina, forthcoming!) – and so is the issue of artificial general intelligence, if the concept is intelligible at all (I will write about this before the year is out).
But to end by referring to my own sensibilities: who came up with the expression “generative AI”? A certain kind of linguist would like the word back, thank you very much.
[i] This point applies to this post of mine too, which aimed to evaluate whether language models had acquired linguistic competence. This study carried out a more systematic analysis, and the results clearly demonstrate that the systems do not in fact show any mastery of language. That being the case, the much more fundamental point that language models are not in fact models of human cognition is perhaps even more important. If anything, though, the just cited study on how language models do with classic linguistics tests should showcase the confusion of Steven Pinker in this interview, as he really should know better (on politics, however, he knows as much as expected).
[ii] But the debate continues; this study claims that meta-learning neural networks are better at systematicity, with this rejoinder casting doubt on the results.