by David J. Lobina
A specter is haunting Artificial Intelligence (AI) – the specter of the environmental costs of Machine/Deep Learning. As Neural Networks have by now become ubiquitous in modern AI applications, the gains the industry has seen in applying Deep Neural Networks (DNNs) to solve ever more complex problems come at a high price. Indeed, the quantities of computational power and data needed to train networks have increased greatly in the last 5 or so years. At the current pace of development, this may well translate into unsustainable amounts of power consumption and carbon emissions in the long run, though the actual cost is hard to gauge, with all the secrecy and the rest of it. According to one estimate, nevertheless, improving cutting-edge DNNs to reduce error rates without expert input for correction could cost US$100 billion and produce as much carbon emissions as New York City does in a month – an impossibly expensive and clearly unethical proposition.
Once upon a time, though, advances in so-called neuromorphic computing, including the commercialization of neuromorphic chips by Intel, with its Loihi 2 boards and related software, hinted at a future of ultra-low power but high-performance AI applications. These models were of some interest to me not long ago too, given that they seem to model the human brain in a closer fashion than contemporary DNNs such as Large Language Models (LLMs) can, the paradigm that dominates academia and industry so completely today.
What are these models again?
In effect, LLMs are neural networks designed to track the various correlations between inputs and outputs within the dataset they are fed during training. This is done in order to build models from which they can then generate specific strings of words given other strings of words as a prompt. Underlain by the “pattern finding” algorithms at the heart of most modern AI approaches, deep neural networks have proven to be very useful in all sort of tasks – language generation, image classification, video generation, etc.
It is important to note, however, that even in those cases in which intelligent behavior seems to be displayed by these networks, all that these models are effecting is a correlation between an input and an output – that is, the model finds patterns in the data – with no understanding of the underlying cause of an event, and often manifesting no transparency as to how the correlations were calculated (the so-called lack of explainability in AI). LLMs, in particular, can generate strings of elements, such as words, based on word combinations from the training dataset and the models do so by calculating embeddings and binding relations (I have discussed these issues ad nauseam here, here, and here).[i]
The use of LLMs does not only raise environmental concerns, but social worries as well. LLMs will spell doom for certain professions, it seems, including university lecturers; or at least it will bring an end to the dubious practice of certain companies that provide essay writing services to students for a fee, though in the process this might create a new industry: that of AI tools designed to identify LLM-generated essays to begin with. Perhaps more importantly, there is the question of the sexist and racist biases ChatGPT can manifest, which are of course part and parcel of the patterns that LLMs find in the human-generated dataset that they are trained on. As we all are aware, LLMs can spread misinformation, give catastrophic advice, and create falsehoods out of, seemingly, nowhere.
Take Human Resources and Recruitment as another sector where the wide use of LLMs can create more problems than it may be worth. This sector uses LLMs to automate many of its processes now, including writing job advertisements, coming up with interview questions by type of role, carrying out employee surveys, and the like. This often results in increased efficiency, a different experience for candidates, perhaps even a better one, and the automation of various tasks, including the ones just mentioned (though this involves not an insignificant cost: someone has to create a bespoke integration in each case). But the old problems would surface here too, and the assessment of a candidate against a job description is perhaps the more relevant case. What if the decision to hire, or not hire, a person is based on unacceptable biases? And what happens to the personal data of a candidate whose CV is being inputted into, say, ChatGPT for assessment – does it then become part of the database that OpenAI make use for further training?
So far, so commonplace. Are there any alternatives out there, at least to mitigate environmental cost? What are these neuromorphic models you speak of?
Neuromorphic computing involves implementing neural-inspired computations in a circuit. Taking the human brain as a model to mimic rather than as inspiration in the way that DNNs have typically been conceptualized, neuromorphic calculations are performed by large numbers of small units (“neurons”) that communicate with each other through bursts of activity called spikes. In the brain, these spikes are brief changes in voltage, which in neuromorphic computing are simulated by using integers. An overall method that involves neurons working in parallel to send spikes to other networks of neurons, information processing takes place throughout the entire network. The result is Spiking Neural Networks (SNNs), the AI version of the brain’s neurons and synapses.
This kind of computing can be implemented in conventional computer architectures, as it has been done extensively in research. The novelty more recently has been the possibility of manufacturing viable neuromorphic chips – microprocessors that use electronic circuits to mimic the brain’s architecture – which provides a new computing architecture that can process vast data sets more efficiently than a conventional chip. Typically a System-on-Chip design – that is, an integrated circuit combining all components in one piece of silicon – with no strict separation between the processing unit and memory storage – effectively a so-called Network-on-Chip wherein each neuron carries a small cache of information – the model represents a rethinking of computer architectures at the transistor level, as shown in the graphic heading this post. Indeed, neuromorphic chips use artificial neurons made from silicon within an integrated circuit that contains millions of transistors in a single chip.
The goal of manufacturing such chips is to provide a computer architecture that is better suited for the kind of information processing that the brain supports so effortlessly. After all, the human brain processes, responds to, and learns from real-world data with energy consumption in microwatts and response times in milliseconds, a kind of performance that is beyond modern DNNs.
More to the point here, neuromorphic processing was developed as a way to mitigate environmental costs. One of such neuromorphic chips was produced by IBM in 2014; TrueNorth, a Network-on-Chip with 5.4 million transistors giving rise to 1 million neurons and 256 million “synapses”, is an example of the general neuromorphic blueprint: both memory and computation are handled within each core with the aim to be more power efficient. Intel’s aforementioned Loihi 2, for its part, with its 1.2 million neurons and 120 million synapses per chip, has been shown to outperform traditional processors by a factor of 100 on energy efficiency in some applications.
Still, neuromorphic computing itself is more a matter for research than a well-established commercial product; the domain of academics, and as such neuromorphic chips are not a readily available commercial product. IBM’s TrueNorth was developed in large part thanks to a research grant from the US Defense Advanced Research Projects Agency (DARPA), after all. At the same time, the European Union is currently funding the Human Brain Project, which aims to simulate an entire human brain and has greatly advanced research in SNNs, to the tune of hundreds of millions of euros.
One just hopes that this publicly-funded research does not turn into private-held profits as has been the case with large swathes of technology, resulting in the now-familiar billionaire tech bros and the ever greater techno-feudalism on display – expropriate them all, I say!
Where neuromorphic computing seems to be making an entrance into commercial AI applications is in so-called edge computing – that is, computing that is closer to the source of the data, as in “edge” devices such as sensors, microphones or the like (that is, these devices come with chips that carry out some computing in situ). The potential of brain-inspired chips for edge computing is not entirely surprising. Neuromorphic chips – as mentioned, System-on-Chips in which memory storage and processing units are not separate components of the architecture – are ideally suited for the edge, as the technology provides ultralow power consumption and low latency responses, precisely what is required for edge devices. What this does, in effect, is bring some AI-like processes to sensory applications such as computer vision.
The aim, in such cases, is to place computing chips as close to sensors as possible. This is a natural development in itself, and considering the ever-increasing data and processing requirements of Convolutional Neural Networks (CNNs) algorithms – the standard computing model to analyse visual information, for instance – neuromorphic chips are likely to prove an attractive proposition. CNNs combine large computational complexity and massive parameter counts, which results in inferences that are fairly slow and power hungry, and thus only possible on powerful processing units. The point of neuromorphic chips is to bring CNN-based computer vision to the edge, where it is usually most useful anyway.
Part of the allure of brain-inspired chips is the belief that they may herald in a new age of AI systems – systems that are closer in nature to what the human brain does, thus implementing a life-like AI. This belief can be realized in two different but related ways, however. On the one hand, there are neuromorphic chips, which aim to mimic the architecture of the nervous system by blurring the distinction between memory and processing. This is achieved by the use of artificial silicon neurons working in parallel, and in this sense, neuromorphic chips constitute a development at the level of the transistor. According to this design, networks of neurons communicate with other networks of neurons and thus information processing takes place throughout the overall collection of neurons.
On the other hand, there is the actual computational model executed in neuromorphic chips. Most of the brain-like chips now available are capable of implementing neural network algorithms, including CNNs, in line with what is the most established AI technique, but the more congruous computational model for neuromorphic chips is what may be termed neuromorphic processing. According to this model, based on the aforementioned Spiking Neural Networks rather than on Deep Neural Networks, networks of neurons communicate with other networks through bursts of activity (spikes), and thus each neuron carries a small cache of information within it.
Be all that as it may, for someone like me, a cognitive scientist with a healthy interest in AI tout court and an unhealthy interest in Machine/Deep Learning, neuromorphic computing is most interesting as a method to tackle those processes that to my mind are more amenable for AI treatment: what Jerry Fodor once upon a time called the mind’s peripheral processes of language comprehension, visual cognition, etc (perception, in short), as opposed to the central processes involved in thinking and reasoning (belief fixation, in simple terms) – thus, AI not as a model of thinking but of low-level processing; AI not as an approach to intelligence, general or otherwise (I doubt whether the term ‘artificial general intelligence’ is even coherent, but that discussion shall have to wait another time).
[i] I have often described LLMs as very sophisticated auto-completion tools, but it is more accurate to state that LLMs are large networks of matrix/tensor products, with no model of semantics or facts about the world included in the system, and thus, with no marker of what is true or false – LLMs find patterns of word/letter/sound co-occurrences, and the result is the interactions that a dialogue management system such as ChatGPT affords. The technological novelty of the approach is that the neural networks underlying LLMs include “attention mechanisms”, layers of artificial neurons that learn which parts of the data should be focused on during training in order to adjust the weights between words (or, more accurately, between word embeddings). Networks with such layers are called “transformers”, and that is why GPT stands for Generative Pretrained Transformer.
Enjoying the content on 3QD? Help keep us going by donating now.