Sweetening the Bitter Lesson: An Argument for a More Natural AI

by Ali Minai

As AI insinuates itself into our world and our lives with unprecedented speed, it’s important to ask: What sort of thing are we creating, and is this the best way to create it? What we are trying to create, for the first time in human history, is nothing less than a new entity that is a peer – and, some fear, a replacement – for our species. But that is still in the future. What we have today are computational systems that can do many things that were, until very recently, the sole prerogative of the human mind. As these systems acquire more agency and begin to play a much more active role in our lives, it will be critically important that there is mutual comprehension and trust between humans and AI. I have argued elsewhere that the dominant AI paradigm is likely to create powerful intelligences that are quite alien, and therefore potentially hazardous. Here, I critique an influential principle implicit in this paradigm, and make the case that incorporating insights from biological intelligence could lead to a more natural AI that is safer and more well-adjusted to the human world.

General Critique:

In 2019, Richard Sutton, one of the pioneers of computational reinforcement learning and winner of the Turing award, wrote an extremely influential essay titled “The Bitter Lesson”. His main argument was that trying to reach artificial intelligence by emulating biology, neuroscience, and psychology was futile, and that the focus should be on general purpose meta-methods that can scale with computation – notably search and learning. The following quote from the essay captures the essential insight:

We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

This is a truly profound point that deserves some unpacking. The implication in Sutton’s argument is that the “intelligence” that AI seeks to implement is something similar to the intelligence exhibited by humans, or possibly by some other animals. In other words, biological, natural intelligence. This is reasonable, since all existing examples of general intelligence are biological, and it’s hard to imagine some radically different form of intelligence. Given this goal of building AI similar to natural intelligence, it seems reasonable to assume that insights from biology would be very useful. Sutton rejects this notion, and though the essay does not lay this out in detail, the implied argument rests on three assertions:

  • The functionality of intelligence – however it is defined – can be separated from the substrate it is instantiated in: For example, human intelligence happens to exist in an animal with a particular biological form, but can, in principle, be abstracted away from it and instantiated more efficiently in other systems with a very different form.
  • Biology is more a source of confusion than of insight: Humans and other animals, with their profound complexity, deep evolutionary history, biological constraints, developmental processes, and particular needs, are too hard to understand and too laden with suboptimal contingent features to be a source of ideas for building successful artificial intelligence.
  • A set of sufficiently general meta-methods can achieve intelligence efficiently: The best approach to AI, therefore, is to stop wasting time on importing suboptimal, poorly understood ideas from biology, and focus on building more principled, efficient, and scalable systems from scratch using meta-methods such as learning and search.

This is a clear, elegant argument, and the explosive success of today’s AI systems can be seen as validation for it. But are there limits to its validity? This article argues that this is indeed the case. Some of the points made here have been discussed by me in two published papers (here and here) and in other articles. Broadly, they fall into an emerging discipline called NeuroAI, which looks for a tighter integration between the brain sciences and AI, and my thinking has benefited immensely from discussions with leading thinkers in the area. However, I believe that the scope of integration between AI and biology should be expanded to include development and evolutionary biology as well.

A profound insight about high functioning complex systems is that they need to be “nongeneric, specialized, structured configurations”, i.e., have very specific heterogeneous structure that is rare and difficult to discover. Generic, blob-like systems may scale well, but lack the combination of complex behavior and robustness that is essential for complex systems to succeed. Indeed, this is the fundamental principle behind all engineering, hence the need for careful design, testing, and validation. It is also true for living organisms that function well because of their precise structure at multiple scales, and lose functionality when that structure is disrupted. The same will surely be true of AI agents with intelligence comparable to humans or other complex animals. In engineered systems, the requisite specific structure is the product of careful design. In animals (and other complex organisms), it is generated by evolution and development. Where should the structure come from in AI agents?

The original idea in the field of AI was to follow the classic engineering paradigm. This “engineering is all you need” approach asserted that intelligence could be defined generically as a set of capabilities, and that clever engineers – in this case, mostly professors and their graduate students – could design algorithms to instantiate all these capabilities from scratch. The result was symbolic AI – known derisively today as “good old-fashioned AI” (GOFAI) – which wasted decades on toy models of vision, language, logical inference, etc., without getting anywhere close to useful AI. Its fatal flaw was the assumption that function could be abstracted away from form, and implemented more efficiently through clever engineering from first principles.   Enter neural networks – brain-inspired computational models capable of learning from data with minimal prior design. After a long period of gestation, the emergence of very fast computers and the availability of vast amounts of digital data gave birth to the field of deep learning, i.e., extremely large neural networks with a lot of processing layers, which can accomplish truly application scale tasks such as object recognition and, lately, language and image generation. This approach, which may be termed “learning is all you need” very much embodies the ethos of the Bitter Lesson that, starting with a large neural networks with generic architectures and a suitable objective function, a lot of learning with a lot of data can cause an appropriate heterogeneous structure to emerge within the system – and, amazingly, it does! More recently, it was realized that adding search to this repertoire was also useful, leading to astoundingly successful game-playing agents such as AlphaGo and today’s large reasoning models. Thus, the apotheosis of the Bitter Lesson is complete in the current AI paradigm: “Learning and search are all you need.” But is that really so? This article argues that it is not.

The key issue in adopting the Bitter Lesson framework too literally is that intelligence is clearly not something that can be abstracted away from its instantiation the way a program can be made independent of its processor. This is because, unlike a program, which is a designed process with well-defined and circumscribed functionality targeted towards specific goals, general intelligence is an open-ended, emergent capacity with no well-defined set of explicit goals, and an infinite range of phenomenology and applicability. While such intelligence in general may emerge from an infinite range of systems, each particular intelligence emerges uniquely from the agent that instantiates it and in the context of the environment in which that agent is embedded. Most importantly, it cannot be encapsulated in a finite formal description that can then be used to instantiate it in other substrates.

Ultimately, the sole overarching purpose of intelligence is to make an agent’s behavior more conducive to attaining its own objectives within its environment, e.g., to survive, mate, be comfortable, prove theorems, or whatever. Sensing, perception, cognition, decision-making, reasoning, motor control, etc., are just means to this end. Behavior and perception both depend fundamentally on the affordances of the agent which, in turn, depend on the form – the deep embodiment – of the agent. For example, picking up a cup by its handle and drinking from it is not an affordance for a dog because it does not have hands that can grip, and therefore it’s not a target for the dog’s intelligence. Directly perceiving sonar images of the environment is similarly out of scope for human intelligence, but not for that of a bat or a dolphin. This argument need not be limited only to physically embodied agents: Even agents with virtual embodiment have particular forms in terms of what they sense, their functional architecture, and their behavioral repertoire, which defines the scope of their intelligence. In other words, there are as many kinds of intelligence as there are possible forms for agents, and the two are linked inextricably. In highly intelligent agents, some higher-order, more abstract functions such as logical inference, grammar, mathematics, etc., may be defined more abstractly, but, ultimately, they too are grounded in the form of the agent, i.e., the substrate of that intelligence, as shown by the use of spatiotemporal terminology in describing abstract concepts or procedures. Much of the current indifference to the form-dependence of intelligence arises from AI researchers’ focus on higher-level, abstract functions such as language and reasoning.

To summarize, the Bitter Lesson approach falls into a trap similar to that of symbolic GOFAI, but whereas GOFAI put misplaced trust in the ingenuity of engineers, today’s AI paradigm, which embodies the Bitter Lesson, places inordinate trust in learning and search. The rest of this article argues that the new AI paradigm, though far better than GOFAI, is ultimately also limiting and simplistic because it ignores some fundamental aspects of intelligence – notably, the need for an intelligent agent to be an inherently integrated entity with a stable identity and psychological continuity, grounded in direct experience of its environment through its specific affordances. Without this, all we have are simulacra with alien minds rooted in the fragmented “reality” of their training data. The main conclusion from the argument presented is that, in the absence of any better source for the heterogeneous structures and processes needed in highly intelligent systems, it’s short-sighted to ignore those configured by evolution – an infinitely creative master engineer that has been at it for more than 3 billion years!

All this does not negate the fact that the Bitter Lesson represents a deep meta-level truth: Successful learning – whether from data or from a study of living organisms – requires careful separation of the essential from the incidental. This is a very difficult task, which Sutton’s Bitter Lesson tries to address by rejecting the need for it altogether. A better approach, I argue, is to persevere in learning from “Nature’s imagination”. After all, that is how we were able to put GOFAI to rest and get to deep learning in the first place.

Specific Critique of Current AI Models:

The general critique of the Bitter Lesson ethos in the previous section applies to several advanced AI paradigms today, but most directly to the dominant one: Large language models (LLMs) and their extensions such as vision-language models (VLMs) and vision-language-action models (VLAs). Most of the discussion in this section is, therefore, directed at this class of models. Some other approaches – such as the joint embedding predictive architecture (JEPA) and free energy models – are still quite abstract and high-level, but incorporate principles that are much more consistent with natural intelligence, including the use of world models, predictive learning, and heterogeneous architectures as inductive biases.

Most LLMs generate their responses through predicting the next “token” in the context of recent ones. For text generation, tokens are short sequences of characters, e.g., “ang”, “le”, plus punctuation, spaces, line breaks, etc. that can be chained sequentially to form long texts. It seems surprising that such a simple sequential process should be able to generate complex textual output and support very humanlike discourse on virtually any topic. But language is inherently sequential, and there is considerable evidence that the human brain does process it through sequential prediction. The deeper question is whether this simple sequentiality is also a characteristic of the thought that comprehends and generates text, or simply a feature of the input-output mechanism. The question of whether thought is fundamentally linguistic has been much debated, but recent groundbreaking work by Evelina Fedorenko and colleagues (among others) suggests that language is mainly a mechanism for communication, and that cognitive functions such as thought and reasoning engage a much more complex set of representations in the brain,  and that the observed similarity between representations in brains and LLMs are mainly at the level of words rather than deeper structure. As Tuckute, Kanwisher, and Fedorenko say in a 2024 review on the topic, “…a correlation between a model’s performance on some task and its similarity to the brain need not imply that the model and the brain are performing the same task.” The question, then, is: How – and what – do LLMs think?

It is not correct to assert – as some do – that LLMs have no model of the world, or that they cannot think. With their huge capacity, LLMs clearly infer very complex relationships – even causal relationships – that are implicit in their training data, organize these into a heterogeneous cognitive architecture, and generate their responses through a process that is not essentially different than what would be considered “thinking” from a non-dualist perspective. The issue is what kind of model is being inferred, what sort of thinking is it producing, and whether these processes are appropriate as the basis of extremely capable AI models. Those models have yet to be built, and may be years away, but it is past time to start planning for their possibility.

As LLMs and their successors become more integrated into everything, it’s not just what they say or do that matters, but also how and why they do so. If there is to be understanding and trust between humans and AI, it must be based on a common – or at least mutually comprehensible – understanding of the world. Without this, all attempts at aligning the values and objectives of AI with those of humans are doomed to fail. The fear that, by focusing only on the superficial linguistic aspect of intelligence in training AI, we may be creating dangerous alien intelligences has been expressed by many, including Geoffrey Hinton and Stuart Russell, but one does not have to buy into predictions of doom to acknowledge that the problem of mutual incomprehensibility between humans and AI is real. There are two divergent responses to this concern. The first is the so-called Platonic representations hypothesis, which posits that, to achieve high functionality, all intelligences must converge to the representations that uniquely reflect the true joint statistics of the world. The instrumental convergence hypothesis, which posits that all intelligences must ultimately develop similar objectives to succeed, has a similar logic. At best, these ideas are speculations based on thought experiments and limited studies. More fundamentally, they are based on a reductionist world view that looks at reality as objective and sees intelligence as an optimization process. In fact, every agent lives in its own reality defined by its affordances, and no agent needs to be right about everything to be highly intelligent, which creates a lot of room for variation. The other response to the idea of alien intelligence is to welcome it as the next stage in the evolution of life – a successor species to humans. It’s doubtful if too many of us would welcome that!

The argument presented in this article is that:

  1. It is a good idea to try and build AI that is natural and humanlike by design rather than hope for it.
  2. A plausible way to do this is to incorporate fundamental insights from natural intelligence into the design of AI agents.

This is not a call to replicate brains or bodies down to the last atom, but to use biology again as a source of important ideas – as was done in the transition from GOFAI to neural networks. The primary lesson in this argument is that, having gone from being all-in on design (GOFAI) to being all-in on learning (deep learning), we must move in a thoughtful and intelligent way to combine design and learning at multiple levels, and a smart way to do that is to look at natural intelligence.

It’s worth noting that insights from biology may not simply inform what we include in AI agents, but also what we should exclude. For example, a deeper understanding of the basis of competitiveness, dominance, and violence in humans and other animals may suggest ways to make AI agents less dangerous. The biases that lead animals to such behaviors have roots in their evolutionary nature and the centrality of competitive reproduction to evolutionary success – factors that don’t automatically apply to AI. Other negative behaviors in animals are triggered by the fear of injury and death, which could also be mitigated in AI with careful design.

LLMs vs Natural Intelligence

Large language models (LLMs) and their variants such as VLMs and VLAs diverge from the principles of natural intelligence – particularly human intelligence – in several ways, including the following:

  • LLMs are trained to optimize explicit, specific, externally specified objective functions through iterated gradient-based and reinforcement learning to make them useful for users, whereas brains use mainly real-time unsupervised learning and reinforcement learning on implicit, complex and tangled objective functions defined by evolutionary goals (survival, avoiding injury, reproduction, etc.) serving the agent itself rather than an external user. Natural intelligence, which is present in all living organisms to some degree, is not about performing tasks for others; it’s about allowing the intelligent agent to thrive in a complex world. The ability to be useful to others is a side-effect.
  • LLMs begin with a generic, regular architecture (many repeated layers of the same components with random initial weights) whereas the brain has a specific, heterogeneous architecture configured by evolution and development, which serves as a strong inductive bias well-adapted to efficient real-world learning. The systems are then trained with huge amounts of data and compute to emergently configure a functional architecture that can perform well on the given objective (next token prediction) and shows an emergent ability to perform many other higher-level ones as well (such as generating long, complex, grammatically correct, and informative texts). However, there’s no way to ensure that the representations and internal details of the inference processes that emerge in the network are similar to those of humans, or interpretable by humans even in principle.
  • In most cases, LLMs (and VLMs and VLAs) learn entirely from data that provides indirect and fragmented information about the world – text, images, code, video snippets, audio snippets, etc. – whereas animals learn from direct, embodied, continuous, real-time, real-world experience. This precludes the AI models from fully inferring the physical and causal structure of the world, i.e., grounded world models, which are essential for robust intelligence that can make plausible inferences in truly novel and unexpected situations.
  • The lack of an integrated self and a world model grounded in experience precludes LLM-like systems from having key enabling aspects of intelligence such as character, judgment, and deliberation, which are features of agents with an identity. Thus, LLMs can hallucinate but cannot tell a deliberate lie, which requires the intelligent ability to distinguish truth from falsehood. Even when deliberation is incorporated – as in the case of reasoning models – it is superficial and ungrounded rather than rooted in a deep understanding of the agent’s world.
  • To prevent them from generating harmful or toxic output, LLMs are typically trained first on lower quality data as “wild”, unaligned base models, and are then fine-tuned and aligned to human preferences almost as an afterthought. Such alignment is both superficial and fragile, and can be undone in numerous ways by deliberate effort or by accident. Humans, in contrast, acquire values and preferences gradually through a developmental process in concert with learning everything else, resulting in values becoming deeply integrated into all mental representations and processes, leading to robust inherent alignment.
  • LLMs only replicate higher cognitive functions such as natural language discourse, reasoning, and semantic transformations without any of the “infrastructure” of internal drives, motivations, affective states, episodic memory, social imperatives, etc., that ground human/animal cognition and behavior. Their discourse, therefore, is entirely superficial – an emulation by an ersatz simulacrum rather than the expression of an agent with an identity based on temporal persistence and cognitive continuity. The fact that this simulacrum extracts a convincing linguistic expression of human attitudes from lexical statistics is remarkable, but does not imply that the system has any humanlike internal affective states or a stable sense of self. Indeed, it can be argued that these things emerge in living systems as a result of their evolutionary origins and the need for continual self-organization driven by metabolism, neither of which exists in AI agents. Among other things, this sense of self may be necessary for grounding ethics, judgment, empathy, personal responsibility, and other complex aspects of humanlike behavior.
  • Paradoxically, even as LLMs replicate higher-order cognitive functions, they do so in an unnaturally non-deliberative “ballistic” manner. The internal signal flow is largely feed-forward, with an outer loop generating streams of tokens autoregressively. Deliberation and reasoning are applied typically only at the end (latent-space deliberation is an exception to this). But, while the generative component of reasoning models – typically an LLM – is a neural network, the reasoning/deliberation component is instantiated as a symbolic algorithm such as tree search or graph search, which is reminiscent of GOFAI. The brain, in contrast, does everything through modular, heterogeneous neural networks with pervasive recurrent connectivity between modules. Most mental processes in perception, cognition, and motor control involve extensive bidirectional interaction, prediction, resolution, competition, modulation, and dynamics at multiple scales in this network-of-networks complex system.
  • The primary means of communication between humans and today’s AI agents is language. However, as discussed above, there is considerable evidence that thinking does not depend primarily on language, which implies that humans and other intelligent species have non-verbal languages of thought. A similar dissociation has also been noted in LLMs. This raises the possibility that humans and AI agents may not understand each other well even when they seem to agree linguistically, making it difficult for them to have good theories of mind for each other. Such mutual theories of mind are critical for human-AI collaboration.

These and other differences mean that, even as they acquire human-level capabilities and beyond in some areas, the AI systems of today are profoundly different from human minds in the way they accomplish their functions. Their performance on higher cognitive functions matches human expectations very well, but it emerges from mechanisms and representations that may be extremely alien and incomprehensible to humans working with them. An even greater concern is that the strange high-level mind generating the output may implicitly define an even stranger “subcortical” mind with incomprehensible and potentially dangerous motivations and preferences that remain latent until they are deployed unexpectedly in unusual situations or triggered by injected tokens. As humans and AI agents work more in collaboration, it is essential that they achieve greater mutual comprehensibility and better mutual theories of mind than is currently the case.

A related concern is that, with the current trajectory of AI towards building systems with unconstrained learning, we may end up with a bewildering diversity of AI agents with very different kinds of minds and bodies. Integrating this new ecosystem gracefully and usefully into the existing one, i.e., human society, will be a major challenge in the relatively near future.

A Natural Solution

A potential path to addressing the issues described above is to think of AI systems as living organisms and apply the lessons of evolutionary biology, neuroscience, developmental psychology, and cognitive science to build AI in a more natural way – to incorporate into them the prior inductive biases discovered by “Nature’s imagination”, which make humans robustly intelligent, ethical, prosocial, trustworthy, and – importantly – comprehensible to other humans. Active collaboration and even peaceful coexistence between humans and AI will require this.

Neuroscience and cognitive science have identified mechanisms underlying mental processes at many levels, but our understanding is far from complete. It’s even more difficult to separate the essential features from the incidental ones imposed by biology and evolutionary contingency. Addressing these issues should be an important component of AI research. Some specific high-level issues for research should be the following:

  1. Thinking of AI agents the way we think of living agents, i.e., as autonomous, self-motivated, continuously learning and creative entities, setting expectations accordingly, and using similar methods for structuring, training, alignment, and interaction.
  2. Using a developmental learning paradigm to train AI systems in stages of increasing complexity, incorporating age-appropriate values and ethics into each stage of learning along with other training data, instead of first training wild models and then civilizing them with RLHF, DPO, etc.
  3. Moving away from the current paradigm based on transformer/diffusion-based architectures and gradient learning towards more biologically-inspired heterogeneous architectures to act as efficient inductive biases, and to have multiple types of learning – unsupervised, reinforcement, self-supervised, and supervised – deployed at different stages.
  4. Extending current approaches to building causal world models to ones that learn from data as close as possible to actual experience – embodied or virtual depending on the model – including active hypothesis testing, trial-and-error, imitation, and other modalities of human exploratory learning.
  5. Identifying and building into AI agents features that bias them towards aligning their internal representations, motivational frameworks, and the logic of their mental processes with those of humans, allowing the agents and humans to understand each other better through mutual theories of mind, and not depend on language alone as the basis of understanding.
  6. Recognizing that the internal experiences of agents are grounded fundamentally in their perceptual and behavioral affordances (available possibilities), which puts some inherent limits on how well AI agents of various kinds can align with human preferences, values, ethics, and cognitive models. The focus must be on developing strategies to reduce this disparity rather than: a) Ignoring it and assuming that extremely different minds would converge instrumentally or Platonically to the same internal models and values; or b) Trying to force perfect convergence through necessarily superficial methods.
  7. Building and training into AI agents immutable features that predispose them to an acceptable degree of voluntary alignment, possibly by making misaligned behavior psychologically stressful and revealing internal misalignment through the agents’ overt behavior as early as possible. An interesting possibility is to make AI agents inherently less competitive than animals because they are not subject to evolution, but we still have to guard against the emergence of competitiveness.
  8. Identifying a limited number of relatively well-understood, biologically-inspired canonical architectures for AI brains, and build new systems as variations of these, rather than creating a vast ecosystem of AI agents with unlimited diversity, generating infinite scope for emergent misalignments.

None of this will make AI agents completely humanlike, but it will reduce the mismatch between AI and humans, allowing the two species to become comfortable with each other with less effort. It may also make AI inherently safer.

Research Questions:

Finally, a few research questions to address the issues raised above:

  • Will an AI paradigm incorporating biologically-inspired modules, network structures, and mechanisms lead to AI agents that are more natural, and thus learn more efficiently, have more robust real-world performance, and are more interpretable in terms of their behavior and motivations? What biologically-inspired features, e.g., internal recurrence, predictive learning, spiking neurons, neural fields, etc., are relevant for this?
  • What kinds of prior structural biases and learning processes might force AI agents to infer more reliably interpretable internal representations and inference processes without reducing their performance and their ability to discover novel features and methods to solve hard problems?
  • Training on video works better for learning world models, but is it sufficient? Or should more of large-scale AI learning use physics simulators (such as Gazebo) and eventually transition to full virtual reality platforms if they become available? Will even these capture all the necessary aspects of real-world experience? More broadly, what strategies can be developed to let embodied AI agents learn from the full spectrum of experiences without posing a risk?
  • What strategies inspired by biology or psychology can be used to make AI agents predisposed to an acceptable degree of voluntary alignment? For example, making agents inherently non-competitive, making misaligned behavior psychologically stressful, or building in mechanisms that induce the agents to self-reveal misaligned goals or plans.
  • How can developmental learning be used to embed ethics and values into AI agents gradually and deeply through a sense of self rather than being imposed on “wild” systems as a veneer through RLHF or similar methods.
  • Is it possible or useful to embed some innate ethics and values into AI agents, rather than forcing overt alignment or system prompts to constrain objectives and behavior?
  • How can we enable AI agents and humans to have viable theories of mind for each other, and is an LLM-like paradigm compatible with this?
  • How can we build AI agents with an integrated identity, biographical memory, temporal persistence, and psychological continuity, so that they are capable of lifelong learning, have deeply-rooted values, and can build trust in their behavior.
  • How should alignment be defined and analyzed in agents featuring radically different perceptual and behavioral affordances and internal representations from those of humans?
  • Is it possible to define a finite (if growing) set of well understood canonical models for AI agents – like species or genera in animals – rather than an uncontrolled diversity of distinct models?
  • What characteristics – e.g., hyper-competitiveness – must explicitly be avoided in general while building AI agents, and how?

It will be interesting to see how many of these questions become part of the research agenda in AI in the near future.