Moretti and the Stanford Literary Lab: Computational criticism in two senses and the prospect of a new approach to literary studies

by Bill Benzon

27193548829_c1fcf23f3bFranco Moretti and his colleagues at the Stanford Literary Lab have collected a number of their pamphlets into a book:

Canon/Archive: Studies in Quantitative Formalism by Franco Moretti (Author, Editor), Mark Algee-Hewitt, Sarah Allison, Marissa Gemma, Ryan Heuser, Matthew Jockers, Holst Katsma, Long Le-Khac, Dominique Pestre, Erik Steiner, Amir Tevel, Hannah Walser, Michael Witmore, Irena Yamboliev, published by n+1.

That book is the occasion of this essay, which thus resembles a review in some, but only some, respects.

If it’s only a review that interests you, then perhaps I can save you the trouble of a rather long read.

Canon/Archive is an important book of literary criticism, likely as important as any published this year. It is also rather technical in places. But you can skate over those spots if you’re determined to read the whole book. Look at the charts and diagrams, they’re the heart of the book.

That in itself is important to note; the book is full of charts and diagrams. That is unheard of in standard literary criticism. It’s a sign of the fact that Canon/Archive embodies a new mode of thought, perhaps the first since the advent of the so-called New Criticism before World War II (though mostly after the war).

A new mode of thought! Heavens to Betsy!

As Moretti notes in his preface, “Images come first … because – by visualizing empirical findings – they constitute the specific object of study of computational criticism” (xi). Think of it, visually, of course. You run the programs, visualize the results, and then write the text to support, explicate, and reflect on the implications of those visualizations.

Toto, I've a feeling we're not in Kansas anymore.

For those who decide to brave the whole essay, here's a piece of advice: If you find something boring or a bit picky, do what I always do, skip over it. You can always come back.

The Collaboratory

But why, you may ask, why all those authors for this one book, fourteen of them? Here’s what the Stanford Literary Lab has to say about collaboration:

At the Lab, all research is collaborative, even when the outcome ends up having a single author. We hold frequent group meetings to evaluate the progress of the experiments, the status of existing hypotheses, and the promise (and problems…) of future developments. Most of our meetings are limited to those directly engaged in the research; however, four or five per quarter are open to whomever is interested in our work.

So, this is a collective project. As Moretti explains in his preface, “Literature, Measured” – which originally appeared as Literary Lab Pamphlet 12, April 2016:

I would say that almost every project goes through two very different stages. In the initial phase, the group functions like a single organism, where every individual attends to a specific task. The first of such tasks is clearly that of programming: something Matthew Jockers laid the foundations for even before the Lab was officially opened, and Ryan Heuser sustained over the years with his unique imaginative talent, and whose mathematical implications have eventually been made clear to us all by Mark Algee-Hewitt. On the basis of programming, much more becomes possible: from the refinement of the corpus to the analysis of initial results; from the review of the critical literature to the design of follow-up experiments. This functional division of labor, whose results no individual scholar could ever achieve in isolation, is clearly indispensable to modern research. (pp. x-xi)

Think about that for a moment, for it is quite unlike standard-issue literary criticism, which is undertaken by individual scholars (a few of which have achieved almost rock-star like status within the discipline).

Collaboration? Among literary critics! Who’d’ve thought?

Now the good stuff:

Now, the team sits together around a table—the lab table, as essential a tool as the really expensive ones—and discusses how to make sense of the results. Here, the efficient integration of the first stage gives way to a swirl of disparate associations: C reflects on the language of a specific excerpt, and A on the historical categories that could explain it; F recalls something D had said a few months earlier (and then forgotten); E recognizes a grammatical pattern, for which B suggests an evolutionary explanation … All researchers bring to this phase their interests, and even fixations. At times, there is a lot of noise. But in a few magic moments, the group becomes truly more than the sum of its parts; it “sees” things that no single pair of eyes could have. If, in the pamphlets that follow, there are some genuine discoveries, that’s where they have always begun. (p. xi)

Imagine that, the group becomes “truly more than the sum of its parts” and sees things that no one individual could have seen.

It’s a process I know well from my years in graduate school at the State University of New York at Buffalo during the mid-1970s. I was getting my degree in the English Department, but my deepest education came in the computational linguistics research group that David Hays, in the Linguistics Department, convened at his home on the shore of Lake Erie. Hays had been a first generation researcher in machine translation and as such was one of the founders of computational linguistics. By the time I met him, in the spring of 1974, his interest had shifted to semantics, which is what most interested me.

The give and take that Moretti described, the meeting of minds, it was there around the table in Hays’s dining room. Depending just when the seminar convened, we’d share a meal at that same table, a meal we’d prepared together. Later, after I’d graduated, Hays and I continued to collaborate. At times it seemed we were of one mind distributed across two brains, each with its own competences.

But I digress.

Here’s where I’m going: Computational criticism, as Moretti and his colleagues have come to call it, is not merely new, it is deeply new, for reasons I will explain in the next section. The natural complement, or supplement, to this computational criticism is not the regime of close reading, New Historicism, and critique that constitutes traditional literary criticism, but something else. I’ve been calling it naturalist criticism, but for the purposes of this essay I’m interested in that aspect of naturalist criticism that might be called computational criticism, but in a different sense from how Moretti and his colleagues use the term.

Consider, for example, computational physics. What is that? here’s how the Wikipedia entry begins:

Computational physics is the study and implementation of numerical analysis to solve problems in physics for which a quantitative theory already exists. Historically, computational physics was the first application of modern computers in science, and is now a subset of computational science.

Computational physicists simulate physical processes.

Computational literary critics, on the other hand, do not simulate literary processes. What Moretti and his Stanford colleagues have done, and continue to do, what other computational critics do, is very important. As I argued in one of my early pieces at 3 Quarks Daily, it’s the only game in town [1].

The question presents itself, however: could there be a computational criticism that actually simulates literary processes, that in effect simulates the reading of literary texts? That’s what computational linguists were doing with language back in the 1960s, 70s and into the 80s – not literary texts, but texts. That’s one of the projects that came up around the table in Hays’s research group. And it was while sitting around that table that I imagined such a computational literary criticism. I all but made it the subject of my dissertation, and one of my first publications was a contribution to such a criticism in which I proposed a computational analysis of a Shakespeare sonnet [2].

In the rest of this essay I’ll speculate on how such a criticism is a necessary complement to the criticism in Canon/Archive. I begin with the basics, the text. Then I consider two chapters – alas, only two out of nine, plus Moretti’s preface and his conclusion:

  • Chapter 2: On Paragraphs – Originally Pamphlet No. 10
  • Chapter 6: From Keywords to Cohorts – Originally Pamphlet No. 4

I’ll conclude with some general remarks about the prospects for this new discipline. Alas, I do not think it can be well accommodated within the current regime of academic literary criticism, at least in the United States. The problem begins with the conception of the text that is active in the discipline.

The text, which text?

The concept of the text is one of the most enigmatic in literary criticism [3]. This passage is from the introduction Rita Copeland and Frances Ferguson prepared for five essays from the 2012 English Institute devoted to the text [4]:

Yet with the conceptual breadth that has come to characterize notions of text and textuality, literary criticism has found itself at a confluence of disciplines, including linguistics, anthropology, history, politics, and law. Thus, for example, notions of cultural text and social text have placed literary study in productive dialogue with fields in the social sciences. Moreover, text has come to stand for different and often contradictory things: linguistic data for philology; the unfolding “real time” of interaction for sociolinguistics; the problems of copy-text and markup in editorial theory; the objectified written work (“verbal icon”) for New Criticism; in some versions of poststructuralism the horizons of language that overcome the closure of the work; in theater studies the other of performance, ambiguously artifact and event. “Text” has been the subject of venerable traditions of scholarship centered on the establishment and critique of scriptural authority as well as the classical heritage. In the modern world it figures anew in the regulation of intellectual property. Has text become, or was it always, an ideal, immaterial object, a conceptual site for the investigation of knowledge, ownership and propriety, or authority? If so, what then is, or ever was, a “material” text? What institutions, linguistic procedures, commentary forms, and interpretive protocols stabilize text as an object of study? (p. 417)

What? “Linguistic data” and “copy-text”, they sound like the physical text itself, the rest of them, not so much.

For my purposes, we can think of three concepts of the text:

  • The archival text: this is the physical object that is preserved and from which various editions are prepared, both for general and for scholarly use. Much of the so-called digital humanities has been devoted to the archival text, but this work is quite different in character from computational formalism and so can be set aside for this discussion.
  • The hermeneutic or interpretable text: This is the conception operative in standard literary criticism, “an ideal, immaterial object, a conceptual site for the investigation of knowledge, ownership and propriety, or authority”.
  • The semiotic or linguistic text: The object analyzed, described, and theorized by linguistics and the less delirious semioticians. Like the archival text, it is a physical thing, albeit one of a subtle and enigmatic kind.

The hermeneutic or interpretable text is a vague object. This text is apprehended through spatial metaphors one absorbs in the course of learning how to do literary criticism. But it is never the topic of explicit discussions where we say: We conceptualize the text as a spatial object, and so forth and so on.

One learns to do a “close” reading. How close, a yard, a foot, 2.78 inches? The question is silly. It doesn’t mean that at all. What it means is that, when interpreting a text, you include (often extensive) quotations from the text in your interpretation; those quotations are evidence. This is easily done with poems – where the practice started, but not so easily with novels. But why conceptualize this practice with a metaphor of distance, where the object one studies is uncharacterized by anything more specific than a location in the ethereal space of interpretation?

Whatever these interpretable texts are, they have come to have places where meaning can be “hidden”. The object of criticism, then, is to discover these places and reveal the meanings hidden within. More recently, however, these texts have acquired surfaces, surfaces which one can read and describe. Such “surface” reading is conceived of as an alternative or supplement to “close” reading.

And then there is “distant” reading, a term Moretti coined well over a decade ago, before he’d become involved with computation. Some computational critics, however, use the term to characterize their work. But that work, this computational criticism, doesn’t use the hermeneutic text as its object of study.

It studies the semiotic or linguistic text. In Saussure’s terminology, language consists of signs, and signs consist of a signifier and a signified. The signifier is a physical thing, sonic vibrations, marks on a surface, or gestures (as in signing). The signified is a mental object; it is what a signifier is said to mean. Moreover the relation between signifier and signified is said to be arbitrary. The word apple in no way resembles an apple, nor does the word Jupiter resemble the planet, the word tornado the storm, the word love any of various feelings and attitudes, nor does the word singularity resemble any of the various mathematical, physical, metaphysical, and dream objects for which it used.

Literary critics know this, of course, and it is an important weapon in the standard critical armamentarium; this bit of knowledge is central to deconstruction and reader-response criticism. But it is this knowledge that is important. Not the signifiers themselves. In fact this knowledge is used, more or less, to justify setting the signifiers aside so that criticism can proceed to “the text”.

Computational criticism is built on those very signifiers. They constitute the data in the “big data” of computational criticism, though not all computational criticism is about big data. Much of it is about rather modest data.

The computer knows nothing about meaning, about the signifieds that are the “substance” of textual meaning. Computational criticism is thus forced to be the only form of literary criticism that takes the semiotic or linguistic text as its object of investigation. Yes, students of poetry may remark on meter, rhyme, and such, and students of prose fiction may node in the direction of style every now and then, but these are peripheral activities, distant satellites to the sun of interpretation, an activity that can be close to or distant from its vague object, a mysterious being all decked out with impalpable surfaces and invisible nooks and crannies where meaning lurks, waiting to ambush any critic who happens along.

It is in this respect that computational criticism is quite different from standard literary criticism. Yes, we more or less know this. It is a standard line among computational critics – though one not much paraded through Canon/Archive – that their work is not in competition with standard criticism. Standard interpretations of Middlemarch or Moby Dick, for example, are not going to be either felled or supported by a topic analysis of 19th century Anglophone fiction. These are different activities, and they take place in different intellectual worlds. Topic analysis takes place in the world of the linguistic text, a world where mere signifiers are central, while interpretation takes place in the world of the hermeneutic text, the world of meanings.

The relationship between these two intellectual worlds is not at all clear. At best, they are merely different. But they might, with their different foundational commitments, be incompatible. Sorting that out is going to be a major task for future criticism.

A popular current gambit in that conversation focuses on scale – it floats in and out of the pamphlets collected in Canon/Archive. Computational criticism works at the macro scale, a large number of texts (100s or 1000s) over a long period of time (decades or even centuries) whereas interpretive criticism works at the micro scale, with individual texts, perhaps one at a time, perhaps more typically in groups of a half dozen or a dozen, perhaps more for a long monograph. This discussion has its value, the remarks in Canon/Archive are often fascinating. But ultimately it is secondary to a discussion that focuses on the differences between the interpretable text – that thing of mystery and critical desire – and the linguistic text, a string (for that is what it is, a string, one word form after another) of signifiers.

This other discussion, a deeper one, has hardly begun. And it wouldn’t even be worth entertaining if it weren’t for that older computational linguistics, the one I learned around the table in Hays’s research group. The focus of many of those discussions was semantics, meaning, and how word forms are computationally linked to semantic structures. That enterprise collapsed in the 1980s because it became too unwieldy. But if you look, you can see signs of its return in the wake of neural networks and machine learning.

Topics and paragraphs

Just about every document more than a 100 words long will be organized into paragraphs. My dictionary says that a paragraph is “a distinct section of a piece of writing, usually dealing with a single theme”. OK. But I’m thinking back to, say middle school, when I was learning how to write and had to figure out how to divide my piece into paragraphs. That kind of definition wasn’t much help.

Chapter 3, “On Paragraphs: Scale, Themes, and Narrative form” (by Mark Algee-Hewitt, Ryan Heuser, and Franco Moretti), is an investigation into the nature of paragraphs that uses a technique known as topic analysis, where topic is a term of art that corresponds roughly with the common sense notion of a topic. It is, if you will, an operationalization (a notion Moretti discusses in Chapter 4) of the common sense notion.

Let’s step back for a moment and think about words in texts. Consider this word:

1) race

What’s it mean?

Well, it could mean several things, depending…


Absent any context it is just a word form that could be bound to (associated with) any of a variety of meanings. Here are some pairings:

2) race car
3) horse race
4) human race
5) missile race

In each case our impulse is to bind race to a meaning that is consistent with some meaning of the other term. The meaning is similar among 2, 3, and 5, where it designates some kind of competition, which is quite different from 4. However, in both 2 and 3 we’re dealing with a contest where competitors attempt to best one another in traveling over some course. Competition in 5 is of a different character.

Depending on further context – more words – these two-word phrases might be from four different topics: horse racing (2), human athletic competition (3), biology or politics (4), and Cold War competition between the United States and the USSR (5). Or maybe we have just two topics, competition (2,3, and 5) and biology (4). Or perhaps 2 and 3 belong to one topic and 5 to another.

What happens in topic analysis is that different documents are compared with one another to see which words hang together. Words that hang together are assigned to the same topic. Just how this is done, well that is a technical matter. Algee-Hewitt, Heuser, and Moretti assume that you either know how the technique works, or you don’t care and are willing to assume that they know what they’re doing. So they don’t explain it, which is fine. For the most part, I’m not going to do so either [5].

But there are two details. First of all, the technique treats each document as a so-called “bag of words”. Imagine that some document, any document — a poem by Denise Levertov, a play by Beaumarchais, a technical article by Richard Feynman, a novel by George Eliot, whatever — is printed out on single sides of paper sheets. Slice the sheets into thin strips each containing a single line of print; cut those strips into individual words like so many pieces of confetti; and gather all the individual snippets together and place them into a bag. THAT’s a bag of words. All the structure in the document is gone. All we’re interested in is whether or not words occur together.

Second, when we’re dealing with long texts, such as novels, the texts are generally divided into 1000 word segments, where each segment is treated as a single document. A moment’s reflection should reveal why we do this. Remember, we’re doing a massive cross comparison between all the ‘documents’ in our corpus. Do race, horse, sprint, and car, for example, consistently occur with one another? Well, if our texts are 100,000 words long, chances are they will; with texts that long, just about any given word is highly likely to occur with any other word. And that would make the analysis useless. So we divide the document into 1000 word segments for analytic purposes. It could just as easily be 1033 or 875 or even 492, but 1000 is a nice round number.

And that brings us to paragraphs. What happens if we treat each paragraph as a separate document for the purposes of topic analysis? After all, that’s how people create documents, in paragraphs. Do paragraphs have an internal integrity that can be revealed though topic analysis?

That’s what this chapter is about. They created a way to measure the “thematic focus” of a text segment. Is a segment (‘natural’ paragraph of arbitrary 1000-word slice) devoted to a single topic (in the sense of topic analysis) or to two, three, four, or more topics? If indeed paragraphs are natural units of thematic organization then paragraphs should exhibit greater thematic focus than arbitrary 1000-word segments. It turns out, not surprisingly, that this is the case.

Let us be clear about the meaning of these findings. First of all, we did not “discover” that paragraphs were thematic units; scholars who had studied the paragraph had long established this “fact,” which we had all learned in elementary school, and had “known” ever since. But we proved that this “well-known fact” was actually true, and could be “recognized” by a topic modeling program, thus proving its reliability; two instances of corroboration which, though hardly exciting in themselves, have their modest role to play in the process of research. More significantly, our results suggest that—if one wants to use topic modeling to analyze literature—then paragraphs are a better unit than “mechanical” segments, and should replace them in future research. And the same for thematics: if, as we have seen, no one really knows “where” to look for themes in a text, our findings suggest that paragraphs are probably the best starting point: by concentrating thematic material within their limited space, they act as the textual habitat of themes. What this concretely means, is the object of the next two sections. (74)

They then went on to examine mono-topical paragraphs and poly-topical paragraphs (three-topics was most common), concluding:

If the initial comparison between paragraphs and “mechanical” segments had established the greater thematic focus of paragraphs, then, these later findings specify that focus as a kind of thematic combination: neither the large notions routinely associated with the idea of the “theme” (War; Nature; Travel), nor those “indivisible units” often labeled as “motifs” (“a bomb explodes”; “falling leaves”; “the train leaves the station”), but the interaction of a few topics within the frame of an everyday event. Event; because, let’s not forget it, these are paragraphs in a story; paragraphs that make the story. Event; because, let’s not forget it, these are paragraphs in a story; paragraphs that make the story. David’s state of mind changes after returning to Mr. Wickfield’s house; Guy’s mother accepts the fact that it’s best for him to go away. An action has occurred; the initial situation has been transformed (“I had not expected it of her”) into a different one. And it has been transformed, by the encounter of distinct topics: David’s feelings don’t simply change straightforwardly from “fears” to “hopefulness”: they do so, by taking a detour through “house” and “room” and “staircase” (“there was such influence in Mr. Wickfield’s old house”). The paragraph is not a pawn that makes its orderly one-way move towards the end of the story; it’s a knight that advances by combining two axes in a single move. For now, it’s just a metaphor for how paragraphs contribute to the plot. (88)

And now things begin to get interesting.

By defining it as “a sentence writ large,” or, symmetrically, as “a short discourse,” previous research was implicitly asserting the irrelevance of scale: sentence, paragraph, and discourse were all equally involved in the “development of one topic.” We have found the exact opposite: scale is directly correlated to the differentiation of textual functions. By this, we don’t simply mean that the scale of sentences or paragraphs allows us to “see” style or themes more clearly. This is true, but secondary. Paragraphs allows us to “see” themes, because themes fully “exist” only at the scale of the paragraph. Ours is not just an epistemological claim, but an ontological one: if style and themes and episodes exist in the form they do, it’s because writers work at different scales—and do different things according to the level at which they are operating. (98)

* * * * *

Let us now set that aside and go back to the computational linguistics of my youth, the time I spent with David Hays and his group in the mid-1970s. At that time there was considerable interest in the conceptual structure of common sense information. A lot of attention was given to knowledge structures that would likely yield some/many of the topics discovered by topic analysis. Moretti and others just find these topics in texts, but they have no idea how they got there beyond the apparent fact that that seems to be how the human mind is organized. Back in those old days we were trying to figure out, in detail, how this organization worked. How much detail? Enough so that when you had a computer ‘read’ a text, it would be able answer questions about it. Or, you can ask the machine it a natural-language question about some topic (a database of naval vessels figured prominently in one major research project), it would be able to determine the answer. In some cases researchers were interested in using such structures to generate stories. Thus Marvin Minsky talked of frames while Roger Schank talked of plans and scripts.

What’s a script? What happens when you go to a restaurant? You enter, take a seat, read the menu, give your order to a waiter, who delivers the food to your table, after which you eat it, accept the check, and pay the bill. That’s a script. Obviously it will have variations, along with a zillion details. What you do at McDonalds is going to be very different from what you do at Le Bernadin. What’s a frame? A frame, for example, might have information about the typical automobile – body, seats, engine, transmission, wheels, gasoline, top speed, etc. – or the typical fish – head, body, tail, fins, gills, swims, etc.

That sort of thing. There’s a million of them, at least. It’s easy to see that, yes, such things in the mind would result in the word groupings that topic analysis finds in texts.

But that’s not all. In 1973 Hays had proposed that abstract concepts got their meaning from stories [6]. Charity was the standard example. What is charity? Charity is when someone does something nice for someone else without thought of reward. At the time I joined his research group Hays had just completed a paper in which he used this concept to analyze various concepts of alienation used in the social sciences [7], including those by Karl Marx (“ a condition of certain societies at certain periods”), Melvin Seeman (“the content of certain beliefs”), and Walter Gerson (“a mode of operation of the human personality”). One graduate student, Mary White, was finishing a dissertation where she investigated the belief system of a contemporary millenarian community and used the model to analyze and describe those beliefs. Another student, Brian Phillips conducted a computational investigation of stories about drownings with a program written in SNOBOL and running on a CDC 6400 mainframe, a physically huge computer having considerably less power than your smart phone – CDC (Control Data Corporation) no longer exists.

Phillips was interested in the difference between stories that were merely narratives and stories that had thematic content. He chose tragedy as his theme. What is tragedy? Tragedy is when “someone does a good act that results in his death” [8]. Phillips tested the system by analyzing a number of stories. Most had been elicited from students taking linguistics and English classes, but he found at least one in The New York Times. The system had no parser, only semantics; input was thus expressed in a cognitive formalism rather than using raw text.

Consider the following two examples:

(Story 4) A body was found early yesterday at the foot of the Mango River, near Clubsport. The body is believed to be that of Jose Gepasto. It seems as if Mr. Gepasto’s car made a wrong turn on the highway and plunged into the water.

(Story 22) DF, 43 years old, of Queens, drowned today in MB reservoir after rescuing his son D, who had fallen into the water while on a fishing trip at TF, near here, the police said. Story 22 exhibits the pattern of tragedy, a man died subsequent to rescuing his son. Story 4 does not.

These stories that define abstractions are a bit different from scripts, plans, and frames. The abstract stories are closed in a way that the others aren’t. Certain things must be present if the abstraction is to be instantiated. The others aren’t like that, they’re looser.

Where do we find these abstraction in the text: a) paragraphs, b) sentences, c) groups of paragraphs, d) whole texts, or e) all of the above? I suspect all of the above. Such abstractions can be defined in a sentence, but they can encompass a text. I used the concept to analyze Shakespeare’s Sonnet 129, “The Expence of Spirit” [2].

Can topic analysis distinguish between plans, scripts, and frames on the one hand, and abstraction-defining stories on the other? That’s not at all obvious to me. Is there any computational way for identifying such stories? Does it matter? Well, is the difference between reading about a meal at Le Bernardin, or Applebee’s, and reading a tragedy matter? And what if some specific meal exemplifies some abstraction, such as the “social contract”? Yes, these questions are important.

We need not outline investigative approaches to them at the moment. But they are on the table for investigation, investigation by a computational criticism that we can only imagine now [9].

The direction of literary history

Let us turn to Chapter 6, “From Keywords to Cohorts: Tracing Language Change in the Novel, 1785–1900”, by Ryan Heuser and Long Le-Khac. The examined a corpus of 2,958 nineteenth-century British novels spanning the period from 1785 to 1900. What they discovered, roughly speaking, is a shift from abstract terms to concrete, which they characterize as shift from telling (abstract terminology) to showing (concrete terms). They read this shift through Raymond Williams (The Country and the City) as reflecting a population shift from small rural closely-knit communities to large urban communities where people are constantly amid strangers.

Here is how Heuser and Le-Khac characterize the texts toward the beginning of the period:

Thinking in terms of the abstract values, the tight social spaces in the novels at the left of the spectrum are communities where values of conduct and social norms are central. Values like those encompassed by the abstract values fields organize the social structure, influence social position, and set the standards by which individuals are known and their behavior judged. Small, constrained social spaces can be thought of as what Raymond Williams calls “knowable communities,” a model of social organization typified in representations of country and village life, which offer readers “people and their relationships in essentially knowable and communicable ways.” The knowable community is a sphere of face-to-face contacts “within which we can find and value the real substance of personal relationships.” What’s important in this social space is the legibility of people, their relationships, and their positions within the community. (179)

Toward the end of the period writers wrote and readers read texts Ryan and Le-Khac characterize like this (p. 36):

If this is how the abstract values fields are linked to a specific kind of social space, then we can make sense of their decline over the century and across the spectrum. The observed movement to wider, less constrained social spaces means opening out to more variability of values and norms. A wider social space, a rapidly growing city for instance, encompasses more competing systems of value. This, combined with the sheer density of people, contributes to the feeling of the city’s unordered diversity and randomness. This multiplicity creates a messier, more ambiguous, and more complex landscape of social values, in effect, a less knowable community… The sense of a shared set of values and standards giving cohesion and legibility to this collective dissipates. So we can understand the decline of the abstract values fields—these clear systems of social values organized into neat polarizations—as a reflection of their inadequacy and obsolescence in the face of the radically new kind of society that novels were attempting to represent. (180-181)

The upshot (181): “Alienation, disconnection, dissolution—all are common reactions to the new experience of the city.”

I have no problems with this, as far as it goes. But, in light of Hays mechanism of abstraction, where abstract terms are defined over terms, I want to suggest that something else might be going on. Perhaps all those concrete terms in the later novels are components of abstract patterns, patterns defining terms which may not even be named in the text (or elsewhere).

That is to say, abstract terms do not contain their definitional base somehow wrapped up “inside” them. The signified is not enclosed within the signifier. It lies elsewhere. When constructing discourse intended to circulate within a known world one can rely on others to possess, internally, the defining pattern of terms. But when sending a text to circulate among strangers, a message in a bottle, one cannot rely on them to already to have internalized the definitional patterns. One must also supply the patterns themselves. And once the patterns are there, perhaps the terms they define become irrelevant. Perhaps, in fact, this situation is an opportunity to gather new patterns – which others may or may not name and rationalize.

The fact that these later novels do not use abstract terms so liberally thus does not necessarily mean that those texts do not imply abstraction. Perhaps they do employ abstraction but are using it in a different mode; they are supplying the patterns, the defining stories, themselves. Heuser and Le-Khac seem to be implying as much later on in their discussion:

The growing inadequacy of explicitly evaluative language, the change in characterization to an indirect mode of presenting concrete detail, the inversely related trends of the abstract values and “hard seed” fields—all of these point to an overarching shift in the novel’s narration and style: a shift from telling to showing. Given the range of concrete description words in the “hard seed” cohort, many of which are not necessarily anthropocentric, we can see a broad change in the general mode of perceiving and representing people, objects, spaces, and actions in the novel. This change from abstract, evaluative language to concrete, non-evaluative language doesn’t necessarily indicate the disappearance of evaluation. Given the patterns we’ve seen, it would be more accurate to say that the modes of evaluation and characterization changed, moving from explicit to implicit narration, from conspicuous commentary to the dramatization of abstractions, qualities, and values through physical detail. (192)

So, now we’ve got two hypotheses on the table. One would account for the shift from abstract to concrete terminology as a consequence of population shift from rural to urban environments. The other hypothesizes the emergence of new abstract concepts, concepts of a type that are defined over stories. The stories are realized in the proliferation of concrete details that are patterned to the requirements of those abstractions. Those patterns aren’t going to show up in vocabulary lists, but they are there in structures in the text, structures in the signifiers.

These hypotheses are not, of course, necessarily contradictory. Both could be happening. How do we find out?

I’ve not read Williams’ The Country and the City, so I don’t know what kind of evidence he adduces for his hypothesis about the shift from “knowable” rural communities versus the “messier, more ambiguous, and more complex landscape of social values” in the city. But I am willing to accept it as Heuser and Le-Khac present it. How do we look for the emergence of new abstractions? Perhaps in dictionaries and word lists? Perhaps in other non-fictional texts of various kinds, diaries, news stories, etc.?

* * * * *

I would like to say more, give more examples. And there’s a new Cultural Evolution Society that needs work from computational critics. But this has gone on long enough.

My point is simply that there is an approach to linguistic computation that is older than that employed in computational criticism, one that aspired to understand the human mind. That work can complement this newer work, and even suggest ways to approach the detailed analysis and description of individual texts. Taken together these computational criticisms complement and supplement one another. Taken together they open a new world of intellectual exploration and, yes, adventure.

But not in the existing academy.

What are the institutional possibilities of a new criticism, a deeply computational one?

Computational criticism is deeply controversial within the profession. I have reluctantly concluded that it cannot thrive in existing departments of literature. It will have to find a home elsewhere.

Consider the first paragraph of the Preface where Moretti informs us:

A well-known scholarly journal had been asking for an article on new critical approaches, and that’s where we sent the piece once it was finished. But it came back with so many requests for corrections that it felt like a straightforward rejection. It was dismaying; a few years ago, computational criticism was still shunned by the academic world, and we couldn’t help thinking that what was being turned down was not just an article, but a whole critical perspective.

Been there, done that. In 1980 I’d submitted an article about “Kubla Khan” to MLN (Modern Language Notes), which had published my computational analysis of Shakespeare’s Sonnet 129 in 1976 [2] and had it rejected outright, though it was subsequently published in Language and Style. Last year I’d sent NLH (New Literary History) a piece entitled “Sharing Experience: Computation, Form, and Meaning in the Work of Literature” [10]. It's pretty clear in both cases that “what was being turned down was not just an article, but a whole critical perspective.” Moreover, it is computation that was being rejected in those cases as well as Moretti/LitLab's. I had framed the MLN submission as being structuralist, which it was in a way, but it was computational at its heart. The NLH submission, as you can tell from the title, was explicitly computational. Silly me, I thought the critical world was changing.

I’ve told the tale of those two rejections, with quotations from reviewer’s reports, in a long working paper where I also place them in context of the profession’s history [11]. The problem is that academic literary criticism seems deeply and profoundly committed to interpretive activity founded on the notion of that amorphous and protean interpretable text I discussed earlier, the one that has hidden meanings, but surfaces too, and which one can get close to or step back at a great distance. That commitment makes the profession blind to any serious consideration of the semiotic or linguistic text, to the examination and description of structure in strings of signifiers and certainly to any thought of computing as a linguistic and therefore a literary process [12].

Thus, it came as no surprise when Ted Underwood, a well-known computational critic who was trained as a Romanticist, announced that he was shifting half his faculty line to the School of Information Sciences at the University of Illinois, Urbana-Champaign, where he also has an appointment in the English Department. Information sciences in this sense encompasses library science. Libraries, of course, are among those oldest humanistic institutions in the world. While library science dates back at least to the nineteenth century, its conjunction with information science is a product of the emergence of the computer after World War II. It is thus a relatively new discipline.

Moreover, one of the techniques central to corpus linguistics was invented by a scholar interested in the automatic indexing and retrieval of texts in document collections (that is, libraries). The scholar is Gerald Salton and the technique is the vector space model of the linguistic text, which is foundational to topic analysis and other corpus techniques [13]. There is thus a certain historical justice in Underwood’s (partial) move to information science.

Why did he move? Because he found information science more congenial to his work than English. Alas, not all computational critics have such an option.

Let me offer one last data point. Several months ago I entered into a Twitter conversation with James Ryan, who is finishing up his doctorate as a member of the Expressive Intelligence Studio at the University of California, Santa Cruz. It turns out that he was organizing a workshop on the history of expressive computation (HEX01) and asked me to present a paper on the work I did with Hays back in the 1970s [14]. That’s the first time anyone expressed any interest in my technical work since, well, I don’t know how long. It makes sense that that interest should come from someone who programs games. Game designers and programmers need to know about stories, but they need that knowledge in computational terms, terms that are al but taboo in literary criticism.

Is there a home for computational criticism in the gaming world? That is by no means obvious. But there clearly is little chance of a home in literature departments in the near future. The conceptual styles are too different.

Could the gaming industry support an Institute for the Study of Expressive Computation? It seems like a long shot, but who knows, the world changes, and life thrives in the strangest places.

Where does this leave scholars who do work such as that in Canon/Archive? Moretti himself has retired from Stanford and set up shop at the École Polytechnique Fédérale de Lausanne, a technical institute in Switzerland. The students who’ve worked in the Lab and signed chapters in this book, they’ll get faculty or staff jobs somewhere. But a thriving intellectual community organized around computational criticism? Oh it exists, online, in the Twitterverse and the blogosphere, in fugitive meetings here and there. People have jobs, some good, some not so good. But institutionally the best prospects are probably in media studies, communication, and information science, not literary studies. As far as I can tell literary studies will remain committed to pre-computational intellectual formations for the foreseeable future and will do so from a position of quasi-aristocratic superiority over crass calculation, of which it will remain fitfully ignorant.


[1] The Only Game in Town: Digital Criticism Comes of Age, 3 Quarks Daily, May 5, 2014,

[2] Cognitive Networks and Literary Semantics, MLN 91: 1976, 952-982.

[3] I’ve written a number of posts at New Savanna about the concept of the text. They’re at this link:

[4] Rita Copeland and Frances Ferguson, “Introduction”, ELH, Volume 81, Number 2, Summer 2014, pp. 417-422.

[5] For a discussion of topic analysis, see my working paper, Corpus Linguistics for the Humanist: Notes of an Old Hand on Encountering New Tech, Working Paper, July 2013, 21 pp.

[6] David G. Hays, The Meaning of a Term is a Function of the Theory in Which It Occurs. SIGLASH Newsletter 6, 8-11 (1973).

[7] David G. Hays, Hays alienation On “Alienation”: An Essay in the Psycholinguistics of Science. In: Geyer R.R., Schietzer, D.R. (eds.): Theories of Alienation. pp. 169-187. Martinus Nijhoff, Leiden (1976).

[8] Brian Phillips, A Model for Knowledge and Its Application to Discourse Analysis, American Journal of Computational Linguistics, Microfiche 82, (1979).

[9] See the discussion of the Prospero project in William Benzon and David Hays, Computational Linguistics and the Humanist, Computers and the Humanities 10, 265 – 274 (1976),

[10] I’ve posted the final draft online, Sharing Experience: Computation, Form, and Meaning in the Work of Literature, Working Paper, September 2016, 21 pp.

[11] Rejected! @ New Literary History, with observations about the discipline, Working Paper, 2017, 60 pp.

[12] I’ve devoted an inordinate amount of time to to examining the discipline. You can find my most recent work in Prospects: The Limits of Discursive Thinking and the Future of Literary Criticism, Working Paper, 2015,

[13] David Dubin, The Most Influential Paper Gerard Salton Never Wrote, Library Trends 52(4) Spring 2004: 748-764.

[14] Abstract Patterns in Stories: From the intellectual legacy of David G. Hays, Workshop on the History of Expressive Systems (HEX01), November 14, 2017,