Out of Bounds: Language limits, language planning, and the definition of distance in the new spaces of linguistic capitalism

Sack, Warren

Out of Bounds: Language limits, language planning, and the definition of distance in the new spaces of linguistic capitalism

Article Information

Author(s): Warren Sack
Affiliation(s): University of California, Santa Cruz
Publication Date: 28th November 2017
Issue: 6
Citation: Warren Sack. “Out of Bounds: Language limits, language planning, and the definition of distance in the new spaces of linguistic capitalism.” Computational Culture 6 (28th November 2017). http://computationalculture.net/out-of-bounds-language-limits-language-planning-and-the-definition-of-distance-in-the-new-spaces-of-linguistic-capitalism/.

Abstract

Software challenges us to re-inscribe what we comprehend as inscription. And, most importantly, software challenges us to understand new forms of technological politics and new practices of political invention, legibility and intervention that we are only just beginning to comprehend as political at all … These orderings – written down as software – are becoming one of the chief ways of animating space. Nigel Thrift and Shaun French (2002)¹

The Order of Things and the Computational Episteme
Philosopher and historian Michel Foucault, published a book in 1966, The Order of Things, in which he argued that there were vast differences, ruptures, between the way language was understood in three time periods: from the early modern period to the end of sixteenth century (the Renaissance); from the seventeenth century to the nineteenth century (the Classical period); and, from the nineteenth century to now (the modern period). In each of these periods, Foucault argues, one can find a distinctive set of ideas and approaches to language.

He is especially concerned with how meaning is differently ascribed to language in these different periods:

…in the sixteenth century, one asked oneself how it was possible to know that a sign did in fact designate what it signified; from the seventeenth century, one began to ask how a sign could be linked to what it signified. A question to which the Classical period was to reply by the analysis of representation; and to which modem thought was to reply by the analysis of meaning and signification. But given the fact itself, language was never to be anything more than a particular case of representation (for the Classics) or of signification (for us). The profound kinship of language with the world was thus dissolved. The primacy of the written word went into abeyance. And that uniform layer, in which the seen and the read, the visible and the expressible, were endlessly interwoven, vanished too. Things and words were to be separated from one another.²

“Epistemes” are what Foucault called this series of relations posited between language and meaning.³ He showed how, even though one can see some continuities between the episteme of one period and another, the epistemes are fundamentally different. In this article, I will argue that Foucault was too hasty in his coupling of “us” (today, now) with the nineteenth century moderns.

Historians have repeatedly critiqued Foucault for this periodization because it seems to only be descriptive of a small group of male, European intellectuals. It does not take into account, for instance, the criteria of feminist epistemologists.⁴ When Foucault uses a phrase like “…signification (for us)” we are left wondering who is included, who is excluded from this “us”?

I agree with this critique but only because Foucault’s use of “episteme” is too grand for what he actually does in that book. In The Order of Things what is under scrutiny is not the general knowledge of a grand public about language during certain periods of time. No, what Foucault actually does offer is a close reading of a select number of scholarly texts about language (including those from rhetoric, logic, grammar, philology, literature, and linguistics). Instead of the term “episteme” it would have been more accurate to use a phrase (admittedly awkward) like “scholarly analysis of language.”

In this article, I intend to pursue another line of critique: even if Foucault’s “us” does indeed include all of the male, European intellectuals of the twentieth century (which I doubt); even then, I will argue, his “Renaissance,” “Classical,” and “modern” epistemes are inadequate and require the addition of a fourth episteme that covers the time period from the beginning of the twentieth century until now. The fourth episteme is distinct from the modern episteme because the fourth episteme is not primarily concerned with signification and meaning.

One might call this fourth episteme the “computational episteme” and emphasize that its primary approach to language pushes meaning to the margins. In its contemporary form, meaning is pushed to the margins by software. Essentially, within a “computational episteme,” theories of language are devised to analyze language as if language is meaningless. While signification and meaning are central to the preceding epistemes named by Foucault, in many twentieth century works of linguistics, literature, logic, and information, the idea that language forms can be studied in isolation from meaning is taken to be both possible and desirable.

However, for reasons described above, I think it best to replace Foucault’s term, “episteme,” with something else. In its place, I will employ the phrase “language limits” to highlight the notion that scholarly and technical theories of language are founded on and circumscribed by very particular sets of presuppositions about what language can and cannot do.⁵

Language limits are not universal; they are specific to particular social, cultural, political, or economic contexts. For example, members of a culture where there is a belief in the power of prayer are not stunned if they are shown evidence that prayer can attain concrete results. Members of a discipline, like psychoanalysis, that acknowledges the existence of the unconscious, are not blind to the idea that what someone says is only partially under conscious control. Each culture, each discipline, each social and political formation has particular language limits.

The second reason I prefer the phrase “language limits” over the term “episteme” is because the phrase can be read linguistically and spatially, especially geographically. I propose an investigation into language limits that explores its implications for geography. Specifically, what follows concerns an ongoing discussion within cultural geography on the issues of “language spaces” and the “spaces of writing.”⁶

Language Limits After Meaning
As narrated by Foucault, previous theories of language from the early-modern period through the end of the nineteenth century were concerned with the meaning of language. To theorize language as if meaning was marginal or immaterial is to broach a very different discussion about language but a discussion that has had many threads throughout the twentieth century and on into today’s technical literatures.

For example, Claude Shannon and Warren Weaver, in their book, A Mathematical Theory of Communication, maintain that one of the tenets of their theory is that meaning plays no role in it:

The word information, in this theory, is used in a special sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning. In fact, two messages, one of which is heavily loaded with meaning and the other of which is pure nonsense, can be exactly equivalent, from the present viewpoint, as regards information.⁷

A genealogy of the language limits of computation can be sketched with special attention paid to the mathematician David Hilbert and the works he inspired, such Alan Turing’s formulation of computing, Noam Chomsky’s linguistics, and, contemporary forms of big data analysis and machine learning.

There is only space, in this article, to mention the existence of this genealogy. Elsewhere, in my forthcoming book⁸, I provide this genealogy and show how it connects many contemporary theories of language that paradoxically presuppose that language can be analyzed as if it is meaningless.

To work with and conceptualize language as if its form could be separated from meaning can be illustrated with the theories and methods of many twentieth-century mathematicians, logicians, and linguists. Shannon and Weaver were hardly alone in imagining that the form of a message could be divorced from its meaning. As expressed by the linguistics historians Geoffrey Huck and John Goldsmith, (Noam) Chomskyan linguistics referred to the posited independence of the form and the meaning of language – the separation of the concerns of syntax from those of semantics — as the “autonomy hypothesis.”

The idea that syntactic analysis could (in fact, must) be done without recourse to semantic terms, which has come to be known as the “autonomy hypothesis,” has played a central role in all of Chomsky’s work since his undergraduate days. It was also … a crucial component of the structuralist program of his [Noam Chomsky’s] teacher Zellig Harris.⁹

If the “autonomy hypothesis” is accepted, then analysis must proceed without reference to meaning. In Chomsky’s words of 1977

[T]he study of meaning and reference and of the use of language should be excluded from the field of linguistics…¹⁰

Chomsky’s words may sound radical, but this sentiment, this affection for a (David) Hilbertian formalism, had already been expressed by Chomsky’s nemesis Leonard Bloomfield and Chomsky’s dissertation advisor Zellig Harris. In 1931, linguist Leonard Bloomfield pronounced that “[m]eanings cannot be defined in terms of our science and cannot enter into our definitions”¹¹ According to Huck and Goldsmith,

Harris’s pronouncement in 1940 that “the structures of language can be described only in terms of the formal, not the semantic differences of its units and their relations” (1940: 701) presaged the development of a distributional approach in which even difference of meanings has no place.”¹²

Long before Chomsky, long before even Bloomfield, Hilbert’s critics recognized and ridiculed formalism’s decoupling of meaning from mathematics. Mathematician Henri Poincaré (1854 – 1912) sarcastically wrote

Thus it will be readily understood that in order to demonstrate a theorem, it is not necessary or even useful to know what it means. … we might imagine a machine where we should put in axioms at one end and take out theorems at the other, like that legendary machine in Chicago where pigs go in alive and come out transformed into hams and sausages. It is no more necessary for the mathematician than it is for these machines to know what he is doing.¹³

Poincaré’s comments sounded absurd in an era when no machines to prove theorems existed. Clearly, any machines used at the time in Chicago’s slaughterhouses could not have proved a theorem. But, computers can now be programmed to prove theorems and Poincaré’s remarks no longer elicit laughter.

An extreme formalism of this kind¹⁴ has taken hold in natural language processing, information retrieval, machine learning, and other software-based approaches now commonly applied to indexing, searching, and – most importantly – ordering the language spaces of the Internet.

Language Space
This genealogy of theories of language, wherein meaning is at the margins, has been closely studied by cultural geographers. Geographers Mike Crang and Nigel Thrift elaborate on an understanding of language spaces and – even though they do not mention Chomsky’s “autonomy hypothesis” – they makes a similar observation about how language is cut free from reference in many twentieth-century theories of language:

…the models of language that have become so prominent are actually founded upon rather particular models of space and time within language. In the end not only is space seen as linguistic but language is seen as spatial. … In the systemic realm, signifiers relate to each other in chains of mutual absence. The static analytic space both allows these mutually referring chains to complete circles and the space of language to cut itself free from reference [my emphasis]… Language becomes a series of synchronous spatial relationships that work to defer meaning not in time but in space…¹⁵

In that article, Thrift and Crang were reflecting on pre-computational theories of language, theories of the early- and mid-twentieth century frequently labeled as “structuralist” or “post-structuralist.” But, once a language model is computational, is constructed on or with software – as we see already in the Chomskyan linguistics of the late-1950s and contemporary search algorithms alike – these theories, which slice the fleshy meaning of a text away from its skeletal syntax, can become industrialized and scaled to global distribution. At global size, the politics and economics of formal language models loom in full view.

In the most direct terms, one could say that the language spaces defined by software are not ordered according to the meaning of words but, rather, ordered by a set of computations on the form of the words — regardless of their meanings. For example, given a list of words like “elephant,” “cat,” “dog,” they can be alphabetized into an order without reference to their meaning: “cat,” “dog,” “elephant.”

Google
Consider the case of Google – which was recently reorganized and put under the umbrella of a parent company called, appropriately, Alphabet. I say “appropriately” because Google makes its money by imposing a software-sorted ordering on the language of the Internet.
In 2014, digital humanist Frédéric Kaplan published a succinctly insightful paper in the journal Representations entitled “Linguistic Capitalism and Algorithmic Mediation.”¹⁶ His paper begins like this:

Google made 50 billion dollars in revenue in 2012, an impressive financial result for a company created less than fifteen year ago. That figure represents about 140 million dollars per day, 5 million dollars per hour. … What does Google actually sell to get such astonishing results? Words. Millions of words. The success of Google’s highly original business model is the story of two algorithms. The first—pioneering a new way of associating web pages to queries based on keywords—has made Google popular. The second—assigning a commercial value to those keywords—has made Google rich.¹⁷

The first of Google’s algorithms is the one that, given a query like “tree,” returns an ordered list of links to a set of webpages.¹⁸ Initially, this was called the PageRank algorithm, named by Google co-founder Larry Page. Published in 1998, the original academic reference to the algorithm is easy to find.¹⁹ Obviously, in the subsequent almost-twenty years, Google has updated its web search algorithm. An overview of one later state of the algorithm can be found on YouTube.²⁰

In addition to returning an ordered list of webpage links associated with the query, Google also returns, at the top of the page, a set of sponsored links. These sponsored links may overlap with the unsponsored links but not necessarily. Proximity to words, like “tree,” are sold at auction to individuals and institutions that want to have their webpage associated with the word.²¹ Google’s second algorithm takes, ranks, and ultimately accepts bids for word proximity. Hal Varian, Google’s chief economist, who developed these AdWords algorithms to conduct these auctions, provides a clear description on YouTube.²² It is this second algorithm that prompts Kaplan’s pronouncement to the effect that Google makes its money by selling words, also known as search “query terms.”

Language Space as Commodity
I will argue a point not opposite but rather apposite to Kaplan’s. My argument is that, already by the nineteenth century, in the age of “print capitalism” (as articulated by the political scientist Benedict Anderson and to be described later in this paper), words were being bought and sold (in the form of books and newspapers); and, the real difference between print capitalism and linguistic capitalism is that, in the latter, the basic commodity is not the word but is, instead, the software-ordered space between words.

In print capitalism, the space between words is already bought and sold in advertising. To place an ad on the front page costs/costed a lot more than placing one inside a newspaper.²³ However, that is almost an ancillary business compared to the role that advertising placement plays in today’s economies of linguistic capitalism where the words – the “content” – is frequently free and advertisements placed next to or between the words is the major source of revenue.²⁴ Moreover, the “space between words” is now ordered by software and quite unlike the graphic design conventions of kerning, whitespace, line separation, paragraphs, and pages; and, significantly different from the orderings of reference works like dictionaries and thesauruses.

In his article, Kaplan goes on to explain that the keystone of Google’s business success is a very modest third algorithm that connects the other two algorithms of PageRank and AdWords: an algorithm of “auto-completion.” Auto-completion is the algorithm that corrects our spelling or automatically extends a phrase when we type in a search term or make a spelling error. It is one of the ways in which the space between words is redefined by Google’s software. For example, when I type “onec u” into the Google search box it suggests an extended phrase: “once upon a time.”

Kaplan points out that Google has sold proximity to certain words to advertisers and, while we imagine auto-completion to be a service to us, the users of Google, it is also an economic engine for Google driving us to choose words centered within space already sold to advertisers.

Note that there is nothing new about spelling correction algorithms. For example, one such algorithm was published in 1974.²⁵ However, to implement spelling correction forty years ago, one would have used a standard reference work, like the Webster’s Dictionary, to serve as the territory onto which the misspelled words would be mapped. Google’s ingenuity is one of business smarts: instead of using something like Webster’s, Google uses a list of words that have been sold at auction. According to Kaplan, “[s]uch auctions happen every time a user enters a search query—about three billion times per day in 2012—millions of times per minute.”²⁶

Kaplan does not see this as just a money making trick by Google, but as a completely new form of capitalism, a form he calls “linguistic capitalism.” Kaplan claims that linguistic capitalism is not based on an economy of attention, but an economy of expression. That is to say, businesses like Google practice it when they build cloud-based platforms of everyday expression – network-based word processing programs, collaborative editing tools, slide presentation packages, and so forth – and intervene in our personal and collaborative acts of expression – like the composition of texts – in order to sell or advertise commodities at the very moment we write and edit.²⁷

Google’s auto-completion algorithm will attempt to “correct” a word towards a term they have already sold at auction. This is clear when one starts to type in a prefix that could be autocompleted to a high-value trademark. For instance, if I want to search for information about the biochemistry term “peptide” (two or more amino acids linked in a chain), I will start by typing “pep” and, before I get to the “t,” the Google algorithms will likely offer that the query be completed as “Pepsi,” the carbonated beverage. Or, let us assume I have a friend whose name is “Mac Donald” and I want to find him on the web. Even if I type “Mac Donald” in as the search term, it will be “corrected” to “McDonald’s,” the name of the fast food restaurant. In Kaplan’s terms, “Auto-completion services can transform linguistic material without value (not much bidding on misspelled words) into a potentially profitable economic resource.”²⁸

Second is something implicit to Kaplan’s observation, but not clearly spelled out: Google, by connecting “peptide” to “Pepsi” or by making “MacDonald” a synonym for “McDonald’s” has redefined the measure of distance in our language space.

In some sense, this redefinition of distance in advertising was common before the advent of software: advertisers want their products next to complementary texts and images regardless of whether their advertisements appear in television, film, radio, newspapers, magazines, or on the web. Advertising is just like real estate: in both it is all about location, proximity, distance.

The big difference between dictionaries of print capitalism versus those of linguistic capitalism is how they are articulated to other instruments or reference works for navigating a language space. So, for example, in print capitalism, dictionaries and thesauruses are articulated together because they have overlapping entries. To find the meaning of “tree,” I look up its entry in a dictionary; to find other words similar to “tree,” I look up its entry in a thesaurus. In this way, print dictionaries and thesauruses cross-link words into a web of words linked by meaning.

Consider how, using the reference works of print, I could translate the word “tree” into French and Italian. Likely, I would look up the term in an English-French dictionary (to find “arbre”) and then in another dictionary, a bilingual English-Italian (to find “arbero”). The distance between this triad of tree-arbre-arbero is measured according to meaning: the English, French, and Italian terms are all thought to mean the same thing and their connection is found with the help of two dictionaries.

In stark contrast, the dictionaries employed in Google’s algorithms do not define the meaning of the words. Language meaning is pushed aside as it is for so many other productions of the computational episteme.²⁹

Another critical difference between the dictionaries of print and those of computation is that print dictionaries do not include misspelled words. There are no printed English dictionary or thesaurus entries for, for instance, the token “tere.” It simply does not exist – it is not a word – and therefore, despite its orthographic overlaps with “tree,” it is represented neither as close to nor far from the word “tree.”

Instead of using meaning as a measure of distance, Google’s auto-completion and spell-correction facilities rely on orthographic overlaps as measures of distance in the new language spaces of the web. How far, in Google, is the French word for tree (“arbre”) from the Italian word for tree (“albero”)? I would venture that they are at a distance of 3, for reasons I will explain presently.

Consideration of these new forms of language space and distance necessitates a slight amendment to Frédéric Kaplan’s declaration that Google is making its money by selling words. Google is not actually selling words, even if buyers of their AdWords product think that is what they are getting. Google’s advertisers are not getting words, they are purchasing proximity to words. Google, by archiving and indexing a larger portion of the web, has produced a new language space and new measures of distance in that space. For purchasers, Google will warp the distance metric so that the advertiser’s website is “close to” the advertised word.

Algorithms of Language Difference
Let us examine these software-sorted distances more closely. What is a measure of distance? Why do I say 3, for example, when asked to guess the new form of distance introduced between “arbre” and “albero”? Let us look at the two words each as just a series of characters. Their first letters match: they are both “a.” But, then “r” and “l” do not match: this is one difference between the two words; “b” and “b” do match; “r” and “e” do not match. But, here, let us pass over the “e” in “albero,” count that move as another difference, but then move on to match the second “r” in “arbre” with the “r” in “albero.” Finally, “arbre” ends in “e” and “albero” ends in “o,” so that is a third difference. The distance is 3.

This measure of distance is usually called an “edit distance” or, more specifically the Levenshtein distance, after Vladamir L. Levenshtein who introduced the measure in a 1965 publication.³⁰ The Levenshtein distance is 0, if the two strings are the same. It is always at least the difference of the sizes of the two strings; and, it is at most the length of the longer string.

It should be clear too that the Levenshtein distance is a measure that holds between any two strings, regardless of whether or not they have meaning. So, computing the distance between “[}+=&yHHj34” and “07734” is just as feasible as computing the distance between “arbre” and “albero.”
The Levenshtein distance between two strings is the minimum number of single-character edits (including insertions, deletions or substitutions) needed to change one string into the other. It is not the only measure of edit distance. For example, the Hamming distance is an upper bound on the Levenshtein distance, but the Hamming distance only works on strings of the same length. It differs from the Damerau–Levenshtein distance that allows insertion, deletion, substitution, and the transposition of two adjacent characters. The Levenshtein does not take into account the transposition of two characters as one edit operation; instead it models it as two edit operations, two substitutions. The Levenshtein distance is less sophisticated than the Damerau-Levenshtein distance (because it does not allow transpositions), but it is more sophisticated than the longest common subsequence metric that allows only insertion and deletion and does not allow substitution.

Because there are other measures of edit distance and because I do not know which one Google employs³¹, I can only guess that the distance between “arbre” and “albero” is 3; which it is using the Levenshtein distance. The “units” of this distance of 3, is therefore 3 edit operations (insertions, deletions, and/or substitutions).

The Levenshtein distance can be defined as a brief, recursive, JavaScript function named ld:

function ld(x, y) {
    if (!y) return x.length;
    if (!x) return y.length;
    return Math.min(ld(x.slice(1), y)          + 1,
		    ld(x,          y.slice(1)) + 1,
		    ld(x.slice(1), y.slice(1)) + (x[0]!=y[0]));
}

A few notes should make this easier to understand for those readers who do not program in JavaScript. Note that, in JavaScript, the empty string (denoted “”) can be coerced into a Boolean value of false, and the exclamation point is the Boolean NOT operator, thus !x returns true when x is the empty string. Note also, in JavaScript, that, when x is a string, the expression x.slice(1) returns all of x except the first character. So, if x is “arbre,” x.slice(1) returns “rbre.” One should also know that characters in a string can be accessed using what is usually an array notation; thus, if x is “arbre,” x[0] returns the first character in x: it returns “a.”

One might ask, then, is this short snippet of code what gives Google its “secret sauce,” what makes Google five million dollars an hour? Conceptually, one might argue that indeed it is the “secret sauce.” But, materially, of course, this is not at all correct.

Consider all of the engineering that is completely overlooked even if one is looking at this on a code level. One might begin with the fact that this implementation of the Levenshtein distance measure is very inefficient: defined as a recursive function, intermediate results need to be computed again and again. So, instead, right from the very beginning, one would not use the above definition, but rather something like the following that employs a form of dynamic programming to progressively save and thereby avoid recomputing the intermediate results:

function ld(x, y) {

if (!y) return x.length;

if (!x) return y.length;

var m = []; m[0] = [0]; m[0][0] = 0;

for (var i=1; i <= y.length; i++) {
  m[i] = [i];
  for (var j=1; j <= x.length; j++){
    m[0][j] = j;
    if (y[i-1] == x[j-1]) { m[i][j] = m[i-1][j-1]; }
    else { m[i][j] = Math.min(m[i-1][j-1] + 1,
                              m[i][j-1] + 1,
                              m[i-1][j] + 1);
    }
   }
  }
  return m[y.length][x.length];
  }

But, tunneling immediately down into the code like this, we are completely avoiding a number of other engineering issues. In what kind of a data structure should we store the dictionary so that we do not need to compute the distance between the query string and every other string in the entire dictionary? Maybe a trie (pronounced “tree,” like the middle syllable of “retrieval”) so that we only look at the words in the dictionary that share a prefix with the query term?³² How much of the dictionary are we willing to scan through for any given query term? All words within an edit distance of 1 or 2, or are we willing to go up to 3 edits from the query term? If the dictionary comprises millions of words are we going to try to load the whole dictionary into the user’s browser before doing any auto-correction? If so, how long will it take to download? If not, will network latency be a huge issue as we consult the dictionary over the network repeatedly as the user types each letter of a query? What happens if our server holding the dictionary goes down? What if we have millions of people (as Google does) accessing the dictionary at the same moment? Should, perhaps, copies of or parts of the dictionary be distributed across several different servers and some kind of load balancing be implemented? And, beyond these difficult engineering issues, the question of what will be included in the dictionary and what will not be included in the dictionary still needs to be resolved.

Algorithms and Systems
Algorithms are step-by-step procedures designed to solve well-defined problems. Textually, algorithms are short. Rarely does the description of an algorithm run longer than a page or two of (pseudo) code. In comparison, software systems are large and typically incorporate many algorithms implemented in code. For example, just the kernel of the Linux operating system currently includes over 15 million lines of code. Imagine printing it out as a book. If you print fifty lines per page, you will have a book of 300,000 pages; or, 300 books of a thousand pages each. Word processors, web browsers, and video editors are all examples of systems. So too are large Internet services like Google’s search engine, the Facebook social network, and Twitter.

In computer science, algorithms are distinguished from systems. In computer science, algorithms and systems are considered two different subfields and are usually taught in university programs as different courses — even different sequences of courses — by different sets of professors. Why? Because they name two different sets of problems.

Algorithms concern the set of difficulties one faces in implementing a particular process or operation of calculation irrespective of the computing environment (e.g., irrespective of the computer hardware, the specifics of network latency and capacity; irrespective of the other software built as “layers” below the software defining the algorithm; irrespective of the kinds of people who use the software).³³

Systems, in contrast, address issues of interface, interaction, scale, and infrastructure. System problems normally come after the algorithm questions have been answered. After that, any problem that prevents a piece of software from performing as it should perform might be a systems problem.
I would not mention this basic distinction except that algorithms and systems are frequently confused in the contemporary social science literature about “algorithms.” In a short article on the keyword “algorithm” as used by social scientists, sociologist Tarleton Gillespie defends how sociologists and journalists employ the word “algorithm” as a rhetorical synecdoche to refer to many different parts of software and hardware systems including “…model, target goal, data, training data, application, hardware.”³⁴ Gillespie et al.’s approach is problematic because if we approach the eclectic concerns of systems design as though they could all be addressed by concentrating on algorithms we will be repeatedly lost in the wrong details. It is like that aphorism about how if the only tool you have is a hammer, you will treat everything as if it were a nail. Not everything that is software is an algorithm!
So, the two JavaScript functions above that define two ways to compute the Levenshtein distance are descriptions of algorithms. But, when we ask how a spelling corrector can be run simultaneously for a million people distributed throughout the world, we are asking a systems question and not strictly an algorithms question.

Assemblages and the Materialities of Language Spaces
Another way to distinguish algorithms from systems is to understand that work on algorithms is frequently accompanied by an attempt (inevitably unsuccessful) to avoid thinking about the materialities of computation. For example, one might specify an algorithm for rendering detailed 3D graphics without worrying about power consumption or heat. But, as soon as one integrates such an algorithm into the core of a gaming system and tries to run it on a smart phone, there are worries about the battery life of the phone and whether or not it gets too hot to touch because the phone does not have a fan to cool it down the way most laptops and desktop computers do. Material concerns like power consumption and heat production are seen to be primarily system concerns and not issues of algorithms. But, they are coupled together. For instance, note that if a more efficient 3D graphics algorithm is invented, when it is incorporated into a gaming system, it will consume less power and produce less heat. In other words, the problems and solutions of algorithms can sometimes be translated into system problems or solutions; and, vice versa.

In a recent article geographer Louise Amoore distinguishes two concerns of geographical studies of cloud-based software, concerns that she labels “Cloud I” and “Cloud II”:

“Cloud I” or a geography of cloud forms, is concerned with the identification and spatial location of data centres where the cloud is thought to materialize. Here the cloud is understood within a particular history of observation, one where the apparently abstract and obscure world can be brought into vision and rendered intelligible. In the second variant, “Cloud II” or the geography of a cloud analytic, the cloud is a bundle of experimental algorithmic techniques acting upon the threshold of perception itself.³⁵

Here again we see this dichotomy between the materialities of systems (Cloud I) versus “algorithmic techniques” (Cloud II). What is left unsorted in a taxonomy like this is a dialectical concern: how do algorithms influence systems and systems algorithms?

In the current geography literature of software studies there are at least two approaches to studying this gap between algorithms and systems: one might be labeled “transduction” after its key developed by the philosophers Bernard Stiegler and Gilbert Simondon³⁶; the other is called “assemblage” after an approach that combines insights from the philosophers Gilles Deleuze and Bruno Latour.³⁷ The later will be used in what follows.

Print Capitalism and Language Planning
The linguistic capitalism of Google is comparable to the print capitalism that engendered the nation-states of the nineteenth century. By comparing the two it should be possible to see why contemporary software sorted spaces and distances are of both economic and geographic concern.

According to the political scientist Benedict Anderson, “print capitalism” becomes possible when words become commoditized: when words become things that can be bought and sold, in volume, and widely distributed. So, print capitalism is an economic possibility dependent on the invention of the printed book, the creation of productions of frequent publication – like the daily newspaper — the development of the high-speed printing press, and means of rapid distribution, like swift ships and rapid rail.³⁸ The print capitalist has a business interest in the homogenization and standardization of languages so that one edition can be sold everywhere to everyone.

However, historically the standardization of languages was a political power not a power exercised directly by businesses that might profit from it. The practitioners of language planning – the discipline of standardizing and homogenizing languages — were empires and nation-states aiming to produce and enforce a standardized, written language. Latin, Chinese, and Arabic were languages of empire.

Key to the production of a standardized language are reference works, like grammars and dictionaries. It was for this reason that Antonio de Nebrija presented his grammar of Castilian to Queen Isabella in 1492 to be used as a tool of empire, to tame the “outlandish tongues” of those her empire had conquered.³⁹

While language planning of this sort is a very old strategy of empire, the nineteenth century language planners, caricatured by Lewis Carroll’s Humpty Dumpty, were developing versions tailored for nation-states, rather than empires. One might say that the invention of the modern dictionary coincided with the invention of the political formation we now know as the nation-state.
From the middle of the eighteenth century through the middle of the nineteenth — the time of Noah Webster – it was within the realm of the possible for one person to write a dictionary. It took Webster decades and his critics thought him mad⁴⁰, but he accomplished the task primarily with the aim of unifying the spoken and written language of the new Republic. It was a political project. And, in Webster’s case it was done against the King, or more precisely against the King’s English from which he was trying to differentiate American English. “A national language is a national tie, and what country wants it more than America?”⁴¹

Before, after and during the nineteenth century, Webster was far from singular in his ambitions.⁴² The creation of standard reference works – dictionaries, thesauri, grammars – were seen as essential tools of nation building. Regularizing the lexicon and the grammar of a language became a means to connect the population of a nation-state together – to keep them “on the same page” — through newspapers and other mass distribution publications.

Today, “language planning” largely sits somewhere between an academic discipline (associated with applied linguistics) and a quasi-governmental power. The great expert of Scandinavian languages, Einar Haugen, wrote an article in 1961 entitled “Language Planning in Modern Norway.” The topic of the article is the Norwegian state’s ongoing efforts to differentiate and standardize two written forms of Norwegian (now known as “bokmål” and “nynorsk”). He begins by pointing out some language limits of linguistics as a discipline: “Linguists tend to look askance on normative linguistics, because it brings in an element which is not purely scientific. Some of them even have emotional reactions to it like that suggested by the title of Robert E. Hall Jr.’s Leave Your Language Alone! In Bloomfield’s Language (e.g., 496 ff.) one will find expressed a distaste for the ‘authoritarianism’ of the usual school norm, particularly when it is based on erroneous observation of good usage. Linguistics as such is obviously not equipped to deal with these problems, which belong in the realm of social and political values.”⁴³

In general, linguistics is supposed to be descriptive rather than prescriptive in orientation. Modern linguists study language as it is and do not necessarily make recommendations about language as it ought to be. This puts them at some distance from early-modern grammarians, like Antonio de Nebrija, whose grammar of Castilian was written with normative political goals with implications for both empire and education.

Consequently, the past few decades of academic work on language planning has not taken place within the discipline of linguistics, but rather, within the field of applied linguistics, a field largely concerned with the teaching of language in schools. Within applied linguistics, there exists the more specific field of language policy and language planning (LPLP).⁴⁴ As a field of research, LPLP is especially concerned with status, corpus, and acquisition planning. Status planning is the legal process of making a language official. Corpus planning is the specification of orthography, grammar, and pronunciation. Acquisition planning concerns educational policy and management to ensure that the language is taught in school.⁴⁵ As an academic field, LPLP is largely pursued with applied linguistics, but as a normative practice it is accomplished in the institutions of law, policy and management. LPLP is executed in government agencies (e.g., ministries of cultural, military, and foreign affairs); education agencies (at national, state, and local levels); and, quasi or non-governmental agencies (such as the civil service, the court system, and language agencies such as the French Alliance Française, and the German Goethe Institute).⁴⁶ Nevertheless, in her recent book on LPLP, Sue Wright argues that although language planning takes place in formal institutions, it has also always included a set of informal activities (e.g., when a parent corrects a child at the dinner table).⁴⁷

While, as a field, LPLP is able to capture the institutional forces of language planning, the technological context is largely overlooked. Although, it is interesting to see how Wright, in her discussion of media, integrates Benedict Anderson’s work on print capitalism and, thereby, considerations of the homogenizing forces of media technology.⁴⁸

Anderson, following Marshall McLuhan,⁴⁹ points out that the book was the first mass-produced, industrial commodity. And, “…the newspaper is merely an ‘extreme form’ of the book, a book sold on a colossal scale, but of ephemeral popularity. Might we say: one-day best-sellers?”⁵⁰ Thus, book and newspaper publishers supported and reinforced the standardization of language: the greater the number of people who could read the language, the larger the publishers’ potential market. Thus, commercial concerns benefited from the regularization and standardization of language. One could even argue that an interest in increasing market size was a force in the creation of new genres of writing.⁵¹ In this manner, standardized language opens a larger potential market that can only be served by new technologies of distribution; and, in a mutually recursive manner, books and newspapers circulating across a large readership, demonstrate and thereby induce (e.g., motivate readers to learn) the standardized language.

Language Differentiation and Language Distance
From the literature of LPLP and from Anderson’s work on the nineteenth century production of nation-states, we know that the standardization and homogenization of “national” languages was only one operation in the production of these new language spaces. There was also an opposing operation of language differentiation at work. Thus, for example, Wright points out that there was not any pressing need to distinguish dialects at what would become the French/Italian border. That is, there was not until France began to define itself as a nation-state with a national language that, necessarily, had to be distinguished from the national languages of others, like the Italians.⁵² One can see how, on the one hand, standardized spelling, grammar, and pronunciation could be a force for the homogenization of language. But how, on the other hand, could nationally sanctioned dictionaries and grammars be employed as a force for language differentiation distinguishing, for example, the French word for tree, “arbre,” and the Italian word for the same “albero”? At the French/Italian border, the pronunciation might be almost identical, but standardized orthography opens a huge distance — as big as a nation-state – between “arbre” and “albero.”⁵³

Publishers hoping to have a market as large as possible would, of course, prefer to smooth over the differences between national languages, like French and Italian. Builders of empires and nation-states would also like a standardized, uniform language space, but one that ends at the geographical borders. Beyond those borders, nationalists are happy to have their language written and spoken, as long as they — the nationalists — are in control of the standards. If not, then the nationalist wants to distinguish their language from other languages. In this way, language spaces and geographical spaces are co-produced.

When language difference is measured using reference works, like dictionaries, controlled by nation-states, there is a homology between different languages and different nation-states (e.g., French is to France as Italian is to Italy). However, when dictionary production becomes purely a matter of commercial calculation, unhinged to the words’ meanings as defined in the print dictionaries; and, when language difference is measured using algorithms (e.g., the Levenshtein distance) not geographical localized accents, then commercial enterprises, like Google, become the concern not just of economists, but also the concern of political geographers studying the composition of international borders.

The Assemblages of Print Capitalism
Face-to-face conversation, oral language, requires us to be able to hear one another. The spoken word fails when, for example, a room becomes too noisy or if we lose our voice or our hearing. The written word depends on different media; it fails if we run out of ink and paper. The printed word is contingent on a much larger and more complicated chain of media technologies. If the journalist’s typewriter gets dropped, the monkey’s boxes of type fall apart⁵⁴, the workers at the pulp mill go on strike, if the printers break, if the delivery truck has engine troubles, if the paperboy’s bicycle gets a flat, we might not get our morning newspaper; or, we might get a newspaper different from the one we would have had if all of the mechanics had been in working order.
Of course, the people are important in this chain of connections, but the chain for print capitalism includes complicated machines. In comparison to the chain of connections essential to written language, print language incorporates many complicated machines; and, incorporates many, many, many more than the chain intrinsic to oral language. Let us call these chains “machinic assemblages” because they include humans, machines, and an eclectic mix of articulations.⁵⁵

Language limits can be thrown into relief when the machinic assembly of another form is referenced metaphorically. For instance, the poet Ralph Waldo Emerson wrote the following about the opening of the American Revolutionary War against the British: “Here once the embattled farmers stood, And fired the shot heard round the world.”⁵⁶ In a world founded on the machinic assemblage of print language, it is commonplace for worlds to have a global distribution: they are “heard” in newspapers. Emerson’s metaphor reveals two common beliefs held about print language limits. First, is an idea that reading is analogous to listening. And, second, is the corollary that aural events can be heard round the world if they are recorded in writing.

One might observe that the language limits of print are assumed to be the language limits of the spoken word. Analogously, it is commonly assumed that the language limits of networked texts are the same as those limits that constrained print language of yesterday. We assume the machinic assemblage of yesterday when we try to imagine the language limits of today. This is why Frédéric Kaplan’s explanation of linguistic capitalism shocks me even if it should not. The machinic assemblage we forget in this case seems unimaginable from the perspective of even just a few years ago: each time we type a letter in Google’s search box, our keystroke is recorded, analyzed by Google, and then, in our browser, we see some of the analysis results as a list of suggestions for completing the word being typed. The speed and global reach of this assemblage is what is so hard to imagine. How can so much be done so fast, so far away from us, but in conjunction with what we are typing now, this very instant?

If we compare the respective machinic assemblages that distinguish print capitalism from what Frédéric Kaplan calls “linguistic capitalism” we find that the assemblage of print capitalism included high-speed printing presses, railways, steamboats, and the definition of uniform grammars and dictionaries created collaboratively between the institutions of journalism, book publishing, and national governments. Anderson argues that this assemblage — in conjunction with the sorts of language planning we associate with the standardization of spelling and pronunciation – facilitates the creation of what he calls “imagined communities.” Specifically, he is concerned with the imagined communities we now know as nation-states.

In contrast, the machinic assemblages of linguistic capitalism include Google and the portions of the Internet that it has copied and saved to its various server farms. But, what are our “imagined communities” of this new machinic assemblage?⁵⁷ We have moved into an entirely different rhythm of reading and writing. No longer are we in an age of journalism – where the aim was the co-authorship of a daily production.

In the words of the BBC journalist and Uzbek poet, Hamid Ismailov, “With the blossom of social media in shape of Facebook and Twitter we are now talking not just about ‘journalism’ but rather ‘houralism’, ‘minutealism’ and ‘secondalism’.”⁵⁸ Ismailov’s neologisms are droll but are yet another example of forgetting about or refusing to acknowledge the machinic assemblage behind the networked word. The events we need to wrangle with occur not in minutes or seconds, but in times far faster than the blink of an eye. The computers and networks of today have event-times of billionths of a second. We need to be asking what happens to our words in one billionth of a second.

One way to approach this question would be to pursue a study of speed, what philosopher Paul Virilio calls “dromology.”⁵⁹ Following Virilio, we might note that any radical increase in speed will introduce new forms of accident or disaster. For example, a few years ago, when the clock speed of computers increased dramatically from megahertz (i.e., millions of cycles per second) to gigahertz (i.e., billions of cycles per second) new forms of financial transaction, like high-frequency trading, became possible. Arguably high-frequency trading faciliates new forms of financial disaster like the May 6, 2010 and August 24, 2015 “flash crashes” of Wall Street.

Semiotics and Machines
What if, instead of dromology, we employ a form of semiology to interrogate the limits of contemporary language and language planning? Taking this approach, the key question is this: How is language connected to the world? We say, for example, due to the Internet, we now live in a connected world. We say this because we can now move bits – words, images, data, and video – around the globe at the speed of light. So, speed is important. But, it is important semiologically because the speed with which information moves gives us new connections between people and between things, between language and the world, and between one word and another. Thus, today, any answer to the question “How is language connected to the world?” needs to take into account the vast reach of computer networks because, through those networks, words are connected to words, and words are connected to the world in new ways.

Let us consider these contemporary relations between language and the world from a historical perspective and revisit the words of Michel Foucault cited at the start of this article. In his book The Order of Things, Michel Foucault wrote “… in the sixteenth century, one asked oneself how it was possible to know that a sign did in fact designate what it signified; from the seventeenth century, one began to ask how a sign could be linked to what it signified.”⁶⁰

So, how might one answer this seventeenth century question? How is a sign linked to what it signifies? Using the technical vocabulary of the Port-Royal grammarians, as Foucault did, we might say that, increasingly, the connection between a signifier and a signified is a computer or network of computers. The links between words and things, words and other words, words and people are – increasingly – machines, specifically, the machinery of computer hardware and software. But, here we see the rupture between the so-called Classical episteme of representation and, even, the nineteenth and twentieth century concern with signification. A piece of text, a piece of software, written here can trigger or respond to an event initiated there, far away on the other side of the world. Software is not so much a means of representation as it is a means of instrumentation and manipulation from afar.

Let me underline my point by referring to semiotician Charles Sanders Peirce’s trichotomy of signs: the much discussed icon, index, and symbol.⁶¹ An icon is said to be a sign that physically resembles what it stands for – like the icons we see on our computers. An index is said to correlate – usually via some connection we can explain with chemistry or physics – with what it stands for; thus, smoke could be said to be an index for fire. In contrast to both icons and indices, symbols are signs linked to what they stand for according to psychological or cultural conditions. Thus, the English word “tree” is connected to the roots, bark, and leaves of a tree, not by the word’s resemblance to the plant, not by chemistry or physics, but by cultural conventions that we are familiar with if we know something of the English language and/or have access to a dictionary. My point is that this threesome – of icon, index, and symbol – needs to be extended into a foursome. To articulate that, increasingly, the connection between a signifier and a signified is a computer network, we need reference to a fourth kind of sign that I will name a “machine” or, more specifically, a machine of computation, a computer.

Like a symbol, a machine is a sign that links the signifier and a signified together via a cultural production of language planning. In the case of the symbol, this artifact of language planning is the dictionary. In the case of the machine, this artifact is a complicated machinic assemblage composed of hardware and software, computers and networks. This assemblage include algorithms, but also many other material details. It is a system, not simply an algorithm. So, as a sign, the machine is more akin to a Peircean symbol than it is to an icon or an index because both machines and symbols are subject to forms of language planning.

But, unlike a symbol, a computer has autonomous performative powers. That is why founders of computer science, like John von Neumann, developed the theory of automata in order to develop a means to reason about a form of language that is tantamount to a machine.⁶²

And, in a complementary move, linguists like Noam Chomsky developed theories of language that are computational machines. For Chomsky and others, a theory of language is no more and no less than a computer program, a piece of software. In an article published in 1964, “On the Notion ‘Rule of Grammar’,” Chomsky describes in a precise manner what he means when he equates a grammar to a machine, a device: “By a grammar of the language L I will mean a device of some sort (that is, a set of rules) that provides, at least, a complete specification of an infinite set of grammatical sentences of L and their structural descriptions.”⁶³ So, a grammar is a device comprised of a set of rules. But, then, epistemologically, what is a grammar? It is, according to Chomsky, a theory: “A grammar, in the sense described above, is essentially a theory of the sentences of a language.”⁶⁴

This strange equivalence posited between language and computational machines was already apparent to Alan Turing by the beginning of the 1950s.⁶⁵ And, after that, this kind of formalism pullulated into academic disciplines like linguistics, computer science, logic and mathematics. Yet, outside of academics in the everyday worlds of commonsense, to imagine that language is more-or-less meaningless and can be a kind of computational machine and vice versa is out of bounds, it exceeds the language limits of the everyday. The notion exceeds our education; our sense of time and timing; our understanding of referential semantics – that which connects a word to the world; and, everyday experience with performative pragmatics – the work a word does in the world. But, today, in the guise of Google’s various services and elsewhere on the Internet, software and hardware systems redefine the ordering and distances between words and thus re-enact, on a global and industrial scale, what would have been, previously, just academic, formalist theories of language.

Conclusion
A decade ago, in 2007, at the American Association of Geographers annual conference, geographers Martin Dodge and Rob Kitchin and Matthew Zook convened a double panel session on the topic of software studies. Later, in 2009, session papers were published in the journal Environment and Planning A. In their guest editorial for the published papers, Dodge, Kitchin and Zook wrote

Software matters today, and it will matter more so in the decade to come, as various aspects of pervasive computing play out. How do we begin to make sense of what this might mean? One way is to analyse the way in which software can, quite literally, make space. Code beckons into being sociospatial relations that are dependent on the effective operation of software; what Dodge and Kitchin have called “code/space.” As such, geographers can potentially contribute valuable new perspectives to the emerging field of software studies.⁶⁶

Writing now, ten years later, after Nick Lally and Ryan Burns convened a group of us for a multi-session panel on software studies at the 2016 Association of American Geographers conference, it seems clear to me that Dodge’s and Kitchin’s predictions came true. What is perhaps more remarkable is how their list of open questions still ring true and resonate with contemporary conditions.

Among their open questions was this one, one that animates this paper: “How do concepts of near and distant, codified and tacit, evolve in concert with software?”⁶⁷ Clearly this is a much larger question than the one that is addressed here. In this article, my more modest endeavor has been to trace out some of the ways in which software and systems, by redefining established language limits, disrupt and reorder the interweaving of geographical, political, economic, and language spaces.

Acknowledgements
I would like to thank Nick Lally and Ryan Burns for including me in a panel at the spring 2016 Association of American Geographers conference in San Francisco. I would also like to thank them for prompting me to revisit this topic and for writing a very constructive critique of my first draft. Thanks too to the three anonymous Computational Culture reviewers. I hope I have addressed some of your excellent suggestions! I want to thank Bernard Stiegler who initiated my thinking on this topic with an invitation to publicly dialog with Frédéric Kaplan at the Centre Pompidou in December of 2014. A record of that dialog is accessible online here: https://digital-studies.org/wp/frederic-kaplan-and-warren-sack-02122014/ I drafted this article later, in the fall of 2016, during my time as a fellow at the Paris Institute for Advanced Studies. This article benefitted from a fellowship at the Paris Institute for Advanced Studies (France), with the financial support of the French State managed by the Agence Nationale de la Recherche, programme “Investissements d’avenir”, (ANR-11- LABX-0027- 01 Labex RFIEA+).

Notes

Nigel Thrift and Shaun French, “The Automatic Production of Space,” Transactions of the Institute of British Geographers, NS 27, 2002: 309–335; p. 331. ↩
Foucault, Michel (2005-08-18). The Order of Things (Routledge Classics) (Kindle Locations 1277-1284). Taylor and Francis. Kindle Edition. ↩
What Foucault called “epistemes” is similar to what historian Thomas Kuhn called “paradigms.” In Kuhn’s vocabulary paradigms are “incommensurable.” Kuhn, Thomas S. The Structure of Scientific Revolutions. Chicago: University of Chicago Press, 1970. ↩
Cf., Sandra Harding, Whose Science? Whose Knowledge?: Thinking from Women’s Lives (Open University Press, 1991) Feminist epistemology interrogates not only what can be known, but what is institutionally sanctioned to be known by whom; and, it stresses that from a privileged, frequently male, perspective many issues and ideas are invisible even unthinkable even while, seen from another, less privileged perspective, the same issues and ideas are obvious. Thus, at the very least, any given Foucaultian episteme could be descriptive of men’s knowledge but completely inapplicable to women’s knowledge. ↩
Comparable to “language limits” are what linguistic anthropologists call either “language ideologies” or “linguistic ideologies.” Anthropologist Michael Silverstein defined “language ideologies” in 1979 as “sets of beliefs about language articulated by users as a rationalization or justification of perceived language structure and use” Silverstein, Michael (1979) “Language Structure and Linguistic Ideology.” In P. Clyne, W. Hanks, and C. Hofbauer (eds.), The Elements: A Parasession on Linguistic Units and Levels (Chicago: Chicago Linguistic Society), pp. 193–248. ↩
Cf., Mike Crang and Nigel Thrift, Thinking Space (Routledge, 2000). ↩
Shannon, Claude Elwood, and Warren Weaver. The Mathematical Theory of Communication. Urbana: University of Illinois Press, 1949, p. 8. ↩
Warren Sack, “Grammar,” The Software Arts (MIT Press, forthcoming). ↩
Huck, Geoffrey J., and John A. Goldsmith. Ideology and Linguistic Theory: Noam Chomsky and the Deep Structure Debates. London; New York: Routledge, 1995, p. 13. ↩
Chomsky, N. (1977). Language and responsibility. New York: The New Press: 139. ↩
Leonard Bloomfield, “Review of Was ist ein Satz?, by Johan Ries,” Language 7 (1931). 204-209; as cited by Huck and Goldsmith, p. 8. ↩
Zellig Harris. 1940. “Review of Louis H. Gray, Foundations of Language.” Language 16, 216-223; as cited in Huck and Goldstein, 1995, p. 10. ↩
Henri Poincaré, Science and Method. New York: Dover, 1952 as cited by Davis, Martin. The Universal Computer: The Road from Leibniz to Turing. New York: Norton, 2000, p. 93. ↩
Marcus Tomalin, Linguistics and the Formal Sciences: The Origins of Generative Grammar, Cambridge Studies in Linguistics (Cambridge University Press, 2006). ↩
Crang and Thrift, p. 4. ↩
Kaplan, Frédéric. “Linguistic Capitalism and Algorithmic Mediation.” Representations 127 Summer 2014: 57-63. ↩
Kaplan, 2014, p. 57. ↩
I have chosen the example of “tree” because it is so frequently discussed in reference to linguist Ferdinand de Saussure’s theory of the sign; e.g., see “Whether we are seeking the meaning of the Latin word arbor or the word by which Latin designates the concept ‘tree’,…” Saussure, Ferdinand de; Roy Harris. Course in General Linguistics (Open Court Classics) (Kindle Locations 1605-1606). ↩
Brin, S.; Page, L. (1998). “The anatomy of a large-scale hypertextual Web search engine.” Computer Networks and ISDN Systems. 30: 107–117. ↩
For example, in “How Google makes improvements to its search algorithm” (https://www.youtube.com/watch?v=J5RZOU6vK4Q) Google engineers state that almost two changes a day are made to Google’s search algorithm. Note also Bernard Rieder’s excellent history of the PageRank algorithm: Bernhard Rieder, “What is in PageRank? A Historical and Conceptual Investigation of a Recursive Status Index,” Computational Culture: A Journal of Software Studies, Issue 3, 28 September 2012. http://computationalculture.net/article/what_is_in_pagerank. ↩
See Google AdWords Keyword Planner: https://adwords.google.com. See also Pip Thornton’s project “{poem.py} : a critique of linguistic capitalism”: https://linguisticgeographies.com/2016/06/12/poem-py-acritique-
of-linguistic-capitalism/ ↩
Cf., “Search Advertising With Google: Part 1 – Introduction,” https://www.youtube.com/watch?v=umL3CTEbmdw&list=PL28D81F8088CD3D88 ↩
Richard Terdiman, Discourse/Counter-Discourse: The Theory and Practice of Symbolic Resistance in Nineteenth-Century France. Ithaca: Cornell University Press, 1985. ↩
Cf., Richard Rogers (ed.) Preferred Placement: Knowledge Politics on the Web (Jan Van Eyck Editions, the Netherlands, 2000). ↩
Robert A. Wagner and Michael J. Fischer. “The string-to-string correction problem.” J. ACM, 21(1):168–173, 1974. ↩
Kaplan, 2014, p. 59. ↩
Kaplan is not alone in envisioning Google’s three algorithms as far more than a business model. Comparable to Kaplan’s notion of “linguistic capitalism” is, for instance, the “semantic capitalism” articulated by Christophe Bruno and elaborated by Martin Feuz, Matthew Fuller, and Felix Stalder in a First Monday article: Christophe Bruno, 2006. “Interview we–make–money–not–art,” cited in http://distributedcreativity.typepad.com/idc/2006/03/the_power_of_wo.html, accessed 7 December 2010; Martin Feuz, Matthew Fuller, and Felix Stalder. “Personal web searching in the age of semantic capitalism: Diagnosing the mechanisms of personalization,” First Monday, Volume 16, Number 2 – 7 February 2011. ↩
Kaplan, 2014, p. 60. ↩
The set of techniques employed by Google for its translation service (https://translate.google.com/), including deep neural networks, can be studied here: https://research.googleblog.com/2016/09/a-neural-network-for-machine.html For my more detailed argument on why I do not think these techniques have anything to do with meaning see my forthcoming book: Warren Sack, “Grammar,” The Software Arts (MIT Press, forthcoming). ↩
V.I. Levenshtein, “Binary codes capable of correcting, insertions, and reversals,” in Soviet Physics – Doklady, Vol. 10, No. 8., pp. 707-710, February 1966; translated from Doklady Akademii Nauk SSSR, Vol. 163, No. 4, pp 845-848, August 1965. ↩
Peter Norvig, Google’s director of research, discusses some related issues in a short tutorial he wrote on spelling correction. His toy implementation is in the Python programming language. See Peter Norvig, “How to Write a Spelling Corrector,” (February 2007 – August 2016); online here: http://norvig.com/spell-correct.html In his post, Norvig also refers to a number of more detailed and technical references that address the problem; e.g., Casey Whitelaw and Ben Hutchinson and Grace Y Chung and Gerard Ellis, “Using the Web for Language Independent Spellchecking and Autocorrection,” Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 890–899, Singapore, 6-7 August 2009; online here: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/36180.pdf ↩
Edward Fredkin (1960). “Trie Memory.” Communications of the ACM. 3 (9): 490–499. ↩
For an extended discussion on the definition of algorithms, see Warren Sack, “Algorithm,” The Software Arts (MIT Press, forthcoming) ↩
Tarleton Gillespie, “Algorithm,” in Digital Keywords: A Vocabulary of Information Society and Culture (Benjamin Peters, editor) Princeton University Press, 2016. ↩
Louise Amoore, “Cloud geographies: Computing, data, sovereignty,” Progress in Human Geography (2016) 1–21. ↩
Citing software studies scholar Adrian Mackenzie, geographers Rob Kitchin and Martin Dodge maintain “transduction is a kind of operation, … transduction involves a domain taking-on-form, sometimes repeatedly.” (in Mackenzie, Adrian. Transductions: Bodies and Machines at Speed (New York: Continuum, 2002), p. 10 as cited in Rob Kitchin and Martin Dodge, Code/Space: Software and Everyday Life (MIT Press, 2011), p. 72). Mackenzie, in turn, for his description of transduction, draws form the philosopher Gilbert Simondon (e.g., Gilbert Simondon, “The Genesis of the Individual,” in Jonathan Crary & Sanford Kwinter (eds.), Incorporations (New York: Zone Books, 1992): 297–319). Using transduction as a conceptual tool, Kitchin and Dodge develop an approach to what they call “code/space,” a means to analyze how code and physical space mutually shape one another.
To understand transduction, one must also understand the key terms of “technics” and “technicity” as developed by the philosopher Bernard Stiegler. Samuel Kinsley, in an article titled “The matter of ‘virtual’ geographies,” provides a succinct introduction to these concepts for geographers interested in software: “A useful way of thinking about what technology is and does in relation to ‘the human’ is the concept of ‘technics’, as the problematic and constitutive relation between what we call ‘human’ and ‘technology’, and accordingly ‘technicity’ as the (emergent) qualities of that relation as it is performed.” (Samuel Kinsley, “The matter of ‘virtual’ geographies,” Progress in Human Geography 2014, Vol. 38(3) 364–384; p. 371) ↩
This approach is summarized by several articles in the literature of geography including Graham Pickren, ’The global assemblage of digital flow: Critical data studies and the infrastructures of computing. Progress in Human Geography 2016, 1-19. ↩
Anderson, Benedict R. O’G. Imagined Communities: Reflections on the Origin and Spread of Nationalism. London; New York: Verso, 2006. ↩
“…recall the time when I presented you with a draft of this book earlier this year in Salamanca. At this time, you asked me what end such a grammar could possibly serve. Upon this, the Bishop of Avila interrupted to answer in my stead. What he said was this: ‘Soon Your Majesty will have placed her yoke upon many barbarians who speak outlandish tongues. By this, your victory, these people shall stand in a new need; the need for the laws the victor owes to the vanquished, and the need for the language we shall bring with us. My grammar shall serve to impart to them the Castilian tongue, as we have used grammar to teach Latin to our young’.” As cited in Illich, Ivan, and Barry Sanders. ABC: The Alphabetization of the Popular Mind. San Francisco: North Point Press, 1988. ↩
Lepore, Jill (2008). “Introduction.” In Schulman, Arthur. Websterisms: A Collection of Words and Definitions Set Forth by the Founding Father of American English. Free Press. ↩
Noah Webster, “To Timothy Pickering” (May 25, 1786). Letters, 52. ↩
Samuel Johnson’s extremely influential A Dictionary of the English Language was published in 1755. Johnson’s was the definitive dictionary until the advent of the Oxford English Dictionary (OED) published 150 years later. The first edition of the OED was seventy years in the making and, interestingly, one of its most important contributors was a criminally insane American Civil War veteran. An entertaining history of is told by Simon Winchester in his book The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary (2009). ↩
Einar Haugen, “Language Planning in Modern Norway,” Scandinavian Studies Vol. 33, No. 2 (May,1961), pp. 68-81. ↩
As the renowned linguist and inventor of systemic functional grammar M. A. K. Halliday put it, “Language planning is a highly complex set of activities involving the intersection of two very different and potentially conflicting themes: one that of ‘meaning’ common to all our activities with language, and other semiotics as well; the other theme that of ‘design’. If we start from the broad distinction between designed systems and evolved systems, then language planning means introducing design processes and design features into a system (namely language) which is naturally evolving.” Halliday, M. “New ways of meaning: the challenge to applied linguistics” in A. Fill and P. Mühlhäusler (eds), The Ecolinguistics Reader: Language, Ecology and Environment. London: Continuum. 2001 (p. 177) as cited in Wright, Sue (2016). Language Policy and Language Planning: From Nationalism to Globalisation, Palgrave Macmillan UK. ↩
Wright, Sue (2016-04-08). Language Policy and Language Planning: From Nationalism to Globalisation (Kindle Locations 61-64). Palgrave Macmillan UK. Kindle Edition. ↩
Robert B. Kaplan, Richard B. Baldauf, Language Planning from Practice to Theory (Bristol, PA: Multilingual Matters Ltd, 1997): p. 6. ↩
“My argument is that, although formal language policy making and language planning is a relatively recent development in terms of human history, as an informal activity it is as old as language itself, plays a crucial role in the distribution of power and resources in all societies, is integral to much political and economic activity and deserves to be studied explicitly from these perspectives.” Wright, Sue (2016-04-08). Language Policy and Language Planning: From Nationalism to Globalisation (Kindle Locations 65-67). Palgrave Macmillan UK. Kindle Edition. ↩
Wright, chapter 2, 2016. ↩
McLuhan, Marshall. The Gutenberg Galaxy: The Making of Typographic Man. (Toronto): University of Toronto Press, 1962, p. 125 as cited by Anderson, Benedict (2006). Imagined Communities: Reflections on the Origin and Spread of Nationalism (p. 35). Verso Books. Kindle Edition. ↩
Anderson, Benedict (2006-11-17). Imagined Communities: Reflections on the Origin and Spread of Nationalism (p. 35). Verso Books. Kindle Edition. ↩
For example, Richard Terdiman argues that early nineteenth century newspapers were essential party organs that were each written from a specific political perspective. Consequently, their distribution was small and subscribers, who were party members, paid a hefty price. Later in the century, in an effort to find a larger public and thereby bring down the price per copy, “objective” journalism was invented to appeal to readers across party lines. Richard Terdiman, Discourse/Counter-Discourse: The Theory and Practice of Symbolic Resistance in Nineteenth-Century France. Ithaca: Cornell University Press, 1985. ↩
“The French concern with the flow of Italian lexis into French in the sixteenth century coincides with the first fixing of borders and the early stirrings of French national identity. Several contemporary works set out to persuade French speakers to avoid these foreign loan words.” Wright, Sue (2016-04-08). Language Policy and Language Planning: From Nationalism to Globalisation (Kindle Locations 1254-1256). Palgrave Macmillan UK. Kindle Edition. ↩
“Up until the time of the first vernacular grammars – in other words, up until the late fifteenth century – lingua or tongue or habla was less like one drawer in a bureau than one color in a spectrum. The comprehensibility of speech was comparable to the intensity of a color…(j)ust as one color may appear with greater or lesser intensity, may bleed into its neighbor, just as landscapes merge into one another, …” Illich, Ivan, and Barry Sanders. ABC: The Alphabetization of the Popular Mind. San Francisco: North Point Press, 1988, pp. 62-63. ↩
“This Séchard was a former journeyman pressman, in typesetters’ jargon, a Bear. Their back-and-forth movement, the way they took themselves from the ink-block to the press and the press to the ink-block, is reminiscent of a near in a cage, and certainly earned them the nickname. In retaliation, the Bears named the typesetters Monkeys, because of the incessant gymnastics required to snatch type out of the hundred fifty-two little boxes that hold it.” Balzac, Honoré de. Lost Illusions. New York: Modern Library, 1985, p. 124. ↩
“Machinic assemblage” is a phrase due to the philosophers Gilles Deleuze and Félix Guattari. “No chain is homogeneous; all of them resemble, rather, a succession of characters from different alphabets in which an ideogram, a pictogram, a tiny image of an elephant passing by, or a rising sun may suddenly make its appearance. In a chain that mixes together phonemes, morphemes, etc. without combining them, papa’s mustache, mama’s upraised arm, a ribbon, a little girl, a cup, a shoe suddenly turn up.” Gilles Deleuze and Félix Guattari, Anti-Oedipus: Capitalism and Schizophenia, Robert Hurley, Mark Seem and Helen R. Lane (translators) (Minneapolis: University of Minnesota Press, 1983) p. 39. Other humanists traversing similar territory have coined similar phrases intended to invoke Deleuzean assemblages; e.g., John Johnstone’s “computational assemblage” in Johnston, John. The Allure of Machinic Life: Cybernetics, Artificial Life, and the New AI (Cambridge, MA: MIT Press, 2008); and Manuel DeLanda’s “computer assemblage” in DeLanda, Manuel. War in the Age of Intelligent Machines. New York, NY: Zone Books, 2003. ↩
Ralph Waldo Emerson, “Concord Hymn” (1837). ↩
In a recent encyclopedia entry, Jen Jack Gieseking compares the “geographical imagination” to Anderson’s notion of “imagined community”: “The geographical imagination is prominently used in regards to nationalist discourses, and profoundly helpful in critiques of colonialism and imperialism. Edward Said’s (2000) imaginative geographies demonstrates how the geographical imagination of citizens’ minds can be manipulated and exploited to portray a fashioned social political history of the state. Benedict Anderson’s (1983) imagined communities describes how the media of a nation can create a shared social identity.” in Gieseking, J. “Geographical Imagination,” In the International Encyclopedia of Geography (eds. D. Richardson, N. Castree, M. Goodchild, A. Jaffrey, W. Liu, A. Kobayashi, and R. Marston). New York: Wiley-Blackwell and the Association of American Geographers. 2017. ↩
Hamid Ismailov , “Media and Academia: do they speak a common language?” Online here: http://www.bbc.co.uk/blogs/legacy/worldservice/writerinresidence/2011/09/media_and_academia_do_they_spe.html ↩
Paul Virilio, Speed and Politics: An Essay on Dromology. New York: Semiotext(e), 1977 (1986): p. 47. ↩
Foucault, Michel (2005-08-18). The Order of Things (Routledge Classics) (Kindle Locations 1277-1284). Taylor and Francis. Kindle Edition. ↩
Charles Sanders Peirce, The Writings of Charles S. Peirce: A Chronological Edition. Volume 2. Eds. Peirce Edition Project. Bloomington I.N: Indiana University Press; pp. 53-54; see also Atkin, A., 2005. “Peirce On The Index and Indexical Reference”. Transactions of The Charles S. Peirce Society. 41 (1), 161–188. ↩
Cf., Mahoney, Michael S., and Thomas Haigh. Histories of Computing (in English). Cambridge, Mass.: Harvard University Press, 2011, p. 133. ↩
Chomsky, “On the Notion ‘Rule of Grammar’,” in Fodor, Jerry A., et al. The Structure of Language: Readings in the Philosophy of Language. Englewood Cliffs, N.J.: Prentice-Hall, 1964, pp. 119-120. ↩
Chomsky, 1964, p. 120. ↩
Lassègue, Jean and Giuseppe Longo. “What Is Turing’s Comparison between Mechanism and Writing Worth?” CIE 2012, LNCS 7318. Ed. Cooper, S.B., A. Dawar, and B. Löwe. Berlin and Heidelberg: Springer-Verlag, 2012. 451–62. ↩
Martin Dodge, Rob Kitchin, and Matthew Zook, 2009, “Guest Editorial: How Does software Make Space? Exploring Some Geographical Dimensions of Pervasive Computing and Software Studies,” Environment and Planning A, Vol 41, No. 6, pages 1283-93. ↩
Dodge and Kitchin, p. 1291. ↩