Article information

    Geert Lovink

    Institute for Network Culture, Hogeschool van Amsterdam

    Publication Date
    9th November 2014

Reflections on the MP3 Format: Interview with Jonathan Sterne

Used by hundreds of millions on a daily basis, there is finally a comprehensive study out on the MP3 audio standard. Sound theorist Jonathan Sterne not only describes the political economic background of how this technology came into being in the early 1990s but also provides the reader with an interesting history of sound and hearing in the 20th century in which telephones and radios play surprising roles. MP3 was born out of the challenges of how to ‘push’ live audio through the existing copper infrastructure. This is a story of monopolies, compression, and perceptual capital, “the accumulated value generated by a surplus definition.” In his book Sterne develops the notion of MP3 as the product of perceptual technics, through which a company can economize a channel or storage medium in relation to perception. The MP3 saga boils down to the question of how to make a profit from the insufficiency of the human ear or the distracted state of most listeners.

In terms of discipline and methods Sterne has come up with an interesting mix of cultural studies, science and technology studies (STS), and what he calls ‘format theory’. I can’t wait to read similar studies in the same genre on Skype, Android, on the Moving Picture Experts Group (MPEG) itself, HTML5, .zip, the Rails programming language and internet protocols such as SMTP, SSH, IPv6 and IRC. Let a thousand software (case) studies bloom! Jonathan Sterne is author of The Audible Past (Duke, 2003) and editor of the The Sound Studies Reader (Routledge, 2012). He teaches in the Department of Art History and Communication Studies of McGill University in Montreal, Canada and recently edited an anthology on the politics of academic labor in communication studies.

GL: Jonathan, can you describe for us, in detail, what happens when we create an MP3 file? Whatever computer we use there is still a delay, there is some digitization happening, some compression, but what exactly is going on?

JS: First, thank you for asking all these great and difficult questions. And to readers, thanks for plowing through what’s about to be a lot of prose. Brevity in print is not one of my strong points.

My simplification of the official version goes something like this. You start with a full size digital audio file in .wav or .aiff format. It could be on a compact disc or already in your computer. First, you tell the mp3 encoder how big you want the final file to be. MP3s are measured in kilobits per second, which is essentially how much space they take up in a digital line or on your hard-drive. With that information, the encoder goes to work. First, it removes all the redundant data, and reorganizes things. This is called Huffman coding, and it’s basically the same thing that happens with a .zip file. That process yields a file about half the size of what you’d find on a CD. So far so good. No changes to the sound, just to how the computer handles the data. Today, FLAC and Apple Lossless files are made with a technique like this.

Then it gets interesting. Because this wasn’t small enough for real-time transmission over digital lines in the 1980s, they had to invent other ways to get rid of data. The most important technique is called “perceptual coding.” It’s built around a mathematical model of the gaps and absences within the audible spectrum of human hearing. Basically, the audio is cut into thousands of frames lasting a tiny fraction of a second, and in each frame the encoder compares the frequency content of the audio to what it “knows” about human hearing and removes what it thinks its user won’t hear. It can be rougher or more gentle, depending on what choice you made on the front end—how big you want your file to be. Finally, the mp3 encoder also does some things to the stereo image, based on assumptions about where people do and don’t need to hear stereo, and it cuts off some of the very highest frequencies, assuming (correctly) that most adults don’t hear well above 16khz.

Now, this is all an abstraction and an idealization. Today, I would wager that most mp3s aren’t made through users putting CDs in their computers and ripping them. Rather, they’re made either industrially or from scratch. Apple now has a whole set of instructions for preparing a finished record for direct conversion to iTunes (they use Advanced Audio Coding, a descendent of the mp3 format). I’m sure there are standard protocols for Amazon, for Spotify, and other companies that basically retail compressed audio. Moreover, lots of recording devices now record directly to mp3, the encoding happens before the file ever comes to rest in some other form in their static storage. And if that’s not enough, you’ll see that all of these encoding schemes assume mp3 is the “last” encoding of the audio and then it will be consumed and no longer edited or messed with. But of course, lots of musicians and sound artists use mp3s as the raw material for their work, so you get multiple encodings, and all sorts of other interesting phenomena.

GL: In the book you give one possible reason for the success of MP3, being the rise of privatization and the unwillingness of public investments in (cyber)infrastructure. This resulted in the drive to push more and more content through existing (copper) lines; hence the emphasis on compression. Can we say that from a technological perspective there is no need for compression to start with? Your book reads as if there is an almost perfect historical coincidence in the struggle of standards, around 1993, after the end of the Cold War, and the breakthrough of a neo-liberal market economy on a global level, the rise of the internet and the mobile phone, and then there is the MP3, which falls out of the sky. No conspiracy, right? In this context you introduce the concept of perceptual capital, which generates surplus value from surplus definition (of existing technologies and capacities).

JS: When we are talking about technical media, to use Kittler’s phase, there are always engineering problems. Data are material at their very core; there is always a negotiation happening between the “stuff” of any representation and the assemblages that “store” or “transmit” it. I am using scare-quotes here because all these things exist in tension with one another; they are governed by a relational causality. Without something like a code, for instance, a telegraph infrastructure is effectively worthless. So it is not capitalism that gives us limits—life gives us limits—it is rather the case that capitalism has its particular ways of negotiating limits by finding new ways to monetize things.

That’s where perceptual technics comes in, along with perceptual capital. The basic idea is that engineers tuned the telephone to deliver the minimum amount of signal necessary for speech intelligibility. This allowed them, at least theoretically, to cram more phone calls into a single line. It’s a classic case of relative surplus value; they wanted to make more money out of their infrastructure.

To do this, they had to develop a different account of speech and hearing, and more specifically the limits of human hearing, its gaps and absences. It would have been more accurate to call perceptual capital IMperceptual capital, because AT&T was trying to make money from people not hearing. I spent a lot of time wondering if this was (yet another) case of free labor in the media. But the autonomist argument that new media profits are built around extracting free or cheap labor (for instance, through user forums, “likes” or volunteer beta-testing) doesn’t quite work in the AT&T case. It’s really much more about building human capacities into infrastructures as a kind of operating principle. And very quickly, because AT&T effectively rebuilds hearing science in its own image, we have a situation where everything we think we know about hearing in the state of nature is actually governed by relations between ears and media. Or to put it more economistically, the organizational imperatives of the phone system start to dominate the epistemologies of hearing research.

As to “no conspiracy,” that’s a hard call. Certainly, one can make a structural-causal argument. Where AT&T is harvesting imperceptual capital from its users, the owners of the rights to mp3 (Fraunhofer, AT&T, Thomson, others) are making money off royalties, while compressed media make the internet itself more valuable to people who build and maintain infrastructure or who sell bandwidth. As every mobile data subscriber knows, bandwidth is still the most expensive thing in a network.

GL: For others, your study might be called a part of musicology or science studies. You call it format theory. For me your study is part of the growing tendency of techno-materialism, also called software studies, that emphasizes the importance of often invisible and unknown standards and protocols on the lives of literary billions of people who use this format on a daily basis. In one way it is amazing that you are the first to come up with a comprehensive study of the MP3, twenty years after its release. Do you have an explanation for this? Are there other priorities in academia? Is the study of new media still in its infancy? Or, to put it differently, is there something like a collective techno-unconscious that we are yet unaware of and can only register in retrospect?

JS: In my social world, anyway, nobody can agree on what it means to be materialist, but everybody is sure that materialism is the way ahead. That includes me, so I would certainly accept your reading of the book. Similarly, although I didn’t wind up citing a ton of software studies in the actual book, it’s very much meant to be in dialogue with that work. This book was also heavily influenced by my encounter with science and technology studies because I found myself moving in those circles. As for Format Studies, I do not advocate it. I just liked the name “format theory” for a nose-tweak on the various traditions of media theory. We are living in an age where the edges of things called media are eroding, and scholars will increasingly need to move to very different registers, from the subindividual, to the truly massive in order to understand them. Media are a mess of formats, standards, platforms, infrastructures, protocols, interfaces, signal processing, data processing, consumer electronics, user practices, content legacies, and on and on. And that mess is constantly changing.

So then there’s the question of why it took till 2012 to get a book on mp3s. Part of that is my fault. Your Owl of Minerva argument is probably right, but there’s also a time lag with academic publishing. My mp3 article was written in 2003-4 mostly (it came out in 2006), and then it just took me eight more years to get the book done for all the reasons that it takes mid-career academics a long time to finish books. Michael Bull has a book on iPods, which made sense given that he was talking to users, and users are going to think a lot more about their iPods than their mp3s. My work on the telephone is hugely influenced by Mara Mills’ scholarship in the area, which will be a book soon. A couple of business researchers in the UK published something on mp3s; and John Shiga had a note about perceptual coding as well in an early article (these people are all in my bibliography if you’re curious).

But it’s also a bit about the sociology of academic knowledge. A lot of the more cultural-theoretical work on new media comes out of fields that have for decades now operated on the assumption that the most important sense is vision, and that even if we are talking about audiovisual media, the visual part is the important part. That’s an artifact of the organization of the disciplines, and the institutional sectioning off of music from the other humanities and arts in many English-speaking universities (and acoustics largely being left to the sciences). Of course vision is hugely important, but here we have an historical case where the audio world was far ahead of the visual world. Scholars were busy digging in visual media for antecedents because that’s what they’d been trained to do. Cinema was understood to be aesthetic; telephony was understood to be anaesthetic. So the number of accounts of new media as descendents of cinema are legion; television lacks the high culture patina so it gets less attention; radio, sound recording and telephony sort of fell through the cracks (with some notable exceptions like Frances Dyson’s Sounding New Media).

Sound history actually tells us some things about new media that visual history can’t. Signal processing and information theory are both hyperextensions of sonic problems in the phone system, and they are both central to how new media work today. The value of sound studies is thus not that you get to study sonic phenomena (though of course that’s nice), nor is it in some kind of moral claim about hearing over seeing (which is mostly a ridiculous proposition), but rather that it simply orients us to different histories and objects, some of which offer support for much more robust analyses of our current media situation. As Douglas Kahn says, sound provides a different point of entry.

GL: You seem to shy away from the reductionism of someone like Friedrich Kittler. Instead, you prefer a broader approach that includes ideas that come from Cultural Studies and STS (Science, Technology and Society). Who is afraid of German techno-determinism?

JS: That’s right—I would describe my approach to technology as a mix of cultural studies and science and technology studies, though Kittler and his followers are in there somewhere as well (where there are disagreements, they are more about politics than about techno-determinism as such). I am interested in promoting a humanism of technics, of subjecting technical operations and routines to humanistic interpretation. This means bringing objects traditionally reserved for engineering, science, and social science into the purview of humanistic interpretation and asking after them as cultural artifacts. Even a few years ago you could hear new media scholars lamenting how “boring” questions of infrastructures and standards were, but I think the tide has changed a bit and people now understand that these are rich technocultural forms like any other. On the flip side, you could say this is a very old project, because generations of scholars have understood—and argued for—ways of thinking that do not separate culture from technology.

Combining culture and technology means dissolving both terms—and the assumed gaps between them—into some more robust account of object and contexts. By that I mean the scholar’s job is to formulate original questions. That means not accepting prefabricated objects of study and assuming certain methods belong with certain objects—what Pierre Bourdieu calls “the preconstructed.” He’s talking about sociologists lifting their research problems from newspaper columns, but it works just as well for thinking about the way new media scholars sometimes take their problems, concepts or approaches from advertising, industry, or shallow press coverage. It also requires a “radical contextualism”—to use Larry Grossberg’s term from cultural studies—which means that we can’t assume there is a given set of things to look for as inside or determinate of context. Rather, the goal of analysis is to reconstruct context. If you take the two positions together, it basically means you have to start your analysis by tossing out your assumptions regarding how different institutions, ideas, practices, and theories necessarily fit together, because they don’t necessarily fit together, and the things that did fit together had to be made to fit together, “articulated” to one another, in Stuart Hall’s terms. It’s less a theory than an approach to the intellectual craft.

For the MP3 book, that meant several things. When I started, there were legions of articles coming out about iPods and personal stereos and earbuds, as well as a lot of writing about file sharing, and some of that work was great, but it seemed like it was at the wrong scale because it continued the assumption from a previous generation of media scholars that the unit of analysis could be centered around individuals and consumer electronics (a term with a long history but that has more recently come into general usage to denote end use points of media). There were these tiny software routines that enabled the portable devices to do what they do, and there was this massive technocultural complex—a mess of media infrastructures, international standards, musical practices, and a particular construct of sound and hearing—that made the whole thing possible. When I started doing the reading, I realized quickly the documentation for the format assumed all this tacit knowledge I didn’t have. So suddenly I had to acquire the skills of an oral historian and start interviewing the people involved. When I did, I discovered that they were operating on notions totally different from what humanists have been saying about new media. So suddenly I’m reconstructing this history of 20th century communication in order to describe this otherwise very basic new media phenomenon. But that makes sense—if we want to understand the new in new ways, it may challenge our cherished assumptions about “the old.”

While I hope the book helps others who want to study formats, I also hope it doesn’t lead people to harden its approach into a single position or an argument for “format studies.” As I said in the “Format Theory” chapter, in this case the format mattered, alongside standards, infrastructures, economic systems, and a quest to make use of infra-psychic phenomena. I wouldn’t start another new media study assuming the same issues mattered or that the format level was the right one. To the contrary, I would recommend starting over and assuming you don’t know the proper scale of analysis, and then find out where the materials or actors take you.

GL: You also seem bored by the endless repetition of the same old arguments of the lay-experts about the flatness of the MP3 sound, the supremacy of the turntable sound and the better quality of other compression standards. What does this tiredness indicate? Simon Reynolds refers to you in his Retromania book, which deals with “pop culture’s addiction to its own past.” Reynolds is a classic British music journalist, a soft cultural studies guy, not a hardcore techno-materialist. Yet, he often refers to the MP3 and the digitization of music as the reason why the music industry is stuck in its own past. Memory has proven to be trap. What was once seen as a rich, ever-growing collection of styles and influences one could build-upon is now reduced to a random collection, downloadable within minutes.

JS: Sound quality discussions were another one of those preconstructed arguments about the mp3 that I wanted to rethink. Reynolds repeats a standard industry line about sound quality affecting music sales. I think here the industry is believing its own bullshit about compact discs (to be fair, lots of people don’t believe this) and we are being taken along for the ride. A number of authors (like Kembrew McLeod and Aram Sinnreich) have shown that CDs, despite the marketing on sound quality terms, only took off when record distributers stopped accepting returns on vinyl, a lesson that was learned for the transition from video tape to DVD. “Better sound” was important for marketing, but didn’t automatically lead to commercial success. In fact, I don’t even know of any valid experimental studies that show for the average listener that sonic definition is correlated to musical meaning or pleasure. On the contrary, as John Mowitt argued in 1987, once they got rid of tape hiss and other “obstructions” to clean recordings, musicians immediately sought out new ways to distort their sounds. Meanwhile, the listening test people keep citing work from the 1950s that showed that people tend to prefer the distortions present on the sound reproduction systems they grew up with. Boomers like the compression of 2” tape. People who were university students in 2002 may well prefer the pre-echo of a poorly encoded mp3.

We live in a great historical moment if you love to listen to music, and I think lots of new styles and approaches are constantly popping up. But Reynolds is absolutely right that there is a lack of imagination in the mainstream music industry and a pervasive resistance to novelty tied to a fear of risks. Here we find music as part of a broader media phenomenon, conglomeration. As parts of conglomerates, music companies may or may not be run by people who know much about music, and often they are subject to imperatives elsewhere in the corporation. Lots of media industries are trying to find ways to tell old stories again, to use old properties repeatedly. On top of that, as a result of financialization and debt leveraging, a lot of media companies are cutting down on “the talent” hoping to find replacements in free or cheap labor. They are cutting labor costs, but not costs related to acquisitions or technologies. The results have been damaging to music, but also to journalism (which was mortally wounded by conglomerates before the web ever got to it), Hollywood cinema, network television, and on and on.

There is a political problem around culture in the new media environment. Where before the cultural industries were seen as generating an important part of the value in media, today they are largely imagined as “content.” Anna McCarthy and Aurora Wallace are doing some amazing work on this, but you can even see it in Gina Neff’s narrative of Silicon Alley in New York during the dot com crash. Content is a “downstream” problem from infrastructure, signal processing and consumer electronics. The “creatives” who make it (to use Andrew Ross’ term) are considered an add-on, rather than the basis of the media industry. The world of Silicon Valley is very much invested in showing how things like coding and engineering are meaningful cultural production—and I agree!—but not at the expense of other kinds of cultural production. If culture is nothing but content so people can sell the really valuable bandwidth and hardware, we’ve got a big problem. Again, the conservatism Reynolds sees in labels strikes me as a part of this bigger phenomenon.

GL: Your work on the difference between noise and sound somehow resonates with the way others are writing about the lack of attention. Think of Nicolas Carr in his book The Shallows. You state that we can no longer fully concentrate on music. Tracks are playing in the background while we do other work, travel etc. The MP3 is the format of multi-tasking. Do you think it is possible to regain the capacity to listen to music as an isolated activity? Even in the 1960s/70s it was still possible to sit down or lie down and decipher all the layers of meaning in the multi-track sound. These days this intensity can often only be achieved during (EDM) festivals, where the audience reaches a certain stage of trance, also induced by drugs. What do you make of current solutions such as mindfulness (see Howard Rheingold’s latest work) to restore the attention economy?

JS: I agree we are living through a concerted corporate attention-grab right now. You’ll notice that the solutions posed, like mindfulness, or email de-tox, or whatever, are all personal responses to social problems. I’m all for being focused and in the moment as part of a rich life. But here we need to parse our terms. The distraction of a student sitting in my class facebooking is not the distraction of me listening to a new record while I cook dinner, or the distraction of someone listening to a soap opera in the next room while doing housework, or the overburdened office worker constantly switching from task to task. Those are all slightly different scenarios. These are all different touch-points for institutions, practices and technologies to interact. If we want to fight back against the corporate attention-grab, we need to attack the commercialization of everyday life and the identification of personal needs with consumer needs. It’s an old battle.

That said, I wouldn’t say that “we” can no longer concentrate on music. Rather, the mp3 in both design and use acknowledges a state of distraction that already existed for some time. (But of course lots of people still get lost in their music, and format doesn’t make much of a difference to this phenomenon as far as I can tell.) We assume that people used to pay more rapt attention to stuff than they do now, but we don’t actually know a whole lot of the history. In her study of music in everyday life, Tia DeNora found older people were more likely to report having sat down and listened to music intently at home, but it may well be the case that they said so because they believe that was what they were supposed to say, or that was an activity that they valued. You know how it goes with survey and interview research. Radio historians like David Goodman and Alex Russo have shown that distraction was actually understood as an important part of radio culture from the 1930s on, and probably earlier. In the mid-1970s, social psychologists of music started studying music listening in states of distraction rather than attention. So when we hear jeremiads for attention, we ought to begin by asking how scholars themselves have attended to the history of attention.

There’s one other thing. That living room, where people used to sit and listen to music: it’s changed. The consumer electronics people wanted to sell more speakers and so now you’ve got a home theater built around an “audiovisual receiver” and six speakers. The spectacular listening that happens there is more likely to be in the context of audiovisual spectacles—TV, games, movies. Most 5.1 systems aren’t that great for listening to music, no matter how fancy your format is, because most music is still made for stereo playback. Alongside the portable stereo and the computer and against the declining living room, the car has become an increasingly important site for music listening, at least for those who drive and ride.

GL: Simon Reynolds writes that the iPod is fundamentally asocial. There is an appetite-loss induced by excessive downloading. At the same time he witnesses a resurgence of live music. The ‘sharity’ attitude may seem social but in fact empties out the desire to collect stuff. The French philosopher Bernard Stiegler writes in a similar way about these developments in society (such as malaise). Stiegler is proposing a ‘pharmacological’ approach in which we think poison and medicine together (‘plus/and’). Can we look at music standards in the same way?

JS: One of the things I like about Stiegler is that he understands that technicity is a fundamental dimension of human existence. To put it another way, architects don’t critique buildings by saying we should get rid of all buildings; they argue about how to make better ones. That’s one of Stiegler’s great contributions. So if we are to mix some quotations and paraphrase from his pharmacology book, the current state of music is not a “result of a technological second nature”; to say that the iPod or any listening technology has inherent effects is to confuse a specific deployment with an “automatic becoming” to use his terms. There is no music without technology, so the question is what kinds of musical technology should we make and support? As I try to show in my book, I don’t believe mp3 is the cause alone of massive changes in musical and sonic practice. It is a part of a relational set of causes, techniques, institutions, practices, and decisions by groups in power and massive patterns of emergent use and appropriation. While some music standards are better than others—though I would value openness and interoperability above mystical claims for fidelity—a standard is neither the poison nor the cure. Rather, if there is a “malaise” regarding music, it is the result of a lack of serious attention to music as a cultural problem, what Stiegler would call a “carelessness of thought” (53) regarding music. Something like music activism or music policy, especially a transnational policy, stands a much better chance of producing real remedies for musicians and audiences, and the cultural spheres nurtured by musical practice, than new standards or set of technologies on their own.

GL: The smaller and lighter the universal music library becomes, the heavier it seems to pull us down. Is there a way out or do we just have to wait until the General Boredom flips? Politically speaking we might already be there (think of Occupy and the protests in Brazil, Turkey and Egypt). It just seems that the ‘musical’ component of this movement seems to be lacking. What’s the current position of music in society?

JS: That’s another book, and one I’ve fantasized about writing. Certainly the musical components of the movements you cite isn’t anything like the social movements of the 1960s, but to stop there would be to miss a vital dimension of what’s happening. Music and sound are still very much at the center of mass politics. So that’s the first thing. We can’t use music as a stand-in for sound the way, for instance, that Jacques Attali once did. The scales have “flipped,” to use your term, and now music exists in a field of sonic practices and actions.

Sonic contests are very much at the center of contemporary politics, but its perhaps the intellectuals who are catching up. The #casseroles that overtook the Quebec student movement turned it into a massive collective sonic meditation on democracy. And there are people thinking about sonic politics. Lilian Radovac has written about Occupy’s human microphone as an artifact of New York City’s noise laws, long tied to the city’s history of racial and class conflict. This past summer, I heard Martin Stokes give a really great talk on the soundscape of the Turkish protests, again refracting them as sonic contests. And to mention reactionary uses of music rather than assuming its politics are always going to be ones we agree with, Suzanne Cusick’s writing on music and torture absolutely floors me every time I read it.

GL: Jaron Lanier is not only a VR-guru, he is also a musician. What do you make of his latest move as a technology critic? It seems to be very difficult for him to acknowledge the capitalist/corporate reality of Silicon Valley.

JS: I agree. In You Are Not a Gadget Lanier accepts way too many of the standard Silicon Valley pieties about technology and culture despite his intent to be critical. But his reading of MIDI, another one of those hugely important standards that is under-studied, is spot on.

GL: You must have stopped working on the MP3 topic some years ago. Are you happy that this research is over?

I loved doing the work and I’m very happy the research is over. Present evidence suggests that I’ll be talking about it for a while yet.

Bio: Geert Lovink is a media theorist, internet critic and author of Zero Comments (2007) and Networks Without a Cause (2012). Since 2004 he is researcher in the School for Communication and Media Design at the Amsterdam University of Applied Sciences (HvA) where he is the founding director of the Institute of Network Cultures. His institute recently organized conferences and research networks around topics such as the politics and aesthetics of online video, urban screens, Wikipedia research, critique of the creative industries, the culture of search, internet revenue models, digital publishing strategies and alternatives in social media. He is a media theory professor at the European Graduate School (Saas-Fee) and associated member of the Centre for Digital Cultures at the Leuphana University (Lueneburg/D).

