With the rapid ascent of Big Data – always the capitals – into the vapour clouds of overheated, media-saturated publicity, and its condensation into the steady drip, drip of the recent Snowden leaks, questions about the generation, storage and processing of data have become a matter of concern for many. The thoughts and anxieties of system administrators fretting about security and failover strategies for backing up data in their organisations have become widely distributed, in more ways than one. Like no other of the recent innovations in the field of digital technology, Big Data points up the indissociability of the social, cultural, technical and political qualities of the infrastructures of computational culture, and does so in fairly stark terms. The capacity to generate vast quantities of data as a by-product of basic interactions within networked technologies raises significant questions about the affordances of such infrastructures and the specific technical “phyla” on which these generative mechanisms depend.
If it is true, as some have implied, that Big Data is to be defined merely negatively – as that which exceeds the storage and processing capacities of relational database management systems – any understanding of the former must, of necessity, proceed by way of an understanding of the latter. This is what two of the papers presented in this issue of Computational Culture – those by Castelle and Ruppert – attempt to do. Both papers relate, one directly, one indirectly, to a workshop run under the auspices of Computational Culture at the Wellcome Trust in London in June 2012 that sought to initiate a properly sociotechnical interrogation of the database, one framed partly in the context of public healthcare provision. It seemed to us then – and it does so all the more now – that the database is one of the crucial social technologies of our time. As such, it is in urgent need of technically informed critical analysis, the kind of analysis that has tended to be demoted in favour, for instance, of the more semiotically amenable exploration of the more obviously cultural widgets and interfaces of the front end of Web 2.0 technologies. Yet the history of the development of the database casts an interesting light on otherwise frequently algorithm-centric eulogies to the development of computing.
It is argued by some today that the relational database model, encapsulated in E.F. Codd’s early work, leading as it did to the mid-to-late 1970’s ANSI SPARC definition of the relational model, is superseded by NoSQL database technologies of the kind that form the basis for Big Data processing. However, it seems to us that the historical implementation of the relational model in software architectures of growing sophistication, scale and efficiency, the incorporation of a complex set-theoretical algebra within the technico-material formats of computer software, along with the technocratic invisibility of the database (in the hierarchy of prestige of computer programmers, the DBA is not at the top) are all contributing factors to the expansion of the reach of databases across the digital forms of contemporary society. In this respect, ongoing interrogation of the ways in which the database is entangled in the fabric of the present requires the well-grounded consideration of its history, an issue that Michael Castelle explores in detail.
The database is, as Castelle argues in his paper here, a significant relay in the development of bureaucratic rationality. Databases deal with ‘populations’ in all sorts of ways, processing data concerning them and facilitating the recursive work of producing such populations in ever new and varied mathematical combinations. This gives databases a critical role in relaying and transforming the “political arithmetic” of populations that has historically been associated with the development of statistics. Indeed, the development of the notion of population, a notion which still underpins much database practice, goes hand in hand with record keeping and tabulation practices, as scholars such as Alain Desrosières, Lorraine Daston, Ian Hacking or Theodore Porter have shown. In this regard, the database is perhaps only the latest materialisation of the population form, and the kinds of processing possibilities that it introduces implies a set of connections in the prehistory of computing that is somewhat less scientifically flattering than the geek heroics of Bletchley Park and the cracking of the Enigma code. The history of statistics, and the employment of its political arithmetic in the mechanisation of the governance practices of the State points towards processes of the structuring of things as potential for becoming data without which the social could never be “algorithmised”, with all the consequences this can have (and has had, in the past – the easily overlooked shadows in the history of the IBM corporation, for example).
The entity relational model of the database form can be considered as a point of intersection between the formal-technical operations of databases and social realities. But making this claim perhaps raises more questions than it answers. So an ongoing issue for us concerns the ways in which we can explore and understand these kinds of intersection. Taking the production of data models as a process of real abstraction, for example – through the atomisation of entities into indefinite sets of relations and the combinatorial effects this produces – might then lead to considering data models as generative of external effects, by means of “resonance”, perhaps. How can we tease apart the technical requirements of well-defined data elements from the social realities the “merely technical” supports? This might lead into a consideration of the ways in which the practices that cohere around databases operate, how they frame the work of the database, the way in which an ideal of knowledge – the data model as a representation of a set of entities and the relations between them – and the practices that take root around such an ideal (the role of the knowledge manager, for example) perhaps does a bit more than just operate on data as a knowledge practice. The processes involved in designing a database for some or other end is not simply that of producing a neutral representation of working practices but tacitly frames or reframes those practices.
Here, Evelyn Ruppert’s paper offers an exploration of the complexity of these relations. The algebraic logic of the relational database, working through a material implementation of set theory, excels in the generation of multiple views or versions of entities. Subjects and subject positions multiply in the “dataverse”, sometimes converging in relatively unified figures but just as frequently diverging, effecting a disseminated ontology of populations. “Persons” here become something other than what they may have been in the slower moving world of analogue media forms, although something like a principle of bureaucratic schizophrenisation exists even in the paper world of “classic” bureaucracy. Drawing on Ann-Marie Mol’s study of the performative “enaction” of atherosclerosis in a hospital, Ruppert argues for a close consideration of the ways in which the database technology of a management information system similarly “enacts” young offenders. Somewhat paradoxically, the development of database technologies as surveillance devices within the welfare practices of the contemporary State, used, as Ruppert puts it to “identify, track, monitor, evaluate, govern and intervene in the life chances and trajectories of people” generates a de-centred ontology of young offenders who are classified and categorised in any one of a number of institutional situations, according to a range of social relations, as composite beings. Here the use of the data models that shape an understanding of young offenders in relation to attributes associated with possible contexts of encounters and intervention, eventuating in the production of a management information system, facilitates the data multiplication of subjects. Far from providing a global view of populations, the subjects – the young offenders – generated through this technology are multiplicities produced on an ongoing, recursive basis: recursive populations, rather than the publics examined by Kelty in his work on open source software.
Both Ruppert and Castelle’s papers open up a topic of interrogation that is far from being exhausted, and we have just briefly tried to flag here the interest of their work in relation to particularly current issues regarding the broader politics of computation. We think that database technologies and the very peculiar being we call “data” associated with it, represent a critical point of engagement for the interests characteristic of this journal in the future. There are more general questions to be asked here about the sociotechnical logics of the database, how to get a grasp of the dynamics of data-driven sociality and the rhythms it generates. Database programming itself employs a quasi-formal, quasi-set theoretical vocabulary that has an extraordinary concision – the notion of the relation, the notion of the set, part, inclusion, belonging, and the all the derived entities that can then be articulated – which might be employed beyond their context of origin to address these questions. Another way – starting from the hierarchy of computation prestige – might entail trying to situate RDBMS in computing more generally – narrating where they fit in relation to other data structures, programming languages, the shifting platforms, infrastructures and networks over the short but condensed history of the development of database technologies proper. Even just a diagnostic of the contemporary technical situation with database technologies – ranging from sql, no-sql, ms-access, excel, BigTable, etc, etc; – would point towards developmental tendencies that would situate dominant forms such as the relational model in a conflictual space that points towards the pressure for other possible relationalities, some mundane, some esoteric.
The other papers presented in this issue of Computational Culture might be read, in part, in the context of this broader issue of the articulation of database technologies with other elements crucial to the production of the cataracts of data associated with the sociality of the networked cultures of the present. Incontestable in its centrality to that production is the humble hyperlink, itself emerging out of a history that had envisaged associate connections between electronic data files as one way of tackling the data overload of the day. Whilst today one might conceivably yearn wistfully for the guarded optimism of Vannevar Bush, addressing the possibility of man (sic) better reviewing his (sic) “shady past” and analyzing “more completely and objectively his (sic) present problems” through the kinds of associativity made possible in and by his proposed Memex, the hyperlink has facilitated something like a generalisation of Bush’s notion of the associative trail well beyond the limits originally envisaged for it, becoming a crucial vector, we might say, in the becoming of the internet as something like a digital commons. Indeed, in the elegant simplicity of its mutating affordances, the ramifying array of functions, both social and technical, that it produces as much as it serves, count amongst them precisely the kind of high velocity click-through that is thought so revealing of human behaviour amongst data scientists.
It is to the shifting nature of these kinds of affordances that Anne Helmond’s paper is devoted. Tracking what she sees as the three key “stages” in the development of the hyperlink to its present day incarnation, Helmond argues for an analysis that is able to follow the industrialisation of the hyperlink by search engines, its automation by blog software, and finally, its algorithmisation by social media platforms. Of course, this is a history that is not over, and Helmond’s paper raises a number of interesting questions about the socio-technical politics of the hyperlink. The last stage that she explores, that of the algorithmisation of the hyperlink, coincides with the emergence of Big Data as both a matter of concern and a matter for broad speculative investment. This is because in its short history the link has gone from being a navigational device, hand-coded by webmasters and web 1.0 enthusiasts, to being a device for the execution of calls into databases associated with social media and online shopping platforms.
In this respect, social media have perhaps been responsible for generating a quantum shift in the use of the hyperlink across digital media technologies, bringing organisations who specialise in short links, such as bit.ly, into the frame as crucial new mediators, “obligatory passage points” in the increasingly corporate, balkanized world of the Internet. The recent IPO of Twitter has been the excuse for considerable quantities of speculation of both the financial and mediatic kind in this regard. Twitter, of course, has also provided ready material for budding social media algorithmists to try their hand at Big Data-type processing, Minority Report style “pre-cog” obsessed academic researchers and chancers included. Figuring out how to develop the right kind of pre-emptive analyses of luridly tagged and nebulously defined “hate” speech using topic modelling or some other variant of natural language processing techniques is no longer a remote Hollywood fantasy. But whatever not too distant control regime augurs through social media platforms and the technologies they have at their disposal, such shifts are not foregone conclusions. They remain likely but by no means inevitable, futures. This is one reason why Taina Bucher’s paper here on the Twitter API is so interesting. Developed through interviews with developers, she offers a persuasive account of the intensive investment of programmers in the social media giant’s Application Programmming Interface, and discloses key elements of the politics of this widely used digital “tool”. As a protocological object, Bucher argues (borrowing from Galloway’s work on protocols), the API shapes practices by the way in which it enrolls developers into particular kinds of business model, norms of what counts as a proper use of the API, and so on. As a protocological object, or, as Bucher puts it (borrowing from Serres), a “quasi-object”, the API forms something like a condition of possibility for the sharing of content and data online, a social-material form for the management of social relations. In respect of their reliance on third party developers to facilitate the entanglement of Twitter within the broader fabric of networked cultures, the API becomes something like the object of a calculus on the part of Twitter. As Bucher points out, their enormous success in recruiting third party developers to their technology has been matched by an abrupt betrayal as the protocols associated with the API shift with the business priorities of the organization. But developers work with an API with a sense of the riskiness of what they are getting involved in, and it is the instability of the articulation of practices around the API that Bucher wants to flag up with her view of it as a quasi-object. It stabilizes some relations, whilst it destabilizes others, creating uncertainties that may or may not be productive (a function, perhaps, of the kinds of calculations that different practices are willing to make in respect of it).
Time will tell what happens to the ecology of developer practices that have emerged round the technological devices of a social media corporation like Twitter. The API, like the hyperlink and the relational database management system are all significant ingredients in the myriad practices of the data-driven present. We would like to thank all the authors who have written papers for this issue of Computational Culture, including the reviewers, a discussion of whose excellent work would have taken this discussion much further afield than would have been desirable for an editorial. Their contributions are very much appreciated, as is their patience in the face of the rather long time it took to get this issue of the journal off the ground.
As can readily be seen however by the calls for papers of future issues however, there is a momentum gathering in software studies and related fields that seems likely to ensure that this publishing gap has been something at least of a chance to take a breath before yet another, if more modestly proportioned, deluge of data.