Infrastructure of Vision: Envisioning the Future through Market Devices

Article Information

  • Author(s): Théo Lepage-Richer
  • Affiliation(s): Department of Modern Culture and Media, Brown University
  • Publication Date: 21st October 2019
  • Issue: 7
  • Citation: Théo Lepage-Richer. “Infrastructure of Vision: Envisioning the Future through Market Devices.” Computational Culture 7 (21st October 2019). http://computationalculture.net/infrastructure-of-vision-envisioning-the-future-through-market-devices/.


Abstract

With applications ranging from object recognition to spatial navigation, computer vision has become a key feature of most contemporary infrastructures. Compared to facial recognition and other techniques of surveillance, privacy-oriented messaging app Snapchat might seem like a fairly banal implementation of this technology; yet, such commodified services have become a key framework through which information is not only captured, classified, and monetized, but also mobilized around a specific vision for the future of infrastructure. The logic governing how Snapchat’s modes of vision prioritize, capture, and make visible the objects and subjects with which it engages might be quite opaque, but three key devices – or technologies of the future as they are framed in this text – offer a privileged perspective on the infrastructure behind the app: Snapchat’s business plans, patents, and functions. While many media theorists situate computer vision within a larger techno-cultural obsession with identifying and classifying people, this article will attempt to complement this body of work by using these devices to highlight the promissory logic embedded in computer vision. By illuminating how visibility, visuality, and vision are conceived and operationalized by the infrastructure supporting apps like Snapchat, these devices shed light on how infrastructure is conceived and constructed as both a real and imaginary space where different futures can be disclosed and enacted. The infrastructure of vision that results, this text agues, necessarily functions as an infrastructure for a vision by making certain futures more visible than others, thus reframing vision as the site where economic interests, media habits, and practices of data capture are both amplified and distorted.


Introduction

Since Frank Rosenblatt’s Perceptron – a neural network processing ‘[visual] inputs impinging directly from the physical environment’1 developed between 1957 and 1961 – computer vision has been a defining function of machine learning. While the Perceptron never reached the level of accuracy at image classification tasks that its military funders were hoping for, the capacity of machine learning systems to recognize objects in images has remained a benchmark in measuring the progress of machine learning as a whole. Between 2010 and 2017 for instance, the ImageNet Large Scale Visual Recognition Challenge – a yearly image indexing competition ‘evaluat[ing] algorithms for object detection and image classification’2 – was recognized as one of the main events in the field of machine learning. For years, it provided both private organizations and academic institutions with an industry-wide standard to compare a wide range of competing architectures and learning models; yet, with most teams achieving near-perfect results in 2017, the event’s organizers questioned the relevance of the marginal improvements in image classification further editions would foster and decided to end the competition altogether.
The obsession of today’s techno-cultural context with ‘vision’ and visual knowledge has been discussed by many scholars. While Wendy Chun highlights how ‘visual knowledge is being transformed and perpetuated’3 by digital practices that are profoundly non-visual in nature, Orit Halpern links the reformulation of vision by computer systems to ‘the production of a range of new tactics, and imaginaries, for the management and orchestration of life.’4 With applications ranging from emotion recognition to spatial navigation, computer vision readily captures the larger operationalization of the visible Chun and Halpern discuss by shaping how the private and public institutions relying on such applications envision the contexts in which they operate. Yet, very little has been written on the persistence of visual knowledge in and around the field of machine learning. In films, business reports, and many fields across the humanities, computer vision’s capacity to analyze and produce visual information is overwhelmingly portrayed as a faculty somehow equivalent to the human experience of vision regardless of the way this technology processes images by transforming them into matrices of numerical pixel values. There then seems to be a strong visual bias underpinning how computer vision, machine learning, and the systems relying on these technologies are conceived – a bias which, this text will argue, can be linked back to the way computer vision has become a key infrastructural component in many industries including the app economy.

One company in particular seems to have fuelled the reformulation of computer vision from a back-end, specialized technology to a highly commoditized service: Snap Inc. When Snap’s flagship product Snapchat – a multimedia messaging app – came out in 2011, it was the first service to allow users to alter visual content in real-time. Back then, platforms like Facebook had already implemented computer vision techniques to classify images and identify the users appearing in them, but Snapchat quickly differentiated itself by promoting computer vision itself as its service. On Snapchat, users can modify pictures of themselves by juxtaposing stickers, virtual masks, and other augmented reality features onto their faces and share the resulting images within their network. That way, Snapchat provides the brands and organizations with which it partners with a privileged framework to use the human face as an interface to not only engage with users and distribute sponsored content, but also extract value from the app economy. Years later, computer vision is now a core component of virtually all popular mobile and social platforms (e.g., TikTok, Facebook Messenger, Instagram, etc.).
In that context, I am interested in untangling the visual bias underpinning machine learning by exploring how the commodification of vision-as-a-service shapes the development of the computer vision technologies and infrastructures on which apps like Snapchat rely. To do so, I will discuss and put in dialogue three ‘devices’ – which I use here in a sense akin to Michel Foucault’s dispositifs, i.e., as objects and discourses with a prescriptive effect on the organization of the contexts in which they circulate5 – that actively shape the services and modes of vision provided by Snap: its business plan, its patents, and the main functions of its app. By respectively situating vision at the levels of discourse and markets, technology and speculation, and finally users and media practices, these three devices expose the multiplicity of technical and non-technical components that are embedded in the infrastructure of vision behind Snapchat’s services. In this regard, these devices all function as technologies of the future that aim to mobilize the necessary conditions for Snap’s modes of vision to come into being, reframing vision as a highly contested site where competing futures are envisioned and enacted. From there, I intend to highlight how these devices conceive and construct infrastructure as both a real and imaginary space making certain futures more visible than others.

Vision as an Infrastructure: Beyond Computer Vision

Before attending to Snap’s business plan, patents, and functions, it is first necessary to provide some background information on the app itself and untangle concepts such as computer vision, vision, and visibility, which will be discussed throughout this text. When first released, Snapchat was posited as a privacy-friendly alternative to image hosting services such as Flickr and Facebook. Rather than hosting images indefinitely, Snapchat makes users’ content available for a limited time and became especially popular among millennials as a platform to share comic or self-deprecating images of themselves with virtual masks.6 In retrospect, Snapchat has made it remarkably banal to stare into a smartphone and have one’s face scanned and analyzed. In many ways, Snapchat’s main innovation was to develop a concept that not only ensured a steady flow of new data on which its computer vision system could be trained, but also playfully integrated users’ facial features and expressions into the broader infrastructure of vision supporting the production and distribution of mobile apps. When processed, the millions of faces Snapchat captures everyday are not so much analyzed for their unique qualities as classified based on features that have been correlated with more complex emotions, relations, and contexts that can be monetized.
What is invoked by Snapchat’s mode of vision is therefore not so much the visual perception of things as they appear in the phenomenal world as the prioritization of what features should be perceived based on their assigned value. The often-assumed equivalence between the ‘vision’ of computer vision and ‘vision’ as visual perception then seems to obscure the field’s defining functions: computer vision might have first been conceived as the automation of visual tasks that humans can easily perform,7 but most computer scientists now emphasize its irreducibility to the abilities of the biological visual system. For AI researcher Ian Goodfellow and his colleagues, computer vision broadly refers to ‘a wide variety of ways of processing images’ that enable applications ‘rang[ing] from reproducing human visual abilities […] to creating entirely new categories of visual abilities.’8 Computer vision therefore encompasses functions involving modes of perception that humans would normally conceive in visual terms, but also whole new competencies such as the perception of sound waves in video inputs.9 When computer vision highlights whole new visual competencies, the notion of vision itself gets reshuffled and redefined, which in return creates new conditions of visibility and orients the future trajectories of computer vision. From the perspective of computer vision, vision itself then goes beyond the perception of the light reflected by objects and include a wide range of metrics and measurements – lidar, depth measurement, electromagnetism, etc. – which expand vision beyond its subjective, embodied experience.
At the same time, vision cannot be simply reduced to the latest breakthroughs in computer vision. When mandated by Apple to develop filters – or lenses, as they are called in the Snapchat vernacular – exclusive to iPhone X users, Snapchat used the phone’s embedded infrared sensors to apply visual alterations with a sense of depth that could not be otherwise perceived by, and achieved through, digital cameras alone.10 In addition to illustrating the role of non-visual metrics in computer vision, this example highlights the type of economic incentives which define and redefine vision by prioritizing what has to be perceived and operationalized by digital systems. As a set of computer vision techniques, vision then challenges conventional understandings of visual perception; as a monetizable service, it defines what has to be perceived and made visible based on technological and economic considerations. Vision can thus be here understood as the product of the interplay between, on the one hand, the techniques and functions of computer vision and, on the other hand, the economic incentives prioritizing what has to be perceived, processed, and operationalized by computer vision.

Vision then emerges as a complex site where both technological breakthroughs and economic incentives establish what is made visible by and to computer vision. By visibility, I here refer to the way certain objects in an image are attributed a greater value in determining how that image will be operationalized. A human face, for instance, could be qualified as ‘more visible’ to Snapchat’s mode of vision than a dog’s muzzle since its presence in an image will have a greater influence on how it will be processed and altered. The more accurate techniques of computer vision become however, the more the logic governing how things are made visible become opaque. In A Prehistory of the Cloud, Tung-Hui Hu claims that, ‘as each infrastructure becomes naturalized, we tend to refer to it with increasing amounts of abstraction.’11 Accordingly, the implementation of computer vision models in most digital infrastructures has led these same infrastructures to become increasingly invisible by obscuring the links between their technical, commercial, and institutional components and the conditions of visibility they impose on subjects and objects. By investigating the conditions through which things are made visible however, it seems possible to reconstruct some of the technical and non-technical aspects that constitute the infrastructure of vision behind them.
This approach is inspired by Philip Agre’s article “Surveillance and Capture” in which he investigates the conditions of visibility produced by two distinct infrastructural frameworks. The first one, surveillance, involves an almost nostalgic relationship to visibility as it entails modes of perception – cameras ‘watching’ people, listening devices ‘eavesdropping,’ etc. – that remain relatable by those monitored. Comparatively, the second framework, capture, refers to both the decentralized collection of data and their usage to model, predict, and delimit subjects’ actions.12 Capture’s visual qualities might be less explicit than surveillance’s, but this framework nevertheless posits that individuals can be more accurately ‘perceived’ by imposing upon them a grammar of action governing how they can interact with other subjects. In that sense, while surveillance can only modulate the limits of what it itself perceives, capture dictates the very limits of what the subjects it interacts with can themselves perceive. ‘To govern,’ Nikolas Rose writes, ‘it is necessary to render visible the space over which government is to be exercised,’13 thus reframing visibility as the infrastructurally-delimited space where power and control are played out and enforced.
The infrastructure of vision at stake here thus seems to be both constitutive of, and constituted by, computer vision, discourses around vision, and the economic incentive prioritizing what is made visible. In that sense, it indeed encompasses various technological components, but also a wide range of non-technical devices that inform the development and implementation of these components. To support this argument, I now turn to three devices through which Snap’s infrastructure of vision is constituted, that is, Snap’s business plan, patents, and functions. In “Mediating Instruments,” Peter Miller and Ted O’Leary undertake a similar endeavour and highlight how speculative devices such as the Moore’s law and other technology roadmaps mediate the human, technological, and institutional components of the networks in which they operate.14 Such devices, they write, ‘link science and the economy [… and in doing so] contribute to the process of making markets.’15 In the same vein, by studying Snapchat’s functions alongside Snap’s business model and portfolio of patents, I intend to not only highlight these devices’ key role in guiding the development of the infrastructure they inhabit, but also demonstrate that the interactions among them directly inform how vision and infrastructure come together into a vision for the future.

On Envisioning: The Business Plan as a Literary Technology

Snap might have lost $293 and $459 million in 2015 and 2016 respectively but its 2017 pre-Initial Public Offering (IPO) business plan nonetheless predicted the app’s first profitable year just around the corner – an assurance that has been enough for Snap to secure more than $2 billion in investment since the plan’s release. The business plan figures among entrepreneurs’ most precious tools: since the early days of the software industry, it has been a privileged tool for start-ups and entrepreneurs to secure the support of investors and other key actors. In “Business Plans,” Martin Giraudeau defines this type of documents as investment proposals in new companies16 and links its emergence to the development of a strong entrepreneurial economy in the second half of the 20th century.17 Comparatively to more resource-heavy industries with slower growth cycles, new software ventures require relatively limited resources and can reach staggering valuations in a short amount of time. With venture capitalists prioritizing companies that have the potential to bring about lucrative exits, the business plan seems especially well-adapted to the specificities of the software industry, as it focuses on communicating a vision for the future that could foster such high growth.
While most of the traditional literature on management and entrepreneurship focuses on what business plans are (e.g., descriptions of companies’ value proposition,18 revenue models,19 etc.), I am rather interested in exploring how these devices intervene upon the markets in which the ventures they describe are set to operate. For entrepreneurship scholars Frédéric Delmar and Scott Shane, business plans should first and foremost be understood as legitimizing tools: they provide a space for entrepreneurs to demonstrate their knowledge of a given market and communicate how their venture will transform it.20 From this perspective, business plans appear to function as both maps and roadmaps. They not only describe the current conditions of a venture’s industry (e.g., available infrastructure and capital, competitors, potential consumers, etc.), but also account for how that venture could use these resources to capitalize on some untapped opportunity. In “What Do Business Models Do?,” Liliana Doganova and Marie Eyquem-Renault describe these devices in similar terms: by reframing the business plan ‘as both a calculative and a narrative device,’21 they argue that business plans envision a certain profitable future in order to mobilize the necessary network of people and resources to catalyze the vision they communicate.22
Building upon a framework similar to Doganova and Eyquem-Renault’s – one recognizing the agency of business plans in envisioning and bringing about a certain future – I am interested in exploring how Snap’s business plan gives primacy to specific modes of vision by transposing them into a hypothetical future in which they play a central role. By presenting ways to monetize Snap’s computer vision technology, Snap’s 2017 business plan not only illustrates the influence of economic incentives on reframing vision as a service, but also highlights the influence of this commodification of vision on what computer vision systems are designed to perceive. As a performative instrument, Snap’s business plan thus emerges as a privileged means through which computer vision and the markets in which it operates are brought together around modes of vision that are intimately linked, as I will advance, to the vision for the future presented in the plan.

Snap’s 2017 business plan is most certainly its most circulated one. Officially registered as Snap’s Form S­1 Registration Statement, it was submitted to the United States Securities and Exchange Commission (SEC) on February 2nd, 2017 in preparation for the company’s IPO.23 As it is the case with all SEC filings, Snap’s S-1 Statement includes a description of the actions to be offered and information about the management of the company, but also a statement about Snap’s value proposition. Snap already sought and secured funding from venture capital funds at earlier points in its history; yet, this version of the company’s business plan was designed specifically for its stock market launch and aimed at making the company’s vision legible to a broader audience of investors.
Tellingly, Snap’s value proposition makes very few references – if any – to the technology behind its app. In the section entitled “Overview,” Snap defines itself as a ‘camera company’ whose products ‘empower people to express themselves, live in the moment, learn about the world, and have fun together.’24 By describing its app as both a piece of hardware – i.e., a camera – and a social enabler, Snap re-establishes some sort of nostalgic relation to image capture: it presents itself as a discrete, self-contained medium as well as a privileged audience within which images can be shared. Many scholars have documented the primacy given to innovation narratives over technical details in business plans: for Caroline Bartel and Raghu Garud, this emphasis on innovation narratives aims to facilitate these texts’ circulation by making them accessible to as many actors as possible.25 Yet, in the context of Snap’s business plan, this omission also plays another role: by promoting such a nostalgic relationship to photography, the plan conflates the app’s computer vision techniques with the visuo-centrism of the camera, thus projecting visual qualities onto technologies that challenge conventional definitions of vision and visuality.
After all, Snap is first and foremost a software and network technology company: images and videos are captured by third-party smartphones, and Snapchat’s intervention rather consists in altering and distributing this visual content. By presenting its product as a piece of hardware instead of a service, Snap obscures the underlying infrastructure mediating the supposedly discrete medium users manipulate. The complex network of cameras, interface components, algorithms, mobile technologies, and servers – to only name a few – on which the app relies is displaced by a more reassuring innovation narrative about a revamped, unitary relationship between a user, a medium, and a visual output. The nostalgic quality of this narrative thus manifests itself by the way Snap’s plan alleviates anxieties about data leaking and real-time surveillance by claiming to restore the type of unitary relationships photography once provided in the context of connected devices. The value proposition presented in this document then uproots vision from any larger infrastructure and reframes Snapchat as a medium which allows users to decide how they are perceived by both the app and their network. That way, this section of the plan reinstitutes some illusory equivalence between the mode of vision enabled by the app’s computer vision technology and users’ previous experiences with cameras and photography.

The plan’s analysis of Snap’s financial conditions however draws a quite different picture than its value proposition. Because of its sustained innovations in network technology, on-the-go editing, and geolocation-enabled functionalities, Snapchat ‘requires a lot of bandwidth and intensive processing,’ the plan states, and therefore rely on ‘high-end mobile devices, high-speed cellular internet,’26 and ‘third-party infrastructure partners, [such as] Google Cloud.’27 In that sense, Snap’s business plan also acknowledges some of the core components of the infrastructure on which the app relies, highlighting the profound ramifications across infrastructural layers of the image alteration and distribution services it provides. In the section entitled “Quarterly Revenue by Geography,” Snap’s business plan closely links the app’s growth potential to ‘the size of the global population with access to strong infrastructure,’28 emphasizing the broader geo-economic conditions that come into play in informing which audiences become the objects of vision. While the company’s value proposition situates vision within a somehow nostalgic innovation narrative, the plan as a whole rather reframes vision as the product of the interplay between computer vision, infrastructure, and the economic incentives underpinning both.
Snap’s revenue model enumerates some of the economic incentives which inform how it develops its infrastructure. By driving its revenues ‘primarily through advertising,’ Snap focuses on markets in which ‘global advertising spend tends to be concentrated.’29 Yet, some of the largest advertising markets mentioned in the plan are also the ones where Snap’s infrastructure is the most underdeveloped: ‘Google, which currently powers [Snap’s] infrastructure, is restricted in China,’30 the plan mentions, and tapping into that market would require further investments to ‘establish[] an operating presence in China.’31 That way, Snap might indeed rely on pre-existing infrastructures but also accelerates the implementation of new infrastructures by identifying audiences who could be monetized through further infrastructure investments. By concluding that the app’s ‘ability to succeed in any given country is largely dependent on its mobile infrastructure and its advertising market,’32 Snap’s business plan then links the technical components that enable its functions to the economic incentives that drive its value, blurring the distinction between infrastructure and market, visibility and value, technology and advertising. The performative nature of Snap’s business plan is therefore expressed here by the way it conflates Snapchat’s vision technologies with a more abstract mode of vision encompassing the infrastructural and economic conditions (e.g., audience analytics, advertising markets, wireless internet coverage, infrastructure partners, etc.) that shape what and whom computer vision systems are designed to perceive.
Snap’s business plan can then be understood as a device akin to what Steven Shapin calls ‘literary technologies.’ For Shapin, this notion encompasses all technologies ‘of trust and assurance that the things […] described can be] done in the way claimed.’33 In a similar fashion, Snap’s business plan provides its readers with the assurance that the vision it communicates can be accomplished in the foreseeable future. It might describe diverging modes of vision – e.g., vision as photographic, as image-centric, as co-constituted by the development of the markets and infrastructures in which it is embedded, etc. – but all these modes come together to form the larger vision for the future the plan brings forward. That way, by describing an imaginary yet coherent infrastructure of vision where economic incentives, geo-economic conditions, and computer vision techniques all function as a coherent whole, the plan projects an illusory closure onto technologies and markets that are still quickly evolving. In that sense, this document offers an imaginary space where Snap’s vision for the future is disclosed in order to mobilize the actors whose support is necessary to bring that vision about. The modes of vision that are described are thus intimately intertwined with the vision for the future in which the plan invites its readers to partake: as they make certain futures more visible than others, these modes of vision function hand-in-hand with the way business plans more generally invite their readers to envision a certain future of which they are themselves key actors.

Visualizing Infrastructure: Patents and the Objects of Vision

Snap’s legal battle against Facebook’s now defunct app Slingshot over the latter’s infringement of two of its most prized patents – “Single mode visual media capture” and “Apparatus and method for single action control”34 – is only one example of the strategic role that patents play in the software industry. Copying competing platforms’ features might be commonplace in the app economy, but Slingshot’s reproduction of the design and execution of some of Snapchat’s defining features ultimately led to the app’s removal from all app stores. This legal battle might not have been the only factor leading to Slingshot’s discontinuation, but this episode nevertheless illustrates how Snap’s filing of patents anticipating the commodification of computer vision both secured and consolidated its position as a key player in the app industry. Patents, this example highlights, are profoundly speculative devices: in the case of Snap, they not only anticipated the rise of vision-as-a-service, but also secured the app’s rights over some of the media practices that would propel that rise. The software industry’s relationship with intellectual property remains however quite complex. In “Carving up the Commons,” Annette Vee highlights how long-lasting debates regarding the patentability of software reflect legal institutions ongoing struggle ‘to define what software is.’35 While software components are subject to the same conditions of eligibility as any other invention – patents can be granted to ‘any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof’ (35 U.S.C. § 101)36 – Vee emphasizes that software’s close proximity with other textual forms that are not covered by intellectual property laws such as mathematics complicates the terms in which it can be patented.37
In many ways, it is to alleviate this ambiguity that Snap’s patents frame their inventions as computing processes linked to processors, servers, and other hardware components. By describing patented software as ‘a computer implemented method,’38 ‘an electronic device includ[ing] digital image sensors,’39 or ‘a machine includ[ing] a processor and a memory connected to the processor,’40 Snap’s patents indirectly account for its software components through the hardware devices on which they run – hardware which, in practice, is almost exclusively owned by third-party providers. When the actual processes are subsequently introduced, their description tends to emphasize the functional aspect of software. Since patent applications require written and visual descriptions as opposed to technical accounts, illustrations are given primacy over code and algorithms as the means to claim ownership over software components. Because of the requirements of intellectual property filings, software companies such Snap can then not only claim ownership over functions for which they do not yet fully master the technology, but also secure key functions that they foresee as becoming fundamental – or infrastructural rather – to the industry as a whole.
If the last section was about the way business plans turn functions into markets, this one will focus on how patents bring together new infrastructures. By highlighting ways to link disparate infrastructural components at the levels of hardware, software, and functions, patents operate like speculative texts which both anticipate and claim ownership over the future forms of infrastructure. Patent applications therefore illuminate their seekers’ anticipations toward their industry as well as how they understand, visualize, and schematize the broader infrastructure in which their inventions are embedded. Snap’s patents thus appear as privileged texts to not only identify the different components that the company considers critical in building tomorrow’s infrastructure of vision, but also highlight how the modes of vision this infrastructure will enable perceive and capture the objects and subjects with which the app engages.

To pursue this line of inquiry, I reviewed the patents either filled or acquired by Snap since its creation in 2011.41 In order to account for the infrastructure of vision portrayed by Snap’s patent application strategies rather than the logic behind the United States’ patent attribution system, I compiled awarded patents as well as patent application publications for which no rights have yet been granted. Patents can be more accurately characterized as strategic and speculative devices than transparent statements of intent: only a fraction of the inventions described in Snap’s patents have been implemented and the execution of those that were often differed greatly from how they were initially portrayed. For that reason, I have analyzed Snap’s patents independently of if they have been implemented or not, focusing rather on how Snap’s patent filings foreground the company’s changing anticipations toward the future of the app industry.
I compiled a total of 85 documents42 and classified them in two ways. First, I assigned each patent to one of the 7 categories I defined based on the main functions and components covered by Snap’s patents: visual media capture (6), which refers to the way the app makes use of the photographic hardware available; interpretation of visual content (16), which encompasses object recognition, emotion analysis, and other functions related to how images are interpreted; interface and interaction (18), which points to how users access images and interact with the app; network technology (23), which includes all the communication technologies that enable the app’s social functionalities; geolocation (8), which comprises all location-specific functions; hardware (10), which covers all aspects related to Snap’s proprietary eyewear camera; and privacy and security (4), which refers to how the app protects images from unauthorized uses. In order to evaluate the prevalence of computer vision-enabled functions in these different categories, I then divided all the patents into two groups: those that involve or are enabled by some sort of image analysis or content awareness (36), and those that do not (49). Finally, I coded the patents based on the year (2013-2018) and order (a, b, c, d, etc.) in which they were granted or published.43
By looking at the content of the patents filled through time, we can reach some general conclusions regarding the evolution of Snap’s assumptions vis-à-vis the nature of vision. Despite Snap’s growing emphasis on its signature eyewear camera from 2017 onward (Snap 2017v), most of its hardware-related patents do not involve any form of content awareness (e.g., Snap 2017w, Snap 2017cc, Snap 2017vv). Computer vision, in these patents, emerges as a hardware agnostic set of techniques more closely related to how images are shared and displayed than captured. Comparatively, patents filled at around the same time in categories such as privacy and security and network technology refer to different ways to use image classification and analysis to optimize the functions they describe. Snap 2017kk, for instance, describes the rerouting and storing of images on different servers based on the relevance of their content for specific networks, while Snap 2016l and Snap 2017nn expose methods through which facial recognition is used to identify unauthorized usages of images shared on the platform. This transition toward greater content awareness is key, since it highlights evolving understandings of what constitutes vision. Patents in visual media capture filled before 2016 cover different methods related to the optimization and storage of images, but those filled afterwards reframe vision as itself enabling functions in other categories, from the real-time alteration of a camera’s video feed to reflect the activity represented (Snap 2017rr) to the automatic generation of 3D models of objects to improve the accuracy of future alterations (Snap 2017uu). Vision, as illustrated by these two last patents, is then conceived not so much as a function but rather as a key infrastructural component enabling other functions.

Vision and infrastructure might be folded together here, but the conditions of visibility produced by the resulting infrastructure of vision remains more ambiguously defined. While many patents describe a mostly face-centric user experience, others portray a rather different paradigm of vision in which brands, not faces, dictate the conditions of visibility through which objects and subjects are made visible. Patents in interpretation of visual content are, for instance, generally aligned with the defining functions of many social apps: they describe the juxtaposition of filters over users’ faces (Snap 2015i), the aggregation of images into galleries based on their content (Snap 2016j), and the analysis of certain facial expressions in terms of the emotions they convey (Snap 2017e). In all these cases, users’ faces are framed as the main objects of vision; they are analyzed for the intentions and emotions they convey, and playfully altered to reflect the internal states or intentions that have been inferred. Snap 2015b however provides the first example of a shift toward another paradigm: in this patent, a method for identifying brands and enhancing their presence in an image is described. Later, Snap 2016k relates a process through which a specific object is put into auction among different brands whose products are related to it; a photo filter featuring the highest bidder’s brand is then juxtaposed on the object in question. Finally, Snap 2017l describes different techniques through which recognizable brand images trigger 3D animations that users can watch in augmented reality on their mobile device.
Objects, in these patents, are made visible insofar as they are given a certain value by the brands that partner with Snap. This does not mean that users’ intentions are no longer considered, but rather that they are analyzed for how they predict users’ receptivity to specific brands. Snap 2017ff describes how contextual cues such as geographical location and surrounding objects can be used as proxies to infer users’ mindset and needs, allowing the platform to deliver contextually relevant branded content and emphasize certain brands via visual cues. Visibility, this last patent implies, is then closely related to the way objects that are first captured through non-visual means – contextual cues, pre-assigned values, correlations, etc. – are subsequently emphasized visually in order to direct users’ attention toward them. The different computing techniques, economic incentives, and infrastructural components which enable Snapchat therefore mediate what is made visible not only to the app’s infrastructure, but also to its users via visual cues emphasizing the objects that are prioritized by the app’s mode of vision.
Visibility is therefore closely linked to the economic incentives that prioritize certain objects over others as the privileged objects of vision for both the app’s infrastructure and users. The face indeed remains a privileged object of vision under that paradigm yet is also conceived as the surface upon which visual cues can be projected to reflect the app’s own mode of vision. Snap 2017mm for instance describes the process through which 3D models of users’ faces are automatically generated in order to alter their facial expressions in real-time. In this patent, visibility is mediated not so much by the juxtaposition of visual cues, as previously discussed, but rather by the very erasure of mediation’s visual traces. Snap’s patents then seem to offer two different accounts of visibility where faces are presented as both the source of expressivity and a malleable surface upon which expressivity can be projected. To illustrate the latter case, many filters now require users to express a certain emotion to trigger an animation (e.g., joy to provoke a rain of Kraft macaroni and cheese, surprise to conjure Beats headphones, etc.) while others simply alter users’ facial features in real-time to match the intention implied by the filter. These two models of visibility – the face as the source of expressivity and the face as a malleable surface – currently co-exist within Snap’s infrastructure and are alternately put into operation according to the contexts in which the app is used or the needs of the company’s partners. The distribution of the visible enacted by Snap then appears to be directly linked to how both users and advertisers conceive vision as a service provided by Snapchat, thus linking the infrastructure of vision on which the app relies to the changing needs and expectations of those contributing to, and benefitting from, the app economy.

Configuring Visuality: Snapchat’s Functions and the Making of Ideal Users

Patents might secure their holders’ rights over specific software components, but the implementation of these components can differ greatly from their original, patented design. Just like a computer does not run an algorithm but an implementation in code of an algorithm,44 the devices on which Snapchat is installed do not so much run Snap’s patented software as implementations of it which reflect the evolving context in which the app operates. Snap’s functions are in many ways applied forms of the larger techniques that enable them: neural networks, Viola–Jones object detection algorithms, and backpropagation are only a few examples of the machine learning models that compose Snap’s toolbox, yet these techniques cannot be reduced to their implementations in Snap’s infrastructure of vision. Snapchat’s functions in fact reflect, I argue, the larger expectations, media habits, and value propositions that inform how these technologies are implemented. Like patents and business plans, functions provide users, partners, and other actors with a specific vision for what these vision technologies are and what they can do. In this regard, I am thus interested in reframing Snapchat’s functions as implementations of larger technological frameworks in order to highlight the expectations that get embedded in the app’s infrastructure of vision.
Snapchat allows its users to modify images and videos captured with their phone’s camera through text, virtual stickers, and augmented reality alterations applied to either their face or surroundings. From there, users can either send these altered outputs to specific contacts as ‘snaps’ or share them among their friends and/or friends’ friends as ‘stories.’ Stories are primarily accessible through their creators’ profile for 24 hours yet can also be modified by users in the same network: with the proper settings, users in a given network can add new images or videos to stories posted by their friends. In many ways, the type of immediacy that Snapchat establishes not only among its users, but also between users and the events, brands, and companies with which the app partners is precisely created by the visible traces of mediation and curating. Users can, for example, access exclusive filters based on their location, receive free perks by sharing pictures of themselves with branded stickers, and enhance a live-stream of their night out with visuals associated with the event they are attending.
Such visible traces resonate with the tension at the heart of Snapchat’s reformulation of vision: while profoundly non-visual in nature, the modes of vision that result from the interplay between computer vision, economic incentives, and infrastructure have become the object of a blossoming visual culture transforming how visual knowledge is produced and disseminated. Visuality, as instantiated here by Snapchat’s interface, functions as a powerful device designed to produce certain affects orienting how users engage with their network. Other apps also attempt to mediate such interactions yet do so by using quite different visual cues. Instagram, for instance, offers a more linear scrolling experience and predominantly features minimally altered images. Calling on some sort of nostalgic relationship to photography, it offers a promise of, as Jay David Bolter and Richard Grusin would call it, ‘transparent immediacy’ which manifests itself by the erasure of ‘the presence of the medium.’45 Snapchat, in comparison, showcases images that are openly and thoroughly colonized by interpretation. Snapchat’s stickers and augmented reality alterations all reflect pre-established correlations and conditions of visibility informed by not only the app’s unique vernacular and defining functions, but also the economic incentives provided by its partners. Not all filters and alterations straightforwardly serve financial purposes, but those that do (e.g., sponsored filters and stories) rely on the same visual cues and tropes as non-sponsored ones, blurring the distinction between monetizable and non-monetizable media practices. The type of visual cues characterizing the app’s functions and interface components hence offer a privileged perspective on the vast network of technical and non-technical components that not only constitute the infrastructure upon which Snapchat relies, but also mediate the relationship between the app’s users and the stakeholders benefitting from the app economy.

The way Snapchat captures and alters visual content differs quite tellingly from how images are then classified, distributed, and spatialized in the app’s interface. Despite the remarkable accuracy with which its computer vision techniques recognize a wide range of objects and contexts, Snapchat distributes images as if they were empty information. While Facebook and Instagram classify images according to what they feature, Snapchat rather displays snaps and stories according to a fairly straightforward network logic: users can only access stories and snaps shared by people with whom they have either strong ties – i.e., friends and friends of friends – or weak ties – i.e., celebrity users and corporate accounts. The way snaps and stories are visually configured in the app thus appears to be more directly informed by the way Snap’s infrastructure maps relationships among users than any other mode of information consumption based on the content of what is shared.
Visuality, in the context of Snapchat’s interface, then emerges as a way to configure visual content in accordance with the way infrastructure structures relationships. Tung-Hui Hu’s account of infrastructure as constituted by the superposition of abstraction layers resonates with the way visuality is spatialized in Snapchat. For Hu, infrastructure is composed of layers of techniques and functions stacked up one over the other, with each layer ‘depend[ing] on the more material layers “below” it to work, but […] not need[ing] to know the exact implementation of those layers.’46 Accordingly, visuality is configured on Snapchat through layers of increasingly abstract components which demonstrate more and more acute levels of content awareness that are not necessarily displayed by the layers below. Users’ snaps and stories, for instance, might be distributed according to a fairly literal network logic, but snaps themselves can be manipulate on-the-go thanks to the app’s advanced computer vision capabilities. Snap’s layered infrastructure – which spans from data centers and network technologies to cloud computing, mobile connectivity, and handheld devices – is then mirrored by the app’s interface and functions: while Snapchat’s classification and spatialisation of snaps and stories disregard the actual content of these units of information, its defining functions nevertheless exhibit extensive content awareness. The way Snapchat alters and then visually arranges images thus reflects the different layers through which its infrastructure not only perceives, captures and processes information, but also maps actual and potential relationships among users.
The point here is not to argue that Snapchat’s visual configuration invites users to identify with the perspective of infrastructure or even reproduce its logic, but rather to say that the restrictive conditions of usage imposed upon users are clearly articulated and delimited by the app’s functions and visual configuration. Users, for instance, are encouraged to consume content produced by those with whom they are clustered as opposed to seeking specific types of content. They might indeed be able to add new images to their friends’ stories but can only do so according to a limited set of parameters (e.g., proximity between two users in a network, shared location, etc.). In that sense, the forms of visual remixing and information seeking allowed by the app remains very limited compared to other platforms which encourage practices such as sharing, reposting, and re-tweeting, and thus constitute quite distinct ‘ideal users’ via media practices that are closely aligned with the app’s unique infrastructure and business model. In “Making Up People,” Ian Hacking writes that such combinations of infrastructural components, institutional influences, and discourses provide ‘a space of possibilities for personhood’47 which opens up new ways of constituting individuals or, in that case, users. As the product of Snap’s infrastructure, discourses about vision, and economic incentives, visuality then emerges in the context of Snapchat as the device through which the app creates ideal users – users who are constituted through specific ways of producing and consuming visual information independently of that information’s content.

A few studies have indeed attempted to quantify and model the behaviour of users on Snapchat,48 but, in this section, I am rather interested in exploring the type of ideal users and media habits implied by the interplay between visuality and infrastructure at the heart of Snapchat. By defining which positions users can occupy through the reconfiguration of vision and visuality, Snapchat’s functions appear to be endowed with clear prescriptive properties. ‘Prescriptive’ is here used in the sense proposed by Madeleine Akrich and Bruno Latour, i.e., as the way devices enforce certain behaviours upon the actors – both human and nonhuman ones – with which they engage.49 As the app rearranges how visual information is presented, it creates new ways to consume visual content and invites users to embrace media habits which reflect how it itself perceives that content. In Our Biometric Future, Kelly Gates links computer vision’s current applications to neoliberal societies’ pressure on subjects to reinvent themselves ‘in order to acclimate to their changing social, economic, and technological environment.’50 As these technologies become increasingly sophisticated, she explains, they become recognized as privileged tool through which subjects can acquire new knowledge about themselves and their environment – yet, it is precisely this assumption that such systems can function ‘as solutions to problems of self-knowledge’ that needs to be challenged, she adds.51
Interestingly, Snapchat’s value proposition seems to imply a similar promise: by limiting the reach and lifetime of the images produced through the app, Snapchat offers the necessary framework for users to showcase a supposedly ‘truer’ or less curated self. Since most of the metrics generally associated with social platforms – e.g., ‘likes,’ number of friends, etc. – are absent from Snapchat, the ‘transparent’ or self-deprecatory presentation of the self is instituted as the main way to become more visible within one’s network. Self-knowledge is thus displaced by self-presentation as the core problem the app aims to tackle. While self-knowledge tends to rely on more complex metrics, self-presentation appears here to be quite straightforwardly linked to the way the app organizes visual content: the more transparent users are, the stronger the bonds with their friends will be and the more visible they will be on the platform. As users begin to assimilate and reproduce that logic, new tools for self-presentation are made available to them: they are invited to share ‘live stories’ and gain access to new, exclusive filters. Self-presentation, like the images shared on Snapchat, is thus emptied out of its content and reduced to how visibility is distributed according to the logic governing how the app processes information. What is requested from users is then not so much adaptability, as Gates argues,52 but rather transparency – for the less curated and more plentiful their content is, the more they become presented by the platform as ideal, transparent subjects instead of simple users.
By defining the different positions users can inhabit on the platform, visuality thus emerges as mediating the way users become recognized as subjects by the app’s infrastructure of vision. These conditions of usage might be somehow restrictive, but it is precisely by learning to master them and recognizing the modes of self-presentation embedded in the platform that users can increase their visibility on Snapchat. Visuality and visibility then both play a key role in mobilizing subjects within Snap’s infrastructure of vision: while visuality establishes the grammar of perception through which users are invited to produce and consume visual content in a way akin to how the app’s infrastructure processes information, visibility dictates the grammar of action that users must embrace in order to themselves become visible. For advertisers and technologists alike, Snapchat is then a privileged channel to interact with users: not only does the app invite them to display a supposedly truer self, but it also gives advertisers and technologists the opportunity to intervene in networks constituted of stronger ties. The modes of vision associated with Snap’s infrastructure are thus intimately linked to the type of visibility and visuality the app produces – the more visible one becomes, the more easily one can be targeted by the economic incentives and interests underpinning Snap’s infrastructure of vision.

Promissory Infrastructure: Enabling a Vision

As illuminated by Snap’s functions, patents, and business plans, vision can be described as encompassing the image processing techniques associated with computer vision, the conditions of visibility through which objects and subjects are classified, and the modes of visuality that govern the spatialisation of visual content. There is however another definition of vision that must also be considered: vision as a representation of what the future could be. To be actualized, a vision for the future must be not only compelling and communicable to others, but also able to rally the necessary partakers and resources for that future to come into being. Business plans, patents, and functions are only a few devices that companies use to articulate, communicate, and bring about their visions: while business plans function as performative texts that address the actors who could catalyze a certain future, patents speculate on and claim ownership over the technical components that would power that future; functions, for their part, create media practices which, if adopted by users and advertisers alike, align the industry with that specific future. In all these cases, these tools are deployed with the objective of creating the present conditions that could allow a certain future to emerge.
Functions, patents, and business plans can then be understood as different technologies of the future that are employed to close the gap between a present state and a future, envisioned one by shaping the development of specific markets and infrastructures. In Biocapital, Kaushik Sunder Rajan offers a definition of the ‘promissory’ that captures quite readily this dual-engagement toward the present and the future by bringing together the performative, speculative, and prescriptive qualities of business plans, patents, and functions. For Rajan, the promissory consists in the transposition of key aspects related to the grammar of life ‘into a calculable market unit […] which structures the strategic terrain’ in which organizations operate.53 In this regard, the promissory refers to the process through which the heterogenous elements that are necessary for the actualization of a vision are mapped and transposed onto a common operational plane. In the context of Snap, this promissory quality manifests itself first and foremost at the level of infrastructure: by mapping and linking all the necessary elements for Snap’s vision to become a reality, the app’s infrastructure provides the shared plane onto which new components can be added, crucial technologies embedded, and key actors mobilized. Infrastructure thus functions here as a zone of exchange where different visions can meet and compete, and where dominant ones can be rearticulated as time passes by and new components are added. The articulation and actualization of a vision like Snap’s then appears to be intimately linked to how infrastructure is conceived as both a real and imaginary space – for it is there that technical components, social and economic actors, and cultural discourses can be endlessly rearranged and reconfigured.
Infrastructure might be shaped by the literary, discursive, and technical devices that operate within it, but it is at the level of the relationship among these devices that its promissory nature makes itself palpable: business plans get written and re-written as technology changes, patent owners expand and reorient their portfolios of intellectual properties based on changing media habits, functions establish media habits that later become foundational to new business plans, etc. All these elements evolve in relation to one another in a way that produces a shared, promissory future instead of a coherent present. By facilitating the proliferation of computer vision-enabled services, infrastructure’s promissory quality thus appears to reframe any infrastructure of vision as an infrastructure for a vision. Even when modes of vision become obsolete and are displaced by new ones, technologies of the future like business plans and patents sustain this promissory quality by mobilizing the infrastructure they inhabit around the ever-changing futures it might give rise to. By establishing the conditions through which vision and futurity – or technologies of vision and visions for the future to be more specific – inform one another, these devices reframe infrastructure as the operational plane where certain visions for the future are enacted via the concrete modes of vision which make these futures visible.

To conclude – by replacing conventional definitions of infrastructure by a more abstract one encompassing hardware, computer vision, economic incentives, and the devices which shape how these elements come together, this text has attempted to argue that any infrastructure of vision can also be conceived as an infrastructure for a vision. Snap’s business plan, patents, and functions together establish the app’s defining modes of vision, conditions of visibility, and forms of visuality, and thus play a key role in presenting Snapchat’s commodification of vision as the future of the app industry as a whole. Furthermore, by creating a vast amount of visual information (e.g., graphs, tables, diagrams, etc.) around modes of vision that are themselves non-visual in nature, these devices not only mediate how these visions get represented, circulated, and embraced by the app’s partners, but also reinforce the visual bias which underpins how computer vision and machine learning are conceived. Vision, rather than being reduced to a set of computer vision techniques, is thus framed by these devices as the site where economic interests, media habits, and practices of data capture are combined together into a functional infrastructure.
Snap’s mobilization of vision within a promissory framework might indeed inform how its modes of vision constitute, and are constituted by, the app’s functions, infrastructural components, and services; yet, this close proximity between the functions enabled by computer vision and the vision for the future these functions catalyze seem to go back to the reformulation of vision as a faculty that could be mastered by computer systems. Orit Halpern, for instance, argues that cybernetics’ conceptualization of vision ‘as an autonomous process [that] could be technologically replicated’ continues ‘to underpin much computer science work on vision.’54 By separating vision from the experience of a subject, she adds, vision was deprived of its ‘ontological stability’ and became a concept that could be redefined according to the functions it was given.55 As products of this genealogy, computer vision’s current applications in the app economy perpetuate a similar detachment of vision from any specific subject, system, or context, leading the functions enabled by computer vision to become a privileged framework through which vision itself, as well as the larger infrastructure supporting it, is grasped and envisioned. Visions – which I am here referring to as both a disembodied faculty and a sense of futurity – can then be understood as key frameworks through which the infrastructures and technologies that enable them are themselves defined and developed.
Snapchat might seem like a fairly banal application of computer vision compared to the automatic estimation of demographic attributes from facial images56 or the use of facial recognition for policing purposes.57 Yet, Snapchat inhabits the somehow mundane space of everyday usage and app-based services, and therefore plays a key role in framing computer vision as a commodified service and machine learning as a visuo-centric technology. Snapchat’s modes of vision anticipate and shed light on the vision for the future that Snap and its partners strive to bring about, while being in return shaped by the way this vision evolves through time. In that sense, Snap, its app, and the infrastructure of vision on which they rely provide a privileged perspective on how visions are not only produced, but also propagated and themselves made visible.

Bibliography

Agre, Philip E. “Surveillance and Capture: Two Models of Privacy.” The Information Society 10, no. 2 (1994): 101-103.
Akrich, Madeleine and Bruno Latour. “A Summary of a Convenient Vocabulary for the Semiotics of Human and Nonhuman Assemblies” in Shaping Technology/Building Society: Studies in Sociotechnical Change, edited by Wiebe E. Bijker and John Law. Cambridge, MA: The MIT Press, 1992.
Ballard, Dana H., Geoffrey E. Hinton, and Terrence J. Sejnowski. “Parallel Visual Computation.” Nature 306 (1983): 21–26.
Bartel, Caroline A. and Raghu Garud. “The Role of Narratives in Sustaining Organizational Innovation.” Organization Science 20, no.1 (2009): 107-117.
Bolter, Jay David and Richard Grusin. Remediation: Understanding New Media. Cambridge, MA: The MIT Press, 2000.
Chun, Wendy Hui Kong. “On Software, or the Persistence of Visual Knowledge.” Grey Room 18 (2004): 26-51.
Davis, Abe, Michael Rubinstein, Neal Wadhwa, Gautham Mysore, Frédo Durand and William T. Freeman. “The Visual Microphone: Passive Recovery of Sound from Video.” ACM Transactions on Graphics 33, no. 4 (2014): 79:1–79:10.
Delmar, Frédéric and Scott Shane. “Legitimating First: Organizing Activities and the Survival of New Ventures.” Journal of Business Venturing 19, no. 3 (2004): 385-410.
Doganova, Liliana and Marie Eyquem-Renault. “What Do Business Models Do? Innovation Devices in Technology Entrepreneurship.” Research Policy 38 (2009): 1559-1570.
Dubosson-Torbay, Magali, Alexander Osterwalder and Yves Pigneur. “eBusiness Model Design, Classification and Measurements.” Thunderbird International Business Review 44, no. 1 (2002): 5-23.
Foucault, Michel. Naissance de la biopolitique : Cours au Collège de France, 1978-1979. Paris, France: Seuil/Gallimard, 2004.
Gallagher, Billy. “A Tale of Two Patents: Why Facebook Can’t Clone Snapchat.” TechCrunch, June 22, 2014. https://techcrunch.com/2014/06/22/facebook-slingshot-snapchat-patents.
Garvie, Clare, Alvaro Bedoya, and Jonathan Frankle. “The Perpetual Line-Up: Unregulated Police Face Recognition in America.” Center on Privacy & Technology at Georgetown University. Accessed on August 4, 2019. https://www.perpetuallineup.org.
Gates, Kelly A. Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance. New York, NY: NYU Press, 2011.
Giraudeau, Martin. “Business Plans (États-Unis, second XXe siècle)” in Les Projets : Une histoire politique (XVIe – XXIe siècles), edited by Frédéric Graber and Martin Giraudeau. Paris, France: Presses des Mines, 2018.
Goodfellow, Ian, Yoshua Bengio and Aaron Courville. Deep Learning. Cambridge, MA: The MIT Press, 2015.
Grieve, Rachel. “Unpacking the Characteristics of Snapchat Users: A Preliminary Investigation and an Agenda for Future Research.” Computers in Human Behavior 74 (2017): 130-138.
Hacking, Ian. “Making Up People” in The Science Studies Reader, edited by Mario Biagioli. New York, NY: Routledge, 1986.
Halpern, Orit. Beautiful Data: A History of Vision and Reason since 1945. Durham, NC: Duke University Press, 2014.
Han, Hu and Anil Jain. “Age, Gender and Race Estimation from Unconstrained Face Images.” MSU Technical Report (2014): 1-9.
Hu, Tung-Hui. A Prehistory of the Cloud. Cambridge, MA: The MIT Press, 2016.
ILSVRC. “ImageNet Large Scale Visual Recognition Challenge.” Accessed on September 9, 2018. http://www.image-net.org/challenges/LSVRC.
Knuth, Donald. The Art of Computer Programming, Vol. 1: Fundamental Algorithms. Reading, MA: Addison-Wesley, 1997.
Mayo, Benjamin. “Snapchat launches exclusive face filters for iPhone X that take advantage of the TrueDepth camera.” 9 to 5 Mac, April 6, 2018. https://9to5mac.com/2018/04/06/snapchat-iphone-x-face-filters.
Miller, Peter and Ted O’Leary. “Mediating Instruments and Making Markets: Capital Budgeting, Science and the Economy.” Accounting, Organizations and Society 32, no. 7-8 (2007): 701-734.
Piwek, Lukasz and Adam Joinson. ““What do they snapchat about?” Patterns of use in time-limited instant messaging service.” Computers in Human Behavior 54 (2016): 358-367.
Rajan, Kaushik Sunder. Biocapital: The Constitution of Postgenomic Life. Durham, NC: Duke University Press, 2006.
Roesner, Franziska, Brian T. Gill, and Tadayoshi Kohno. “Sex, Lies, or Kittens? Investigating the Use of Snapchat’s Self-Destructing Messages” in Financial Cryptography and Data Security, edited by Nicolas Christin and Reihaned Safavi-Naini, 64-76. Berlin, Germany: Springer, 2014.
Rose, Nikolas. Powers of Freedom: Reframing Political Thought. Cambridge, UK: Cambridge University Press, 1999.
Rosenblatt, Frank. The Perceptron: A Perceiving and Recognizing Automaton. Buffalo, NY: Cornell Aeronautical Laboratory, 1957.
Shapin, Steven. “Pump and Circumstance: Robert Boyle’s Literary Technology.” Social Studies of Science 14, no. 4 (1984): 481-520.
Snap Inc., July 8, 2014. Apparatus and Method for Single Action Control of Social Network Profile Access. United States patent US 8,775,972 B2.
Snap Inc., June 23, 2016. Gallery of Video Set to an Audio Time Line. United States patent application publication US 2016/0182875 A1.
Snap Inc. “Form S-1 Registration Statement under the Securities Act of 1933.” Securities and Exchange Commission, February 2, 2017. https://www.sec.gov/Archives/edgar/data/1564408/000119312517029199/d270216ds1.htm.
Snap Inc., April 23, 2013. Single Mode Visual Media Capture. United States patent US 8,428,453 B1.
Vee, Annette. “Carving up the Commons: How Software Patents Are Impacting Our Digital Composition Environments.” Computers and Composition 27, no. 3 (2010): 179-192.
Weill, Peter and Michael Vitale. Place to Space: Migrating to eBusiness Models. Cambridge, MA: Harvard Business School Press, 2001.

Author Biography

Théo Lepage-Richer is a PhD student and SSHRC/FRQSC Fellow in the Department of Modern Culture and Media at Brown University. His research is broadly concerned with the history and epistemology of machine learning, with a specific focus on the adversarial epistemology underpinning the transformation of neural networks from a functional model of the mind to an operational framework for pattern extraction.

Notes

  1. Frank Rosenblatt, The Perceptron: A Perceiving and Recognizing Automaton (Buffalo, NY: Cornell Aeronautical Laboratory, 1957), 1.
  2. “ImageNet Large Scale Visual Recognition Challenge,” Competition, ILSVRC, accessed on September 9, 2018, http://www.image-net.org/challenges/LSVRC.
  3. Wendy Hui Kong Chun, “On Software, or the Persistence of Visual Knowledge,” Grey Room 18 (2004): 47.
  4. Orit Halpern, Beautiful Data: A History of Vision and Reason since 1945 (Durham, NC: Duke University Press, 2014), 17.
  5. Michel Foucault, Naissance de la biopolitique : Cours au Collège de France, 1978-1979 (Paris, France: Seuil/Gallimard, 2004), 22.
  6. Franziska Roesner, Brian T. Gill, and Tadayoshi Kohno, “Sex, Lies, or Kittens? Investigating the Use of Snapchat’s Self-Destructing Messages,” in Financial Cryptography and Data Security, eds. Nicolas Christin and Reihaned Safavi-Naini (Berlin, Germany: Springer, 2014): 64-76.
  7. Dana H. Ballard, Geoffrey E. Hinton and Terrence J. Sejnowski, “Parallel Visual Computation,” Nature 306 (1983): 21–26.
  8. Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning (Cambridge, MA: The MIT Press, 2015), 452.
  9. Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham Mysore, Frédo Durand and William T. Freeman, “The Visual Microphone: Passive Recovery of Sound from Video,” ACM Transactions on Graphics 33, no. 4 (2014): 79:1–79:10.
  10. Benjamin Mayo, “Snapchat launches exclusive face filters for iPhone X that take advantage of the TrueDepth camera,” 9 to 5 Mac, April 6, 2018, https://9to5mac.com/2018/04/06/snapchat-iphone-x-face-filters.
  11. Tung-Hui Hu, A Prehistory of the Cloud (Cambridge, MA: The MIT Press, 2016), xxvii.
  12. Philip E. Agre, “Surveillance and Capture: Two Models of Privacy,” The Information Society 10, no. 2 (1994): 101-103.
  13. Nikolas Rose, Powers of Freedom: Reframing Political Thought (Cambridge, UK: Cambridge University Press, 1999), 36.
  14. Peter Miller and Ted O’Leary, “Mediating Instruments and Making Markets: Capital Budgeting, Science and the Economy,” Accounting, Organizations and Society 32, no. 7-8 (2007), 703.
  15. Miller and O’Leary, “Mediating Instruments and Making Markets,” 729.
  16. Martin Giraudeau, “Business Plans (États-Unis, second XXe siècle),” in Les Projets : Une histoire politique (XVIe – XXIe siècles), eds. Frédéric Graber and Martin Giraudeau (Paris: Presses des Mines, 2018), 207.
  17. Giraudeau, “Business Plans,” 209.
  18. Magali Dubosson-Torbay, Alexander Osterwalder and Yves Pigneur, “eBusiness Model Design, Classification and Measurements,” Thunderbird International Business Review 44, no. 1 (2002): 5-23.
  19. Peter Weill and Michael Vitale, Place to Space: Migrating to eBusiness Models (Cambridge, MA: Harvard Business School Press, 2001).
  20. Frédéric Delmar and Scott Shane, “Legitimating First: Organizing Activities and the Survival of New Ventures,” Journal of Business Venturing 19, no. 3 (2004): 385-410.
  21. Liliana Doganova and Marie Eyquem-Renault. “What Do Business Models Do? Innovation Devices in Technology Entrepreneurship,” Research Policy 38 (2009): 1560.
  22. Doganova and Eyquem-Renault, “What Do Business Models Do?,” 1561.
  23. Snap Inc., “Form S-1 Registration Statement under the Securities Act of 1933,” Securities and Exchange Commission, February 2, 2017, https://www.sec.gov/Archives/edgar/data/1564408/000119312517029199/d270216ds1.htm.
  24. Snap Inc., “Form S-1 Registration Statement,” 60.
  25. Caroline A. Bartel and Raghu Garud, “The Role of Narratives in Sustaining Organizational Innovation,” Organization Science 20, no.1 (2009): 107-117.
  26. Snap Inc., “Form S-1 Registration Statement,” 2.
  27. Snap Inc., “Form S-1 Registration Statement,” 68.
  28. Snap Inc., “Form S-1 Registration Statement,” 65.
  29. Snap Inc., “Form S-1 Registration Statement,” 65.
  30. Snap Inc., “Form S-1 Registration Statement,” 27.
  31. Snap Inc., “Form S-1 Registration Statement,” 66.
  32. Snap Inc., “Form S-1 Registration Statement,” 115.
  33. Steven Shapin, “Pump and Circumstance: Robert Boyle’s Literary Technology,” Social Studies of Science 14, no. 4 (1984): 491.
  34. Billy Gallagher, “A Tale of Two Patents: Why Facebook Can’t Clone Snapchat,” TechCrunch, June 22, 2014, https://techcrunch.com/2014/06/22/facebook-slingshot-snapchat-patents.
  35. Annette Vee, “Carving up the Commons: How Software Patents Are Impacting Our Digital Composition Environments,” Computers and Composition 27, no. 3 (2010): 185.
  36. Inventions Patentable, 35 U.S.C. §101 (1952).
  37. Vee, “Carving up the Commons,” 185-186.
  38. Snap Inc., July 8, 2014. Apparatus and Method for Single Action Control of Social Network Profile Access. United States patent US 8,775,972 B2.
  39. Snap Inc., April 23, 2013. Single Mode Visual Media Capture. United States patent US 8,428,453 B1.
  40. Snap Inc., June 23, 2016. Gallery of Video Set to an Audio Time Line. United States patent application publication US 2016/0182875 A1.
  41. I complied the patents and patent applications with which I worked in July 2018: I therefore did not consider patent applications published after that date.
  42. All the patents I reviewed for this article as well as the excel document documenting in which category each of them was classified have been archived on this GitHub page: https://github.com/tlricher/infrastructure.
  43. The excel document stored on the GitHub page mentioned above specifies the code (e.g., Snap 2017l) that was attributed to each patent.
  44. Donald Knuth, The Art of Computer Programming, Vol. 1: Fundamental Algorithms (Reading, MA: Addison-Wesley, 1997), 3-4.
  45. Jay David Bolter and Richard Grusin, Remediation: Understanding New Media (Cambridge, MA: The MIT Press, 2000), 272-273.
  46. Tung-Hui Hu, A Prehistory of the Cloud (Cambridge, MA: The MIT Press, 2016), xxvi.
  47. Ian Hacking, “Making Up People,” in The Science Studies Reader, ed. Mario Biagioli (New York, NY: Routledge, 1986), 165.
  48. See Lukasz Piwek and Adam Joinson, ““What do they snapchat about?” Patterns of use in time-limited instant messaging service,” Computers in Human Behavior 54 (2016): 358-367 & Rachel Grieve, “Unpacking the Characteristics of Snapchat Users: A preliminary investigation and an agenda for future research,” Computers in Human Behavior 74 (2017): 130-138.
  49. Madeleine Akrich and Bruno Latour, “A Summary of a Convenient Vocabulary for the Semiotics of Human and Nonhuman Assemblies,” in Shaping Technology/Building Society: Studies in Sociotechnical Change, eds. Wiebe E. Bijker and John Law (Cambridge, MA: The MIT Press, 1992), 261.
  50. Kelly A. Gates, Our Biometric Future: Facial Recognition Technology and the Culture of Surveillance (New York: NYU Press, 2011), 187.
  51. Gates, Our Biometric Future, 187-188.
  52. Gates, Our Biometric Future, 187-188.
  53. Kaushik Sunder Rajan, Biocapital: The Constitution of Postgenomic Life (Durham, NC: Duke University Press, 2006), 34.
  54. Orit Halpern, Beautiful Data: A History of Vision and Reason since 1945 (Durham, NC: Duke University Press, 2014), 206.
  55. Halpern, Beautiful Data, 206.
  56. Hu Han and Anil Jain, “Age, Gender and Race Estimation from Unconstrained Face Images,” MSU Technical Report (2014): 1-9.
  57. Clare Garvie, Alvaro Bedoya, and Jonathan Frankle, “The Perpetual Line-Up: Unregulated Police Face Recognition in America,” Center on Privacy & Technology at Georgetown University, accessed on August 4, 2019, https://www.perpetuallineup.org.