What are Metadata?
In the twenty-first century, the term “metadata” has appeared in the popular news press with increasing frequency. For instance, recent arguments made by proponents of mass surveillance have focused on the fact that the content of messages may be irrelevant for spy agencies; such organizations are instead more interested in knowing who spoke to whom, a type of metadata, rather than the full description of what was said. What is clearly of greater import to them is that which can be inferred by the blank spaces of content when one knows the circumstances of transmission and reception. It may be that these structures are “just metadata,” as Senator Feinstein defensively claimed in 2014, but it is also clear that metadata alone provide sufficient clues to profile individuals, often with chilling consequences.
The term “metadata” first arose in the 1960s but came to prominence in the 1970s context of Database Management Systems (DBMSs) (Vellucci). In the usual definitions, the term refers to “structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource”. It is meta- (beyond) data (from datum; a given thing). Furthermore, “metadata is often called data about data or information about information” and can be subdivided into structural and descriptive metadata (NISO 1). This difference between descriptive and structural metadata might be schematised as the difference between a row in a table that describes another object (descriptive) and a definition of what each column in this descriptive table means (structural). Descriptive metadata pertains to specific objects: the copyright that applies to a book, the ISBN of the object in question etc. Structural metadata refers to the formats of data containers: the layout on the page of a copyright declaration, the numerical format structure of an ISBN.
Yet this only takes us so far. In order to fully conceptualise metadata, we must also have an understanding of what is meant by “data”. For metadata are themselves “just” data. Data are sometimes taken to be opposed to information, its unstructured counterpart. So data + metadata = information. The format of descriptive metadata, however, can be described by structural metadata. It is also possible to conceive of a further level of description for structural metadata. And so on to an infinite regress. As Martin Mueller and John Unsworth put it with respect to data and metadata: “what counts as second-order, depends on the boundaries of the first order” (Mueller and Unsworth, n. pag.). This leads to the paradox that in order to define metadata as “information”, rather than as “data”, will require an infinite number of metadata elements, each to define the other. Data, on the other hand, can refer to a variety of things. In fact, in many conversations it is the case that the word “data” can safely be replaced by the term “stuff” and still retain the same degree of specificity. Data can range from a few lines in a spreadsheet up to petabytes of quantitative material.
Information about information: in order to fully conceptualise metadata, we must also have an understanding of what is meant by “data”
[Image by gabitogol under a CC BY-SA licence]
Perhaps more relevantly for the context here, though, data can point, in some ways, to cultural artefacts. Those like Matt Jockers and Franco Moretti – conducting computational analyses of literature through methods ranging from stylometrics to visualisations and mapping techniques – process literary texts through an intermediate quantitative phase in the same way as other scientists. This results in analyses that can encompass corpora larger than it is possible for people to read while also noting trends in fiction that often go unremarked upon, such as the use of punctuation. Librarians cataloguing works of fiction can also perceive of novels as data; objects to be filed under the correct location according to the description in its metadata. Those responsible for the production of books – which, even in print, is a process that has long been defined and conditioned by the digital, even while thereby repressing new digital possibilities, as N. Katherine Hayles tells us – must think of paragraphs as a type of data to be inserted into structured, semantic XML markup (Hayles 6). Even our acts of normal (or critical) reading are a type of processing that involve contextualising the raw material of fiction, treating it as a kind of sensory input data that is then put through a series of interpretative moves. Of course, the term “data” when applied to fiction can provoke strong reactions. Those opposed to the quantification of artforms question whether novels really are “data” or whether works of art possess elements that elude analysis. Yet, at the same time, we can see, at the broadest level, how the term can apply.
Indeed, aesthetic forms themselves are surrounded by and represent metadata-like phenomena in ways with which any reader – either academic or lay – is implicitly familiar and from which much can be inferred. In this piece I want to argue for metadata as a form of inter-data reference that can be read/interpreted and that carries with it a type of “involuntary viral semantics” that can be said, in certain circumstances, to be political. Indeed, I will briefly here show what a literary approach can bring to the study of metadata and also what thinking about metadata can bring to literary studies. In fact, I would also argue that even when not analysing the explicit types of metadata that surround literature, there are many metaphorical applications of “metadata” that can profitably intersect with the methods of literary criticism.
What lies beneath: form and content
To begin this explanation and before reading any further, take a look at Figure 1. Most readers will instantly recognise this document, although it has no visible language. It is, of course, the copyright page to the frontage of a book with the textual elements obliterated. Yet before we even move beyond this, most viewers will have accurately recognised that the blocks separated by commas and obliterated by larger chunks are text. Without seeing any of the content or individual characters, the outline shape, flow and kerning here imply words, even without access to the actual language. The direction of the commas that I have left visible, along with the placement of the © sign may also allow a reader to infer that this is a European alphabet with a left-to-right reading pattern.
Figure 1. Erasing copyright
Some of the characters can be inferred to be numbers, rather than text, however. This is because, as readers, we recognise certain structural and contextual features of metadata and infer the valid character sets and data types that must lurk below. For instance, take the lines that contain the © symbol. We know that that standard syntax for a copyright declaration in the English language is “© Firstname Lastname date” where “date” is a four-character numeric string representing the year A.D. of the claimed copyright. Given that both of these lines end with a roughly uniform block size that appears to be able to accommodate a Gregorian calendar date signifier, it is logical to assume that this block is a date and that the text’s contents are numerical.
We can also do some character-type deduction on the ISBN lines of this page (these are the lines two thirds of the way down the page that contain a block, then a space, then five blocks separated by hyphens, then a space, then a block). I recognise this as an ISBN line because I know that any book publication is likely to have one and, in this case, there are two lines that contain ISBN-13-style formatting (e.g. 978-1-107-09789-6) with a block before that looks like it might say “ISBN” (four character length) and a block after that might specify a publication type (“Hardback”/“Paperback” etc.). We can also infer some of the digits in this ISBN. For instance, it is highly probable that the first three digits are 978, since the vast majority of ISBN-13 identifiers start with this EAN-13 identifier, although some have now moved to the 979 prefix. The most likely remains 978, however. The next digit appears to be a single number, probably either a 1 or 0 (most likely a 1 given the spacing here), indicating that the book comes from an English-speaking area and validating my earlier assumption about the textual language/alphabet. The final checksum digit after the last hyphen appears to be a moderate width character, wider than a 1, so perhaps a 6 or an 8.
As the opponents of mass surveillance argue, then, and as the above trivial example shows, we can deduce a great deal about the actual content of a message from its metadata. In the instance I have just given, I took a set of information that was itself metadata (the frontispiece to a book is an item of metadata, describing the volume’s contents, its cataloguing information, its legal status etc.) and treated this as a message in its own right. When I erased the bulk of the content here so that just the structure was visible it was still possible to recover many parts of the object, or at least to identify the constituent parts. It was possible to infer the form and some of the contents because of the social conventions of language, structure, and genre within which every instance of communication must operate. We are all used, as readers, to navigating these conventions and to making inferences based upon them; they form a crucial part of interpretation. Indeed, as Barbara Herrnstein Smith has put it, “no judgement is or could be objective in the classic sense of justified on totally context-transcendent and subject-independent grounds” (Herrnstein Smith 6). No piece of fiction, for example, that depicts a future world can be read independently of the generic field of comparator works that we call “science” or “speculative” fiction; there is a metadata element called “genre” that conditions reception. These conventions are anticipated by authors who wish to create specific literary effects. Indeed, even a synchronic understanding of a literary text is one in which external social conventions provide contexts for every passage, even while other portions of the same work likewise inflect the meanings of preceding and following passages. In some senses, due to the unending and metaphorical chain of signification implied in language, every sentence, every word, of a literary text is an item of metadata for another grapheme, structuring our understanding and provisionally conditioning reception.
Primarily, though, what I want to draw out here is that metadata, like a paratext and like other types of formal structuration (such as a codex’s materiality, which itself can be encoded as a form of metadata), provide semantic contexts for reading works. Some works of art also play with these forms (especially forms of “found poetry” but also those novels that deploy redaction), subverting their usual formal purpose. In the case of found/erasure poetry, it is clear that meaning comes from the obliteration of the original context, although the reader will usually want to reconstruct the primary work so that the poem can stand apart by its difference. For another example, in literary works where text has been redacted (seen frequently in the nineteenth-century novel with respect to place and person names, supposedly for reasons of libel/decorum) it is often possible for readers to nonetheless decode what lies beneath the redaction – and in its coyness the text proffers a challenge to undertake such sleuthing.
Interdata References and Involuntary Viral Semantics: the case of XUL
Metadata, though, even when not being used by political entities, are political and I want to unpack the ways in which metadata, even in their formal and non-literary variants, can be seen as semantic and intertextual (or interdata), in functional, cultural and political ways. The example that I will use in this article comes from the software-development world: Mozilla’s XML User Interface Language (XUL). By way of background, “XUL (pronounced ‘zool’) is Mozilla’s XML-based user interface language that lets [one] build feature rich cross-platform applications that can run connected to or disconnected from the Internet” (Mozilla, ‘The Joy of XUL’).
Among the core design principles of XUL, as with other forms of markup language is the desire to separate the process of creating a user interface from that of programming; the “ideology of strict division between content and presentation—the very religion, as it were, of text encoding and databases”, as Alan Liu puts it (Liu 62). This form/content dichotomy is controversial in many ways for those who study aesthetics. For one, there has been much debate about whether, in works of art, this type of split is even valid. For another, it contributes, in part, to increasing specialisation and separation of labour, perhaps furthering Fordist principles of factory-line production in spaces that are otherwise creative.
XML documents, though, usually have unique namespaces that identify the type of content that falls under their jurisdiction (a namespace is, naturally, a kind of metadata attribute of an XML node). In the case of XUL the unique namespace URL can be found at: https://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul. The most basic example of a XUL interface that implements a blank window would, therefore, look like this:
<?xml-stylesheet href=”chrome://global/skin/” type=”text/css”?>
<!– Other elements go here –>
</window> (Mozilla, ‘Creating a Window’)
We can clearly see the XML namespace (“xmlns”) attribute is here set to the XUL URL. This component of metadata tells both interpretative software and also me as a human reader that this is a XUL document and that, therefore, it is likely to describe a user interface.
However, canny readers and connoisseurs of 1980s pop-culture will have already spotted the joke/reference in the above. For, in the 1984 film, Ghostbusters, there are characters called “the keymaster” and “the gatekeeper”. There is also a character called Dana and a paranormal villain called Zuul (pronounced “XUL”) who possesses Dana. In the case of the XUL namespace, this pun is completed. In the film, following Dana’s possession by the malevolent spirit, she speaks the line in a deep demonic voice: “there is no Dana. There is only Zuul”. For the user interface description language called XUL, which aims to thoroughly decouple application logic (which is perhaps analogous to some limited extent to data/content) from the description of that interface (metadata): there is no data. There is only XUL (Mozilla, ‘Mozilla XML Namespace’).
Geek humour: XUL’s developers have literalised the content/form separation with reference to the popular cult film Ghostbusters
[Image by Shannon Hayward under a CC BY-SA license]
This reading makes functional sense and it is a good example of geek humour. The joke works by literalising the description of XUL’s content/form separation and then contextualising it within the framework of a cult film. The namespace URL provides a piece of actual information (that this is a XUL document) as well as additional semantic contexts: the developers are probably good natured people who enjoy the humour of Ghostbusters and are also working in a corporate environment that is flexible enough to allow them to release publicly facing documents that make such jokes.
There is, however, a potentially darker reading here. Ghostbusters was, indeed, recently ranked as number 10 in the National Review‘s list of the “Best Conservative Movies of the Last 25 Years”. As the National Review puts it:
you have to like a movie in which the bad guy (William Atherton at his loathsome best) is a regulation-happy buffoon from the EPA, and the solution to a public menace comes from the private sector. This last fact is the other reason to love Ghostbusters: When Dr. Peter Venkman (Bill Murray) gets kicked out of the university lab and ponders pursuing entrepreneurial opportunities, a nervous Dr. Raymond Stantz (Dan Aykroyd) replies: “I don’t know about that. I’ve worked in the private sector. They expect results!” (Hayward, n. pag.).
Indeed, others have explicitly read Ghostbusters as a film that harbours a strong conservative narrative. John DeVore points out that “[a]cademia is an indisputable pillar of the liberal establishment, and they [Columbia University] fire our heroes [the Ghostbusters]. This proves that the Ghostbusters are not elites or swells. It also strengthens the conservative notion that academia is an intellectual racket that punishes free-thinkers and innovators” (DeVore, n. pag.). As Amanda Ann Klein puts it, Ghostbusters can be read as a “neoliberal AltAc [alternative academic career] fantasy” (Klein, n. pag.). Ghostbusters is, indeed, the tale of the university postgraduate dropouts who go on to make it through private enterprise, thwarting the meddling regulators of the Environmental Protection Agency in the process.
But this is now one of the contexts that surrounds the XUL framework and therefore applies to any work created with this framework. It is undoubtedly the case that this intertextual resonance and interpretation was not meant. Someone at Mozilla just liked Ghostbusters. However, as Roland Barthes noted, we cannot control how intertexts reflexively modify our own writings and computational cultures are not exempt from these contexts. Every intertextual and inter-data reference is a gamble on the future’s verdict on the present’s cultural norms.
Yet, by embedding cultural references, which must come with commensurate norms of interpretation, within a metadata format, the reach is much wider than elsewhere, since metadata enclose and describe other data. The reflexive modifications to meaning that we know to be a feature of intertextuality virally spread to other entities that interact with the metadata. Every interface created in XUL contains a hard-coded reference to Ghostbusters. This may sound far-fetched in terms of having any kind of inflection on real perceptions of such code but a reducto ad absurdum hypothetical historical analogy can bring this sobering truth to light. A librarian who might have decided, in 1915, to embed a reference to D.W. Griffith’s Birth of a Nation in an emergent metadata standard, despite the controversy among right-thinking people over the film’s racism at even that time, would by now have caused outrage among any users: xmlns=“http://www.an-organisation.nodomain/defense.of.their.Aryan.birthright”.
As metadata purport to sit above and beyond data, I argue that the contamination of inter-data references and contexts creates a kind of involuntary viral semantics for data and texts that sit under a metadata format’s jurisdiction. As the standardisation of a metadata format becomes embedded, works cannot opt out from their affiliation. References in metadata, like those in XUL, to dynamically interpretable artefacts have knock-on effects upon the objects they describe. In other words, it is one thing to make a reference in one’s work. It is quite another to place a reference within every work of a specific type. It is this contagion of metadata that gives them their power, it is their attachment and super-positioning that spreads their politics to the works over which they hold jurisdiction. To embed reference within metadata is to impose constraint and contexts on all works that such metadata might describe.
CITATION: Martin Paul Eve, “On the Political Aesthetics of Metadata”, Alluvium, Vol. 5, No. 1 (2016): n. pag. Web. 30 March 2016. http://dx.doi.org/10.7766/alluvium.v5.1.04
Martin Paul Eve is Professor of Literature, Technology and Publishing at Birkbeck, University of London. He is a Project Director of the Open Library of Humanities, a member of the steering committee for JISC’s OAPEN-UK project, Chief Editor of Orbit, the open-access peer-reviewed e-journal on the writings of Thomas Pynchon, and is Senior Online Editor of Alluvium.
John DeVore. ‘Why The Ghostbusters Are Conservative Icons’, Medium (2013) [accessed 12 December 2015]: <https://medium.com/@johndevore/why-the-ghostbusters-are-conservative-icons-9d6d8e159485#.2tyls579y>
Hayles, N. Katherine. How We Think: Digital Media and Contemporary Technogenesis (Chicago: University of Chicago Press, 2012).
Hayward, Steven F. ‘The Best Conservative Movies of the Last 25 Years’, National Review Online (2009) [accessed 12 December 2015]: <http://www.nationalreview.com/corner/177234/10-best-conservative-movies-last-25-years-steven-f-hayward>
Herrnstein Smith, Barbara. Belief and Resistance: Dynamics of Contemporary Intellectual Controversy (Cambridge, MA: Harvard University Press, 1997).
Klein, Amanda Ann. ‘GHOSTBUSTERS as Neoliberal AltAc Fantasy’, Judgmental Observer (2013) [accessed 12 December 2015]: <http://judgmentalobserver.com/2013/10/16/ghostbusters-as-neoliberal-altac-fantasy/>
Liu, Alan. ‘Transcendental Data: Toward a Cultural History and Aesthetics of the New Encoded Discourse’, Critical Inquiry, 31.1 (2004): 49–84. <http://dx.doi.org/10.1086/427302>.
Mozilla Foundation. ‘Creating a Window’, Mozilla Developer Network (2015) [accessed 12 December 2015]: <https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XUL/Tutorial/Creating_a_Window>
Mozilla Foundation. ‘Mozilla XML Namespace: XML User Interface Language (XUL)’ (2015) [accessed 12 December 2015]: <https://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul>
Mozilla Foundation. ‘The Joy of XUL.’ Mozilla Developer Network (2015) [accessed 12 December 2015]: <https://developer.mozilla.org/en-US/docs/The_Joy_of_XUL>
Mueller, Martin and John Unsworth. ‘Notes towards a User Manual of Monk.’ MONK (2007) [accessed 2 January 2016]: <https://apps.lis.illinois.edu/wiki/display/MONK/Notes+towards+a+user+manual+of+Monk>
NISO. National Information Standards Organization. Understanding Metadata (Bethesda, MD: NISO Press, 2004).
Vellucci, Sherry L. ‘Metadata.’ Annual Review of Information Science and Technology (1988) 33: 187-222.
Please feel free to comment on this article.