Big Data & Uncertainty Conference

Following THATCamp Kansas, on Saturday, September 22, 2012, IDRH will be hosting a one-day conference on Big Data & Uncertainty in the Humanities.

This conference seeks to address the opportunities and challenges humanistic scholars face with the ubiquity and exponential growth of new web-based data sources (e.g. electronic texts, social media, and audiovisual materials) and digital methods (e.g. information visualization, text markup, crowdsourcing metadata).

“Big data” is any dataset that is too large to be analyzable with traditional means (whether e.g. manual close readings or database queries). Developments in cloud computing, data management, and analytics mean that humanists and allied scholars can analyze and visualize larger patterns in big data sets. With these opportunities come the challenges of scale and interpretation; we have moved from the uncertainty resulting from having too little data to the uncertainty implicit in large amounts of data.

What does this mean for how humanists structure, query, analyze and visualize data? How does this change the questions we ask and the interpretations we assign? How do we combine the best of a macro (larger-pattern) and a micro (close reading) approach? And how is interpretative and other uncertainty modeled?

The conference is free and open to the public.

Preliminary List of Speakers and Topics

The Humanities in a Digital Age

Gregory Crane
Editor-in-Chief, Perseus Digital Library
Note: This presentation will take place on Thursday, September 20, 4:30 PM, Watson Library 3 West


We now live in a pervasively digital world and Humanists have an opportunity to rethink our goals. On the one hand, we can now develop research projects that are broader and deeper in scope than was feasible in print culture. First, we can trace ideas across dozens of languages and thousands of years. Second, the explosion of high-resolution digital representations of source texts, objects, and archaeological data sets has, in some quarters, transformed the traditional (and out of fashion) task of editing. At the same time, the shift to a digital world does not simply allow professors to produce more specialist publications. Rather the explosion in source materials available to a global net public requires advanced researchers and library professionals to draw upon student researchers and citizen scholars as essential collaborators. One possible outcome is a new, decentralized and cosmopolitan republic of letters supporting a global dialogue of civilizations. No particular outcomes are guaranteed and our actions and decisions as Humanists in the present can have far-reaching consequences.

Gregory Crane is Professor of Classics and Adjunct Professor of Computer Science, Winnick Family Chair of Technology and Entrepreneurship at Tufts University. He is also Editor in Chief of the Perseus Project. He has been elected a Humboldt Professor in Digital Humanities at the University of Leipzig and hopes to establish the first transatlantic laboratory in the Digital Humanities.

False Positives: Opportunities and Dangers in Big Text Analysis

Geoffrey Rockwell
Professor of Philosophy and Humanities Computing at the University of Alberta, Canada
Note: This presentation will take place on Friday, September 21, 4:30 PM, Watson Library 3 West

Phylogenetic Futures: Big Data and Design Fiction

Kari Kraus
Assistant Professor, College of Information Studies and the Department of English at the University of Maryland


This talk seeks to position phylogenetics within the broader frameworks of both big data and the design disciplines. Originating in “big data” applications of evolutionary biology, phylogenetic methods are increasingly used to reconstruct the hereditary relationships of cultural data sets in the social sciences and humanities, including textual criticism, historical linguistics, and anthropology—examples I will provide. However I frame this talk within the context of a larger project that seeks to invert the temporal orientation of phylogenetics so that its key insights can be used to imagine the future as well as reconstruct the past; and to refigure phylogenetics as a design discipline whose underlying commitments and techniques accommodate broad swathes of material culture, such as images, hardware, games, and other objects.

By “design” I have in mind not large-scale visualizations of cultural data sets, but, paradoxically, small-scale prototypes of design fictions. Coined by Bruce Sterling and further elaborated by Julian Bleecker at Nokia Design, the term “design fiction” is used to denote the mocking up of artifacts that embody our ideas about the future—what Stuart Candy has called “object-oriented futuring.”[1] Design fiction (or “tangible futures”) can be thought of as science fiction re-imagined for DIY and Maker Culture. Examples include Wired Magazine’s long-running series “Artifacts from the Future,” which approaches the design fiction space in a playful spirit (e.g., wifi-enabled, location-aware contact lenses or a mood ring that controls rather than reflects one’s emotional state);[2] and Branko Lukic and Barry Katz’s NONOBJECT, a slick coffee table book and iPad app filled with marvelous counterfactual objects of “as yet undiscovered materials, imagined manufacturing processes, and invented rules.”[3] This class of design artifacts moves the needle of humanities research away from the cultural record of the known and toward the stranger and more speculative realms of the unknown. A larger theme of this talk is therefore to argue for an Experimental Humanities and a DIY Humanities as necessary adjuncts to the Big Humanities.

[1] Stuart Candy, “The Sceptical Futuryst: Object-oriented Futuring,” The Sceptical
2 Nov. 2008
[2] “Found: Artifacts from the Future,” Wired
[3] Nonobject (MIT P, 2010)

Museum Collecting in the Age of ‘Big Data’: Opportunities for Collaboration

Peter Welsh
Professor and Director of Museum Studies, University of Kansas


Museums, particularly museums of cultural history, face a constant challenge of deciding which objects to add to the collection, knowing that acquiring any object brings obligations to provide long term stable environments, appropriate documentation, and ongoing access. Established practice is for each museum to evaluate potential acquisitions in accordance with their own written collections policy. However, institutions acting independently has led to significant duplication of objects, straining the resources of each museum. Some museums are exploring collaborative approaches to this “big data” problem by sharing information on collections, reallocating objects among museums, and making collections available to one another for exhibitions.

A World in a Grain of Sand: Uncertainty and Poetry Corpora Visualization

Katharine Coles
Professor, Department of English, University of Utah
Julie Lein
PhD in Literature and Creative Writing, University of Utah


Under a grant funded by the National Endowment for the Humanities in the US and the Arts and Humanities Research Council, Economic and Social Research Council, and JISC in the UK, we recently embarked on a poetry visualization project with a group of computer scientists at Oxford University. Together we are working to see whether and how new software that treats poems as large data sets might help literary scholars and poets make observations, interpretations, and poems that might not otherwise be possible. While our initial visualizations will help scholars to perceive and analyze sonic devices in individual poems, eventually we hope to be able to use these original visualization tools to analyze large poetry corpora, such as those available online through organizations like the Poetry Foundation and the Academy of American Poets.

The project fits into a growing tradition of collaboration between computer scientists specializing in visualization and simulation, and researchers in scientific fields as diverse as neurology, economics, and combustion. Often richly productive, these partnerships also require flexibility, openness, and intellectual generosity from all members in order to navigate differences in approach, understanding, and expectation. If boundaries between scientific disciplines may sometimes seem enormous, the boundaries between computer scientists and literary scholars may appear insurmountable. Computer scientists by training are likely to be more conversant with scientific disciplines than literary disciplines. Conversely, though literary scholars routinely use computers as tools, we know relatively little about their capacity to deal with complex systems like literary texts and the uncertainties they embody. Likewise, though we are used to considering texts as complex systems embodying uncertainty, we are not accustomed to thinking of them in terms of data.

During our presentation, we will share preliminary attempts at visualization, discuss the directions they suggest for creating visualizations that will be useful to literary scholars, and talk about the collaborative process that has brought us to this point.

Poems are small on the outside but large on the inside—not only in how they use tiny elements of meter, rhyme, and literary figure to access theme, but also in the complexity with which their elements interact. In this, they are unique in how they help us to understand scale and scope. Blake was able “To see a world in a grain of sand.” We see poems as living things, as complex in their movements as a brain or a heart. We will talk about how our team has struggled together to see how data inherent not only in vast digital libraries, but also in a single poem, is “big.” In this sense, we are confident that exploring poems in terms of “big data” might actually enhance, rather than compete with, close readings.

Reading Genres: Exploring Massive Digital Collections From the Top Down

Ben Schmidt
PhD candidate in History, Princeton University, and Graduate Fellow, Cultural Observatory at Harvard


At what scale can digital analysis address live questions in the humanities? On the one hand, humanists have long cultivated expertise in elucidating meaning from a single text or author; on the other, increasing numbers of scientists are drawn to massive digital corpuses by the appeal of describing ‘culture’ writ large. While digital reading promises only modest improvements to traditional techniques, the scientific approach rightfully causes many humanists discomfort for simplifying the variegated worlds of historical experience out of existence. This paper proposes that the most fruitful applications of ‘big data’ will come from a scale only slightly smaller–the analysis of categories of authorship that can encompass hundreds of thousands of texts. The most important of these categories–academic disciplines and geographic regions, ethnicities and genders–have themselves long been central objects of humanistic research. But to fully realize the benefits offered to humanists by digitization requires developing strategies, infrastructure, and vocabularies for reading digital libraries from the top down.

This paper will address the technical and intellectual challenges this sort of reading presents. As a historian at the Harvard Cultural Observatory, I have helped design and build some of the largest collections of text-as-data designed for historical research. Previously the CO collaborated with Google to build the Google Ngram viewer: since my arrival we have cultivated several terabytes of new textual data at a much more granular level, supported using cloud infrastructure and storage. These collections make trends in massive digital corpora with millions of texts and metadata (newspapers from the Library of Congress, books from the Internet Archive, journal articles from Jstor) available for both quick visualization (through a public website, Bookworm) and more intensive statistical research.

Drawing on these collections, my paper will demonstrate how digital reading opens two specific massive cultural fields for new sorts of analysis. The first is geography. Millions of historical newspaper pages have been digitized and placed in the public domain; properly structured, this data can show subtle geographic variations that neither keyword search nor close reading could unearth. By tracking the impact of a simple federal-mandated practice–spelling–across the late 19th and early 20th centuries, I will show how aggregate behaviors can map onto historiographical questions of the center and periphery. The second is academic discipline. Metaphors of the operation of mind let us explore questions of intellectual history, and recenter the subject from the individual to the discipline. While the individual-centered approaches of intellectual history privileges psychologists or philosophers, genrebased analysis suggests that fields like pedagogy are, perhaps, more influential.

The difficulties and uncertainties in establishing claims like these are not statistical. Rather, they involve questions about the coherence of metadata categories, the provenance of records, and the subtle biases of separately-collected sources. These difficulties are enormous. But they are also traditional questions of source interpretation, ones that need to be adapted but not abandoned to empower new techniques of reading to be effective.

What are You Going to Do with That Data?: Results of Needs Assessment of Humanities Scholars for Digital Collections

Harriett Green
English and Digital Humanities Librarian, University of Illinois


Library collections are an important source of data for digital humanists: Libraries digitize, transcribe and mark up their repositories of texts, images, and manuscripts to produce digital collections of primary source materials for humanities scholars to use in textual analysis, data mining, visualizations, and many other types of research methodologies. But are libraries producing digital materials that are optimized for digital humanities research? And are humanists getting all of the types of data that they need for their research to reach its fullest potential?

This paper will present the results of a recent need assessment study of humanities scholars on the incorporation of digital materials into their research and future needs when using digital collections. From October 2011 through May 2012, the author conducted a mass survey of English and History faculty and series of interviews with faculty from art, design, and theater departments from twelve U.S. research institutions (primarily from the Committee of Institutional Cooperation consortium). The author will analyze the data drawn from the survey and interviews to explore three facets of humanities scholars’ use of digital collections as research data: access to the data in the collections, the types of digital content, and the structure of the data.

Fuzzy Categories: Bridging Categorial and Probabilistic Approaches to Computational Text Analysis

Patrick Flor
English, Computer Science, and Institute for Digital Research in the Humanities, University of Kansas

1 ping

  1. Talk session: Aesthetics and Digital Humanities | THATCamp MLA Boston 2013

    […] isn’t possible. The abstract for their paper, “A World in a Grain of Sand,” is at and their slides and a video of the presentation are at […]

Leave a Reply

Skip to toolbar