Datafication in the Historical Humanities: Reconsidering Traditional Understandings of Sources and Data

Jun 02, 2022 - Jun 04, 2022

International Conference and Workshop at GHI Washington | Conveners: German Historical Institute Washington in collaboration with Luxembourg Centre for Contemporary and Digital History (C2DH), Chair of Digital History at Humboldt Universität zu Berlin, Consortium Initiative NFDI4Memory, Roy Rosenzweig Center for History and New Media, and Stanford University, Department of History

Given the uncertainties around Covid-19 and to open the conference to a wider audience, we changed to a hybrid format, combining online and in-person attendance. Check out our Program and read our Event Registration to find out which events, we will be hosting hybrid or virtual such as our Virtual Poster Session. We are looking forward to three exciting days of discussion!

Conference Website

The Fifth Annual GHI Conference on Digital Humanities and Digital History will revolve around the concept of “datafication,” that is, the production of and the shift toward digital representations of historical sources as a prerequisite for storage, access, and analysis, not to mention their transmission and publication online.

Historians outside the field of quantitative social history rarely consider their objects of study as “data,” even when they look at documents or paintings in digitized versions on their screen. These witnesses of human lives call for emotional, imaginative, and empathetic engagement and thus cannot be reduced to mere commodities to fuel a new kind of computational research, despite what the slogan “data is the new oil” might suggest. Sources, not data, we might thus insist, are at the heart of historical research. On the other hand, we readily observe that gathering, organizing, sorting, excluding, and searching for selected information from (digital) sources are routine processes of historical investigation. Data-centered research, seen from this angle, seems more a continuation with updated tools and technologies than a radical break from traditional methods of inquiry. Johanna Drucker has forcefully pointed out that we should reconceive all data as “capta,” taken and not simply given as the designation might imply. Data is therefore not a natural representation of something pre-existing, but created as part of a knowledge-production process open to investigation and critique. Data in the humanities, by adopting Christof Schöch’s working definition, can therefore be considered as a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry.

While we have seen a convergence in data modeling in text-oriented humanities (TEI), library science (FRBR), and for cultural heritage information (CIDOC CRM), no conceptual framework for modeling, curating, and managing data in historical research has gained wide adoption. The one possible exception comes from Wikidata, a project that has been conceptualized and populated with very little input from within our field. Ruth Mostern and Marieka Arksey argue that there are still no standards to emulate due to the small number of historical datasets currently available, and their heterogeneous nature. However, historical data repositories are “unlikely to realize their promise until the social life of data becomes part of the profession.” The current push by funders for National Research Data Infrastructures, such as NFDI in Germany, both adopts this idea of making data sharing a part of professional practice and calls for interdisciplinary research. Such activities are premised on the idea of the “social life of data,” the concept that research data and models designed and collected for very specific questions might become useful for a broader audience. The support for the re-use of both technical infrastructure and the models used for data collection will jumpstart their wider adoption.

The obstacles to such an undertaking are simultaneously conceptual, structural and practical: modeling the entire range of historical investigation is a call to modeling the entire world, from the very beginning until now. This raises the question whether these models are not in principle culture-bound, which excludes a global approach per se and leads to the question to what extent it is possible to find a generic conceptualization within a subgroup alone.  However, especially in the context of datafication processes, the question of data modeling is a crucial one, since it lays the groundwork for historical research for future generations. It is a time-consuming and cost-intensive process that needs to be well conceived and thought through. There is a great risk of creating path dependencies that later limit our ability to work with this data.

Historical research often takes a nonlinear or even meandering path through many phases of uncertainty and redefinition. Just like traditional source-based studies, a data-driven investigation will not usually start with a predefined set of sources and questions, but will extend and refine the scope, the structure, and the rules for data entry continuously as new questions arise and additional material is encountered. In addition, we notice a lack of tradition in collaborating in larger teams that include programmers, archivists, librarians and other information professionals. Therefore, humanist data often has quite irregular shapes and does not meet the expectations of a building block that can easily be incorporated into larger structures outside the context of its original research. 

For the conference, we would like to focus on the still mostly manual, therefore labor-intensive, and intellectually challenging task of transforming sources and collections into comparatively small but highly rigorous “handcrafted” datasets. How are the archives for such projects defined, developed, and managed? How do we select primary sources, deal with collections and create data models for their digital representations? With whom do we collaborate in this process? What logic and constraints shape the normalization of information when inputting them for comparison and analysis, and, just as important, what is discarded and how is absent or ambivalent data handled? What standards guide our datafication processes, which tools support us and what is the right scale to use? At the same time, which explicit and implicit limitations do such decisions impose on us? How does datafication create new archives, as Vincent Brown argues, defined by the tools used to explore them and the design decisions made during their creation? What could be the general design principles we follow in the process of datafication of historical sciences?

Open Sessions


Given the continuing uncertainties around Covid-19, and to open the conference to a wider audience, we have decided to present the Fifth GHI Conference on Digital Humanities and Digital History, “Datafication in the Historical Humanities: Reconsidering Traditional Understandings of Sources and Data,” as a hybrid event taking place from June 2 to 4, 2022.

Remote participants will be able to listen to our two Keynotes on current perspectives in digital history, join our Workshops to discover new websites and software, and explore the newest research by digital historians in our Virtual Poster Session! Given time differences and to enable more manageable interchange  the conference’s panel discussions will only be open to invited participants.
 

Keynote I: “Table for One: Anecdotes on the Cultures and Challenges of Data(fication) for Historians” 

June 2, 9:30am – 11:00am (ET)
Zoe LeBlanc (University of Illinois, Urbana-Champaign)
 

Keynote II: “What’s in a Footnote? Datafication and the Consequences for Quality Control in Historical Scholarship” 

June 3, 10:00am – 11:00am (ET)
Pim Huijnen (Utrecht University)
 

Virtual Poster Session

June 4, 9:00-10:30 AM (ET)
 

Conference Workshops

June 2, 11:30am – 1:00pm (ET) & June 4, 11:00am – 12:30pm (ET)

Call for Papers


The Fifth Annual GHI Conference on Digital Humanities and Digital History will revolve around the concept of “datafication,” that is, the production of and the shift toward digital representations of historical sources as a prerequisite for storage, access, and analysis, not to mention their transmission and publication online.

Historians outside the field of quantitative social history rarely consider their objects of study as “data,” even when they look at documents or paintings in digitized versions on their screen. These witnesses of human lives call for emotional, imaginative, and empathetic engagement and thus cannot be reduced to mere commodities to fuel a new kind of computational research, despite what the slogan “data is the new oil” might suggest. Sources, not data, we might thus insist, are at the heart of historical research. On the other hand, we readily observe that gathering, organizing, sorting, excluding, and searching for selected information from (digital) sources are routine processes of historical investigation. Data-centered research, seen from this angle, seems more a continuation with updated tools and technologies than a radical break from traditional methods of inquiry. Johanna Drucker has forcefully pointed out that we should reconceive all data as “capta,” taken and not simply given as the designation might imply. Data is therefore not a natural representation of something pre-existing, but created as part of a knowledge-production process open to investigation and critique. Data in the humanities, by adopting Christof Schöch’s working definition, can therefore be considered as a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry.

While we have seen a convergence in data modeling in text-oriented humanities (TEI), library science (FRBR), and for cultural heritage information (CIDOC CRM), no conceptual framework for modeling, curating, and managing data in historical research has gained wide adoption. The one possible exception comes from Wikidata, a project that has been conceptualized and populated with very little input from within our field. Ruth Mostern and Marieka Arksey argue that there are still no standards to emulate due to the small number of historical datasets currently available, and their heterogeneous nature. However, historical data repositories are “unlikely to realize their promise until the social life of data becomes part of the profession.” The current push by funders for National Research Data Infrastructures, such as NFDI in Germany, both adopts this idea of making data sharing a part of professional practice and calls for interdisciplinary research. Such activities are premised on the idea of the “social life of data,” the concept that research data and models designed and collected for very specific questions might become useful for a broader audience. The support for the re-use of both technical infrastructure and the models used for data collection will jumpstart their wider adoption.

The obstacles to such an undertaking are simultaneously conceptual, structural and practical: modeling the entire range of historical investigation is a call to modeling the entire world, from the very beginning until now. This raises the question whether these models are not in principle culture-bound, which excludes a global approach per se and leads to the question to what extent it is possible to find a generic conceptualization within a subgroup alone.  However, especially in the context of datafication processes, the question of data modeling is a crucial one, since it lays the groundwork for historical research for future generations. It is a time-consuming and cost-intensive process that needs to be well conceived and thought through. There is a great risk of creating path dependencies that later limit our ability to work with this data.

Historical research often takes a nonlinear or even meandering path through many phases of uncertainty and redefinition. Just like traditional source-based studies, a data-driven investigation will not usually start with a predefined set of sources and questions, but will extend and refine the scope, the structure, and the rules for data entry continuously as new questions arise and additional material is encountered. In addition, we notice a lack of tradition in collaborating in larger teams that include programmers, archivists, librarians and other information professionals. Therefore, humanist data often has quite irregular shapes and does not meet the expectations of a building block that can easily be incorporated into larger structures outside the context of its original research. 

For the conference, we would like to focus on the still mostly manual, therefore labor-intensive, and intellectually challenging task of transforming sources and collections into comparatively small but highly rigorous “handcrafted” datasets. How are the archives for such projects defined, developed, and managed? How do we select primary sources, deal with collections and create data models for their digital representations? With whom do we collaborate in this process? What logic and constraints shape the normalization of information when inputting them for comparison and analysis, and, just as important, what is discarded and how is absent or ambivalent data handled? What standards guide our datafication processes, which tools support us and what is the right scale to use? At the same time, which explicit and implicit limitations do such decisions impose on us? How does datafication create new archives, as Vincent Brown argues, defined by the tools used to explore them and the design decisions made during their creation? What could be the general design principles we follow in the process of datafication of historical sciences?

At this conference, we will discuss both the practical aspects of datafication in conjunction with theoretical, methodological, ethical, and legal reflections on the role of data within the field of digital history in a transatlantic context. We seek contributions from implementers and stewards of systems and standards for historical data collecting and modeling, schemas, ontologies, and knowledge graphs, from laborers and practitioners of data production, and from researchers building or reusing pre-existing datasets into their research. We welcome critical reflections on the process of datafication, its epistemological prerequisites, consequences and all the different decisions it involves, the “social life of data,” questions of ownership, peer review, sustainable storage, the publication and sharing of data, its responsible use, and the pitfalls and costs involved in creating, storing and accessing historical datasets.

The conference is expected to begin with a day of workshops followed by two days of paper presentations. Please submit proposals by April 1, 2021 April 8, 2021 for either or both of the following options:

  1. 20-minute presentations at the conference
  2. workshops on particular digital tools or standards of one to two hours. Please include a suggested schedule and intended participant learning outcomes.


We are currently planning this event as an onsite workshop and conference. However, given the uncertainties around Covid-19, we might change to a hybrid format, combining online and in-person attendance, or an entirely online event depending on the health situation and possible travel restrictions. The dates will remain the same.

Possible conference topics include (but are by no means limited to):

  • relations between (historical) sources and research data
  • long-term implications of datafication,
  • long-term consequences of decisions in the concrete process of datafication,
  • chances and limits of shared modeling and conceptualization, its sustainability and acceptance
  • the differences and relationship between research-driven and curation-driven approaches in data generation
  • our expectations on historical research data
  • data management systems for historians
  • (reusable) knowledge representation for historical data
  • data publishing: quality, accessibility and representativeness
  • legal and ethical challenges to collecting and sharing historical data
  • the history of datafication in historical method and practice


Please upload a short CV and paper abstract of no more than 500 words by April 1, 2021 April 8, 2021 at the GHI platform. Selected participants will receive an individually calculated lump sum to support travel expenses and accommodation costs of conference participants (one presenter per talk or workshop). For further information regarding the event’s format and conceptualization, please contact Jana Keck (keck@ghi-dc.org) or Atiba Pertilla (pertilla@ghi-dc.org).