Skip to content Skip to navigation

Archives, Access and AI: Working with Born-Digital and Digitised Archival Collections

Archives, Access and AI: Working with Born-Digital and Digitised Archival Collections
“As all historians know, the past is great darkness, and filled with echoes. Voices may reach us from it; but what they say to us is imbued with the obscurity of the matrix out of which they come; and, try as we may, we cannot always decipher them precisely in the clearer light of our own day.”
Margaret Atwood – The Handmaid’s Tale
Digital archives are transforming the Humanities and Social Sciences. Not so long ago, historians, literary scholars and sociologists of culture would read letters and other papers preserved in Special Collections Libraries. Of course, this analogue world has not disappeared, but the digital revolution has profoundly changed the way we encounter archives. Digitised collections of newspapers and books have pushed scholars to develop new, data-rich methods. Born-digital archives are now better preserved and managed thanks to the development of open-access and commercial software. Digital Humanities have moved from the fringe to the centre of academia.
Yet, the path from the appraisal of records to their analysis is far from smooth. There are three main challenges:
First, the volume of digital archives makes it extremely difficult for archivists to assess record sets. Automation is no longer a choice, it is a necessity, particularly in the case of unstructured records. Machine learning is becoming an integral part of archival processes, complementing rather than replacing human skills. To manage the sheer bulk and potential sensitivity of records, archivists may also rely more and more on records creators to help them make appraisal and selection decisions at the point of deposit.
Second, born-digital collections are too often inaccessible due to technical and data protection issues. These “dark” archives contain vast amounts of data essential to Humanities scholars – including email correspondence, drafts of manuscripts, digital photos and videos. We urgently need to unlock these data to fully make sense of our cultural Heritage.
Third, data science and AI are becoming essential tools in the Humanities, but few scholars have been trained to master these research methods. This skills gap has an impact on the training we offer to students, a training that continue to center on qualitative methodologies. As Ted Underwood points out, “to prepare students for a world where information is filtered by computers, we will need a stronger alliance between the humanities and math.” This requires a combination of traditional humanistic methods with data-rich approaches, to analyse vast amounts of records at scale.
Automation, Access and AI are becoming keywords to decipher our history. We do not suffer from a lack of records, but from too many records – often locked away in dark archives. To paraphrase Margaret Atwood, voices of the past may emerge from this darkness, but they cannot always be deciphered in the clearer light of the present. Access to dark archives is central, but needs to be complemented with data-rich methodologies.
How can we shed light on born-digital and digitised archives? What is the role of automation and AI? How can we give greater access to archives currently closed to the public? What are the best ways to involve donors/ creators of born-digital archives, and work with them in an active and collaborative way rather than as a passive part of the deposit process?
This three-day conference in London (15-17 Jan. 2020) will bring together archivists, humanities scholars, computer scientists and policy makers to discuss the way we work with archives now. At a time of rapid change, it is essential to harness the data revolution to bring our cultural heritage from obscurity to light.
Please send a 300-word proposal and one-page CV by 10 June 2019 to Lise Jaillant ( and Victoria Stobo (

This conference is partly funded by an AHRC Leadership Fellowship awarded to Dr Jaillant for “Survival of the Weakest: Preserving and Analysing Born-Digital Records to Understand How Small Poetry Publishers Survive in the Global Marketplace.”