skip to content

Cambridge University Library


A partnership involving Cambridge University Library and led by the National Library of Scotland has secured £230,958 funding from the Wellcome Trust to archive and explore online resources about health information and the Covid-19 pandemic.

Titled ‘The Archive of Tomorrow: Health Information and Misinformation in the UK Web Archive’, the project will examine how we archive websites and other online information about health.

Joseph Marshall, Associate Director of Collections Management at the NLS, said: “The Covid-19 pandemic has contributed to a global crisis of information vs misinformation which has played out mostly online. Government and medical websites have changed on a daily basis as new information emerges, and there has been a massive proliferation of opining on social media and other online publications about Coronavirus.

“Health advice, data and scientific evidence have been contested, revised, used and misused with dramatic and sometimes tragic consequences, and yet the digital record of this is fragile and difficult to access. How easy will it be in a few years’ time to source the tweets, blogs and news stories from the past 18 months and will we be able to make sense of it all? These are the questions we’ll be asking.”

Alongside the National Library of Scotland and Cambridge University Library, other project partners include Edinburgh University Library and Bodleian Libraries, Oxford, with key roles based at all institutions that will form a network of expertise and investigation. The British Library will play a key supporting role in the project.

Caylin Smith, Head of Digital Preservation at Cambridge University Library, said: "The current period is a time where thinking critically about the information we consume, particularly online, is more important than ever. Web archives form an invaluable resource for capturing, as well as enabling users to interrogate, the present moment as it's experienced, as well as the recent past.

"But the web is a vast and volatile resource where information can change, disappear, be misleading, be factually incorrect - or a combination of these things. Online information relating to health, and especially the Covid-19 pandemic, has a wide reach and potentially global consequences. This project will build upon the UL's efforts to collect anc archive the pandemic, as well as its role as a UK Legal Deposit Library and shared responsibility for the UK Web Archive."

In 2020, Cambridge University Library launched a separate appeal for help in building a collaborative history of the Coronavirus outbreak, from residents of the city and beyond, with hundreds of submissions from the public.

The UK Web Archive is a partnership of UK Legal Deposit Libraries. Legal deposit libraries are entitled by law to collect anything published in the UK. The UK Web Archive collects and preserves UK-related web content, including large-scale automated capture, curated collections, and webpages nominated by a range of partners and stakeholders. The partnership has to date archived billions of webpages.

The ‘Archive of Tomorrow’ project will preserve 10,000 sites relating to health – both official and unofficial – and use this collection to make web archives more accessible for researchers and members of the public. Even if a contested website or webpage has been deleted, it’s possible it can still be archived through this project, so it can be included in the research of the proliferation of misinformation.

Marshall added: “Libraries and archives have always strived to collect the stories of our times, and this is more important than ever when information is literally a matter of life and death. We will ensure a wide representation of diverse and otherwise un-collected sources. And we will tackle some thorny questions including how we can ethically capture and describe misinformation and fake news for posterity. It’s our hope that a project like this will help us make sense of events of the past 18 months, and ultimately improve our ability to interrogate factual information and misinformation in the future.”

‘The Archive of Tomorrow’ 14-month pilot project will start in December 2021, which will involve a dedicated project team. Specific aims of the project are to curate a new collection of websites within the UK Web Archive under the theme of ‘Health and Misinformation’; use the collection to explore options for metadata, computational analysis, ethics and rights issues; build a research network across a range of disciplines; make recommendations to make web archives more representative, inclusive, and open for health research.