AI for Cultural Heritage Hub (ArCH)
Cambridge’s GLAM institutions (galleries, libraries, archives, garden and museums) house millions of objects from across the globe, representing an unparalleled repository of cultural and natural history. However, challenges such as analogue formats, handwritten records, fragmented objects, multilingual sources and complex surfaces make much of this data difficult to access.
To address these challenges, the AI for Cultural Heritage Hub (ArCH) will deploy the convening power of Cambridge’s distributed network of collections to create a secure workspace and Community of Practice to empower non-technical users (practitioners and academics) to analyse cultural heritage data securely with AI tools.
By encouraging collaboration between curators, researchers, IT professionals, and AI experts, the new hub will prototype adaptive AI solutions to enhance understanding of collections and identify a selection of AI tools to address these challenges.
This project is funded by ai@cam and the Accelerate Programme for Scientific Discovery, made possible by a donation from Schmidt Sciences.
It is led by the Cambridge University Library Research Institute, in collaboration with the Department of Mathematics and Theoretical Physics and the Collections, Connections, Communities Strategic Research Initiative at the University of Cambridge.
Addressing cultural heritage challenges
ArCH’s six case studies will test the ability of AI methodologies to address three cultural heritage challenges.
Solving these challenges will serve researchers and wider society by benefitting cultural heritage practitioners, expert users and all those engaging with cultural heritage.
Challenge 1: Unlocking inaccessible data
Three of the case studies will address the challenge of unlocking inaccessible data by applying AI transcription and computer vision (CV) tools to digitised documents.
Case Study 1: AI tools will be used to convert analogue Cambridge University Library catalogue cards into online records. This has the potential to make thousands of rare books and maps discoverable, a project that would otherwise take years.
Case Studies 2 and 3: Historical handwritten biodiversity records from the University Museum of Zoology registers and specimen labels from the University Herbarium will be turned into machine-readable datasets. As well as deepening our understanding of these collections, this has enormous potential for biological research and the nature-human interface.
Left: Handwritten register from the Museum of Zoology (UMZC 1867-1902 register). Right: Speciman from Cambridge University Herbarium (CGE00081874).
Left: Handwritten register from the Museum of Zoology (UMZC 1867-1902 register). Right: Speciman from Cambridge University Herbarium (CGE00081874).
Challenge 2: Reconstructing fragmentary or dispersed cultural objects
Two further case studies will investigate how AI can assist with the reconstruction of fragmentary or dispersed cultural objects, to transform our understanding of them and their context.
Case study 4: This case study will test the ability of AI tools to reconstruct the position of unplaced papyrus fragments from the Book of the Dead of Ramose, an ancient document held at the Fitzwilliam Museum, by analysing fibre patterns.
Case Study 5: This case study investigates the potential of machine learning (ML) and computer vision tools to fill in missing text and analyse Mesoamerican symbols found in a sixteenth-century Nahuatl-Latin lectionary held in the Bible Society Collection at Cambridge University Library.
Microscopic study of the white material covering text in Mesoamerican lexicon (CUL BFBS Ms 375, f. 156v). Photographed by Flavia Fiorillo, CUL Centre for Cultural Heritage, January 2025.
Microscopic study of the white material covering text in Mesoamerican lexicon (CUL BFBS Ms 375, f. 156v). Photographed by Flavia Fiorillo, CUL Centre for Cultural Heritage, January 2025.
Challenge 3: Integrating expert cultural knowledge into AI algorithms
Case study 6 will investigate the use of LVM tools trained on small, bespoke datasets of specific types of cultural heritage artefacts, integrating expert, practitioner and community knowledge.
Shady Sharify from Better Images of AI
Shady Sharify from Better Images of AI
Engage with ArCH
Read the ai@cam blog post introducing the project and its aims with Project Lead Amelie Roper
Explore how and why the ArCH workspace was developed in this blog post by Lead Software Developer, Jennie Fletcher
Sign up to the ArCH mailing list to keep up-to-date with the project’s progress and for opportunities to engage with its work.
Watch the ArCH project team's presentation at the RLUK Digital Shift Forum (November 2025).
Project team
The text in this work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

