skip to content

‘Discovering history in the Cairo Genizah’ (2012–16)

This was a major project funded by the Andrew W. Mellon Foundation, and run by Cambridge University Library’s Genizah Research Unit in cooperation with the Digital Content Unit and Digital Services departments of the Library. It was designed to facilitate discovery of the medieval manuscripts of the Taylor-Schechter Cairo Genizah Collection by using techniques from the fields of text mining, information retrieval and natural language processing to automate the process of producing descriptive catalogue data.

There are more than 100 years of published scholarship on Cambridge’s Genizah Collection – an immense collection of medieval Jewish manuscripts recovered from an Egyptian synagogue. By OCRing as many of those sources as possible, automating the process of extracting keyword data from them and associating those tags with the images of digitised manuscripts, the project has greatly increased the amount of descriptive metadata for more than five thousand of the most important historical documents from the Genizah. These documents relate principally to the history of the Jews under Islam in the tenth to mid-thirteenth centuries CE, for which the Genizah is an unparalleled source. The keyword metadata provides a new way of searching and browsing the Collection across broad subject areas or around distinctive items of vocabulary or key concepts. Furthermore, the interface is able to suggest similar items that might be of interest to the user, based on the similarity of the accrued data.



Since the descriptive tags have been derived automatically, new features have been added to the interface to allow users to reject those tags or add their own in various ways. This feedback is collected, weighted according to an authority model, and incorporated into the metadata, allowing the search and browse data to be refined over time and use.

An article describing the text-mining techniques employed can be read in Manuscript Cultures 7, pp. 29–34, available online here:



‘Editioning an archive’ – the Board of Longitude Project (2011–14)

The Library was successful in obtaining funding from the Jisc in its 2011 Content Programme to create a digital archive of the Board of Longitude Papers, the records of an important eighteenth-century scientific body that sought reliable methods of determining longitude at sea and funded a considerable amount of technical invention, astronomical observation and exploration throughout the eighteenth century and into the early ninteenth century.

Comprising 63,000 high-resolution images from 242 volumes of material, the resulting archive is one of the most significant open digital resources on the history of eighteenth-century science available on the web. It covers the period 1715–1828, referencing 1,337 individuals, 777 places and the journeys of more than 300 ships. Its research potential is immense and diverse: in addition to charting the development of astronomy and instrument-making, it includes important historical records of early European contact with indigenous peoples in the Pacific and weather observations that are used in constructing historical climate models. It includes correspondence with hundreds of individuals from throughout the UK, so can also provide information of interest to genealogists, local historians and linguists.

The digital archive project coincided with a five-year AHRC project to research and write the history of the Board, a collaboration between Cambridge’s History and Philosophy of Science Department and the National Maritime Museum at Greenwich. The two projects were complementary: the formal outputs of the AHRC project provided a means of orientation and interpretation for the archive, and the digitisation project opened up the primary sources to support the research and enable future work to go on beyond completion of the project itself. But the collaboration was much deeper, with members of the AHRC project writing accessible introductions to units of the archive, short essays on major themes or people associated with the archive, and working with the Jisc project's archivists to help with the formal archival description of the papers. In the process of working on the digital archive project – and once available online being able to search across the archive – the researchers made several discoveries that might otherwise have been overlooked. In addition to extensive metadata and the contextual essays, a full TEI transcription of the formal minutes of the Board was commissioned.

In effect the digitisation project used the research project to create an edition of the archive, with rich description, commentary and links within the content. The collaboration also extended to the National Maritime Museum. In addition to the Library’s major collection, the Board of Longitude Papers includes important archives and printed books from the Museum's collection and contains links to hundreds of illustrations and objects held by the museum. The Museum also took on an important role in facilitating public engagement with the archive - developing learning resources for schools and highlighting the archive in their major exhibition, held  2014–15 and then touring internationally 2016–17.

The Guardian produced an overview of the archive at its launch in 2013: