Batch upload services

DSpace@Cambridge offers a batch ingest service to deal with computer-generated, large volumes of data. This can be collections of images, text files or any other sets of data including publication lists. In order to use the batch upload service the material needs to conform with the DSpace standard which is described below. The DSpace@Cambridge team is happy to help with converting your material to this standard. Please contact us for assistance.

Batch importing

The basic concept behind the DSpace simple archive format is to create an archive, which is directory full of items, with a subdirectory per item. Each item directory contains a file for the item's descriptive metadata, the files that make up the item and a "contents" textfile listing the items to be imported.

archive_directory/
item_000/
dublin_core.xml
contents
file_1
file_2
item_001/
dublin_core.xml
contents
file_1
etc...

The dublin_core.xml file has the following format, where each Dublin Core element has its own entry within a <dcvalue> tagset. A full list of the Dublin Core elements available in DSpace can be found here (pdf). There are currently three tag elements available in the <dcvalue> tagset:

  • <element> - the Dublin Core element
  • <qualifier> - the element's qualifier
  • <language> - (optional)ISO language code for element

Example:

<dublin_core>
<dcvalue qualifier="none" element="title">Stereoscopic Photographs of the Franklin Relics, No. 2</dcvalue>
<dcvalue qualifier="author" element="contributor">Cheyne, John Powles</dcvalue>
<dcvalue qualifier="created" element="date">2008</dcvalue>
<dcvalue qualifier="issued" element="date">1861</dcvalue>
<dcvalue qualifier="provenance" element="description">Lefoy, Jessie, bequest, 1941</dcvalue>
<dcvalue qualifier="medium" element="format">print, cardboard</dcvalue>
<dcvalue qualifier="none" element="subject">sailing ships</dcvalue>
<dcvalue qualifier="none" element="subject">equipment</dcvalue>
</dublin_core>

The contents file simply enumerates, with one file per line, the bitstream (file) names. If you would like access to files to be mixed, with some bitstreams public and some private, contact us so we can set up an importer schema tailored to your specific needs.

The network batch transfer system

It is possible to transfer the batch data to the repository over the network. This secure service uses the SFTP protocol, for which native clients exist in Linux and Mac OS X, as well as many free clients for Windows, like WinSCP.

If you would like to use this service, please contact us to set up an account and an importer schedule.

The spreadsheet metadata importer tool

The spreadsheet importer is a tool assisting with the conversion of existing descriptive information (metadata) to the Dublin Core metadata standard used in DSpace@Cambridge. If you are interested in using this tool please get in touch.

Spreadsheet structure

spreadsheet

 

The spreadsheet must follow a well-defined structure. It can be made in any spreadsheet package that can export to CSV, such as Apple's Numbers, Microsoft Excel or OpenOffice Calc. The data is grouped by row, containing one row for each item. The metadata and filename are then put in the columns.

The first three rows are special, and define the columns. The first row contains the metadata schema to use (usually "DC" for "Dublin Core"). The second and third row define which dublin core elements and qualifiers that column will hold, respectively.

Any following rows define one item, with dublin core metadata in the appropriate columns (cells are allowed to be empty). Any files (bitstreams) associated with the item are put in columns immediately after the final metadata column (ie. in those columns with the top three cells empty). There is no limit to the number of bitstreams that can be assigned to an item. The name of the files has to be unique, since all files have to be in one, single directory. There should be only one filename per cell (ie. per column).

 

.