Technical details
This section is aimed to give basic technical information on the DSpace@Cambridge system and its developments. For more in depth information on any technical issue or for a progress report on developments please contact us.
Platform
The DSpace@Cambridge live system runs on several Linux machines (see below under Hardware). The database runs on PostgreSQL and the web application on Tomcat. Several Solaris-based servers provide storage for DSpace bitstreams, with a total capacity of about 100TB. Backups are stored on disk, on off-site servers.
The DSpace Technology Platform is a community-developed open source platform. The Cambridge University Repository is based on the DSpace 1.5 sourcecode with several local modifications.
Hardware
The hardware setup consists of two clusters of machines: one for the live system, secondary services and development, and another for backups (hosted off-site). The two sites are linked by redundant 10Gbit links.
The various application servers (all similar Dell hardware) run Ubuntu Linux, and have a range of local disk configurations as necessary for their tasks.
Storage consists largely of SATA disks, and is hosted on Solaris. The live system uses a 100TB EMC SAN and Brocade fibre channel fabric, exported by Sun hardware over NFS. In order to secure the stored data it is mirrored off-site. The backup systems use Sun X4500 and X4540 servers, which hold 48 disks in a single 4U-high chassis. These run Solaris, allowing the use ZFS (the Zettabyte FileSystem). They use rsync over NFS to synchronise the data from the live system. Backups are incremental, and data is kept as long as disk space allows. In all, DSpace@Cambridge uses about 650 hard drives spread out over all its servers and sites. In order to avoid network bottlenecks, network interfaces on storage servers are bonded together to provide higher throughput where possible. Now the limiting factor has become disk I/O, rather than networking. The use of NFS also allows us to re-use the same assetstore for secondary (read-only) services, such as the SOAP interface (see below).
Set-up
Because we can rely on high-speed networking, various server tasks are divided between different systems. The live webapp DSpace consists of separate servers for the webapp, database, SOAP and mail- and related infrastructure tasks.
The development servers mirror this set-up for accurate testing. All systems are monitored, be it graphically to detect trends (using Cricket); by Nagios to detect technical problems; or both.
Soap
SOAP (Simple Object Access Protocol) is a web service language. It allows our users to build websites on top of content and metadata stored in DSpace@Cambridge. This means that, for example, departmental web pages can have the look and feel of the individual department while displaying items (images, bibliographic details, or other material) stored in DSpace@Cambridge (click the image to see a model of how this works). It can be compared to using a database backend to a website, except that it talks to DSpace instead. It was specifically developed to aid the substantial group of webmasters around the University using PHP, with an initial focus on the delivery of images (e.g. resizing images on the fly). We provide users with documentation, a library of standard function and a sample PHP website to get started. If you would like more information on this service, please get in touch.
Dark items
By default, DSpace releases a lot of information about "dark items" (items that aren't meant to be publicly visible, and have anonymous read access disabled). DSpace@Cambridge decided that this "leakage" was unacceptable with regards to our obligations under the Data Protection Act , so we set out to make dark items truly restricted in access. This included masking the item metadata in various browse views, initially replacing the text with "Dark item", later by filtering those results completely unless the user was explicitly allowed to see them. The final step was to filter them from the front page "recent submissions", the RSS feed and the OAI harvester.

