CUL Retrospective Chinese Cataloguing Project

Background

1. An online catalogue of the Chinese collection was completed in 1994 and mounted as part of the CUL OPAC. The catalogue contains short-title records for all Chinese books in the Library, though analytical entries for most collectanea are excluded. The records are in Wade-Giles romanisation only, and were input from shelflists of the Library's holdings. The online romanised catalogue is continuously augmented, updated and improved.

2. As part of the RSLP UK Database of Chinese Research Materials Project (1 November 1999 - 31 July 2002), the existing romanised records were converted to Pinyin romanisation and added to the UK union catalogue. In addition, these records were upgraded by matching them with full vernacular-script MARC records.

3. Records were acquired from the China National Bibliography Retrospective Database (CNBRD) produced by the National Library of China. The following table shows the parts of CNBRD which had been acquired as at 11 May 2005:

Pre-1911 imprints:                                             165,212
Titles published 1911-1949:                                    139,190
Titles published 1949-2000:                                    973,877
Titles published 2001:                                         121,091
Titles published 2002:                                          93,588
Titles published 2003:                                          88,798
Titles published 2004:                                         123,890
Hong Kong and Taiwan imprints:                                  92,581
Serials (1911-2000):                                            46,320
Newspapers (1911-2000):                                          7,119

TOTAL number of CNBRD records:                               1,851,666

Records containing characters not in GB-2312 levels 1 and 2:     9,177 (0.49%)
4. These records have been downloaded into the CUL's local system and matching software developed by the Library's Automation Department (see below) had been used with the following results as at 31 July 2002:
                                     Unmatched       Matched         Total
Cambridge University Library         28,309 (54.76%) 23,384 (45.23%) 51,693
Needham Research Institute (EAHoSL)   3,234 (30.31%)  7,435 (69.68%) 10,669
Faculty of Oriental Studies Library     613 (62.76%)    364 (37.25%)    977
CU Union List of Chinese Serials*     2,619             -             2,619

TOTALS                               34,775 (52.72%) 31,183 (47.27%) 65,958

*no matches attempted as at 31.7.02

5. The CNBRD is growing rapidly. New additions will be acquired as they become available, so the matching process will eventually be complete. Other databases, such as that of the National Central Library in Taiwan, may also be obtained and used in the same way.

Methodology

6. If an ISBN is present, matching is straightforward, but since only books published in China since 1988 have ISBNs, the following matching technique has been developed in CUL. The program takes the title in Wade-Giles romanisation from the existing CUL short-title record and converts it to Pinyin. The conversion is made syllable-by-syllable from a simple conversion table, ignoring hyphens (which only occur in polysyllabic personal and place names). The program then creates an acronym from the initial letters of the Pinyin syllables of the title. This acronym is then used to find a match among acronyms also created by the program from the initial letters of syllables in the Pinyin parallel title fields (200 $9) in the full China MARC records. The set of records found is then searched once more for the date of publication of the CUL record. If only one is found, this is considered a successful match. A cursory visual check is sufficient to confirm that the match is correct.

7. The record number of the full China MARC record is then added to the existing romanised record, so that when it is displayed the matching vernacular script record is available for instant simultaneous display, at the discretion of the user. Among the advantages of this method the following may be mentioned:

(a) romanised-only and full vernacular script records continue to co-exist in the same catalogue (this will be necessary until all the romanised records have been matched);

(b) users who do not need or are not equipped to display Chinese script have the option of dispensing with it.

8. The CNBRD, which itself constitutes an invaluable reference source, may also be searched separately. Eight search options are provided:

(1) Title keyword: (any characters in any order, with optional qualifiers)

(2) Browsable indexes: (a) authors
                       (b) titles (including contained titles)
                       (c) series titles
                       (d) subject headings

(3) Concise mode: (a) title acronym (initial letters of pinyin title syllables)
                  (b) author acronym (initial letters of pinyin author name syllables)
                  (c) ISBN
9. The following China MARC fields are displayed for users, with labels in conformity with the house-style of the CUL OPAC. All other fields are suppressed in the OPAC display.
Title/Author:  200-215
Series title:  225
Uniform title: 500-501
Notes:         300-326, 328-345
Contents:      327
Subjects:      600-610
Other entries: 700-702, 711-712
10. A sample short-title record and its matched China MARC record are shown below, in "display" and "MARC" formats.

(a) CUL Chinese short-title record

 Author:         Ch'en, Wen-liang
 Title:          Pei-ching ch'uan t'ung wen hua pien lan
                 Peking, 1992
  
 Location:       [Univ. Lib.] FB.221.197
                 Aoi Pavilion, Ground Floor
                 Not on loan

 1 RECORD=  7750326677
 2 001 00 $a7540202351
 3 035 00 $aFB.221.197
 4 100 10 $aCh'en$hWen-liang
 5 245 10 $aPei-ching ch'uan t'ung wen hua pien lan
 6 260 00 $aPeking$c1992
(b) China MARC record
 Title:          北京传统文化便览 陈文良主编
                 北京 北京燕山出版社 1992.9
                 84,1379页 20cm
 Notes:          本书分为历史沿革、燕山蓟水、典章制度、民族宗教、园林景观、老
                 字商号、科学技术等24个方面, 阐述了北京历史与文化。
 Subjects:       文化史 北京
 Other entries:  陈文良
  
 1 RECORD=  4005692990
 2 001 01 93029107
 3 005 19 980000000000.0
 4 010 00 $a7-5402-0235-1$b精装$d?47
 5 100 00 $a19930714d1992....em.y0chiy0121....ea
 6 101 00 $achi
 7 102 00 $aCN$b110000
 8 105 00 $ay...z...000yy
 9 106 00 $ar
10 200 10 $a北京传统文化便览$9BEI JING CHUAN TONG WEN HUA BIAN LAN$f陈文良主编
11 210 00 $a北京$c北京燕山出版社$d1992.9
12 215 00 $a84,1379页$d20cm
13 330 00 $a本书分为历史沿革、燕山蓟水、典章制度、民族宗教、园林景观、老字商
14          号、科学技术等24个方面,阐述了北京历史与文化。
15 606 00 $a文化史$y北京
16 690 00 $aK291$v三版
17 692 00 $a22.51
18 701 00 $a陈文良$9CHEN WEN LIANG$4主编
19 801 00 $aCN$bNLC$c19930714
20 905 00 $dK291$eC583