This is meant to run as a 1.5 hours (minimum) module, aimed mainly at pre-fieldwork PhD students but lending itself to participation by larger audiences (students who are writing up, post-doctoral researchers...). It is designed as an interactive workshop, in which participants are encouraged to intervene and contribute their knowledge, and it presupposes basic data management skills. The advanced module builds on the basic 'data management for social anthropology' module, which might be run separately or attached to the advanced course. If this module is not run as a follow-up to the basic module it might be helpful to distribute a survey in advance of the session, to find out about students' skills, requirements and interests in relation to data management. This will allow for better participation, and students could be asked to volunteer their practical skills to the rest of the class. The duration might therefore change depending on the expected level of participation.
This module deals in more detail with issues of:
- metadata, ontology, and other data organisation, with examples from repositories;
- ethics, freedom of information, intellectual property and data protection; and
- sharing and dissemination tools, techniques, and experiences.
[Note: italic text between square brackets indicates notes for the instructors.]
Links to module materials
- Documenting your data
- Ethical and legal issues
We deal with issues of documentation, i.e. the description of how data are created or digitised, what they mean, what their content and structure is, and whether/how they might have been modified.
This is important because of issues of long-term analysis and re-use, sharing, collaboration, and archiving.
Documentation and metadata are more easily applied if planned from the outset, although of course some interpretive categories may become obsolete as your way of thinking changes. Metadata might thus be constantly changing.
Data documentation applies to both analogue and digital data. We begin by exploring some 'classical' ways of documenting data and archiving in anthropology, to then move to contemporary tools and techniques that apply to digital data.
Many among the great anthropologists of the 20th century have archived (at least part of) their data or have had their data archived by others, for public use. Some illustrious examples of anthropological paper archives (and their digital indexes):
- Jack Goody's paper archive at St John's College, Cambridge, 1950-2000 (circa) made up of c. 35 files, 50 boxes, containing notes, field notebooks, correspondence and offprints relating to Goody's researches on the Bagre, Gonja, the Lodagaa, Ghana, and Gujarat.
- Bronislaw Malinowski's archive at the LSE. The Malinowski archive comprises a wide variety of material relating to Malinowski's life and work, including field notes; papers regarding his teaching at LSE, Yale and other universities; material relating to Malinowski's published and unpublished works; correspondence; printed material; and personal papers such as diaries and family correspondence. The archive also contains 1,000 photographs taken during Malinowski's fieldwork on the Trobriand Islands.
Famously, his diaries were published posthumously and caused great controversy for the vehement expression of his frustrations with the people and the place he was studying. He had probably never meant for the diaries to be published - so beware!
More generally, making data public presents several challenges in ethical and methodological terms, which we will come back to.
[Participants discuss some issues relating to the publication of Malinowski’s diaries, and the readings assigned. Possible questions for discussion: what kind of data is a private diary? Does it actually count as data, and how? Should it be archived and published? Does a diary like this undermine the rest of the research?]
- Annette Weiner also did a re-study on Trobriand Island exchange practices which of course relied on Malinowski's own work (including the diaries, correspondence, and pictures). Weiner's archive (1970-1997) is held at NYU, and contains administrative memoranda, correspondence, reports, field notebooks, publications, unpublished writings (reviews, course materials and dissertations), newspaper clippings and papers produced since Weiner's graduate years.
- SOAS-led research project on Indian village life, a 'restudy' of three villages in the Indian states of Gujarat, Madhya Pradesh and Orissa.
- The London School of Economics Archive has an extensive anthropology section, including materials from Raymond Firth, Ernest Gellner, Phyllis Kaberry, Siegfried Nadel, Margaret Read, Audrey Richards, Charles Seligman and Isaac Schapera.
In these examples, indexes have been made in digital format or digitized, and presumably a number of people (and not necessarily the archives' authors themselves) contributed a lot of labour over many years.
What is an index and how does it work?
[The floor is open for a brief discussion of what an index is.]
For personal, analogue archives, a common method of indexing is paper slips/index cards.
- Sir E. Frazer's index cards, as explained by Alan MacFarlane on video [show clip], are also an example of 'meta-digitisation'.
MacFarlane's own methodologies:
- based on the 'one fact, one card' recording of data, which allows to store data according to a number of different criteria (chronological; according to location, or any other chosen one). Reshuffling is seen as an integral part of the analytical process.
Similarly, Levi-Strauss' method for the structural analysis of myth consisted in analysing each myth individually, breaking down its story into the shortest possible sentences, and writing each sentence on an index card bearing a number corresponding to the unfolding of the story. Each such card "will thus show that a certain function is, at a given time, linked to a given subject", that is, "each gross constituent unit will consist of a relation" (211). However, this definition remains unsatisfactory because the linguistic units of a lower order are also "made up of relations", and "we still find ourselves in the realm of a non-reversible time, since the numbers of the cards correspond to the unfolding of the narrative", while "mythological time" is both synchronic and diachronic.
'The true constituent units of a myth are not the isolated relations but bundles of such relations, and it is only as bundles that these relations can be put to use and combined so as to produce a meaning. Relations pertaining to the same bundle may appear diachronically at remote intervals, but when we have succeeded in grouping them together we have reorganised our myth according to a time referent of a new nature… a two-dimensional time referent which is simultaneously diachronic and synchronic' (212).
The "synchronic-diachronic structure of the myth permits us to organise it into diachronic sequences (the rows in our tables) which should be read synchronically (the columns)" (229). This "'slated' structure" (229), as he terms it, "comes to the surface, so to speak, through the process of repetition," (229) as illustrated in the analysis of the Oedipus myth.
Here you also see how collection/capture is already a form of archiving,or at least that the two happen almost simultaneously.
- a large index of slips on numerous topics arranged into a hierarchical index system, which MacFarlane started assembling since his undergraduate years, and continued with the help of his wife. The hierarchical divisions were not worked out all at once, but gradually evolved, the main headings in the first five years, and then lower level ones over the next fifteen years. In this 'Topics' database, the top level of the hierarchy had about twenty major headings such as:
Agriculture, Economy, Marriage, Politics, Capitalism.
This slip index method was not just used for the 'Topics' database, but was later adapted for use in the study of particular communities, historical and anthropological. The top level then became:
Person, Place, Date, Source, Subject.
Under each heading would be the next level. For instance, under Capitalism would be:
'Causes of Capitalism', 'Consequences of Capitalism', 'Feudalism and Capitalism' and so on.
At the third level of the index, would be further sub-divisions, for instance under 'Feudalism and Capitalism' would be headings such as:
'Was England feudal?', 'Is European feudalism unique?', 'When did feudalism end?
Roughly there were some two thousand slips per unit of the top level, each of these divided into ten or more sub-headings, and each of these divided into ten or more with an average of twenty slips. In practice, MacFarlane explains that there is enormous variance in this four-level hierarchy.
The information was written on small slips. For reasons of economy of both space and expense, MacFarlane used reams of foolscap paper cut into blocks of rectangular slips. One ream of 500 sheets produced six thousand slips. There would be a heading, a short quotation or other material in the centre, and the source from which the information came at the bottom.
For example, a general topic or heading might be 'Feudalism and Contract'. Then the abstracted text or quotation or idea was written in the middle of the slip, for instance:
"The master who taught us that 'the movement of the progressive societies has hitherto been a movement from Status to Contract' was quick to add that feudal society was governed by the law of contract. There is no paradox here..." (referring to Henry Maine).
There would then be the reference to the source of the idea or quotation:
title: English Law, vol.1
The small size of the slips (2½"x3", or 60x80mm) demanded brevity. The quotation above, with about forty words, is about the average; almost always the text would be between ten and fifty words long. The maximum would be about 100 words.
- based on the 'one fact, one card' recording of data, which allows to store data according to a number of different criteria (chronological; according to location, or any other chosen one). Reshuffling is seen as an integral part of the analytical process.
However, there are issues with these formats and techniques. Alan MacFarlane writes on 'The reasons why paper slip indexing systems collapse':
- Size (takes more time to file, the same card can go under many different headings)
- Difficult to remember the entire structure in hierarchical classification
- Shifting classification along with ideas
- Paper archives aren't portable and cannot be accessed remotely
- These issues make it more difficult to find cards in the dataset
- Rigidity of hierarchical classification
- 'By definition, all 'slips' contain a relation between at least two things, and probably many more. Hence each slip needs to go under at least two headings. In fact, it should probably go into three or four places in the filing system.' (Ibid. p. 4). This again makes it difficult to retrieve information.
Reflecting on his own card index, MacFarlane thus concludes:
The result of all these un-analysed pressures was that after assembling about 40,000 or so slips, I began to lose heart. It was usually quicker to remember an author and go to the book - to rely on human memory. I might then use the more usual method of indexing a book at the back. As long as I could remember the author, I could find the material. If I did not have the book, I abstracted it and did the same. (Ibid. p. 5)
Some of the issues with paper/physical archives are solved by digital ones, especially those of size, portability, remote access, retrieval, rigidity. Indeed, these days most public paper archives (or at least their indexes) are digitized (as in some of the examples cited before). While the larger the set of records existing in a database, the longer it takes the computer to index/store a new item, computers' ever-growing processing power easily keeps pace with, indeed exceeds, the needs of most social scientists - again from MacFarlane, whose 'Topics' databse has been converted into a web database. It contains 59326 records, based on MacFarlane's library (which is also indexed online).
Other examples of digital anthropological archives in the public domain:
- Pat Caplan's Concepts of Healthy Eating Food Research: Phases I and II, 1992-1996 (The Nation's Diet) [Instructors should show a few datasets and databases from the web.]
- P. Kaberry's Study of the Abelam of Papua New Guinea and the Nso of Cameroon, 1939-1963
- Porter, G. and K. Hampshire Children, Transport and Mobility in Sub-Saharan Africa, 2007-2008
- LSE anthropology photographs
- Digital Himalaya: a project to develop digital collection, storage and distribution strategies for multimedia anthropological information from the Himalayan region
- World Oral Literature project: a global initiative to document and make accessible endangered oral literatures before they disappear without record
The first three items are from the Economic and Social Data Service, a national data archiving and dissemination service which came into operation in January 2003. The service is a jointly-funded initiative sponsored by the ESRC and JISC, and hosts, among others, the data produced by ESRC-funded research projects. Using the keywords 'social anthropology,' 'cultural anthropology,' 'ethnography,' their catalogue returns 64 records. The last two are hosted by DSpace@Cambridge, the institutional repository of the University of Cambridge, currently hosting the scholarly works of 17 authors and the PhD theses of 7 students in social anthropology; 83 occasional papers in sociology and anthropology; and several anthropology lectures . [Should the institution at which the workshop is led have an equivalent repository or similar service provisions, these should be flagged. For a comprehensive list of research data repositories, see the DataCite website.]
We will come back to the issue of sharing data later in this module.
[Participants are asked to briefly introduce their project and reflect on possible ways in which their data might be documented. Issues to bear in mind and discuss include data types and formats, different modes of documentation, software which might be helpful in cataloguing and retrieving data of different kinds, categories to be employed. The exercise can be done collectively or in groups, depending on size, and should last minimum 20 minutes. More detailed discussions and presentations to follow.]
Some general points:
- Techniques for documenting data can be:
- format-specific (depending on whether data is digital or not, you might be able to link and back-link, search, have hierarchical or horizontal classification)
- generic: things like colour-coding, categories, dates, place names...
- Reports to supervisors and/or funding bodies can act as data documentation or help in the process of documenting data.
- Keeping as much of your data and documentation in the same place helps too - so you might want to think about creating a sort of personal database.
- Specific software may prove useful - some examples given below, also in the handout.
A quick run-through a PhD student's data archive and its evolutions:
[can be replaced with instructor's own experience]
I started with a single word document: a journal divided into tabs, one for each month, with daily entries (and lots of gaps) whose headings were an attempt to index (mostly through personal, organisations' or place names). In the journal, I also listed any notes I might have taken in ink and paper, referring to the specific notebook and the themes/events; and any interviews, mentioning file name and location. I kept a paper folder with some fliers and other paper material, which I (sort of) indexed, not very systematically. I also tried to index ink and paper notes on-the-go, though not very systematically (by having a rough list of entries/themes on the first page of each notebook).
I periodically revisited the journal entries - approximately every three/four months, i.e. each time I wrote a report - this helped refresh my memory and focalise issues. Reports also helped to summarise the research activities and the emerging themes.
Once back, I wrote some general reports for funding bodies which again helped summarise ideas and activities. I went through a few of the journal entries (especially the final few months) and began thinking about how to index not only fieldnotes but also bibliographic references and notes on literature. This is when I started experimenting with wikis as non-linear writing tools, and learnt more thoroughly about reference managers.
As a writing and storage/indexing tool, I finally landed on Scrivener, where I exported all the digital fieldnotes and literature. Whilst Scrivener gave me the possibility to tag, link and backlink, and to browse by category, I soon stopped worrying about those and relied on memory and search tools (in Scrivener and in OSX more generally) to retrieve information and descriptions whenever needed. The hierarchical classification of literature and notes also helped me navigate all this data.
[Run through PhD project in Scrivener - this can be replaced by instructor's own archive run-through]
As for reference managers, I started with EndNote and ended with Bookends, although I didn't update my library systematically until the very end, when I compiled the final draft. Both software tools connect with Scrivener. But I would not recommend Bookends or EndNote as they are not as reliable as it looks (though they do the job) - Mendeley seems like a better option. Zotero is also a good one.
[Run through one or more reference management software, done by instructor or participants]
At the moment, I think having everything on Scrivener helps me think about future projects, especially how to turn my PhD into a book. As time goes on, however, my memory is fading somewhat and it might be more difficult to retrieve information (especially from handwritten notes).
Some general points on cataloguing:
- Cataloguing is time-consuming, so evaluate the costs and benefits (as mentioned previously, many academics rely on assistants to do it if they can afford it).
- Creating metadata aids the analytical process - but you will need to keep the process going, to go back to the archive after you decide your collection phase is over.
- Categories are generally project- and focus-specific. Whilst there are systematic ontologies for the structuring of data, at most levels of research (at least in a subject like anthropology) it is easiest to devise whatever suits the researcher and the project rather than sticking to a predefined formalised scheme. An example is here, or take any of the online datasets cited in section ii. above. For the Malinowski collection: 41 main headings, e.g. 'Trobriand Islands fieldwork photographs', then split into 35 sub-headings, e.g. 'sex', which contains 16 items (mainly pictures of women...), each with a description (e.g. 'Physical types [group of children]' and an individual reference number (e.g. MALINOWSKI/3/20/4).
- More importantly even, categories tend to be volatile and processual! As ideas and perspectives change... so for your own purposes it may be best to stick to generic descriptors (e.g. places and personal names, more easily remembered; dates; ...).
- If you decide to catalogue your data, you will need to document and update your index: make a list of the categories you use, and systematically update as you add new ones.
- Spotlight or other search tools may be enough in some cases (long- vs. short-term perspective)
- Generally, specific ontologies are developed by data repositories and other archives. If you decide to deposit your dara in a repository, they will catalogue it according to a specific system.
[Participants are asked to brainstorm about how they might develop an index/ontology.]
Whilst data documentation is useful for personal access to data, it is even more relevant if data is to be shared or made public - so that other users can understand and re-use it correctly.
You might want to share fieldwork data with:
- your supervisor;
- other academics;
- research participants;
- wider audiences.
Of course, there are issues of privacy and intellectual property, which we will deal with in more detail later. Depending on who you are sharing with, different techniques may apply and different issues might arise.
As a general rule, if you know you are going to share large numbers of files, and especially if more than one person is likely to work on them, it is good to agree on certain rules and conventions as to the modality of sharing and the naming of files, and to find the most appropriate, effective, and supple way of sharing.
Tools useful for sharing and working collaboratively:
- Institutional networked storage (e.g. PWF): public workstation facility, giving users (limited) storage space, can be accessed remotely by multiple users. [PWF is institution-specific - replace with appropriate, specific content, where available, and give practical examples/tours]
- Virtual learning or research environments (e.g. CamTools): online teaching and resource storage facility, managed by an administrator, with various levels of permission for different users. Has announcement, wiki and chat functions, among others. [CamTools is institution-specific - replace with appropriate content, where available, and give practical examples/tours.]
- Dropbox: up to two gigabytes of online file space free (and up to 300 gigabytes for a fee), increased periodically; ability to share folders with other users; ability to synchronise versions of your files between different devices (e.g. your laptop, your desktop, the online space); automatic backup.
- GoogleDocs: especially helpful to edit documents collaboratively, to avoid using email attachments.
- Academic web networks: see especially the Open Anthropology Cooperative (OAC) and Academia.edu.
- Blogs: also good for dissemination. Some famous anthropology blogs include Savage Minds and Anthropologists for Justice and Peace .
- Wikis (good for dissemination too).
- Digital repositories: they maintain your digital files and ensure that they remain usable over time. Repositories also provide online access to papers and data for the research community. They can serve as a method of publishing files and data, making them more easily citable as well as accessible. Some communities are organised by subject (e.g. archaeological data, historical analyses, chemical data, etc.) while others are organised by institution (e.g. materials from members of a university, usually focused on publications and theses rather than data). Repositories also provide support for documenting and annotating ('metadata') and many provide additional services such as advice and assistance with data management, formats, security, and intellectual property rights concerns. Funding bodies' policies increasingly require the depositing of publications (and sometimes of data itself) in repositories such as the ESDS. Hence, if you think you are going to deposit all or part of your data, you might have to negotiate its sharing with your research subjects (seeking consent post hoc is a real nightmare usually).
Higher Education institutions might provide digital repositories as part of their services. However, few institutional repositories actually host research data.
The University of Cambridge's institutional repository is DSpace@Cambridge. It yields 1009 hits for a search with keyword 'anthropology' (some of them 'dark' items, which are not publically visible). You can deposit your thesis on DSpace, and the data supporting it (usually upon completion of your PhD). [Replace this with institution-specific information.]
However, while print theses deposited with the University Library are legally seen as unpublished manuscripts, when we put them on the web they are in fact legally published. This might pose problems in terms of negotiating commercial publishing, or the publication of sensitive/copyrighted data. (N.B. It may be possible to restrict access to theses for a period of time - as is the case with DSpace.)
Other examples of digital repositories: [Show as many repositories as you think fit.]
- ESDS (see above)
- UK Data Archive
- EThOS, an online database of theses, which provides a searchable gateway to new non-embargoed e-theses from contributing universities, a list of which is available on their website. They also scan existing paper thesis manuscripts on request.
Posting things on CDs/DVDs might be a good idea for infrequent sharing of large amounts of data. Beware of security issues, which can be sidestepped by encryption (more later); and of decay/damage.
Whilst not ideal (attachments clog inboxes and may cause problems when downloading them), email is convenient. If you do use it, make sure one person is responsible for keeping track of the latest version of a file; beware of security issues; delete messages from your mailbox.
The dissemination of data comes with thorny ethical issues, and potentially relates to legal ones as well. A lot of the data produced through fieldwork is in one way or the other 'sensitive,' if only because it contains personal information concerning research subjects and the researcher her/himself. Therefore, when creating and storing data one needs to evaluate possible risks and consider taking appropriate measures. Ask yourself who might have access to your data (on your computer, USB stick, online cloud, notebooks...).
- Online storage: clouds and emails are handy, but potentially unsafe. Security for established online backup and sharing services is decent, but not guaranteed, and some of the intellectual property rights agreements for the sites are a bit vague; you should encrypt your files if they contain particularly sensitive data (more later). There is always the possibility that your online service will go out of business, leaving you without your important files. Finally, beware of the Data Protection Act (see below).
- We also mentioned the issue of sharing and consent: if you plan to share data, you should inform research participants, ideally as early as possible, and get their permission.
- Crossing borders: under some legislation such as that of the US (of the 'sovereign exception' kind) electronic equipment can be randomly seized and searched at international borders, in view of 'terrorist threats' (for more see wikipedia on border search exceptions). More widely applicable, international agreements that purportedly seek to combat 'counterfeiting' are being negotiated, which would have similar consequences (for more see the wikipedia entry on border searches). Depending on which borders you cross, you might need to take this into consideration.
Different ethical issues push for sharing or withdrawing data. Here is what the Association of Social Anthropologists of the UK and Commonwealth (ASA) ethical guidelines say on the issue:
(3) Sharing research materials: Anthropologists should give consideration to ways in which research data and findings
can be shared with colleagues and with research participants:
(a) Research findings, publications and, where feasible, data should be made available in the country where the research took place. If necessary, it should be translated into the national or local language. Researchers should be alert, though, to the harm to research participants, collaborators and local colleagues that might arise from total or even partial disclosure of raw or processed data or from revelations of their involvement in the research project;
(b) Where the sharing with colleagues of raw, or even processed, data or their (voluntary or obligatory) deposition in data archives or libraries is envisaged, care should be taken not to breach privacy and guarantees of confidentiality and anonymity, and appropriate safeguards should be devised.
(4) Collaborative and team research: In some cases anthropologists will need to collaborate with researchers in other disciplines, as well as with research and field assistants, clerical staff, students etc. In such cases they should make clear their own ethical and professional obligations and similarly take account of the ethical principles of their collaborators. Care should be taken to clarify roles, rights and obligations of team members in relation to matters such as the division of labour, responsibilities, access to and rights in data and fieldnotes, publication, co-authorship, professional liability, etc.
So, to share or not to share? Ultimately, this is a context-specific choice which might have to be re-negotiated in time.
- For important documents and to enter your hard-drives, restrict access by setting up passwords and access permissions (e.g. no access, read only, read and write, administrator-only permission).
- Make sure you log out of websites (e.g. social networks, online banking, email...) and of your hard-drives if you are not using them.
- Use firewalls and anti-virus software (especially for Windows)
- Always keep multiple copies, in different locations and formats
- Destroy data when needed - both hard- and e-copies (Note: data deleted from a hard-drive might still be retrievable, as it is from emails etc. - which may be good or bad news.)
- Encryption: this is a more secure way of both storing and transferring data. Software is available to perform this, both proprietary (e.g. PGP) and open source (e.g. GnuPG). In transmission, a key and passphrase are used to digitally sign each encrypted file and thus allow the recipient to validate the sender's identity.
- Tiered consent: A good way to navigate the ethical complexities involved in ethnographic fieldwork is to negotiate a 'tiered' type of consent to participating in research, where possible. This involves asking subjects to consent to:
- participating in the study;
- having notes taken of their speech;
- having speech recorded;
- having their names published;
- having information concerning them anonymously published;
- having information concerning them published with their names.
- Anonymisation: another solution to ethical issues, especially where consent may be difficult to negotiate (as when 'data' is used from experience, from serendipitous/chance encounters, which were not formally envisaged as fieldwork in advance, for example - encounters on the bus, in bars, shops... you name it) or when participants specifically ask for it.
A few tips:
- Replace names with pseudonyms.
- Remove or alter other direct or indirect identifiers (e.g. date and place of birth and residence; age; occupation). You can often convey the sense of someone's life story, or of a situation, even if some details are altered (e.g. someone selling okra in the market might be presented as selling cooked food; a migrant who moved to Germany might be said to be working in the Netherlands; age can be kept vague, e.g. a woman in her 20s). It might be difficult to conceal some locations if the events taking place or the people involved are unique in some way.
- Be consistent in the use of pseudonyms, and keep track of them (in a place other than the data file itself).
- When possible, negotiate in advance with subjects what needs to be concealed/altered (such that perhaps names or other details might be omitted in interviews themselves, for example). However, you might want to consider long-term re-use of data, for example, some contextual information might be useful in the future and best collected and kept, if in restricted access.
- If no other strategy works, consider restricting access to the data/thesis.
You may not need as many alterations as you think!
For further advice, see the UKDA advice on anonymisation.
The Data Protection Act 1998 was formulated in response to concerns about the amount of personal information - and the accuracy of such information - being stored, processed and passed on by organisations. It provides the legal basis for how organisations handle information relating to living people (personal data).
A distinction is made between 'Personal Data' and 'Sensitive Personal Data' of living individuals:
- Personal Data relates to living individuals which identifies them: name, age, sex, address, etc.
- Sensitive Personal Data is data that may incriminate a person: race, ethnic origin, political opinion, religious beliefs, physical/mental health, sexual orientation, criminal proceedings or convictions.
Once a person is dead, the DPA does not affect their personal data.
A different level of security applies when dealing with Sensitive Personal Data: this could potentially apply to the contents of email and research data in some fields; students and researchers in the medical, social sciences and allied subjects are particularly urged to be aware of this requirement.
The Act establishes that:
- Data may only be used for the specific purposes for which it was collected.
- Data must not be disclosed to other parties without the consent of the individual whom it is about, unless there is legislation or other overriding legitimate reason to share the information (for example, the prevention or detection of crime). It is an offence for Other Parties to obtain this personal data without authorisation. Consent forms should be made and filled and signed participants for these types of personal data to be included in project archives.
- Individuals have a right of access to the information held about them, subject to certain exceptions (for example, information held for the prevention or detection of crime).
- Personal information may be kept for no longer than is necessary and must be kept up-to-date.
- Personal information may not be sent outside the European Economic Area (the 15 EU member states together with Norway, Iceland and Liechtenstein) unless the individual whom it is about has consented or adequate protection is in place, for example by the use of a prescribed form of contract to govern the transmission of the data, or if that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to processing of personal data.
- Subject to some exceptions for organisations that only do very simple processing, and for domestic use, all entities that process personal information must register with the Information Commissioner's Office.
- The departments of a company that are holding personal information are required to have adequate security measures in place. Those include technical measures (such as firewalls) and organisational measures (such as staff training).
- Subjects have the right to have factually incorrect information corrected (note: this does not extend to matters of opinion)
Anonymised or aggregated data is not regulated by the Act, providing the anonymisation or aggregation has not been done in a reversible way. The Act applies only to data which is held, or intended to be held, on computers ('equipment operating automatically in response to instructions given for that purpose'), or held in a 'relevant filing system'. The DPA may apply to the contents of an electronic address book, or email messages that have been backed up to a 'cloud' solution.
For further guidance, see the Information Commissioner's Office guide. Institutions should also be able to offer support in relation to this and other legal requirements.
The Freedom of Information Act 2000 was established to increase transparency in the public sector. It gives people the right to request access to recorded information held by public sector organisations or be informed whether information is held. Research data can be requested under the Freedom of Information Act, but copyright to such data stays with the original researcher.
Exceptions exist to the Act, such as:
- personal data cannot be requested
- information that is accessible by other means e.g. via a website
- information intended for future publication
- information that is subject to a confidentiality agreement, such as in a signed consent form or sensitive data held under restricted access by a data archive
Any person can request any data held by public authorities - including universities. The data does not have to have been produced by the university: it is the fact they hold the data that is important. A request must specify what data are sought.
This is a potential issue for collaborative projects where multiple copies of data are held in different institutions and countries.
"Intellectual property rights... are rights granted to creators and owners of works that are the result of human intellectual creativity".
From the JISC-legal website
Different forms of intellectual property are regulated by law:
- Copyright: this refers to creative works fixed in material form. Under the Copyright, Designs and Patents Act, 1988 copyright applies to (among other things):
- original literary, dramatic, musical or artistic works
- sound recordings, films, broadcasts or cable programmes
- the typographical arrangement (layout) of publications
- teaching materials and blogs
Most research outputs such as spreadsheets, publications, reports and computer programs fall under literary work and are therefore protected by copyright. Facts, however, cannot be copyrighted. Data are not covered by copyright, but the arrangement of data in a spreadsheet or database is. More generally, copyright protects the expression of an idea, not the idea itself. Copyright does not require registration. In the act of creating a piece of work, writing something down, or recording an interview or song, the creator(s) of the work by default hold the right to copy the work in the future.
The ownership of copyright is not the same for all creators of work, it depends on their academic status (e.g. students or lecturers) and employment position. Students are not employees so they enjoy copyright in their own work, and some universities allow their academics and researchers the rights to their works. Of course things can become messier when students are employed on projects and there are external funders or partners involved in projects. Different institutions have different copyright clauses in their employment contracts.
A copyright owner has the right to control the copying, adaptation, publishing, performance and broadcast of the work, and under what conditions this may be done. These conditions may involve payment of a royalty or licence fee. The owner may also give or sell some or all of the rights to others. In addition the author of a copyright work has certain "moral rights" that always remain with the author. These are the right to be identified as the author of the work, the right to object to derogatory treatment of their work and the right to object to false attribution of a work. However, these rights do not exist where copyright in a work has been originally owned by the author's employer. The onus of responsibility lies with the user of a work to get permission, even if the rights holder is unknown or cannot be traced.
The right to use copyright material is typically obtained:
- by seeking and obtaining permission directly from the copyright owner.
- by means of an assignment (assignation in Scotland) of copyright in writing from the copyright owner.
A licence gives someone permission to do the acts which the copyright owner is entitled to authorise or prohibit without infringing copyright. This is how a great deal of material is lawfully used in the education context. In addition there are certain very specific situations where it may be permissible to make use of someone else's copyright protected works without seeking permission from the owner. For example, it is not necessary to get permission in order to use an insubstantial part of a copyright protected work.
Even if material is available on the Internet, permission will still be required in order to reuse the material (such as copying it, adapting it or dissemination of it by a different means or in different formats). Some websites may give information about the permissions (licence) which is granted to users, which will clarify what can and cannot be done with material.
There are a number of exceptions in copyright law which allow limited use of copyright works without the permission of the copyright owner. In the education context relevant exceptions include:
- fair dealing for non-commercial research and private study, criticism and review
- non-exact copies of works for teaching purposes in educational establishments (such as copying material by hand)
Use of a copyright protected work without its owner's permission may be a civil infringement and/or a criminal offence depending on the circumstances. Copyright is infringed if a person does (or authorises another to do) any of the exclusive acts restricted by copyright without the permission of the owner, in relation to the whole or a substantial part of a copyright work. What amounts to a substantial part is not defined in law but it is quite likely that even a small portion of the whole work will still be a substantial part.
Copyright is essentially a private right so it is generally for the rights holder to decide what to do when his or her copyright is infringed. The infringer could be taken to court and can run the risk of having to pay compensation to the copyright owner. They could also face:
- having an injunction (interdict in Scotland) taken out against them to stop use of the material
- being ordered to surrender the copyright material to the copyright owner
- an order requiring that infringing goods be destroyed or delivered up to the copyright owner, and that any resulting profits from the infringement are paid to the copyright owner.
Where deliberate infringement of copyright is undertaken as part of a trade or business, it may be a criminal offence, punishable by an unlimited fine and up to ten years' imprisonment.
The duration of copyright may depend on whether a work is published or unpublished, whether the creator is known or unknown, and whether transitional arrangements from previous copyright legislation apply. However, in general terms, following the end of the year of the death of the creator(s), copyright lasts:
- 70 years for literary, dramatic, artistic and musical works, films and video recordings
- 50 years for sound recordings and broadcasts
- 25 years for the typographical arrangement of published editions.
Other forms of intellectual property include:
- Designs: Appearance and shape of product
- Patents: Inventions - things that make things work
- Trade marks: Signs that distinguish goods and services
- Moral rights: Right to be attributed for your work or right to object to derogatory treatment of your work.
Intellectual Property Rights can be bought, sold, rented, gifted and bequeathed. Different countries have different copyright law.
[In light of what has been discussed, participants are asked to go back to issues of sharing and publishing in ethical, political and analytical terms - cf. also the reading list, especially DeNicola 2011a,c.]