Skip to main content

Study Skills

Research skills

Online course for Cambridge researchers about publishing, managing data, finding and disseminating research.

Research Data Management

Welcome to this module, where we will cover all the main aspects of looking after your research data, including:

  • how to store and backup up data
  • how to organise data
  • what to do with protected data (personal or commercially sensitive)
  • why sharing data is important and how to do it
  • writing Data Management Plans

Data can take many forms: not only spreadsheets, but also images, interview recordings and transcripts, old texts, survey results, protocols... the list goes on. Watch this video by the Engineering Library for an overview of what you should be thinking about at the start of any research project.  

 

Where did it all go wrong?

Lack of planning at the start of a project can cause problems (and much more work!) later on. Think of data management as a time investment to make sure that the data you collect is used effectively and remains usable over time.  

Watch this video by the NYU Health Sciences Library as an example of poor data management and take some brief notes on any mistakes you spot. When you’re done, check the answers at the bottom of this page.  

 

Keeping data safe

When you’re doing research, you have a responsibility to look after your data – your funders,  colleagues, other researchers who will reuse the data, and even your future-self will appreciate the effort you put into managing data properly. 

First of all, make sure you avoid disasters like these: 

Headlines of data disasters. King's college University fire. A PhD student lost her thesis after thieves took laptop and harddrive. Manchester cancer hospital fire may have destroyed vital research.

Image by www.presentationmagazine.com

Plan an effective and realistic back-up strategy. We recommend that you keep at least two backups at two separate locations.  

Slides by Claire Sewell, Office of Scholarly Communication

You should consider where the primary storage and backups are held, and this will depend on how sensitive the data is. The University has prepared a slide that helps you classify your data and store it safely

Cloud storage offers many benefits, especially the recovery options if data is lost. The University of Cambridge offers cloud storage solutions  that are appropriate for data up to level 2 (including personal data, but not sensitive data). Be careful if using private cloud storage accounts, though, as they often have terms and conditions that may not be suitable for research data.  

Organising data

Once you are sure that your data is safe from accidental loss, you should be thinking about how to organise it. Are your computer files ‘an amorphous plethora of objects’? In this video by the University of Edinburgh Data Library, Professor Jeff Haywood talks about his experiences of organising data.  

 

If you want to read more about organising your data, including folder structures and file naming, there is a detailed guide on the Cambridge data website.  

If you are at the start of a project, spend some time now preparing an organisational structure for your data. Create all the folders you are likely to need and a few named placeholders for files you will create. If you would like some feedback on it, email me the link (if it’s in the cloud) or come to one of the virtual drop-ins

Protected data 

If your research data is of a personal or sensitive nature, you must make sure you understand and respect the additional requirements associated with managing it. If possible, get in touch with your department’s ethics committee, or your industrial sponsor to check what they expect of you. Additional help can be sought from the Research Data team, the Research Integrity team, and the Information Compliance Office.  

Click on the common questions below to see the answers (modified from the Research Data FAQs)

Personal data is data relating to a living individual, which allows the individual to be identified from the information itself or from the information plus any other information held by the 'data controller' (or from information available in the public domain). The University of Cambridge as a whole is the data controller. Sensitive data is personal data about: racial or ethnic origin, political opinions, religious beliefs, Trade Union membership, physical and mental health, sexual life, or criminal offences and court proceedings about these.

The The EU General Data Protection Regulation (GDPR), coupled with the UK Data Protection Act 2018 (DPA 2018) gives individuals certain rights and imposes obligations on those who record and use personal information to be open about how information is used and to follow eight data protection principles. Personal data must be: processed fairly, lawfully and transparently; obtained for specified, explicit and lawful purposes; adequate, relevant and not excessive; accurate and, where necessary, kept up-to-date; not kept for longer than necessary; processed in accordance with the subject's rights; kept secure; not transferred abroad without adequate protection

You should limit physical access to sensitive data or encrypt it (speak with your local IT/Computing Officer or the University Information Services Help Desk for help in doing this). To avoid accidentally compromising the data at some future date, you should always store information about the data's sensitivity and any available information on participants' consent or use agreements from your data provider with the data itself (i.e. put information about lawful and ethical data use in your data documentation or metadata description).

There can be a potential conflict between abiding by data protection legislation and ethical guidelines, whilst at the same time fulfilling funder's and individual's requirements to make research results available. Consult your ethics committee before deciding to share participants’ data. Your plans for research data processing, storage and sharing should be considered at the start of each project and reflected in both your data management plan and consent form. For example, you can inform your participants that anonymised data will be shared via the University of Cambridge data repository. There is good guidance on consent forms at the UK Data Archive (www.ukdataservice.ac.uk). The UK Data Archive also provides a sample consent form. Your Department’s Ethics Committee may also provide sample consent forms.

If you would like to learn more about personal and sensitive data and do some practical exercises on identifying these data types, the University of Cambridge offers short 30-mins long online courses on personal and sensitive data

undefined

You should also consider whether your data is commercially sensitive: do you or a sponsor plan to profit from the research in the future? There should be a collaboration agreement in place from the start to clarify the terms of any commercial collaboration. The Research Operations Office can help with this. If you are working with both public funders and commercial partners, clarify early what data can be shared and what can’t, so you can make this clear to all parties.  

Sharing data on repositories 

So what does it mean in practice to share your data? All you have to do is upload your dataset and information about it on a repository, either a subject-specific one, an institutional one like Apollo, or a general one. The repository then lets people find and download the data. Find out more in the video below. 

 

Data Management Plans 

Throughout this module we have seen how important it is to plan the way you will manage your data right at the start of a project. A Data Management Plan (DMP) is a document that captures that process.  

 

The Digital Curation Centre has excellent resources to write a data management plan, including funder-specific templates  that show you exactly what you should include, and examples.  

It’s time to practice everything you have learnt in this module by evaluating a DMP. Read the fictional DMP below and make notes on ways it could be improved. Consider both the data management practices described, and the level of information provided in the document: are they appropriate and sufficient? When you’re done compare your thoughts with the comments from a reviewer in the Research Data team at Cambridge in the answers section at the bottom of this page.  

Answers 

Where did it all go wrong?  

Having watched the video, did you identify the following poor data management practices? 

  • The researcher did not consider that his data may be useful to others in different fields and therefore did not plan for sharing 
  • The researcher was not aware of his funder’s and the publisher’s policies on sharing data 
  • The researcher did not use a repository, but relied on people contacting him directly for it 
  • The researcher only kept one copy of the data and he forgot where he had stored it.  
  • The memory stick could have been lost or damaged in the house move, or in the post. 
  • The researcher used a data format that became obsolete, rather than an open format (such as csv, txt, etc) 
  • The researcher did not provide sufficient metadata to explain what the column headings meant. 
  • The researcher did not agree with his colleague who would be responsible for keeping the data and metadata.  

Data management plans 

Having examined the fictional DMP above, compare your notes on ways it could be improved with these reviewer’s comments. 

  • There is no indication of whether any existing data sets will be used. This is important to include, even if the answer is that no pre-existing datasets will be used. 
  • The list of what data types will be collected is helpful. 
  • Authors need to estimate the number of interviews and experiments to give an idea of sample size.  
  • At the end of the first section, it is not clear whether the interview data will contain any sensitive information? If so, then using the Cloud to store this data is not appropriate 
  • It is good that they state the MP3 file format; however, if using an iPhone to record the interviews then there needs to be a statement regarding: (1) how the data is to be kept secure; (2) how the data is to be transferred off the iPhone (iCloud is not appropriate if the data are sensitive); (3) when data will be deleted from the iPhone; (4) how will the data on the ‘machine’ (laptop, desktop?) be kept secure (i.e. the machine should be password protected and the data encrypted if it contains sensitive information). 
  • It is good that the frequency of backups is mentioned – this is important information to include.  
  • Backing up to the cloud will be inappropriate if the data are sensitive from a personal data or intellectual property perspective. 
  • It is good that authors have opted to use a document naming system, but they could provide more detail on how this will work.  
  • There are 3 different single backup strategies being employed here for different types of data – iCloud, departmental sever, OneDrive. It would be better to consolidate these so that all data from a single project are kept in the same place, and are backed up in two separate locations (one of which can be the Cloud if there is no sensitive data therein). Using the departmental server is a good choice as one backup option, a password protected (and encrypted, if necessary) external hard drive stored in a secure location is also okay for an additional back up for data that are sensitive. 
  • It is not clear how the quality of the transcription will be assured? Some funders (e.g. ESRC) require grant applicants to include a section on ‘quality assurance’ in their DMP. 
  • The statement about transcription being the sole responsibility of the transcriber is also rather vague, and could be elaborated here. Is access to the transcriptions being restricted due to sensitive information therein? If this is the case, then the transcriptions should not be stored in the Cloud. 
  • Excel is a proprietary format. It would be better to share the spreadsheet data via an open format such as .csv. 
  • It is good that the authors plan to include a suitable readme file.  
  • Information about obtaining informed participants’ consent for sharing must be addressed. Will participants be provided with information about the project’s data sharing intentions, and what options will they be given regarding consent to share their data? The wording on consent forms regarding data sharing at the start of a project is very closely linked to data sharing at the end of the project.   
  • It is not clear how the data will be destroyed, this should be specified. 
  • It is good that a Creative Common license has been considered, but more detail is needed. Specifically, which CC license will be used? Also, is limited reuse necessary? Given that the research involves companies, at no point in this DMP is any information provided regarding conflict of interest (e.g. is commercial sponsorship a part of the proposed research?), intellectual property rights (this may affect how the data can be shared), or if the data have the potential to contain commercially as well as personal sensitive information.