Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Study Skills

Research skills

Online course for Cambridge researchers about publishing, managing data, finding and disseminating research.

Research Data Management

Welcome to this module, where we will cover all the main aspects of looking after your research data, including:

  • how to store and backup up data
  • how to organise data
  • what to do with protected data (personal or commercially sensitive)
  • why sharing data is important and how to do it
  • writing Data Management Plans

Data can take many forms: not only spreadsheets, but also images, interview recordings and transcripts, old texts, survey results, protocols... the list goes on. Watch this video by the Engineering Library for an overview of what you should be thinking about at the start of any research project.  

 

Where did it all go wrong?

Lack of planning at the start of a project can cause problems (and much more work!) later on. Think of data management as a time investment to make sure that the data you collect is used effectively and remains usable over time.  

Watch this video by the NYU Health Sciences Library as an example of poor data management and take some brief notes on any mistakes you spot. When you’re done, check the answers at the bottom of this page.  

 

Keeping data safe

When you’re doing research, you have a responsibility to look after your data – your funders,  colleagues, other researchers who will reuse the data, and even your future-self will appreciate the effort you put into managing data properly. 

First of all, make sure you avoid disasters like these: 

Headlines of data disasters. King's college University fire. A PhD student lost her thesis after thieves took laptop and harddrive. Manchester cancer hospital fire may have destroyed vital research.

Image by www.presentationmagazine.com

Plan an effective and realistic back-up strategy. We recommend that you keep at least two backups at two separate locations.  

Slides by Claire Sewell, Office of Scholarly Communication

You should consider where the primary storage and backups are held, and this will depend on how sensitive the data is. The University has prepared a slide that helps you classify your data and store it safely

Cloud storage offers many benefits, especially the recovery options if data is lost. The University of Cambridge offers cloud storage solutions  that are appropriate for data up to level 2 (including personal data, but not sensitive data). Be careful if using private cloud storage accounts, though, as they often have terms and conditions that may not be suitable for research data.  

Organising data

Once you are sure that your data is safe from accidental loss, you should be thinking about how to organise it. Are your computer files ‘an amorphous plethora of objects’? In this video by the University of Edinburgh Data Library, Professor Jeff Haywood talks about his experiences of organising data.  

 

If you want to read more about organising your data, including folder structures and file naming, there is a detailed guide on the Cambridge data website.  

If you are at the start of a project, spend some time now preparing an organisational structure for your data. Create all the folders you are likely to need and a few named placeholders for files you will create. If you would like some feedback on it, email me the link (if it’s in the cloud) or come to one of the virtual drop-ins

Protected data 

If your research data is of a personal or sensitive nature, you must make sure you understand and respect the additional requirements associated with managing it. If possible, get in touch with your department’s ethics committee, or your industrial sponsor to check what they expect of you. Additional help can be sought from the Research Data team, the Research Integrity team, and the Information Compliance Office.  

Click on the common questions below to see the answers (modified from the Research Data FAQs)

Personal data is data relating to a living individual, which allows the individual to be identified from the information itself or from the information plus any other information held by the 'data controller' (or from information available in the public domain). The University of Cambridge as a whole is the data controller. Sensitive data is personal data about: racial or ethnic origin, political opinions, religious beliefs, Trade Union membership, physical and mental health, sexual life, or criminal offences and court proceedings about these.

The The EU General Data Protection Regulation (GDPR), coupled with the UK Data Protection Act 2018 (DPA 2018) gives individuals certain rights and imposes obligations on those who record and use personal information to be open about how information is used and to follow eight data protection principles. Personal data must be: processed fairly, lawfully and transparently; obtained for specified, explicit and lawful purposes; adequate, relevant and not excessive; accurate and, where necessary, kept up-to-date; not kept for longer than necessary; processed in accordance with the subject's rights; kept secure; not transferred abroad without adequate protection

You should limit physical access to sensitive data or encrypt it (speak with your local IT/Computing Officer or the University Information Services Help Desk for help in doing this). To avoid accidentally compromising the data at some future date, you should always store information about the data's sensitivity and any available information on participants' consent or use agreements from your data provider with the data itself (i.e. put information about lawful and ethical data use in your data documentation or metadata description).

There can be a potential conflict between abiding by data protection legislation and ethical guidelines, whilst at the same time fulfilling funder's and individual's requirements to make research results available. Consult your ethics committee before deciding to share participants’ data. Your plans for research data processing, storage and sharing should be considered at the start of each project and reflected in both your data management plan and consent form. For example, you can inform your participants that anonymised data will be shared via the University of Cambridge data repository. There is good guidance on consent forms at the UK Data Archive (www.ukdataservice.ac.uk). The UK Data Archive also provides a sample consent form. Your Department’s Ethics Committee may also provide sample consent forms.

If you would like to learn more about personal and sensitive data and do some practical exercises on identifying these data types, the University of Cambridge offers short 30-mins long online courses on personal and sensitive data

undefined

You should also consider whether your data is commercially sensitive: do you or a sponsor plan to profit from the research in the future? There should be a collaboration agreement in place from the start to clarify the terms of any commercial collaboration. The Research Operations Office can help with this. If you are working with both public funders and commercial partners, clarify early what data can be shared and what can’t, so you can make this clear to all parties.  

Sharing data on repositories 

So what does it mean in practice to share your data? All you have to do is upload your dataset and information about it on a repository, either a subject-specific one, an institutional one like Apollo, or a general one. The repository then lets people find and download the data. Find out more in the video below. 

 

Data Management Plans 

Throughout this module we have seen how important it is to plan the way you will manage your data right at the start of a project. A Data Management Plan (DMP) is a document that captures that process.  

 

The Digital Curation Centre has excellent resources to write a data management plan, including funder-specific templates  that show you exactly what you should include, and examples.  

 

Answers 

Where did it all go wrong?  

Having watched the video, did you identify the following poor data management practices? 

  • The researcher did not consider that his data may be useful to others in different fields and therefore did not plan for sharing 
  • The researcher was not aware of his funder’s and the publisher’s policies on sharing data 
  • The researcher did not use a repository, but relied on people contacting him directly for it 
  • The researcher only kept one copy of the data and he forgot where he had stored it.  
  • The memory stick could have been lost or damaged in the house move, or in the post. 
  • The researcher used a data format that became obsolete, rather than an open format (such as csv, txt, etc) 
  • The researcher did not provide sufficient metadata to explain what the column headings meant. 
  • The researcher did not agree with his colleague who would be responsible for keeping the data and metadata.