Research Data Management
Welcome to this module, where we will cover all the main aspects of looking after your research data, including:
Data can take many forms: not only spreadsheets, but also images, interview recordings and transcripts, old texts, survey results, protocols... the list goes on. Watch this video by the Engineering Library for an overview of what you should be thinking about at the start of any research project.
Where did it all go wrong?
Lack of planning at the start of a project can cause problems (and much more work!) later on. Think of data management as a time investment to make sure that the data you collect is used effectively and remains usable over time.
Watch this video by the NYU Health Sciences Library as an example of poor data management and take some brief notes on any mistakes you spot. When you’re done, check the answers at the bottom of this page.
Keeping data safe
When you’re doing research, you have a responsibility to look after your data – your funders, colleagues, other researchers who will reuse the data, and even your future-self will appreciate the effort you put into managing data properly.
First of all, make sure you avoid disasters like these:
Image by www.presentationmagazine.com
Plan an effective and realistic back-up strategy. We recommend that you keep at least two backups at two separate locations.
Slides by Claire Sewell, Office of Scholarly Communication
You should consider where the primary storage and backups are held, and this will depend on how sensitive the data is. The University has prepared a slide that helps you classify your data and store it safely.
Cloud storage offers many benefits, especially the recovery options if data is lost. The University of Cambridge offers cloud storage solutions that are appropriate for data up to level 2 (including personal data, but not sensitive data). Be careful if using private cloud storage accounts, though, as they often have terms and conditions that may not be suitable for research data.
Once you are sure that your data is safe from accidental loss, you should be thinking about how to organise it. Are your computer files ‘an amorphous plethora of objects’? In this video by the University of Edinburgh Data Library, Professor Jeff Haywood talks about his experiences of organising data.
If you want to read more about organising your data, including folder structures and file naming, there is a detailed guide on the Cambridge data website.
If you are at the start of a project, spend some time now preparing an organisational structure for your data. Create all the folders you are likely to need and a few named placeholders for files you will create. If you would like some feedback on it, email me the link (if it’s in the cloud) or come to one of the virtual drop-ins.
If your research data is of a personal or sensitive nature, you must make sure you understand and respect the additional requirements associated with managing it. If possible, get in touch with your department’s ethics committee, or your industrial sponsor to check what they expect of you. Additional help can be sought from the Research Data team, the Research Integrity team, and the Information Compliance Office.
Click on the common questions below to see the answers (modified from the Research Data FAQs)
Personal data is data relating to a living individual, which allows the individual to be identified from the information itself or from the information plus any other information held by the 'data controller' (or from information available in the public domain). The University of Cambridge as a whole is the data controller. Sensitive data is personal data about: racial or ethnic origin, political opinions, religious beliefs, Trade Union membership, physical and mental health, sexual life, or criminal offences and court proceedings about these.
The The EU General Data Protection Regulation (GDPR), coupled with the UK Data Protection Act 2018 (DPA 2018) gives individuals certain rights and imposes obligations on those who record and use personal information to be open about how information is used and to follow eight data protection principles. Personal data must be: processed fairly, lawfully and transparently; obtained for specified, explicit and lawful purposes; adequate, relevant and not excessive; accurate and, where necessary, kept up-to-date; not kept for longer than necessary; processed in accordance with the subject's rights; kept secure; not transferred abroad without adequate protection
You should limit physical access to sensitive data or encrypt it (speak with your local IT/Computing Officer or the University Information Services Help Desk for help in doing this). To avoid accidentally compromising the data at some future date, you should always store information about the data's sensitivity and any available information on participants' consent or use agreements from your data provider with the data itself (i.e. put information about lawful and ethical data use in your data documentation or metadata description).
There can be a potential conflict between abiding by data protection legislation and ethical guidelines, whilst at the same time fulfilling funder's and individual's requirements to make research results available. Consult your ethics committee before deciding to share participants’ data. Your plans for research data processing, storage and sharing should be considered at the start of each project and reflected in both your data management plan and consent form. For example, you can inform your participants that anonymised data will be shared via the University of Cambridge data repository. There is good guidance on consent forms at the UK Data Archive (www.ukdataservice.ac.uk). The UK Data Archive also provides a sample consent form. Your Department’s Ethics Committee may also provide sample consent forms.
If you would like to learn more about personal and sensitive data and do some practical exercises on identifying these data types, the University of Cambridge offers short 30-mins long online courses on personal and sensitive data.
You should also consider whether your data is commercially sensitive: do you or a sponsor plan to profit from the research in the future? There should be a collaboration agreement in place from the start to clarify the terms of any commercial collaboration. The Research Operations Office can help with this. If you are working with both public funders and commercial partners, clarify early what data can be shared and what can’t, so you can make this clear to all parties.
Sharing data on repositories
So what does it mean in practice to share your data? All you have to do is upload your dataset and information about it on a repository, either a subject-specific one, an institutional one like Apollo, or a general one. The repository then lets people find and download the data. Find out more in the video below.
Resources I mentioned in the video:
Data Management Plans
Throughout this module we have seen how important it is to plan the way you will manage your data right at the start of a project. A Data Management Plan (DMP) is a document that captures that process.
It’s time to practice everything you have learnt in this module by evaluating a DMP. Read the fictional DMP below and make notes on ways it could be improved. Consider both the data management practices described, and the level of information provided in the document: are they appropriate and sufficient? When you’re done compare your thoughts with the comments from a reviewer in the Research Data team at Cambridge in the answers section at the bottom of this page.
Where did it all go wrong?
Having watched the video, did you identify the following poor data management practices?
Data management plans
Having examined the fictional DMP above, compare your notes on ways it could be improved with these reviewer’s comments.