Skip to Main Content
Help

Study Skills

Wolfson College Academic Skills: Research Data Management

Help with finding, managing and using information from the Wolfson Library Team.

All research produces data regardless of academic discipline through collection, observation or creation. Analysis of data produces and validates your original research results.

Research Data Management (RDM) is the organisation, storage, sharing and archiving of all data produced during the research process and beyond. It is important to manage data because:
This page introduces some of the key themes and issues but for detailed information, please visit the University's Research Data Management Guide.

Credits include: Office of Scholarly Communication and the Research Data Management Teams.

Top tips

  • Everyone has data to manage. It can take the form of archival material, audio files, code, documents, experiment results, field notes, images, interview transcripts, lab books, materials, physical samples, protocols, social media outputs, statistics surveys, videos and more.
  • You need to create a Data Management Plan(DMP) before you start your research so that you can look after your data at stages of the research process.
  • Know your resposnibilities with regard to sensitive data and data protection and only collect it if you have to.
  • Make sure you organise your files so that you can quickly find things again and back up your data.
  • Think about making it open; your funder may require it.

Use the tabs below to find out more about how to manage your data at all stages of the research process.

How to

This is a quick 3 minute overview of the topic. Below is a 50 minute presentation, delivered by librarians at the Moore Library. Please look at the links at the foot of the page for further resources.

Anyone carrying out research should create a DMP before they start their project. 

The plan should  

  • Outline data to be collected during a project
  • Detail how it will be managed
  • Cover the lifetime of the project and beyond 
  • Include specifics on data type, storage, backup, metadata, sensitive information and sharing data.

Think about best practices for dealing with data at every stage of the lifecycle to develop a robust strategy for data management.

There is a whole section on writing a DMP on the Cambridge's Data Management site but to get you thinking about the structure it might take, have a look at this checklist. The Digital Curation Centre produced an interactive online tool to help researchers create data management plans: DMPOnline. 

[The Research Cycle image adapted by University of Denver from the Research 360 Project]

Personal data is defined by the 1998 Data Protection Act as anything which can ‘reveal the identity of an actual living person’.

You therefore need to consider whether the data you collect falls into this category and how you will deal with it. The easiest thing is, of course, not to collect it in the first place. But if you do, ensure you get informed and written consent, anonymise it, remove identifiers, aggregate results where there might be low frequency counts, remove outliers and use managed access repositories to store the data.

General Data Protection Regulation ​(GDPR)

This protects all EU citizens, wherever they are in the world. The definition of personal data has been expanded to include mental, genetic, cultural, economic, or social information. Your consent forms will also need to explain exactly what you are collecting, why and how long data will be kept for. Importantly you must inform the Data Protection Officer in your college or department of a data breach within 72 hours of it taking place. You therefore also need to monitor your data to be able to report any such breach. Participants in your research can request the 'right to be forgotten' and removed from your data. There are, however, exceptions to some elements for the purposes of research. 

Once you create data and files, they can quickly become disorganised. You need to establish a system that allows you to access them quickly and avoid duplication. A good place to start is to develop a logical folder structure. 

It is important to name your files so that they are: 

  • consistent
  • meaningful to you and others (if you work on a shared drive)
  • findable

For physical samples, you could create maps of your storage system, reference samples in notebooks, and add notes to the samples themselves.

With digital files, think about using the following pattern: prefix (for document type e.g. report, notes, essay)_document title_version_dateyyyymmdd

Keep folders structured similarly, using meaningful titles plus dates where practicable to divide up work. Nest folders to keep each level to of storage to a minimum. Having 50 folders on your desktop is just as confusing as having 50 documents.

Which of these two structures is better and why? Which most closely represents your file management system?

Consider what would happen if your data disappeared because you lose a USB stick, spill coffee into your laptop, have it stolen, files were accidentally deleted, or there was an incident at your department. This happens all the time to individuals and large organisations.

Read the news story relating to the loss of data in 2017 at the Cancer Research UK Manchester Institute at the Christie cancer hospital, which went up in flames leading to the loss of equipment, data and samples.

 

Options for back up include

  • Digitising any analogue data
  • Departmental back-up 
  • External drives 
  • Online back-up 
  • Automated services

Your strategy should be guided by considering: what you are willing to lose, what is crucial to your research and how often does it change? The more it changes, the more often you need to back it up. Undertake to back it up on a weekly basis (as a minimum) if you are working on it regularly.

ALWAYS have data backed up in two different locations; there is no point saving data to a USB if you store it with your laptop; they are equally vulnerable. 

Watch this video for a fun reminder abotu how important it is to back up our data.

Publications without supporting data and code are just claims. By sharing your data you will potentially benefit from increased citations while helping move knowledge forward, enabling others to reuse your data and mine it for new findings. In addition, it will ensure the integrity of your data through processes of verification and, as with open access publications, many funders now mandate that your data should be publicly available.

You should aim to store it for at least ten years in a suitable repository and link it to your publication(s). Data can be uploaded directly through Symplectic. 

When preserving data ensure that it is:

Findable – includes metadata and a unique identifier

Accessible – protocols are open, free, and universally implementable

Interoperable – uses open source file formats

Re-usable – clarity on usage licences

You may need to share your data with colleagues, peers or supervisors. However, you should consider the security of different tools:

- Email: remember that attachments can easily be forwarded

- Collaborative tools

  • Moodle - your department may be willing to set up a new course for you to share documents with peers in Cambridge
  • Open Science Framework - useful if you are working with collaborators outwith the university; researchers can easily plug in their resources stored on Dropbox, Googledrive, GitHub and other storage platforms.

- (Secure) File Transfer Protocol - set up by your IT support

- Cloud solutions: see UIS cloud matrix for options. Be cautious and check the small print; for example, Google's Terms of Service state that: 

"When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide licence to use, host, store, reproduce, modify, create derivative works[...], communicate, publish, publicly perform, publicly display and distribute such content."

Try it out

The University offers a self-taught online course, that lasts about 90 minutes.

In this course, you'll learn:

  • how to store and backup up data
  • how to organise data
  • what to do with protected data (personal or commercially sensitive)
  • why sharing data is important and how to do it
  • how to write Data Management Plans
  • Look at Nikola Vukovic's advice on creating a folder structure for research and download a template.
  • If your research is based mainly on primary or secondary readings, consider using Zotero to manage your data. More information is available on our Reference Management tab.
  • If you have a lot of images/photos to work with, explore Tropy. This open-source software helps you organise your photos in meanigful ways, contexually descibe them, and annotate them.

More information

  • Cambridge UIS, Research and data storage facilities - lists services offered by the University including mass storage options.
  • Digital Curation Centre - an internationally-recognised centre of expertise with a focus on building capability and skills for research data management.
  • DMPOnline - helps you to create, review and share data management plans that meet institutional and funder requirements. Some mandate the use of the service and templates are available for download. 
  • MANTRA - a free online course for those who manage digital data as part of their research project.
  • Research Data Management, University of Cambridge - guides researchers through various aspects of research data planning and management and will provide assistance in meeting funders’ expectations.
  • UK Data Service - funded by the ESRC, it provides access to datasets as well as guidance in the collection, use, preservation and sharing of data.

CC

Unless otherwise stated, this work is licenced under a CC-BY-NC-SA 4.0 licence by Wolfson College Cambridge.

CC licence logo Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

© Cambridge University Libraries | Accessibility | Privacy policy | Log into LibApps