Skip to main content

Study Skills

Wolfson College Academic Skills: Research data management

Help with finding, managing and using information from the Wolfson Library Team.

What are data?

All research produces data regardless of academic discipline through collection, observation or creation. Analysis of data produces and validates your original research results. Data can take the form of archival material, audio files, code, documents, experiment results, field notes, images, interview transcripts, lab books, materials, physical samples, protocols, social media outputs, statistics surveys, videos and more.

Research Data Management (RDM) is the organisation, storage, sharing and archiving of all data produced during the research process and beyond. It is important to manage data because:
This page introduces some of the key themes and issues but for detailed information, please visit the University's Research Data Management Guide.

Credits include: Office of Scholarly Communication and the Research Data Management Teams.

Research in 3 minutes: Data management

Watch this video for a quick overview of RDM.

Data Management Plans (DMPs)

Anyone carrying out research should create a DMP before they start their project. 

The plan should  

  • Outline data to be collected during a project
  • Detail how it will be managed
  • Cover the lifetime of the project and beyond 
  • Include specifics on data type, storage, backup, metadata, sensitive information and sharing data.

Think about best practices for dealing with data at every stage of the lifecycle to develop a robust strategy for data management.

There is a whole section on writing a DMP on the Cambridge's Data Management site but to get you thinking about the structure it might take, have a look at this checklist.

[The Research Cycle image adapted by University of Denver from the Research 360 Project]

Managing sensitive data

Personal data is defined by the 1998 Data Protection Act as anything which can ‘reveal the identity of an actual living person’.

You therefore need to consider whether the data you collect falls into this category and how you will deal with it. The easiest thing is, of course, not to collect it in the first place. But if you do, ensure you get informed and written consent, anonymise it, remove identifiers, aggregate results where there might be low frequency counts, remove outliers and use managed access repositories to store the data.

30 minute online course provided by the University 

- Sample consent forms are available from the UK Data Service.

 

General Data Protection Regulation ‚Äč(GDPR)

This protects all EU citizens, wherever they are in the world. The definition of personal data has been expanded to include mental, genetic, cultural, economic, or social information. Your consent forms will also need to explain exactly what you are collecting, why and how long data will be kept for. Importantly you must inform the Data Protection Officer in your college or department of a data breach within 72 hours of it taking place. You therefore also need to monitor your data to be able to report any such breach. Participants in your research can request the 'right to be forgotten' and removed from your data. There are, however, exceptions to some elements for the purposes of research. 

Organising your data

Once you create data and files, they can quickly become disorganised. You need to establish a system that allows you to access them quickly and avoid duplication. A good place to start is to develop a logical folder structure. 

It is important to name your files so that they are: 

  • consistent
  • meaningful to you and others (if you work on a shared drive)
  • findable

For physical samples, you could create maps of your storage system, reference samples in notebooks, and add notes to the samples themselves.

With digital files, think about using the following pattern: prefix (for document type e.g. report, notes, essay)_document title_version_dateyyyymmdd

Keep folders structured similarly, using meaningful titles plus dates where practicable to divide up work. Nest folders to keep each level to of storage to a minimum. Having 50 folders on your desktop is just as confusing as having 50 documents.

Which of these two structures is better?

Metadata

Where possible include a further file containing metadata (data about data) in the same folder as your data. This will allow you to add context to your data so that you and others can understand it in the short, medium, and long-term.

  • Write it as a plain text file
  • Call it: README
  • Include general information - title, authors, date of collection
  • Provide an overview - short description of the data each file contains and date it was created
  • Be explicit about how the data can be shared; is there a licence or are restrictions placed on the data?
  • Describe methodological information - how was data collected, generated and processed?
  • Include data-specific information - variable list (including definitions) for tabular data, units of measurement, definitions for codes or symbols used to record missing data

Backing up your data

Consider what would happen if your data disappeared because you lose a USB stick, spill coffee into your laptop, have it stolen, files were accidentally deleted, or there was an incident at your department. This happens all the time to individuals and large organisations.

Read the news story relating to the loss of data in 2017 at the Cancer Research UK Manchester Institute at the Christie cancer hospital, which went up in flames leading to the loss of equipment, data and samples.

 

Options for back up include

  • Digitising any analogue data
  • Departmental back-up 
  • External drives 
  • Online back-up 
  • Automated services

Your strategy should be guided by considering: what you are willing to lose, what is crucial to your research and how often does it change? The more it changes, the more often you need to back it up. Undertake to back it up on a weekly basis (as a minimum) if you are working on it regularly.

ALWAYS have data backed up in two different locations; there is no point saving data to a USB if you store it with your laptop; they are equally vulnerable. 

Open data and sharing

Publications without supporting data and code are just claims. By sharing your data you will potentially benefit from increased citations while helping move knowledge forward, enabling others to reuse your data and mine it for new findings. In addition, it will ensure the integrity of your data through processes of verification and, as with open access publications, many funders now mandate that your data should be publicly available.

 

 

You should aim to store it for at least ten years in a suitable repository and link it to your publication(s). Data can be uploaded directly through Symplectic. 

When preserving data ensure that it is:

Findable – includes metadata and a unique identifier

Accessible – protocols are open, free, and universally implementable

Interoperable – uses open source file formats

Re-usable – clarity on usage licences

Collaborating

You may need to share your data with colleagues, peers or supervisors. However, you should consider the security of different tools:

- Email: remember that attachments can easily be forwarded

- Collaborative tools

  • Moodle - your department may be willing to set up a new course for you to share documents with peers in Cambridge
  • Open Science Framework - useful if you are working with collaborators outwith the university; researchers can easily plug in their resources stored on Dropbox, Googledrive, GitHub and other storage platforms.

- (Secure) File Transfer Protocol - set up by your IT support

- Cloud solutions: see UIS cloud matrix for options. Be cautious and check the small print; for example, Google's Terms of Service state that: 

"When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide licence to use, host, store, reproduce, modify, create derivative works[...], communicate, publish, publicly perform, publicly display and distribute such content."

Download our Academic Skills Research Data Management Guide

Further information

- Cambridge UIS, Research and data storage facilities - lists services offered by the University including mass storage options.

Digital Curation Centre - an internationally-recognised centre of expertise with a focus on building capability and skills for research data management.

DMPOnline - helps you to create, review and share data management plans that meet institutional and funder requirements. Some mandate the use of the service and templates are available for download. 

- MANTRA - a free online course for those who manage digital data as part of their research project.

Research Data Management, University of Cambridge - guides researchers through various aspects of research data planning and management and will provide assistance in meeting funders’ expectations.

UK Data Service - funded by the ESRC, it provides access to datasets as well as guidance in the collection, use, preservation and sharing of data.