All research produces data regardless of academic discipline through collection, observation or creation. Analysis of data produces and validates your original research results.
Credits include: Office of Scholarly Communication and the Research Data Management Teams.
Use the tabs below to find out more about how to manage your data at all stages of the research process.
This is a quick 3 minute overview of the topic. Below is a 50 minute presentation, delivered by librarians at the Moore Library. Please look at the links at the foot of the page for further resources.
Anyone carrying out research should create a DMP before they start their project.
The plan should
Think about best practices for dealing with data at every stage of the lifecycle to develop a robust strategy for data management.
There is a whole section on writing a DMP on the Cambridge's Data Management site but to get you thinking about the structure it might take, have a look at this checklist. The Digital Curation Centre produced an interactive online tool to help researchers create data management plans: DMPOnline.
[The Research Cycle image adapted by University of Denver from the Research 360 Project]
Personal data is defined by the 1998 Data Protection Act as anything which can ‘reveal the identity of an actual living person’.
You therefore need to consider whether the data you collect falls into this category and how you will deal with it. The easiest thing is, of course, not to collect it in the first place. But if you do, ensure you get informed and written consent, anonymise it, remove identifiers, aggregate results where there might be low frequency counts, remove outliers and use managed access repositories to store the data.
This protects all EU citizens, wherever they are in the world. The definition of personal data has been expanded to include mental, genetic, cultural, economic, or social information. Your consent forms will also need to explain exactly what you are collecting, why and how long data will be kept for. Importantly you must inform the Data Protection Officer in your college or department of a data breach within 72 hours of it taking place. You therefore also need to monitor your data to be able to report any such breach. Participants in your research can request the 'right to be forgotten' and removed from your data. There are, however, exceptions to some elements for the purposes of research.
Once you create data and files, they can quickly become disorganised. You need to establish a system that allows you to access them quickly and avoid duplication. A good place to start is to develop a logical folder structure.
It is important to name your files so that they are:
For physical samples, you could create maps of your storage system, reference samples in notebooks, and add notes to the samples themselves.
With digital files, think about using the following pattern: prefix (for document type e.g. report, notes, essay)_document title_version_dateyyyymmdd
Keep folders structured similarly, using meaningful titles plus dates where practicable to divide up work. Nest folders to keep each level to of storage to a minimum. Having 50 folders on your desktop is just as confusing as having 50 documents.
Consider what would happen if your data disappeared because you lose a USB stick, spill coffee into your laptop, have it stolen, files were accidentally deleted, or there was an incident at your department. This happens all the time to individuals and large organisations.
Options for back up include
Your strategy should be guided by considering: what you are willing to lose, what is crucial to your research and how often does it change? The more it changes, the more often you need to back it up. Undertake to back it up on a weekly basis (as a minimum) if you are working on it regularly.
ALWAYS have data backed up in two different locations; there is no point saving data to a USB if you store it with your laptop; they are equally vulnerable.
Watch this video for a fun reminder abotu how important it is to back up our data.
Publications without supporting data and code are just claims. By sharing your data you will potentially benefit from increased citations while helping move knowledge forward, enabling others to reuse your data and mine it for new findings. In addition, it will ensure the integrity of your data through processes of verification and, as with open access publications, many funders now mandate that your data should be publicly available.
You should aim to store it for at least ten years in a suitable repository and link it to your publication(s). Data can be uploaded directly through Symplectic.
When preserving data ensure that it is:
Findable – includes metadata and a unique identifier
Accessible – protocols are open, free, and universally implementable
Interoperable – uses open source file formats
Re-usable – clarity on usage licences
You may need to share your data with colleagues, peers or supervisors. However, you should consider the security of different tools:
- Email: remember that attachments can easily be forwarded
- Collaborative tools
- (Secure) File Transfer Protocol - set up by your IT support
- Cloud solutions: see UIS cloud matrix for options. Be cautious and check the small print; for example, Google's Terms of Service state that:
"When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide licence to use, host, store, reproduce, modify, create derivative works[...], communicate, publish, publicly perform, publicly display and distribute such content."
The University offers a self-taught online course, that lasts about 90 minutes.
In this course, you'll learn:
Unless otherwise stated, this work is licenced under a CC-BY-NC-SA 4.0 licence by Wolfson College Cambridge.