All research produces data regardless of academic discipline through collection, observation or creation. Analysis of data produces and validates your original research results. Data can take the form of archival material, audio files, code, documents, experiment results, field notes, images, interview transcripts, lab books, materials, physical samples, protocols, social media outputs, statistics surveys, videos and more.
Watch this video for a quick overview of RDM.
Anyone carrying out research should create a DMP before they start their project.
The plan should
Think about best practices for dealing with data at every stage of the lifecycle to develop a robust strategy for data management.
There is a whole section on writing a DMP on the Cambridge's Data Management site but to get you thinking about the structure it might take, have a look at this checklist.
[The Research Cycle image adapted by University of Denver from the Research 360 Project]
Personal data is defined by the 1998 Data Protection Act as anything which can ‘reveal the identity of an actual living person’.
You therefore need to consider whether the data you collect falls into this category and how you will deal with it. The easiest thing is, of course, not to collect it in the first place. But if you do, ensure you get informed and written consent, anonymise it, remove identifiers, aggregate results where there might be low frequency counts, remove outliers and use managed access repositories to store the data.
This protects all EU citizens, wherever they are in the world. The definition of personal data has been expanded to include mental, genetic, cultural, economic, or social information. Your consent forms will also need to explain exactly what you are collecting, why and how long data will be kept for. Importantly you must inform the Data Protection Officer in your college or department of a data breach within 72 hours of it taking place. You therefore also need to monitor your data to be able to report any such breach. Participants in your research can request the 'right to be forgotten' and removed from your data. There are, however, exceptions to some elements for the purposes of research.
Once you create data and files, they can quickly become disorganised. You need to establish a system that allows you to access them quickly and avoid duplication. A good place to start is to develop a logical folder structure.
It is important to name your files so that they are:
For physical samples, you could create maps of your storage system, reference samples in notebooks, and add notes to the samples themselves.
With digital files, think about using the following pattern: prefix (for document type e.g. report, notes, essay)_document title_version_dateyyyymmdd
Keep folders structured similarly, using meaningful titles plus dates where practicable to divide up work. Nest folders to keep each level to of storage to a minimum. Having 50 folders on your desktop is just as confusing as having 50 documents.
Where possible include a further file containing metadata (data about data) in the same folder as your data. This will allow you to add context to your data so that you and others can understand it in the short, medium, and long-term.
Consider what would happen if your data disappeared because you lose a USB stick, spill coffee into your laptop, have it stolen, files were accidentally deleted, or there was an incident at your department. This happens all the time to individuals and large organisations.
Options for back up include
Your strategy should be guided by considering: what you are willing to lose, what is crucial to your research and how often does it change? The more it changes, the more often you need to back it up. Undertake to back it up on a weekly basis (as a minimum) if you are working on it regularly.
ALWAYS have data backed up in two different locations; there is no point saving data to a USB if you store it with your laptop; they are equally vulnerable.
Publications without supporting data and code are just claims. By sharing your data you will potentially benefit from increased citations while helping move knowledge forward, enabling others to reuse your data and mine it for new findings. In addition, it will ensure the integrity of your data through processes of verification and, as with open access publications, many funders now mandate that your data should be publicly available.
You should aim to store it for at least ten years in a suitable repository and link it to your publication(s). Data can be uploaded directly through Symplectic.
When preserving data ensure that it is:
Findable – includes metadata and a unique identifier
Accessible – protocols are open, free, and universally implementable
Interoperable – uses open source file formats
Re-usable – clarity on usage licences
You may need to share your data with colleagues, peers or supervisors. However, you should consider the security of different tools:
- Email: remember that attachments can easily be forwarded
- Collaborative tools
- (Secure) File Transfer Protocol - set up by your IT support
"When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide licence to use, host, store, reproduce, modify, create derivative works[...], communicate, publish, publicly perform, publicly display and distribute such content."
- Cambridge UIS, Research and data storage facilities - lists services offered by the University including mass storage options.
- Digital Curation Centre - an internationally-recognised centre of expertise with a focus on building capability and skills for research data management.
- DMPOnline - helps you to create, review and share data management plans that meet institutional and funder requirements. Some mandate the use of the service and templates are available for download.
- MANTRA - a free online course for those who manage digital data as part of their research project.
- Research Data Management, University of Cambridge - guides researchers through various aspects of research data planning and management and will provide assistance in meeting funders’ expectations.
- UK Data Service - funded by the ESRC, it provides access to datasets as well as guidance in the collection, use, preservation and sharing of data.