Where did it all go wrong?
Lack of planning at the start of a project can cause problems (and much more work!) later on. Think of data management as a time investment to make sure that the data you collect is used effectively and remains usable over time.
Watch this video by the NYU Health Sciences Library as an example of poor data management and take some brief notes on any mistakes you spot. When you’re done, compare your notes with our answers underneath.
Check your answers: what did this researcher do wrong?
Here are all the mistakes we spotted: -he did not consider how others may want to reuse his data -he did not share the data in a repository -he was not aware of his funder and publisher requirements -he did not have multiple backups -he did not keep the data in a safe place -data on a USB stick is easy to lose -he did not use a safe way to share data (the post could have been lost) -he did not save the data in a common format -he did not save instructions on how to open the data -he did not plan for long-term preservation -he did not give variables intuitive names -he did not save metadata on what the variable names mean -he relied on knowledge found only in the brain of one person, rather than writing metadata
Keeping your data safe and up to date
Ensuring your data are safe is crucial to any research project. A good storage and backup strategy will help prevent potential data loss. Explore this scenario to see if your choices align with good research practice. Click on the link below to begin
Note: scenario opens in new window. Please view the scenario in full-screen. Return to this window to continue with the module, or if you wish to restart the scenario
Data storage and backup - why bother?
Once you are sure that your data is safe from accidental loss, you should be thinking about how to organise it. Are your computer files ‘an amorphous plethora of objects’? In this video by the University of Edinburgh Data Library, Professor Jeff Haywood talks about his experiences of organising data.
If you want to read more about organising your data, including folder structures and file naming, there is a detailed guide on the Cambridge data website.
If you are at the start of a project, spend some time now preparing an organisational structure for your data. Create all the folders you are likely to need and a few named placeholders for files you will create. If you would like some feedback on it, email me the link (if it’s in the cloud) or come to one of the virtual drop-ins.
Let's imagine that a few years ago, you were involved in a project with colleagues from the Archaeology department. You want to use some of the data from that project in a grant application you are preparing. In particular, you need:
Can you find the information in this folder? https://drive.google.com/drive/folders/1AYheIh_t8jYQvhQPle8dJ4SMCxtz_ziX?usp=sharing
How did you find the process?
If you struggled, you'll be happy to hear that a colleague organised her files better!
Try finding it in this folder https://drive.google.com/drive/folders/1YQg2JfeP44tzGQzoszVwvleuXd2oCbsA?usp=sharing
What was better this time? How could the second system be improved even further?
Take a look a this video of Cambridge researchers talking about their experience of sharing data.
So what does it mean in practice to share your data? All you have to do is upload your dataset and information about it on a repository, either a subject-specific one, an institutional one like Apollo, or a general one. The repository then lets people find and download the data. Find out more in the video below.
Useful resources related to the video:
If your research data is of a personal or sensitive nature, you must make sure you understand and respect the additional requirements associated with managing it. If possible, get in touch with your department’s ethics committee, or your industrial sponsor to check what they expect of you. Additional help can be sought from the Research Data team, the Research Integrity team, and the Information Compliance Office.
Click on the common questions below to see the answers (modified from the Research Data FAQs)
What are personal and sensitive data?
Personal data is data relating to a living individual, which allows the individual to be identified from the information itself or from the information plus any other information held by the 'data controller' (or from information available in the public domain). The University of Cambridge as a whole is the data controller. Sensitive data is personal data about: racial or ethnic origin, political opinions, religious beliefs, Trade Union membership, physical and mental health, sexual life, or criminal offences and court proceedings about these.
What are the legal requirements for data protection?
The The EU General Data Protection Regulation (GDPR), coupled with the UK Data Protection Act 2018 (DPA 2018) gives individuals certain rights and imposes obligations on those who record and use personal information to be open about how information is used and to follow eight data protection principles. Personal data must be: processed fairly, lawfully and transparently; obtained for specified, explicit and lawful purposes; adequate, relevant and not excessive; accurate and, where necessary, kept up-to-date; not kept for longer than necessary; processed in accordance with the subject's rights; kept secure; not transferred abroad without adequate protection
How should I store my sensitive or confidential data?
You should limit physical access to sensitive data or encrypt it (speak with your local IT/Computing Officer or the University Information Services Help Desk for help in doing this). To avoid accidentally compromising the data at some future date, you should always store information about the data's sensitivity and any available information on participants' consent or use agreements from your data provider with the data itself (i.e. put information about lawful and ethical data use in your data documentation or metadata description).
Data supporting my research is personal or sensitive. How do I share these data?
There can be a potential conflict between abiding by data protection legislation and ethical guidelines, whilst at the same time fulfilling funder's and individual's requirements to make research results available. Consult your ethics committee before deciding to share participants’ data. Your plans for research data processing, storage and sharing should be considered at the start of each project and reflected in both your data management plan and consent form. For example, you can inform your participants that anonymised data will be shared via the University of Cambridge data repository. There is good guidance on consent forms at the UK Data Archive (www.ukdataservice.ac.uk). The UK Data Archive also provides a sample consent form. Your Department’s Ethics Committee may also provide sample consent forms.
If you would like to learn more about personal and sensitive data and do some practical exercises on identifying these data types, the University of Cambridge offers short 30-mins long online courses on personal and sensitive data.
You should also consider whether your data is commercially sensitive: do you or a sponsor plan to profit from the research in the future? There should be a collaboration agreement in place from the start to clarify the terms of any commercial collaboration. The Research Operations Office can help with this. If you are working with both public funders and commercial partners, clarify early what data can be shared and what can’t, so you can make this clear to all parties.
Data Management Plans
Throughout this module we have seen how important it is to plan the way you will manage your data right at the start of a project. A Data Management Plan (DMP) is a document that captures that process.
To end this module and pull together everything you have learnt, we recommend you write your own DMP for a project you are about to start or have recently started. Use these instuctions as a guide.