Skip to main content

Data management describes the workflows and platforms for storing and maintaining data that is produced during the course of research. Good data management planning ensures no time will be lost to misplaced files, collaborators are all kept up to date, and funder/publisher requirements for data sharing are met. Well-managed data also ensures good ethical practices and the reproducibility of your research. 

Before your research

Create a data management plan

A data management plan is a written document outlining how a researcher plans to manage data during and after a research project including how it will be organized, maintained and shared. More and more funding agencies are now requiring researchers to submit a formal data management plan (DMP) when applying for grants. Below is a list of resources to learn more information on funder requirements.

During your research

Workflows

The type of research you are doing, the format and size of your data, the number and type of your collaborators will all be determining how your data is gathered, analyzed and shared. These decisions are also pretty closely linked to the tools and platforms you are using. Data scientists may be doing their analyses with Jupyter Notebooks or RStudio. Scholars in environmental sciences may be mostly working with GIS platforms. A good data management plan will take into consideration how these platforms function, and take advantage of the capacity of these tools to manages issues such as version control and permissions.

The data gathering stage, while you are probably refining your workflow, is a good time to check back in with your data management plan and note any changes you may have made. You don’t have to aim for “perfect” documentation — some is always better than nothing!

Not sure how to get started? These are a couple platforms for collaborative research work that are widely used and have a lot of support and training material online.

Open Science Framework
An open-source tool for sharing research data, code, and documentation from the Center for Open Science. Users can create a free account that can be linked to existing accounts on sites like Google Drive, GitHub, and Box.

Jupyter Notebooks
An popular online platform that allows you to combine live code with narrative descriptions and notes. Notebooks are useful for communicating experimental protocols involving computer code.

Data Security

You will want to set up and document workflows to ensure that you data and other research outputs are secure. This includes making sure backups are properly timed and archived and the appropriate collaborators have access.

A good rule of thumb is to follow the “3-2-1” rule. Store three copies of your data at two different locations with one copy in the cloud (or offsite).

Some research data may also fall under restricted categories because it contains human subjects information or other forms of confidential data. Your campus institutional review board and your campus IT office can help you evaluate your particular requirements.

After your research

Sharing your data

It is increasingly common for journals and funding agencies to require researchers to deposit their data in a publicly-accessible repository. Statements in academic articles saying “data available upon request” are increasingly being replaced by persistent links to datasets in a repository.

Researchers working in highly-collaborative fields may want to share their data in a structured way as well, even if they are not required to by a publication.

There are many options for data repositories, and we recommend you follow the guidance of your funder or publisher if they provide any. There are also preferred data repositories for certain academic fields. For example, scholars working on genomic sequencing will share their data in NIH’s Sequence Read Archive.

The Claremont Colleges Library is a member of the Dryad data repository which supports a wide range of disciplines, and this repository meets many of the major journal and funder standards. The library’s membership in Dryad means that our affiliated researchers can deposit their data in Dryad, as well as receive data curation support, for no charge.

Reproducibility

Being able to publish research and accompanying data, code, and other materials in a way that allows other scholars to reproduce your findings is a growing concern in a number of fields. A solid data management plan, clear documentation, and keeping your data in an repository are all key to ensuring reproducibility.

Users of the R/RStudio statistics package can find a number of resources designed to guide you through creating reproducible research.

Reproducible Research with R and R Studio (3rd ed)

Gandrud, C. (2020). Reproducible research with R and RStudio (Third, Ser. The R series). CRC Press. 

The Whole Tale

Whole Tale is an NSF-funded Data Infrastructure Building Block (DIBBS) initiative to build a scalable, open source, web-based, multi-user platform for reproducible research enabling the creation, publication, and execution of tales – executable research objects that capture data, code, and the complete software environment used to produce research findings.

Getting help with your research

We are happy to talk to Claremont scholars about their data management support questions

  • Consultations related to your particular data management questions
  • Course-integrated instruction on data management best practices
  • Advice on repositories for long-term secure storage of research data