Managing Research Data
Learn about tools and resources to help with data management for research
Research Data Management (RDM) encompasses the processes surrounding collecting, organizing, describing, sharing, and preserving research data created in tabular, statistical, numeric, geospatial, image, multimedia or other formats. The most effective and efficient data management practices begin at the research planning stage. With early planning for data management you can:
- Save time by having a plan in place for your data from the beginning of your project.
- Comply with legal and funder requirements.
- Increase the visibility and impact of your research by making your data searchable and citable.
- Support open access and foster new research by preserving your data and making it accessible to other researchers.
For best results, data structure should be fully modeled top-to-bottom/beginning-to-end in the planning phase of a project. When planning the organization of your data, outline the following elements:
- The context of data collection: project history, aim, objectives and hypothesis.
- Data collection methods: sampling, data collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used
- Dataset structure of data files, study cases, relationships between files
- Data validation, checking, proofing, cleaning and quality assurance procedure carried out
- Changes made to data over time since their original creation and identification of different versions of data files
- Access and use conditions or data confidentiality
Folder structure for your files can assist in the unique identification of the files contained within them. Consider the structure of the folders containing your data files before you begin to collect your data. Ideas for how to organize your folders include:
- Data type (text, images, models, etc.)
- Time (year, month, session, etc.)
- Subject characteristic (species, age grouping, etc.)
- Research activity (interview, survey, experiment, etc.)
A data management plan is a document outlining how a researcher plans to manage data during and after a research project including how it will be organized, maintained and shared. More and more funding agencies are now requiring researchers to submit a formal data management plan (DMP) when applying for grants. Below is a list of resources to learn more information on funder requirements.
- See the California Digital Library’s funding agency requirement database to learn more about funder data management requirements.
- UCSD library has a collection of example data management plans attached to NSF proposals organized by directorate.
- Browse the Sherpa/Juliet page to find a summary of policies given by various research funders as part of their grant awards.
- Use the DMPTool to create your data management plan.
- This free tool will walk you step-by-step through the requirements for available funders and upon completion will provide you an exportable data management plan.
- Most Claremont Colleges users can create their own account by selecting “Not in List” during login. Harvey Mudd College users are able to login using their campus NetID.
- Refer to the Guidelines for Effective Data Management Plans from the Inter-University Consortium for Political and Social Research (ICPSR) for guidance on creating a data management plan.
- For templates, tools, funder requirements, best practices and data management plan examples in the natural sciences see the ICPSR Data Management Plan Resources & Examples
Open Science Framework
An open-source tool for sharing research data, code, and documentation from the Center for Open Science. Users can create a free account that can be linked to existing accounts on sites like Google Drive, GitHub, and Box.
An popular online platform that allows you to combine live code with narrative descriptions and notes. Notebooks are useful for communicating experimental protocols involving computer code.
Think of a file name as a unique identifier for each of your files. Following a naming convention allows you to simplify the organization of your files and locate your files with ease, as well as making it easier for others to understand and reuse your data. This is particularly important when you are working on a collaborative project. Here are some recommended best practices for naming your files:
- Use names that are brief but descriptive
- Avoid spaces and special characters (e.g.*, #, % etc.)
- Come up with a naming convention adhered to by everyone using the files
- Identify versions of files using dates and version numbering in file name
- Use three letter file extensions to ensure backwards compatibility (ex: .doc, .tif, .txt)
- Do not use letter case to identify different files (ex. datasetA.txt vs. dataseta.txt)
Use these resources for additional guidance in naming and organizing data files:
- Smithsonian Data Management Best Practices – Naming and Organizing Files
- Data Management for Researchers: Organize, maintain and share your data for research success (book) by Kristin Briney
- Software Carpentry video on Data management
- Consultations related to your particular data management questions
- Course-integrated instruction on data management best practices
- Advice on repositories for long-term secure storage of research data