Skip to content

CWML/research-data-management-cheat-sheet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Research Data Management Cheat Sheet

Research Data Management Lifecycle

The research data management lifecycle is an illustration of the research process as it relates to data management. In this system, data is meant to be organized, annotated and stored in ways that will facilitate data sharing and reuse and/or research validation.

data lifecycle

Project Planning

At the start of any research project, you should think ahead about what data you will need to use (if any) during your research processes. You may need to pay to access, compute against, or store data, so knowing about these costs upfront can inform grant budgets.

Data Creation / Collection

Data might be collected from vendors, databases, or other researchers. If the data you are searching for does not exisit, you may need to collect it yourself from multiple sources, or you may need to create the data.

Support at Yale:

Data Processing

After data is collected or created, most likely you will need to process or clean the data in some way. Data processing and cleaning can involve, merging multiple datasets, selecting or filtering out specific portions of a dataset, standardizing categories found within a dataset, reorganizing how spreadsheets containing data are organized, and more. When you create data groups that result in aggregation, data processing can start to bleed into data analysis.

Support at Yale:

Data Analysis

Data analysis = generating findings from your data.

An important part of data analysis includes data visualization (i.e., graphs).

Support at Yale

Data Sharing / Retention

Generally, research data and materials that are commonly accepted in the scientific community as necessary to validate research findings must be retained by Yale researchers for three (3) years after publication of the findings or all required final reports (e.g., progress and financial) for the project have been submitted to the sponsor. Yale Policy 6001 Research Data & Materials Policy

Data sharing refers to the process of making data public, typically via a data repository. Data retention refers to storing data so it remains usable, though not nessissarily available to the public.

Research Data Management Lifecycle + Constant Themes

In addition to the (sometimes iterative) stages you will progress through during the Research Data, there are also themes that you will need to consider during multiple, if not all, of these phases.

data lifecycle

Version Control

Version control allows you to see the change history of a file, and to restore a file to a previous iteration. You can apply a manual version control by adding dates or v1/v2/vfinal notations to a file name, or by writing a change log within a READme file. Cloud data storage systems like Box and Google Drive have version control capabilitites (Note: whenever you are exploring a data, or content management system, make a note to check if the system supports version control and how versions are retained).

The most robust and independent way to maintain control over your file versioning is to apply a Version Control System like Git.

Support at Yale:

  • If you have questions about Git, or would like help getting started with Git or GitHub, email medicaldata@yale.edu.

Documentation

Documentation can include any notes and annotations related to your research data that make your data understandable to others (as well as your future self). Maintaining accurate and useful documentation can make the difference between your data being reusable in future research senarios or not.

Support at Yale:

Data Storage

When choosing a data storage solution, you should think about how often you will be using this data, if others will need to access this data too, how much a data storage solution will cost, the level of risk associated with your data, the size of your data, and more.

Support at Yale:

Data Security

How can you know which software are cleared for moderate or high risk data? (And what are the classifications of moderate or high risk data?) Check with Yale Information Security

Reserach Data Management Lifecycle + Technology

When you start to think about how you would actually engage with any of these steps, different technology aspects come into play, along with themes including version control, documentation, and operational data storage.

data lifecycle + technology

Project Planning Tools

Data Creation Tools

  • Core Research Facilities - Yale’s Core Research Facilities provide Yale researchers access to state of the art scientific instrumentation with the intent to keep Yale’s scientific research at the cutting edge. Each Core employs highly trained staff that may provide training and assistance with use of instrumentation as well as aid in experimental design.

Data Collection Tools

  • APIs (Application Programming Interfaces)
  • Qualtrics - create and deploy
  • Microsoft Excel
  • Databases - databases are more robust for storing and organizing interrelated data structures than spreadsheets or tables. Email medicaldata@yale.edu with questions about relational database design and set-up.

Data Processing Tools

Data Analysis Tools

Data Sharing / Retention Tools

FAQs

Have other questions? Email medicaldata@yale.edu

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published