From f401f8a91796894c9d317341337da3799b9a0746 Mon Sep 17 00:00:00 2001 From: Julien Brun Date: Mon, 15 Apr 2024 13:09:25 -0700 Subject: [PATCH] add motto and improving bullets --- preserve.qmd | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/preserve.qmd b/preserve.qmd index ce7730c..616b118 100644 --- a/preserve.qmd +++ b/preserve.qmd @@ -5,6 +5,11 @@ title: "Archiving and preserving your data" As you finalize your project, an important task is to archive your data in a publicly available repository (pending sensitivity and by non-disclosure agreement exceptions). There are a few important steps to ensure that your data can be reused by others and thus make your work more reproducible. +Your general philosophy when preparing the preservation of your scientific products should be: ***Document what you used and preserve what you produced*** + +See below for more information. + +
## What scientific products to preserve? @@ -16,14 +21,14 @@ Here are a few questions to ask yourself to determine if you should refer in you 1. The **raw data is already publicly accessible**, and the hosting solution (website, FTP server, etc.) seems well maintained (ideally providing a recommended citation) -=> Document the website or process you used to collect the data and when you accessed/downloaded the data you used. Try also to determine if a pecific version number is associated with the data you used.* + => *Document the website or process you used to collect the data and when you accessed/downloaded the data you used. Try also to determine if a specific version number is associated with the data you used.* 2. The raw data is **not** publicly accessible -Note that we are not talking about data under a non-disclosure agreement (NDA) here but more about data with an unclear reuse status or obtained by interactions with a person or an institution. For example, if the data you used were sent to you privately, then we recommend that you: + Note that we are not talking about data under a non-disclosure agreement (NDA) here but more about data with an unclear reuse status or obtained by interactions with a person or an institution. For example, if the data you used were sent to you privately, then we recommend that you: -- inquire with your person of contact about the status of licensing and if they would be willing to let you share those data publicly. You might face resistance at first, so take the time to explain why you think it is valuable to your work to also share those data sets. -- if, in the end, it is not possible to share the data, please still describe the data in your documentation and list the contact information (person or institution) to inquire about this data set. + - inquire with your person of contact about the status of licensing and if they would be willing to let you share those data publicly. You might face resistance at first, so take the time to explain why you think it is valuable to your work to also share those data sets. + - if, in the end, it is not possible to share the data, please still describe the data in your documentation and list the contact information (person or institution) to inquire about this data set. ### Intermediate data @@ -44,6 +49,8 @@ Those services are often well-integrated with data repositories that link your c We recommend including any data set used to produce statistics, figures maps, and other visualizations that were used in your work, in this case, even if generated by scripts. +
+ ## Choosing a data repository OK, we know what we want to archive. Now let’s decide where we want to preserve things! @@ -78,15 +85,16 @@ Note that data repositories often support a certain number of licenses, so this If you want to know more on how to best license your data, click [here](https://www.library.ucsb.edu/sites/default/files/dls-n10-2021-licensing-navy_0.pdf) + ## Documenting your work -To make your archiving process the most efficient, it is key to document your work as you progress throughout your project. If you do so, archiving your data will consist of collecting existing information about the various parts of your project rather than developing it from scratch a few months after you have generated this specific dataset. +To make your archiving process the most efficient, it is key to document your work as you progress throughout your project. If you do so, archiving your data will consist of collecting existing information about the various parts of your project rather than developing it from scratch a few months after you have generated this specific data set. Add an image about the power of README ### Metadata -Metadata aims at describing your data with enough information that should let you be able to reuse this data even if you know nothing about this specific dataset. It is sometimes defined as data about data. So what should you include? Here are some pointers: +Metadata aims at describing your data with enough information that should let you be able to reuse this data even if you know nothing about this specific data set. It is sometimes defined as data about data. So what should you include? Here are some pointers: - Describe the contents of data files. If you are using complex jargon or concepts make sure you refer to external vocabulary or clearly define these terms as used in your project - Keep data entry consistent