forked from thefaylab/lab-manual
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path08-Realising-FAIR-Organisation.qmd
122 lines (78 loc) · 9.06 KB
/
08-Realising-FAIR-Organisation.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# Realising FAIR: Data organisation {#realisingFAIR1}
*Estimated time: 45 minutes*
In this module, we will focus on the realisation of FAIR data. We will explore some best practices and tools that can facilitate the implementation of the FAIR principles. At the end of this module you should be able to:
- Understand the importance of good data organisation and be aware of best practices around data organisation
::: callout-important
## Activities
- Watch the videos about 'Data organisation' and complete the quiz
- Try out Cookiecutter, share your folder structure on [GitHub discussion #127 Folder Structure](https://github.com/EstherPlomp/TNW-RDM-101/discussions/127), and comment on another person's folder structure
:::
## Data Organisation
Good data documentation starts with good organisation and file naming of the data of your project! Good data organisation on your computer or your [Project Data (U:) drive](https://www.tudelft.nl/en/library/research-data-management/r/manage/storage) will allow you to make the data Findable both for yourself and for your collaborators who have access to the data. Making the data findable also indirectly improves the re-usability of the data. You need to be able to find the data before you can re-use it! Implementing a good folder structure, data organisation and a meaningful file naming convention are first steps towards making data FAIR.
![Image by [Chaz Hutton](https://twitter.com/chazhutton/status/1285955514241875968)](https://pbs.twimg.com/media/EdihtztWsAA9zaE){fig-alt="Title: anatomy of a file name. A file name that shows 'New_Final_final_new-23_fuckingfinal. jpeg'. For each component a box with explanation is added. For the first final the description 'the finished file' is added. For the second final the box says 'the finished file wihtout spelling mistakes. For New-final-final the box says 'the finished file after the client made a last minute complete change of direction. For the added new the box says 'the finished file after everyone in the pointless meeting managed to add in their own change. The 23 box says 'the finished file after was presumably 23 emails'. For fucking final the box says 'the finished version at the point that you stopped caring'"}
In the next video, we will go through some best practices to help you organise the data of your project efficiently. These best practices are using a good folder structure, tagging files (if possible) and an appropriate file naming convention to enhance the findability of the data in your directories.
- [04-1_Module-4_Video_Presentation_Data organisation](https://drive.google.com/file/d/1lCpNmT6P64blT4aX36U4fFWLIGAoZKlV/view?usp=share_link)(*15 minutes*)
- [The Turing Way information on File Naming Conventions](https://the-turing-way.netlify.app/reproducible-research/rdm/rdm-storage.html?highlight=file%20naming#file-naming-conventions)
![Image by [Allison Horst](https://twitter.com/allison_horst/status/1644020630780805121/photo/1)](https://raw.githubusercontent.com/EstherPlomp/TNW-RDM-101/main/images/Horst-image.jpg){width="70%"}
### Presentation Resources
- [Folder structure explanation of Neuroscientist Nikola Vukovic](http://nikola.me/folder_structure.html)
- [The Turing Way - Data storage and organisation](https://the-turing-way.netlify.app/reproducible-research/rdm/rdm-storage.html) [@community2022]
- [Tagging and Finding Your Files: Home. MIT Libraries](https://libguides.mit.edu/metadataTools)
![The Turing Way illustration by Scriberia. CC-BY 4.0. DOI: [10.5281/zenodo.3332807](https://doi.org/10.5281/zenodo.3332807)](https://the-turing-way.netlify.app/_images/research-compendium.svg){width="75%"}
### Quiz Data Organisation
Test your knowledge after watching the 'Data organisation' video. Think about whether the following statements are True or False (see answers below).
1. Good data organisation makes the data Findable for yourself and your collaborators.
2. Thinking about data organisation at the beginning of your project can save you a lot of time later on.
3. A Research Compendium does not follow a Hierarchical structure.
4. In a Research Compendium data, methods, and outputs should be clearly separated.
5. Consistency is not relevant in data organisation.
![Tweet by [\@varsha_khodiyar](https://twitter.com/varsha_khodiyar/status/1559453945478922240/photo/1)](https://pbs.twimg.com/media/FaRK7XcWIAA9xeo)
## Assignment: Using Cookiecutter
*Estimated time: 20 minutes*
**You can also manually set up a folder structure (go straight to step 4 below)! Cookiecutter is especially helpful to people who program/Python-users or if you have to set up a lot of folder structures for smaller projects (for example, with students).**
Note that you need to have Python and pip (or Anaconda) installed to easily follow the video. To install Anaconda:
- Windows ([instruction video](https://www.youtube.com/watch?v=xxQ0mzZ8UvA))
- Open <https://www.anaconda.com/products/individual#download-section> with your web browser.
- Download the Anaconda for Windows installer with Python 3. (If you are not sure which version to choose, you probably want the 64-bit Graphical Installer Anaconda3-...-Windows-x86_64.exe)
- Install Python 3 by running the Anaconda Installer, using all of the defaults for installation except make sure to check Add Anaconda to my PATH environment variable.
- Linux
- Open <https://www.anaconda.com/products/individual#download-section> with your web browser.
- Download the Anaconda Installer with Python 3 for Linux.
- Open a terminal window and navigate to the directory where the executable is downloaded (e.g., `cd ~/Downloads`).
- Type `'bash Anaconda3-` and then press Tab to autocomplete the full file name. The name of file you just downloaded should appear.
- Press `Enter` (or `Return` depending on your keyboard). You will follow the text-only prompts. To move through the text, press Spacebar.
- Type `Yes` and press `Enter` to approve the license.
- Press `Enter` to approve the default location for the files.
- Type `Yes` and press `Enter` to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).
- Close the terminal window.
- Mac ([instruction video](https://www.youtube.com/watch?v=TcSAln46u9U))
- Open <https://www.anaconda.com/products/individual#download-section> with your web browser.
- Download the Anaconda Installer with Python 3 for macOS (you can either use the Graphical or the Command Line Installer).
- Install Python 3 by running the Anaconda Installer using all of the defaults for installation.
1. Watch the video on [setting up a research compendium using Cookiecutter](https://vimeo.com/462773031) (3:36 min).
2. See the instructions on [how to clone the Cookiecutter template](https://utrechtuniversity.github.io/workshop-computational-reproducibility/slides/slides_project-setup.html#6https://utrechtuniversity.github.io/workshop-computational-reproducibility/slides/slides_project-setup.html#6)
a. [Information about hidden folders on Windows](https://web.archive.org/web/20230610055753/https://www.howtogeek.com/446/show-hidden-files-and-folders-in-windows/)
b. You can also find a visualisation of the template on the [GitHub repository](https://github.com/bvreede/good-enough-project)
3. Work on your own to populate the template with information/files from your project.
4. Share your folder structure on [GitHub discussion #127 Folder Structure](https://github.com/EstherPlomp/TNW-RDM-101/discussions/127) and comment on another person's folder structure
5. Discuss with your peers how similar your own folder structure is to the one you created using Cookiecutter. What additional elements are there in the Cookiecutter project? Is there anything that the Cookiecutter template is missing?
If you didn't find this template useful, you can browse these templates to see if one of these fits your needs better:
a. [Data Science](https://github.com/drivendata/cookiecutter-data-science)
b. [Python template](https://github.com/NLeSC/python-template)
c. [Reproducible Science](https://github.com/mkrapp/cookiecutter-reproducible-science)
d. [More templates](http://cookiecutter-templates.sebastianruml.name/)
::: callout-tip
## Problems with Cookiecutter?
You can also download the template from GitHub. Note that if you'd like to reuse this template a lot, it will be easier to do it as described above.
1. Go to "<https://github.com/bvreede/good-enough-project>"
2. Go to Code -\> Download ZIP
3. Extract all files in the folder where you'd like to save it, and rename the folder {{cookiecutter.project_slug}} to the name of your project
:::
::: callout-tip
## Sharing your folder structure
You can use the [tree command](https://www.geeksforgeeks.org/tree-command-unixlinux/) to generate a list of your folder structure and its files.
:::
![Image by [ErrantScience](https://mas.to/@errantscience/110661764681681686)](https://media.mas.to/masto-public/media_attachments/files/110/661/764/621/470/422/original/1b7fc2b2fe16d89a.jpeg){width="90%"}
## Quiz Answers
1. True / 2. True / 3. False / 4. True / 5. False
## References