forked from thefaylab/lab-manual
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path05-Essentials-for-Research-Data.qmd
158 lines (100 loc) · 12.5 KB
/
05-Essentials-for-Research-Data.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# Essentials for Research Data {#essentials}
*Estimated time: 100 minutes*
In this module, we start to dig deeper into RDM by introducing multiple topics concerning the essentials of RDM. At the end of this module you should be able to:
- Identify different types of research data
- Recognise what is considered confidential data in research
- Realise what RDM entails within a research project
- Recognise the responsibilities regarding RDM for TUD PhD candidates
- Store and back up the research data of your project in a secure manner
::: callout-important
## Activities
- Watch videos of TU Delft researchers telling you about the research data and confidential research data they work with
- If you work with personal data, read the information on the two linked websites
- Look at the research cycle image and the RDM questions you need to ask yourself at each step of your project
- Watch the video about data storage & infrastructure available at TU Delft
- Read the policy responsibilities
:::
## Research Data definition
Research data is any information that has been collected, observed, generated or created to validate research findings.
Depending on the discipline you work in, research data can be collected or produced in different ways.
You can capture them in real-time (sensors, images), you can collect them using laboratory instruments and they can derive from interviews or numerical simulations, among others. Research Data can be digital such as tabular data, videos, algorithms, scripts, transcripts, and codebooks.
They can also be non-digital, for example, laboratory samples, sketchbooks, and prototypes.
In the next two videos, researchers from TU Delft tell us about the confidential data they work with and what RDM best-practice they follow:
- [Sian Jones talking about collaboration with industry](https://youtu.be/5MbwTjgc8Vk)
- [Wirawan Agahari talking about personal data in his research](https://youtu.be/YCXZce3qGXI)
Data can be classified in various ways, which is important for effective data management.
The following categories provide a structured framework for understanding and working with research data:
### Research Data categorised by its nature
**Quantitative data** refers to information that is numerical or measurable in nature.
It involves collecting data through structured methods, such as surveys or experiments, and is analysed using statistical techniques.
**Qualitative data** is descriptive in nature, focusing on non-numerical information such as opinions, experiences, or behaviours.
Qualitative data is typically collected through methods like interviews, observations, or open-ended survey questions and is analysed through thematic analysis or interpretation.
### Research Data categorised by collection method
**Experimental data**: Data collected through controlled experiments where variables are intentionally manipulated and measured to establish cause-and-effect relationships.
**Computational data**: Data generated or processed using computational methods, such as simulations, numerical calculations, or machine learning outputs.
**Observational data**: Data obtained through direct observations of real-world phenomena, documenting existing behaviours, events, or characteristics without manipulation.
**Derived/processed data**: Data obtained by analysing or processing raw data.
It is generated through calculations, algorithms, or transformations applied to the original data.
**Research software/code**: Computer programs or code developed and utilised in research activities to support data collection, analysis, modelling, or visualisation.
### Research Data categorised by recording medium
**Digital data** including tabular data, images, videos, algorithms, scripts, transcripts, codebooks,
**Non-digital data** including laboratory samples, sketchbooks, prototypes.
### Research Data categorised by file format
Research data can also be encountered in diverse formats during the data acquisition process reflecting the specific needs and characteristics of each scientific domain.
The ability to access and reuse your data in the future depends on the chosen format.
If the associated software/hardware is no longer used, data may become inaccessible.
To ensure the longevity and accessibility of your research data, it is strongly recommended to **use standard, exchangeable, or open file formats**.
4TU.ResearchData provides a [list of preferred file formats](https://data.4tu.nl/s/documents/Preferred_File_Formats_2019.pdf) for which they guarantee long-term support.
![Image by [Piotr Kononow](https://twitter.com/aabella/status/1527533226574680064)](https://pbs.twimg.com/media/FTLi4U2WIAAQdC9){width="50%"}
## Confidential data
There are multiple types of confidential data that you might be working with during your research project. Some examples include:
- personal data (information about an identified or identifiable natural person, such as names, addresses and social security numbers)
- national security data (such as nuclear research)
- data falling under export control regulations
- confidential data received from commercial, or other external partners
- data related to competitive advantage (for example, patent, IP)
- data which could lead to reputation/brand damage (such as climate change, personal information, animal research)
- politically-sensitive data (such as research commissioned by public authorities, research on societal issues)
When working with confidential data, you need additional security measures for your data to make sure that they are not accidentally released.
## Personal Data
Only read these materials if you work with personal data (data that can identify a person)
- [TU Delft Information about privacy](https://www.tudelft.nl/en/privacy-security/privacy/)
- If you conduct research which involves human Research Subjects (where human participants are the source your research data), you will have to submit an application for approval to the [TU Delft Human Research Ethic Committee](https://www.tudelft.nl/en/strategy/integrity-policy/human-research-ethics/hrec-approval-1-application).
- [Personal data management - The Turing Way](https://the-turing-way.netlify.app/reproducible-research/rdm/rdm-personal.html)
When working with personal data, you will likely encounter the following definitions:
- **Anonymised Data**: Anonymisation involves the removal of personal information.
Therefore anonymised data is often considered less sensitive and may have fewer data protection requirements.
- **Pseudonymised Data**: Pseudonymisation involves replacing identifiable information with pseudonyms or codes.
While it reduces the risk of identifying individuals, it may still be subject to data protection regulations depending on the level of re-identification risk.
- **De-identified Data**: De-identification goes a step further by removing or modifying both direct and indirect identifiers, making it extremely challenging to re-identify individuals.
De-identified data is generally subject to fewer data protection requirements.
[![The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807.](https://raw.githubusercontent.com/alan-turing-institute/the-turing-way/main/book/website/figures/sensitive-data.jpg){fig-alt="A visualisation on how sensitive data requires an additional track or process to be able to share (parts of it). Tools that can help to make research based on senstive data reproducible are: encryption, consent, deidentification, sytnethic data and data safe havens." width="68%"}](https://doi.org/10.5281/zenodo.3332807)
## Relevant RDM steps within a research project
In the following presentation you can go through a simplified cycle which can represent your project. Have a look at the RDM questions you might ask at each step of research.
[02-1\_ Module-2_handout_RDM_steps_Interactive_image](https://surfdrive.surf.nl/files/index.php/s/KiQo8RsVXHPHUUp)
## Research Data infrastructure at TU Delft
In the next video we will go through the infrastructure provided centrally at TU Delft for storing, backup and sharing Research Data. Before starting the data collection/creation within your project, it is good to reflect where you will store and how you will back up the data. Selecting a storage and backup strategy will mean that data is safe during your research project, including in the case of unpredicted problems. Following good data storage practices protect you from data loss and facilitate effective collaborations. In this video, we will go through the infrastructure provided centrally at TU Delft for storing, backup and sharing Research Data. You should ask your supervisor if within your research group/department/ project there is a preferred approach for data storage and backup, or if there are customised solutions already in place.
[02-4_Module-2 \_Presentation \_Data_StorageSharing](https://drive.google.com/file/d/1_SbYKjdh5fcNNqCna3H2IqicLe0OvWQa/view?usp=share_link) (*12 minutes*)
### TU Delft Resources:
- [Additional information about storage infrastructure at TU Delft on TopDesk](https://tudelft.topdesk.net/tas/public/ssp/content/serviceflow?unid=21b6203ec6d74f00a45c32e6034dfc0c&openedFromService=true)
- [Storage solutions overview](https://estherplomp.github.io/TNW-OS-support/posts/storage-solutions/)
- [Recommended backup workflow](https://estherplomp.github.io/TNW-OS-support/posts/storage-backup/#recommended-backup-workflow)
- To [download WebDrive](https://webdata.tudelft.nl/) for the Project Drive
- [Additional information about EduVPN](https://intranet.tudelft.nl/-/openvpn)
## RDM responsibilities
In this section we would like to make you aware of the responsibilities of TU Delft PhD candidates regarding Research Data Management.
These responsibilities are detailed in the University and Faculty Policies. It is very important for TU Delft that researchers follow best practices on Research Data Management (RDM). That is why since 2018 **TU Delft** has published a set of policies which provide a clear division of roles and responsibilities around RDM.
- [Research Data framework policy](http://doi.org/10.5281/zenodo.4088123)
This Framework policy is accompanied by **Faculty-specific data management policies**, which provide more detailed requirements and guidelines for the disciplines associated with each Faculty.
- [Faculty Research Data Management Policies](https://www.tudelft.nl/en/library/research-data-management/r/policies/tu-delft-faculty-policies) or [direct download of the TNW/AS policy](https://d2k0ddhflgrk1i.cloudfront.net/Library/Themaportalen/RDM/Beleid/2020_AS_Research_Data_Management_Policy.pdf).
At TU Delft, software is recognised as a valuable research output that needs to be well documented, preserved and, whenever possible, consistent with the [FAIR principles](https://estherplomp.github.io/TNW-RDM-101/07-FAIR-principles.html). The **TU Delft Research Software Policy** provides a clear division of roles and responsibilities and sets out a simplified, streamlined process to help researchers share software.
- [TUD Research Software Policy](https://zenodo.org/record/4629662)
### Policy summary
This section summarises your responsibilities as a TU Delft/Applied Sciences PhD candidate:
- Developing a written data management plan (DMP) for managing research outputs within the first 12 months of the PhD study. (As part of the Go/No-Go meeting. For all PhDs starting from 1 January 2020 onwards.)
- Attending the relevant training in data management, for which credits can be obtained through the Graduate School.
- Ensuring that all data and code underlying completed PhD theses are appropriately documented and accessible for at least 10 years from the end of the research project, in accordance with the FAIR principles (Findable, Accessible, Interoperable and Reusable), unless there are [valid reasons](https://www.tudelft.nl/en/library/research-data-management/r/buttons/faqs#c547686) which make research data unsuitable for sharing. (For all PhDs starting from 1 January 2019 onwards.)
When sharing software:
- Use a data repository to obtain a DOI for the software. If you use [4TU.ResearchData](https://data.4tu.nl/info//en/) your software will be automatically registered. If you use other repositories, such as [Zenodo](https://zenodo.org/), you will have the register the software via PURE
- Choose one of the pre-approved licenses: MIT, BSD, Apache, GPL, AGPL, LGPL, EUPL, CC0.
See [this slide](https://surfdrive.surf.nl/files/index.php/s/4UtGIygt9940Cog) for more details on how to share your software.