Skip to content

Commit

Permalink
Final changes to training
Browse files Browse the repository at this point in the history
  • Loading branch information
kracha committed Feb 16, 2024
1 parent f1f6d5b commit b76f359
Show file tree
Hide file tree
Showing 9 changed files with 30 additions and 15 deletions.
8 changes: 5 additions & 3 deletions training/01_introduction.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@



These materials are meant to introduce you to the principles of open science, effective data management, and data archival with the DataONE data repository. It also provides an overview on the tools we will be using (remote servers, Rstudio, R, Troubleshooting, Exercises) throughout the training. This document is meant to take multiple days to complete depending on your previous knowledge on some of the topics.
These materials are meant to introduce you to the principles of open science, effective data management, and data archival with the DataONE data repository. It also provides an overview on the tools we will be using (remote servers, Rstudio, R, Troubleshooting, Exercises) throughout the training. This document is meant to take multiple days to complete, depending on your previous knowledge.

We believe in allowing employees the space to fully grasp concepts during training, even if it means taking longer than expected. Quality learning is our priority, and there's no pressure to finish within a specific timeframe. You may find it helpful to take notes on important concepts, and you will always be able to refer back to this training during your time at NCEAS.

If you see anything that needs fixing, submit a issue in the
<a href = 'https://github.com/NCEAS/datateam-training/issues' target='_blank'> github issues </a>
Expand Down Expand Up @@ -51,11 +53,11 @@ On the servers, paths to files in your folder always start with `/home/yourusern

**Note** - if you are a more advanced user, you may use the method you prefer as long as it is evident where your file is from.

When you write scripts, try to avoid writing relative paths (which rely on what you have set your working directory to) as much as possible. Instead, write out the entire path as shown above, so that if another data team member needs to run your script, it is not dependent on a working directory.
When you write scripts, try to avoid writing relative paths (which rely on what you have set your working directory `~/` to) as much as possible. Instead, write out the entire path as shown above, so that if another data team member needs to run your script, it is not dependent on a working directory.

## A note on R

This training assumes basic knowledge of R and RStudio. Spend at least 30 minutes walking through Jenny Bryan's excellent materials [here](http://stat545.com/block002_hello-r-workspace-wd-project.html) for a refresher.
This training assumes basic knowledge of R and RStudio. Spend at least 15 minutes walking through Jenny Bryan's excellent materials [here](http://stat545.com/block002_hello-r-workspace-wd-project.html) for a refresher.

Throughout this training we will occasionally use the namespace syntax `package_name::function_name()` when writing a function. This syntax denotes which package a function came from. For example `dataone::getSystemMetadata` selects the `getSystemMetadata` function from the `dataone` R package. More detailed information on namespaces can be found [here](http://r-pkgs.had.co.nz/namespace.html).

Expand Down
3 changes: 2 additions & 1 deletion training/04_editing_eml.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ This chapter is a practical tutorial for using R to read, edit, write, and valid
Most of the functions you will see in this chapter will use the `arcticdatautils` and `EML` packages.

```{block, type = "note"}
This chapter will be longest of all the sections! This is a reminder to take frequent breaks when completing this section.
This chapter will be longest of all the sections! This is a reminder to take frequent breaks when completing this section.
If you struggle with getting a piece of code to work more than 10 minutes, reach out to your supervisor for help.
```
```{block, type = "note"}
When using R to edit EML documents, run each line individually by highlighting the line and using CTRL+ENTER). Many EML functions only need to be ran once, and will either produce errors or make the EML invalid if run multiple times.
Expand Down
4 changes: 3 additions & 1 deletion training/09_first_ticket.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# First Ticket

After completing the previous chapters, Daphne or Jeanette will assign a ticket from RT. Login using your LDAP credentials got get familiarized with RT.
After completing the previous chapters, your supervisor will assign a ticket from RT. Login using your LDAP credentials got get familiarized with RT.

```{r, child = '../workflows/pi_correspondence/navigate_rt.Rmd'}
```
Expand All @@ -16,6 +16,8 @@ We have developed some partially filled R scripts to get you started on working

You can use this template where you can [fill in the blanks](data/dataset_processing_example_blanks.R) to get familiar with the functions we use and workflow at first. We also have a more minimal example [A filled example](data/dataset_processing_example_skeleton.R) as a intermediate step. You can look at the [filled example](data/dataset_processing_example_filled.R) if you get stuck or message the #datateam.

In addition, you may find this [cheat sheet](https://docs.google.com/document/d/1DPhCmnxhoSWv5FEHvlIiNRBjcuFbKqwZviaZDf8UfVU/edit?usp=sharing) of data team R functions helpful.

Once you have updated the dataset to your satisfaction and reviewed the Final Checklist, post the link to the dataset on #datateam for peer review.

```{r, child = '../workflows/pi_correspondence/final_review_checklist.Rmd'}
Expand Down
10 changes: 7 additions & 3 deletions training/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ favicon: "favicon.ico"
* If you are an intern, fill out anticipated quarterly schedule on the intern google calendar shared with you.
* <a href="https://timekeeping.ucsb.edu/" target="_blank">Electronic Timekeeping</a> - make sure you can log on to electronic timekeeping
via your UCSBNetID and password (may not be accessible on the first day, if you continue to have issues please let Ana know). If you are an hourly employee, log your hours for your first day! Under today's date select 'Hours Worked' under the Pay Code column, enter the amount of hours under the Amount column, and finally click the 'Save' button in the top right. At the end of every two-week pay period you will also need to click the 'Approve Timecard' button in order to submit your timecard.
* <a href="https://www.ucpath.ucsb.edu/" target="_blank">UCPath</a> - you can set up your paycheck preferences here, including direct deposit and income tax withholding.
<a href="https://timekeeping.ucsb.edu/sites/default/files/employee_hours_worked_0.pdf" target="_blank">Detailed Instructions</a>
* Let Jeanette or Daphne know what email you would like to use for general NCEAS updates from all@nceas.ucsb.edu

Expand All @@ -47,8 +48,9 @@ via your UCSBNetID and password (may not be accessible on the first day, if you
NCEAS hosts a number of events that you are encouraged to attend. Keep an eye on your email but the recurring events are:

* Roundtable
+ weekly presentation and discussion of research by a visiting or local scientist
+ Wednesdays at 12:15 in the lounge
+ presentation and discussion of research by a visiting or local scientist
+ first or second Thursdays of the month at 3:30 in the lounge and via zoom
+ followed by happy hour at 4:30 on the terrace
* Coffee Klatch
+ coffee, socializing, and news updates for NCEAS
+ Tuesdays at 10:30 in the lounge
Expand All @@ -64,7 +66,9 @@ Check out their individual calendar entries and channels for more information
* NCEAS Book Club - #bookclub

## Internship Expectations {-}
As an intern with the data team, there are a few expectations that the Project Coordinators have of you. Overall, we expect you to be communicative and proactive. We want you to learn and grow in this position, but we don't want you spinning your wheels going nowhere fast! If you've spent 10-15 minutes on an issue and you're not making any progress, reach out to us and your peers for help in the #datateam slack channel. The #datateam slack channel is the main form of communication, and we expect all interns to become comfortable communicating in this space.
As an intern with the data team, there are a few expectations that the Project Coordinators have of you. Overall, we expect you to be communicative and proactive. We want you to learn and grow in this position, but we don't want you spinning your wheels going nowhere fast! If you've spent 10-15 minutes on an issue and you're not making any progress, reach out to us and your peers for help in the #datateam slack channel.

The #datateam slack channel is our main form of communication, and we expect all interns to become comfortable communicating in this space. By posting your questions and code in the #datateam channel (instead of sending direct messages), multiple people will be able to help at once, and we all can learn from the problems that our peers encounter.

Additionally, we expect interns to work within the standard business hours of 8am - 5pm (pacific time). We ask that you mark your expected work hours on the shared "Intern" Google Calendar. This is so that the Project Coordinators are aware of who's working day-to-day and can plan their days accordingly. We also use this to verify time sheets when they are submitted. Ideally, interns would input their proposed hours on the calendar at least one week in advance. During exams and other unusually busy weeks at school, we understand you may need to shift your hours or reduce your workload. When this occurs, please make sure to email either Daphne or Jeanette so that we know not to expect you during your usual schedule.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This is easier to accomplish using `arcticdatautils`
doc$dataset$otherEntity <- doc$dataset$otherEntity[order(entity_names)]
```

2. Data files
2. Data files and PIDs retrieved with `get_package`

```{r eval = F}
pkg <- get_package(adc, rm, file_names = T)
Expand Down
4 changes: 3 additions & 1 deletion workflows/data_portals/mosaic.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Look out for datasets that are part of the MOSAiC expedition from 2019 -2020. Th

> We would like to ask for the event label associated with this dataset (see https://www.pangaea.de/expeditions/events/PS122%2F4).
Often times researchers do not have a full list of event labels, but instrument names are provided in the dataset metadata. In these cases, try to find the correct event labels by searching through the event lists and finding matching instrument names and dates. Each MOSAiC [campaign](https://www.pangaea.de/expeditions/byproject/MOSAiC) has its own event list. [Here](https://www.pangaea.de/expeditions/events/PS122%2F1) is an example of an event list for the first cruise by the Polarstern research vessel.

2. Find the appropriate dataset and attribute level annotations

- There are functions in `arcticdatautils` to help with annotating: `mosaic_annotate_dataset` and `mosaic_annotate_attribute`
Expand All @@ -27,7 +29,7 @@ The following shows how to add the annotations using `arcticdatautils` and manua

### Dataset Level Annotations

There are 5 main campaigns in the MOSAiC expedition. The main campaigns follow the pattern `PS122/#`. For the full campaign list it is easiest to see on the [PANGAEA website](https://www.pangaea.de/expeditions/byproject/MOSAiC)
There are 5 main campaigns in the MOSAiC expedition. The main campaigns follow the pattern `PS122/#`. The full campaign list is easiest to view on the [PANGAEA website](https://www.pangaea.de/expeditions/byproject/MOSAiC). The first two letters of each campaign name correspond with the ship or station name (ex: PS = Polarstern).

**arcticdatautils**
```{r, eval = F}
Expand Down
2 changes: 1 addition & 1 deletion workflows/edit_data_packages/01_datapack_background.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## datapack Background
*adapted from the dataone and datapack vingettes*
*adapted from the dataone and datapack vignettes*

`datapack` is written differently than most R packages you may have encountered in the past. This is because it uses the [S4](https://adv-r.hadley.nz/s4.html) system instead.

Expand Down
2 changes: 1 addition & 1 deletion workflows/edit_data_packages/set_rights_and_access.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ set_rights_and_access(mn,
permissions = c('read','write','changePermission'))
```

If you ever need to remove/add public access to your package or object, you can use `remove_public_read()` or `set_public_read()`, respectively.
If you ever need to remove/add public access to your package or object, you can use `remove_public_read()` or `set_public_read()`, respectively. Making files publicly readable is especially useful when downloading large amounts of files to the server in order to to use metadata helper functions that require a file path (ex: `eml_get_raster_metadata()` and `get_ncdf4_attributes()`).

```{r, eval = FALSE}
remove_public_read(mn, c(pkg$metadata, pkg$data, pkg$resource_map))
Expand Down
10 changes: 7 additions & 3 deletions workflows/edit_eml/edit_attributelists.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,9 @@ attributes <- data.frame(
missingValueCodeExplanation = c(NA, NA, NA,NA, NA, NA, NA, 'no sampling comments'))
```

However, typing this out in R can be a major pain. Luckily, there's an app that you can use to build attribute information. You can use the app to build attributes from a data file loaded into R (recommended as the app will auto-fill some fields for you) to edit an existing attribute table, or to create attributes from scratch.
However, typing this out in R can be a major pain. Luckily, there's a Shiny app that you can use to build attribute information. You can use the app to build attributes from a data file loaded into R (recommended as the app will auto-fill some fields for you) to edit an existing attribute table, or to create attributes from scratch. Use the following commands to create or modify attributes.

Use the following commands to create or modify attributes. These commands will launch a "Shiny" app in your web browser. You must select "Quit App" in order to save your changes, and R will not run code while the app is open.
Use the following commands to create or modify attributes. These commands will launch a "Shiny" app in your web browser.

```{r, eval = FALSE}
#first download the CSV in your data package from Exercise #2
Expand All @@ -96,7 +96,11 @@ attribute_tables <- get_attributes(doc$dataset$dataTable[[i]]$attributeList)
attribute_tables <- EML::shiny_attributes(attributes = attribute_tables$attributes)
```

Once you are done editing a table in the app, quit the app and the tables will be assigned to the `attribute_tables` variable as a list of data frames (one for attributes, factors, and units). Be careful to not overwrite your completed `attribute_tables` object when trying to make edits. The last line of code can be used in order to make edits to an existing `attribute_tables` object.
Once you are done editing a table in the browser app, quit the app by pressing the red "Quit App" button in the top right corner of the page.

If you close the Shiny app tab in your browser instead of using the "Quit App" button, your work will not be saved, R will think that the Shiny app is still open, and you will not be able to run other code. You can tell if R is confused if you have closed the Shiny app and the bottom line in the console still says `Listening on http://...`. If this happens, press the red stop sign button on the right hand side of the console window in order to interrupt R.

The tables you constructed in the app will be assigned to the `attribute_tables` variable as a list of data frames (one for attributes, factors, and units). Be careful to not overwrite your completed `attribute_tables` object when trying to make edits. The last line of code can be used in order to make edits to an existing `attribute_tables` object.

Alternatively, each table can be to exported to a csv file by clicking the `Download` button. If you downloaded the table, read the table back into your R session and assign it to a variable in your script (e.g. `attributes <- data.frame(...)`), or just use the variable that `shiny_attributes` returned.

Expand Down

0 comments on commit b76f359

Please sign in to comment.