Welcome to the Hydro-JULES Summer/Winter DataLab sessions. Please go to Hydro-JULES school webpage to find more infomration on the school and the sessions. In this GitHub repository you will find all the resources used during these sessions. The DataLabs sessions will be held in the DataLabs room in the Gather.Town. During the DataLab sessions, there will have live presentations, practical demostrations and tutor avaialble to answer questions. If you miss the live sessions, there are a number of video presentations placed around the DataLabs room in Gather.Town, which you can follow instead; these video presentations are also linked within the specific training notebooks provided in this GitHub repository.
The GitHub repository and the associated instructions in the README.md are written for the training sessions on DataLabs during the Hydro-JULES Summer/Winter School. However, please note that this GitHub repository is freely available to use, even after the Hydro-JULES Summer/Winter School is completed and without the DataLabs access. This repository can be cloned to your own local/remote workspace and you can continue the training sessions. To be able to work on your own wokspace, please make sure you have all the required python packages installed (please see the following operating system requirements for cf-python package). You would also need to have access to the datasets used in the notebooks. Please contact Matt Fry or Amulya Chevuturi to get a copy of the required input data. Please do remember to change the path of the input data in the notebooks, when you use this repository on your own workspace.
In this document we provide a step-by-step guide to:
i. Overview and introduction to DataLabs
ii. Importing the GitHub repository into DataLabs
iii. Introduction to the repository notebooks
DataLabs is a NERC funded UKCEH based cloud based collaborative environment for developing data pipelines and analytical methods, that provides us with a secure area to collaborate with your colleagues around the world. You can refer to this video by Matt Fry, which has a brief introduction to DataLabs.
To use the DataLabs platform, you would need to first get an account with DataLabs and also request access for the HydroJules project on DataLabs (instructions provided in an email sent out before the Hydro-JULES school). After you have your username and password, please follow the steps given below to set up your very own DataLabs lab.
-
Please go to the webpage https://datalab.datalabs.ceh.ac.uk/, and click on the “Log In” button shown with the red arrow.
-
Please log in to the ceh-datalab-webpage (shown with the red arrow) with your created username and password.
-
Click on “Open” tab for the HydroJules Project shown with the red arrow.
-
Please click on “Notebooks” tab (shown with the red arrow), in the left hand side panel under “Analysis” section.
-
Please click on the “Create Notebooks” tab, on the right side of the webpage, shown with the red arrow. This will actually create a JupyterLab for you, which can host multiple notebooks. Fill in the options as per step 6 below.
-
Fill in the following information in the pop up, scroll down if required.
- Display Name – anything that you like
- Type – select “JupyterLab” option
- URL Name – any word (ideally similar to the Display Name without any space, dashes or underscore)
- Data Store to Mount – select “initialhj” option
- Description – a few words describing the JupyterLab
- Sharing Status – select “Private” option
- Assets – please leave empty or do not select any option
Then click on the “Create” tab at the bottom, shown with the red arrow.
-
You should then see a new lab in your “Notebooks” page, with a blue “Requested” tab, shown with the red arrow. Please wait, this should turn into a green “Ready” tab in a few minutes (you may need to refresh your browser window).
-
Please click on the “Open” tab (shown with the red arrow), associated with your newly created lab.
-
This opens a new browser tab with your JupyterLab (shown with a red arrow). Sometimes you may see a “Build Recommended” pop up. If it appears, please click on the “Build” button shown with another red arrow.
-
Once you are in your JupyterLab in the new browser, please scroll completely down (red downward arrow) in the "Launcher" section (left facing red arrow).
-
After you scroll down you can see a "Terminal" tab (red box) in the "Other" section (red arrow) of the launcher. Please double click on the "Terminal" tab.
-
This opens up a new "Terminal" (first red arrow) in your launcher section with a command line prompt (second red arrow)
-
In the terminal, please type or copy and paste (using CTRL C + CTRL V) the following commands (followed by return/enter key). The commands are underlined with red lines in the image below. Please exchange {NAME} in the first command with your jupterlab URL name you created from step 6. You can also find this {NAME} in your JupyterLab web browser address, i.e. the word before the "/lab" (shown using the red box). PLease use the following command with the curly brackets {} for {NAME}.
- cd /data/notebooks/jupyterlab-{NAME}/
- ls
- git clone --recursive https://github.com/hydro-jules/school.git
- ls If the all the commands are successful, the result of the last command should show a "school" directory below the command prompt (shown with a red circle).
-
The previous step imports the Hydro-JULES School GitHub repository to your DataLabs JupyterLab. You will also see a "school" directory apprear in the File Browser panel on the left of your screen (shown with a rectangle box). After the "school" directory shows up in your File Browser panel, please go ahead and close the "Terminal 1" by clicking on the cross button next to it (shown with a red circle).
-
If you double-click on the "school" directory, in the File Browser panel, you will see the "python" directory and a "README.md" file which is the current file you are reading.
-
Within the "python" directory, you find the following three directories:
- cf-python – this directory contains notebooks that have exercises to read, analyse and visualize netCDF4 file using the cf-python python module
- netcdf4 – this directory contains notebook that has exercises to read, analyse and visualize netCDF4 file using the netCDF4 python module
- unifhy – this directory contains training material for learning how to use unifhy python package.
-
All of the training notebooks are available with the three directories mentioned in point 14. You can navigate the directories by double clicking on the directory name on the left hand side panel (red arrow) and you can go back with the back button of your browser (red square) or clicking on any of the directories in the path (red rectangle).
-
Following are brief introductions of the training notebooks available:
- netCDF-examples.ipynb – this notebook shows examples of reading in a netCDF file with the netCDF4 python module and discusses in detail the variables and metadata associated with a file having three dimensions (time, latitude and longitude). It shows different methods of analysing and plotting the netCDF data. A brief introduction to this notebook is given in this video by Matt Fry.
- cfpython_examples.ipynb – this notebook shows examples of reading in a netCDF file with the cf-python python module and discusses in detail the variables and metadata associated with a file having three dimensions (time, latitude and longitude). It shows different methods of analysing and plotting the netCDF data. The notebook also has some exercises at the end (with hints) for user self-evaluation. A brief introduction to this notebook is given in this video by Amulya Chevuturi.
- model-vs-observed_examples.ipynb – this notebook shows examples of how to compare model output against observed data with the model output in netCDF format and the observed data in csv format. It discusses reading different formats of datasets into numpy array and plotting the datasets for comparison. The notebook also has some exercises at the end (with hints) for user self-evaluation. A brief introduction to this notebook is given in this video by Amulya Chevuturi.
- exercise_answers.ipynb - this notebook has answers to the exercise questions given in the previous two notebooks.
- unifhy directory - this is the training module for unifhy model. The directory contain data, notebooks and outputs along with its own README.md.
-
Please note that to be able to run these notebooks successfully on DataLabs, you need to make sure that the environment (or kernel) is set to "hj-38-nompi" (shown using a red circle in the image below). The "hj-38-nompi" environment/kernel has all the required python packages for the training notebooks already installed. The environment/kernel is usually automatically set to "hj-38-nompi" when you open any notebook (generally for the netCDF4 and cf-python notebooks).
However, if your environment is set to any other (generally for the unifhy notebooks), shown as an example using a red circle in the image below, please click on it (red circle), which gives you a drop down menu (red square), which on selection has multiple environment/kernel options; please select the "hj-38-nompi" option (right-handed arrow) and click on "select button" (left-handed arrow). -
More details about running a Jupyter Notebook is provided through the images below: