From cf35de90553ee58f99dc18f6400060ce62faf9f7 Mon Sep 17 00:00:00 2001 From: Default user Date: Tue, 13 Aug 2024 22:26:47 +0000 Subject: [PATCH] Minor changes to text and formatting --- .../cloud-computing/01-cloud-computing.ipynb | 6 +++--- .../cloud-computing/02-cloud-data-access.ipynb | 6 +++--- .../03-cloud-optimized-data-access.ipynb | 13 +++++++------ .../04-cloud-optimized-icesat2.ipynb | 4 ++-- 4 files changed, 15 insertions(+), 14 deletions(-) diff --git a/book/tutorials/cloud-computing/01-cloud-computing.ipynb b/book/tutorials/cloud-computing/01-cloud-computing.ipynb index c3dbd41..a785196 100644 --- a/book/tutorials/cloud-computing/01-cloud-computing.ipynb +++ b/book/tutorials/cloud-computing/01-cloud-computing.ipynb @@ -26,12 +26,12 @@ ":open:\n", "If you have your laptop available, open the terminal app and use the appropriate commands to determine CPU and memory.\n", "\n", - "
\n", + "
\n", "\n", "| Operating System (OS) | CPU command | Memory Command |\n", "|-----------------------|-----------------------------------------------------------------------------------|----------------------------|\n", - "| MacOS | sysctl -a \\| grep hw.ncpu | top -l 1 \\| grep PhysMem |\n", - "| Linux (cryocloud) | lscpu \\| grep \"^CPU\\(s\\):\" | free -h | \n", + "| MacOS | `sysctl -a \\| grep hw.ncpu` | `top -l 1 \\| grep PhysMem` |\n", + "| Linux (cryocloud) | `lscpu \\| grep \"^CPU\\(s\\):\"` | `free -h` | \n", "| Windows | https://www.top-password.com/blog/find-number-of-cores-in-your-cpu-on-windows-10/ | |\n", "
\n", "\n", diff --git a/book/tutorials/cloud-computing/02-cloud-data-access.ipynb b/book/tutorials/cloud-computing/02-cloud-data-access.ipynb index 3fabb67..f5a8b3f 100644 --- a/book/tutorials/cloud-computing/02-cloud-data-access.ipynb +++ b/book/tutorials/cloud-computing/02-cloud-data-access.ipynb @@ -50,7 +50,7 @@ "Navigate [https://search.earthdata.nasa.gov](https://search.earthdata.nasa.gov), search for ICESat-2 and answer the following questions:\n", "\n", "* Which DAAC hosts ICESat-2 datasets?\n", - "* Which ICESat-2 datasets are hosted on the AWS Cloud and how can you tell?\n", + "* How many ICESat-2 datasets are hosted on the AWS Cloud and how can you tell?\n", ":::\n", "\n", "\n", @@ -59,8 +59,8 @@ "Here are a likely few:\n", "1. Download data from a DAAC to your local machine.\n", "2. Download data from cloud storage to your local machine.\n", - "3. Download data from a DAAC to a virtual machine in the cloud (when would you do this?).\n", - "4. Login to a virtual machine in the cloud, like cryointhecloud, and access data directly.\n", + "3. Login to a virtual machine in the cloud and download data from a DAAC (when would you do this?).\n", + "4. Login to a virtual machine in the cloud, like CryoCloud, and access data directly.\n", "\n", "```{image} ./images/different-modes-of-access.png\n", ":width: 1000px\n", diff --git a/book/tutorials/cloud-computing/03-cloud-optimized-data-access.ipynb b/book/tutorials/cloud-computing/03-cloud-optimized-data-access.ipynb index 3f67c2b..86437aa 100644 --- a/book/tutorials/cloud-computing/03-cloud-optimized-data-access.ipynb +++ b/book/tutorials/cloud-computing/03-cloud-optimized-data-access.ipynb @@ -7,7 +7,8 @@ "# Cloud-Optimized Data Access\n", "\n", "
\n", - "Recall from the introduction that cloud object storage is accessed over the network. Local file storage access will always be faster but there are limitations. This is why the design of file formats in the cloud requires more consideration than local file storage.\n", + "\n", + "Recall from the [Cloud Data Access Notebook](./02-cloud-data-access.ipynb) that cloud object storage is accessed over the network. Local file storage access will always be faster but there are limitations. This is why the design of file formats in the cloud requires more consideration than local file storage.\n", "\n", "## 🏋️ Exercise\n", "\n", @@ -48,7 +49,7 @@ "\n", "### An analogy - Moving away from home\n", "\n", - "Imagine when you lived at home with your parents. Everything was right there when you needed it (like local file storage). Let's say you're about to move away to college (the cloud), and you are not allowed to bring anything with you. You put everything in your parent's (infinitely large) garage (cloud object storage). Given you would need to have things shipped to you, would it be better to leave everything unpacked? To put everything all in one box? A few different boxes? And what would be the most efficient way for your parents to know where things were when you asked for them?\n", + "Imagine when you lived at home with your parents. Everything was right there when you needed it (like local file storage). Let's say you're about to move away to college (the cloud), but you have decided to backpack there and so you can't bring any of your belongings with you. You put everything in your parent's (infinitely large) garage (cloud object storage). Given you would need to have things shipped to you, would it be better to leave everything unpacked? To put everything all in one box? A few different boxes? And what would be the most efficient way for your parents to know where things were when you asked for them?\n", "\n", "```{image} ./images/dalle-college.png\n", ":width: 400px\n", @@ -56,11 +57,11 @@ "```\n", "

image generated with ChatGPT 4

\n", "\n", - "You are probably familiar with the following file formats: HDF5, NetCDF, GeoTIFF. You can actually make any of these formats \"cloud-optimized\" by:\n", + "You can actually make any common geospatial data formats (HDF5/NetCDF, GeoTIFF, LAS (LIDAR Aerial Survey)) \"cloud-optimized\" by:\n", "\n", - "1. Separate metadata from data and store it contiguously data so it can be read with one request.\n", - "2. Store data in chunks, so the whole file doesn't have to be read to access a portion of the data.\n", - "3. Make sure chunks of data are not too small, so more data can be fetched with each request.\n", + "1. Separate metadata from data and store it contiguously so it can be read with one request.\n", + "2. Store data in chunks, so the whole file doesn't have to be read to access a portion of the data, and it can be compressed.\n", + "3. Make sure the chunks of data are not too small, so more data is fetched with each request.\n", "4. Make sure the chunks are not too large, which means more data has to be transferred and decompression takes longer.\n", "5. Compress these chunks so there is less data to transfer over the network.\n", "\n", diff --git a/book/tutorials/cloud-computing/04-cloud-optimized-icesat2.ipynb b/book/tutorials/cloud-computing/04-cloud-optimized-icesat2.ipynb index 78bcde3..8f2cc81 100644 --- a/book/tutorials/cloud-computing/04-cloud-optimized-icesat2.ipynb +++ b/book/tutorials/cloud-computing/04-cloud-optimized-icesat2.ipynb @@ -17,7 +17,7 @@ "Recall from [03-cloud-optimized-data-access.ipynb](./03-cloud-optimized-data-access.ipynb) that we can make any HDF5 file cloud-optimized by restructuring the file so that all the metadata is in one place and chunks are \"not too big\" and \"not too small\". However, as users of the data, not archivers, we don't control how the file is generated and distributed, so if we're restructuring the data we might want to go with something even better - a **\"cloud-native\"** format.\n", "\n", ":::{important} Cloud-Native Formats\n", - "Cloud-native formats are formats that were designed specifically to be used in a cloud environment. This usually means that metadata and indexes for data is separated from metadata in a way that allows for logical dataset access across multiple files. In other words, it is fast to open a large dataset and access just the parts of it that you need.\n", + "Cloud-native formats are formats that were designed specifically to be used in a cloud environment. This usually means that metadata and indexes for data is separated from the data itself in a way that allows for logical dataset access across multiple files. In other words, it is fast to open a large dataset and access just the parts of it that you need.\n", ":::\n", "\n", ":::{warning}\n", @@ -73,7 +73,7 @@ "\n", "\n", "gdf = gpd.GeoDataFrame(df, geometry='geometry')\n", - "null_value = gdf['h_canopy'].max() \n", + "null_value = gdf['h_canopy'].max() # can we change this to a no data value?\n", "gdf_filtered = gdf.loc[gdf['h_canopy'] != null_value]\n", "gdf_filtered" ]