Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CTSM Glacier Dataset - mountain glaciers and how to handle floating ice shelves #1406

Closed
renerwijn opened this issue Jun 24, 2021 · 35 comments
Assignees
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science
Milestone

Comments

@renerwijn
Copy link

Background Issue
Over High Mountain Asia discrepancies were found in the glacier elevation distribution that were retrieved from the CTSM/CLM glacier dataset (mksrf_glacier_3x3min_simyr2000.c120926.nc). The discrepancies could in particular be found over the southeastern part of High Mountain Asia where the mean glacier elevation has found to be up to 2000 m lower (i.e., around 3000 m) than the mean CLM grid cell elevation (around 5000 m). This gave a reason to find out where these discrepancies could come from, which were eventually found to be related to 1) inaccuracies in the Randolph Glacier Inventory version 1 glacier outlines (relative to RGI version 6) (see screenshot), and 2) potential smoothing issues in the input topography (GLOBE). In addition, the glacier dataset cover elevations up to 6000 m altitude + one extra bin covering altitudes between 6000 m and 10000 m, whereas the HMA grid works with a 36-EC scheme that covers altitudes up to 7000 m + one extra EC covering altitudes between 7000 m and 10000 m. This means a part of the glaciers (between 6000 and 7000 m altitude) are not well captured by CLM. For this reason, we decided to update the existing CLM glacier dataset.

Afbeelding2

Brief description dataset
The first version of the glacier dataset, developed by Jeremy Fykes, Alex Gardner, and Bill Sacks, uses glacier outlines retrieved from RGI version 1 (Arendt et al., 2012), vector data from the University of Zurich (Rastner et al., 2012) for the Greenland Icesheet, and vector data from the SCAR Antarctic Digital Database for the Antarctic Icesheet. The topography was retrieved from the 30-arcsec GLOBE topography (Hastings et al., 1999).

The updated version of the glacier dataset uses glacier outlines retrieved from RGI version 6 (Arendt et al., 2017). The vector data for the GrIS and AIS are retrieved from the masks of BedMachine version 3 and version 2 (Morlighem et al., 2017,2019), respectively. The 30-arcsec topography and land mask are retrieved from the 30-arcsec GMTED2010. BedMachine and GMTED2010 were chosen to keep the datasets consistent with what is already used for the CAM topography (GMTED) and within CISM (BedMachine).

Workflow
The workflow is comparable with the workflow that was used to develop the first glacier dataset. The workflow (until this point) consists of the following points:

  1. Generation of 30 arcsec polygon grids for every RGI region, the GrIS, and AIS.
  2. Generation of glacier polygon grids for the glaciers specifically. This is performed by joining the RGI/BedMachine glacier/ice sheet outlines with the polygon grids generated in step 1. For those polygon grid cells where outlines overlay the grid cell the percent ice cover is determined.
  3. Rasterizing of the glacier polygon grids in step 2 (i.e., shapefile -> tiff).
  4. Merging the generated tiffs, correction of grid cells where PCT_GLACIER exceeds 100% (i.e., for those grid cells where RGI polygon features overlap), and the generation of 30-arcsec NetCDF files, one for glaciers, and one for the ice sheets.
  5. The generated NetCDF files are used to drape the 30-arcsec ice cover over the equivalent-resolution GMTED2010 to determine the fraction of total land area in a grid cell that is covered by glaciers/icesheets, which occur on an elevation band. To this end, 71 elevation bands with an interval of 100m are created where the last elevation band encompasses the ice at altitudes between 7000 m and 10000 m. Roughly said this step consists of two or three sub steps:
  • a. Correcting the GMTED2010 land mask???
  • b. Selecting the ice cover grid cells that occur on a specific elevation band.
  • c. Multiplying the ice percent cover by the land fraction from the corrected land mask in order to get the total land fraction per grid cell that is covered by ice.
  1. Regridding the 30-arcsec ice cover/topography datasets to 3-minute resolution by conservative weighting. Three datasets are created:
  • 3-minute fractional areal land ice coverage (incl. all ice cover).
  • 3-minute distributions of areal glacier fractional coverage by elevation.
  • 3-minute distributions of ice sheet fractional coverage by elevation.

Issues
In step 5a I mention the correction of the land mask. The land mask of GMTED2010 encompasses 2 possible values/flags: (1) for land and (0) for oceans, seas, and lakes. The generated 30-arcsec ice cover datasets contain the percent ice cover, including those of the floating ice shelves surrounding the GrIS and AIS. The grid cells containing floating ice shelf overlay land mask grid cells that are flagged as ocean. In order to get the total land fraction per grid cell covered by ice I need to multiply the ice cover datasets by the land mask. However, this becomes an issue for the floating ice shelves where the land mask is equal to 0. Therefore, I presume a correction of the land mask is required. My questions are as follows:

  1. What is the best way to deal with floating ice shelves? Just “updating” the land mask by changing the flags from ocean into land for those grid cells where floating ice/land ice is present? Or via another way? I have understood from the glacier datasets description of Jeremy Fykes that grid cells flagged as land-ice (incl. floating ice) but as ocean in the land mask were designated as land-ice with an elevation of 0 meter.
  2. If correction of the land mask is required, what is the right moment to do that? Before regridding from 30 arcsec to 3 minutes as in step 5a or after the regridding (so that would be in sequence: step 5b - step 6 - step 5a – step 5c)?

Goal
The goal is to use the updated glacier dataset as input for the HMA simulations. Further the updated glacier dataset can serve as an update in general for the input datasets required in the CTSM code base.

People involved: @whlipscomb @gunterl @adamrher

@billsacks billsacks added tag: enh - new science enhancement new capability or improved behavior of existing capability labels Jun 24, 2021
@billsacks
Copy link
Member

@renerwijn Thank you very much for taking this on, and for your detailed description. I know that this is a side-track from the main work you're trying to do, so I really appreciate the time you have put into this.

The issue of how to handle land mask always makes my head hurt, especially for PCT fields like this. I think every time I think about this, I come up with a different answer of what's "right". That said, my current thinking is that you probably don't want to do any adjustment of the pct glacier based on land mask. i.e., I'm not sure if this

In order to get the total land fraction per grid cell covered by ice I need to multiply the ice cover datasets by the land mask.

is right. But this might warrant a Zoom discussion.

I'm not sure which version of the Gardner-Fyke documentation you have read, but I just found a "full" version, which compared with the main version had these additional bullet points under the list of 3-minute datasets:

  • 3-minute grid of topography, where grid-point topographic heights represent the average GLOBE land elevation within the 3-minute cell (excluding ocean points)
  • 3-minute land-sea mask (set to land value if any GLOBE land point exists within the 3-minute cell, or if there is any land ice in the 3-minute cell)

The second bullet point seems partly relevant to your question, though it may not completely answer it.

Let me know if you'd like to schedule a Zoom call. My schedule is pretty open next week.

@billsacks billsacks added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jun 24, 2021
@whlipscomb
Copy link

@renerwijn, thanks so much for initiating this conversation and including all the details. Like @billsacks, my instinct would be not to adjust pct_glacier based on land mask, but I'm not sure I've thought through the issue completely.

I had a question about the Greenland data set. The latest version of Mathieu Morlighem's BedMachine data is version 4, which came out in the past few months: https://nsidc.org/data/IDBMG4. Was there a reason why you choose version 3 instead of version 4?

@renerwijn
Copy link
Author

@billsacks Thanks for the reply. A Zoom call sounds like a good plan. What about next week Monday or Tuesday (9 AM or 5 PM MDT)?

In order to get the total land fraction per grid cell covered by ice I need to multiply the ice cover datasets by the land mask.

I agree with you this is a good point to discuss. I guess since the land mask only has binary values (0 or 1) it cannot hurt to multiply the ice cover with the land mask since the ice cover will not change (i.e. as long the land mask indicate there is land of course). But let's discuss about it more. About the full description. I/We didn't have a full version of Jeremy Fykes description. I was wondering whether you can share the full description with us?

@whlipscomb Good point. I have checked which file I used an it is indeed version 4 instead of version 3. So I have used the most recent update. I guess I got confused since at the website of the ice sheet modeling group of Mathieu Morlighem (https://sites.uci.edu/morlighem/dataproducts/bedmachine-greenland/) they write about BedMachine Greenland v3. But via there website there is a link to NSIDC (https://nsidc.org/data/IDBMG4) (same link as yours) where I downloaded the source data, i.e. BedMachine Greenland v4. Maybe the website of ice sheet modelling group has not been updated yet and that's the reason why version 3 kept stuck in my mind.

@billsacks
Copy link
Member

A few summary points from today's discussion:

  1. It feels right to not multiply/divide by landmask when regridding the 30-second data to 3-minutes. For example, if a 3' cell is made up of 1/2 cells with glacier cover (landmask = 1) and 1/2 cells with landmask = 0, then we want the final glacier cover in the resulting 3' cell to be 50%, not 100%. One way to think about this is: if we generate a final surface dataset at exactly this 3' resolution, then we would want the glacier cover in this cell to be 50% (with the remaining 50% wetland): we would not want this grid cell to end up as 100% glacier.
  2. To create the landmask at 3': Conservatively regrid the 30" landmask to 3'. Anywhere where the 3' landmask is > 0, set it to 1. Then, anywhere where there is any glacier cover, also set the landmask to 1 (to handle points on ice shelves).

@adamrher
Copy link
Contributor

adamrher commented Jun 30, 2021

Thanks for summarizing that so concisely Bill. It's much clearer when you put it that way.

During our meeting, my final mapping file completed (AIS bedmachine->30arcsec), and so I now have all the ESMF conserve mapping weights required to complete this work, and will list them here for reference.

30arcsec -> 3x3min wgts:
/glade/scratch/aherring/for_rene/wgt_files/map_30arcsec_nomask_to_3x3min_nomask_aave_da_c210623.nc

GriS BedMachine -> 30arcsec:
/glade/scratch/aherring/for_rene/wgt_files/map_BedMachineGreenland_to_30arcsec_aave_da_c210625.nc

AIS BedMachine -> 30arcsec
/glade/scratch/aherring/for_rene/wgt_files/map_BedMachineAntarcticaRotate2RotateBack_to_30arcsec_aave_da_c210630.nc

I went ahead an mapped the BedMachine data using the above wgts:
/glade/scratch/aherring/for_rene/datasets/BedMachineGreenland-2021-04-20.map_TO_30arcsec.nc
/glade/scratch/aherring/for_rene/datasets/BedMachineAntarcticaRotate2RotateBack_2020-07-15_v02_lonshift.map_TO_30arcsec.nc

As we discussed I did some jiu-jitsu to generate the AIS wgts (hence the Rotate2RotateBack modifier in the filenames), but it seems to have worked. The grid rotations are analytical so I was able to recover the lat-lon's of the initial grid exactly. Here is what the mask looks like on the 30arcsec grid:

Screen Shot 2021-06-30 at 10 44 18 AM

And here's the surface elevations:

Screen Shot 2021-06-30 at 10 43 35 AM

You'll notice that weird white line going through the center of the grid. This seems to always show up when I do grid rotations, and I actually don't know what it is, but it doesn't have any values associated with it. @renerwijn, let me know if this becomes a problem when your stitching this into the gmted dataset, and I can try another method for generating the AIS mapping weights.

[edit - the white lines seem to be a plotting artifact; there are valid values in the array that coincide with these white lines.]

@billsacks billsacks added this to the ctsm5.2.0 milestone Jul 1, 2021
@billsacks billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Jul 1, 2021
@renerwijn
Copy link
Author

renerwijn commented Jul 2, 2021

@billsacks Thanks for the summary. It will help me to finish the job.

@adamrher Thanks for the mapping weights and remapped BedMachine data. About the white line in the surface topography, I think I can work around it. Maybe I can use the GMTED topography to fill up the white line area (I assume it is missing data). I will come back to you once I have an issue or completed the job.

[edit - I didn't see your edit. In that case I don't think it will be a problem and can just work with the data]

@renerwijn
Copy link
Author

renerwijn commented Jul 8, 2021

@billsacks @adamrher @whlipscomb @gunterl
I have managed to create the final glacier dataset and followed the steps we discussed. The final glacier dataset can be found here:
/glade/scratch/wijngaard/glacier_dataset/glacier2netcdf/mksrf_glacier_3x3min_simyr2000.c20210707.nc

Please check and let me know if something needs to be changed. Btw I have added one extra variable called TOPO, which represents the surface topography. Further I could not exactly remember what we agreed, but through the process to generate this dataset I have stitched in the BedMachine mask into the land mask as well (like the topography).

@billsacks I have a few questions concerning the generation of the CTSM surface datasets.

  1. With the new dataset I gave it a try to generate the surface dataset. In order to do that the first thing I had to do is change the filename of the new glacier dataset to the filename of the old dataset so the CTSM tools are able to find the glacier dataset (i.e. mksrf_glacier_3x3min_simyr2000.c120926.nc instead of mksrf_glacier_3x3min_simyr2000.c20210707.nc). So I was wondering whether there is some script or namelist where I can change/update the filename?

  2. Is it necessary to make a new SCRIP file and mapping file prior to generating the surface datasets? I assume not, since the grid information hasn't changed, but would like to check it just to be sure.

  3. My last question is related to the land mask and what you have explained about how surface datasets are generated. When I run the script for generating the surface datasets (i.e. in this case ./mksurfdata.pl -res usrspec -usr_gname HMACUBIT -usr_gdate 200810 -y 1979 -glc_nec 36 -dinlc ) I ran into an error (see screenshot below). My suspicion is that the land mask we use does not match with the land mask that is used to define the other land units (e.g. wetlands, lakes), which eventually causes a mismatch and subsequently this error. For example I noticed that first the CTSM land mask (domain mask) is read before reading the glacier dataset (and accompanying land mask (grid mask)). The domain mask and the grid mask have a different non-zero number of elements and have grid cells where it is ocean in one mask and land in the other. Is there a way to solve this issue?

Afbeelding1

[edit - I will complete the README and work flow description later]

@whlipscomb
Copy link

@renerwijn – Thanks for the update. This is a beautiful dataset.

When I was plotting some HMA glacier percentages, I noticed a possible double-counting issue. In general, we should have pct_glacier equal to the sum over k of pct_glc_gic, and in nearly all grid cells this is true to within roundoff. But in cell (5406,2364), we have pct_glacier = 100.0, while the sum over k of pct_glc_gic = 104.2. Could you please take a look at this?

@renerwijn
Copy link
Author

@whlipscomb - Thanks for pointing this out.
I already found out what went wrong. Apparently I used an old version of the 30 arcsec glacier fractions instead of the new one with the capped glacier fractions (I explained before that I had to cap the glacier fractions higher than 100% to 100% due to overlapping geometry features in the RGI, which causes some double counting). This is my bet. Will run the script again with the updated file tomorrow.

@adamrher
Copy link
Contributor

adamrher commented Jul 8, 2021

Is it necessary to make a new SCRIP file and mapping file prior to generating the surface datasets? I assume not, since the grid information hasn't changed, but would like to check it just to be sure.

So I'm thinking you may want to make a new mapping file. I am unclear about the "mask" / "nomask" options when running ESMF_RegridWeightGen in mkmapdata.sh. @ekluzek is this mask option used when generating the mksrf_glacier_3x3min -> target_grid mapping weights? If so, is it taking the mask from grid_imask in one of the SCRIP grid files, or maybe even the LANDMASK variable in the glacier dataset? As you can see I'm confused ... I'm trying to figure out this error Rene is getting, and whether it's because he is recycling the old mapping weights file:

My last question is related to the land mask and what you have explained about how surface datasets are generated. When I run the script for generating the surface datasets (i.e. in this case ./mksurfdata.pl -res usrspec -usr_gname HMACUBIT -usr_gdate 200810 -y 1979 -glc_nec 36 -dinlc ) I ran into an error (see screenshot below). My suspicion is that the land mask we use does not match with the land mask that is used to define the other land units (e.g. wetlands, lakes), which eventually causes a mismatch and subsequently this error. For example I noticed that first the CTSM land mask (domain mask) is read before reading the glacier dataset (and accompanying land mask (grid mask)). The domain mask and the grid mask have a different non-zero number of elements and have grid cells where it is ocean in one mask and land in the other. Is there a way to solve this issue?

@billsacks billsacks added next this should get some attention in the next week or two. Normally each Thursday SE meeting. and removed next this should get some attention in the next week or two. Normally each Thursday SE meeting. labels Jul 8, 2021
@whlipscomb
Copy link

@renerwijn – Thanks for the quick reply on the glacier double-counting issue. It makes sense that this would be related to the RGI overlaps.

@billsacks
Copy link
Member

So I was wondering whether there is some script or namelist where I can change/update the filename?

Yes, you can change the file name in CTSM's bld/namelist_files/namelist_defaults_ctsm_tools.xml – look for mksrf_fglacier in that file.

But we can also do that as a last step once everyone is happy with how this looks.

Regarding questions (2) and (3) from #1406 (comment), they are closely connected, and like too many other things, the answer is a little more complex than you might hope....

The answer to this depends on whether you are using a recent version of CTSM (ctsm5.1.dev040 or later). In ctsm5.1.dev040, we simplified the process for bringing in new datasets that have different masks. Before that tag, if you introduce a data file with a new mask, you also need to create a new SCRIP grid file with that mask (see the reference to file lnd/clm2/mappingdata/grids/SCRIPgrid_3minx3min_GLOBE-Gardner_c120922.nc in namelist_defaults_ctsm_tools.xml), and then you need to create a new mapping file from that scrip grid file to the model resolution. Starting in ctsm5.1.dev040, however, we apply masking on the fly in mksurfdata_map, so as long as you are using an existing raw data resolution (which I believe you are, if you're using a standard 3-minute grid), then those steps aren't needed.

Based on the error you're getting, I'm guessing you're using a CTSM tag earlier than ctsm5.1.dev040. Things will be easier if you update to a recent version and run mksurfdata_map from there. The downside is that there will be answer changes in other surface dataset fields due to the new method. The changes should be relatively small (unless there are other changes that have happened to mksurfdata_map in the intervening time that I'm not remembering). You will also be an early user of the new method, so it's possible that you'll run into other issues, but I'm happy to help if you do. I'm not sure how important it is that you avoid changes in other fields for this work. One approach could be: try with the new tag, then compare all fields in the new and old surface dataset to see the magnitude of changes (we have a tool to do this quickly if you don't), and reconsider if we're seeing too large changes in other fields.

Let me know how you'd like to proceed.

@billsacks
Copy link
Member

By the way, sorry for not bringing this to your attention earlier. I sometimes forget that people aren't always using the bleeding-edge version of the code....

@renerwijn
Copy link
Author

@billsacks Thanks, I managed to install the newest tag (ctsm5.1.dev047). I will give it a try over the weekend or after the weekend.

@whlipscomb @adamrher @gunterl @billsacks
I have updated the glacier dataset. As far I can see there are no bugs anymore in the dataset. Please let me know if you find one. The updated dataset is in the same aforementioned directory, but the name is mksrf_glacier_3x3min_simyr2000.c20210708.nc

@whlipscomb
Copy link

@renerwijn – The glacier dataset looks fantastic. I'm so glad to have an updated version.

Are you planning to make a document describing the steps used to make the dataset? As we've learned, this can be very helpful when it's time for a future update.

I was wondering if it would make sense to add this data set to the DASH repository and create a doi, with you as the lead author. Then the data set would be easily citable, and you're more likely to receive due credit. DASH info here: https://dashrepo.ucar.edu/

@renerwijn
Copy link
Author

@whlipscomb - Yes I am planning to make a document describing the steps used to make the dataset. As a matter of fact I already have a README file. I can use that one to finish the dataset description and its workflow. About the DASH repository, I guess that is a good idea, but maybe it is better to discuss which files need to be preserved. During my last meeting with @adamrher @billsacks @gunterl there was already some discussion on this point.

@billsacks - As I already mentioned I was able to install the newest version of CTSM and its tools (ctsm5.1.dev047). I was able to create the new mapping files and subsequently to run ./mksurfdata.pl -res usrspec -usr_gname HMACUBIT -usr_gdate 210712 -y 1979 -glc_nec 36. This time the aforementioned issue did not pop up, however I ran into another issue. It seems while processing the mkglacierregion the mapping areas are not conserved (see screenshot). Do you have an idea how this issue can be solved?
Schermafbeelding 2021-07-12 om 21 38 17

@whlipscomb
Copy link

@renerwijn – Sounds good. When your dataset description is done, we can talk about what's appropriate to submit to DASH. For now, I think the highest priority is to resolve the mapping issue and then restart the HMA runs with the new glacier dataset. Thanks!

@billsacks
Copy link
Member

@renerwijn regarding the mapping error: This is weird: it's saying that areas on the output grid are 0 and on the input grid are negative. How did you create this mapping file – using CTSM's mkmapdata or some other tool? It seems strange that you're hitting this error for glacier region but not for other fields. Did you use the same process for all of them?

I wonder: If you temporarily delete these lines:

call mkglacierregion (ldomain, mapfname=map_fglacierregion, &
datfname=mksrf_fglacierregion, ndiag=ndiag, &
glacier_region_o = glacier_region)

is this same error encountered for any other fields?

Let me know if you'd like to schedule a time to talk more about this, if you remain stuck.

@adamrher
Copy link
Contributor

@billsacks right now we have the HMA grid installed in a sandbox that uses tag ctsm1.0.dev105. Is that going to be a problem if we use surface datasets generated from this more recent ctsm5.1.dev047?

@billsacks
Copy link
Member

right now we have the HMA grid installed in a sandbox that uses tag ctsm1.0.dev105. Is that going to be a problem if we use surface datasets generated from this more recent ctsm5.1.dev047?

I think that should be okay, but I'm not 100% sure. @ekluzek do you know? If you want to return to the original attempt of using ctsm1.0.dev105 to generate the surface dataset (which might make sense anyway, given the new error @renerwijn is getting), then that should work as long as the scrip grid file you create for the 3min glacier dataset has a mask field that agrees with the mask on the dataset. It has been a while since I've created a new raw dataset, so I may be forgetting some details, but I can try to help with this if you need help (though I may be out in the second half of this week).

@adamrher
Copy link
Contributor

adamrher commented Jul 12, 2021

then that should work as long as the scrip grid file you create for the 3min glacier dataset has a mask field that agrees with the mask on the dataset. It has been a while since I've created a new raw dataset

I gave this a shot just to increase Rene's options. I used the ncl routine rectilinear_to_SCRIP and included landmask from the new 3x3 glacier dataset w/ this option: https://www.ncl.ucar.edu/Document/Functions/ESMF/rectilinear_to_SCRIP.shtml#GridMask. It's here: /glade/scratch/aherring/for_rene/scrip_files/mksrf_3x3min_lmask_scrip.nc

In the meantime I'm running mkmapdata scripts using ctsm5.1.dev047 to see if I can't at least generate the default fsurdat for the HMA grid (not using the new glacier dataset). So far it's working. I'll update this thread with whether I'm successful or not.

@adamrher
Copy link
Contributor

@renerwijn I was able to generate an HMACUBIT fsurdat file using ctsm5.1.dev047. I did not use the new glacier dataset or the modified glacier region file w/ the HMA glacier region. The only issue I had is I needed to modify mkmapdata.sh since it didn't recognize the CSMDATA variable (maybe the asterisk after cheyenne cheyenne* when it's trying to inquire the hostname?) ... So I hard code the cheyenne path at the top of the script: CSMDATA=/glade/p/cesm/cseg/inputdata

If you want to see my regridbatch.sh or other files in the mkmapdata and msurfdata_map directories, you can peruse them here: /glade/scratch/aherring/for_rene/ctsm5.1.dev040/tools/

@renerwijn
Copy link
Author

@billsacks @adamrher @whlipscomb @gunterl

@billsacks @adamrher - Thanks for the different solutions. Actually I also used regridbatch.sh to generate the mapping files when using ctsm5.1.dev047. The only thing I did different is that created an environmental variable for CSMDATA that points to the Cheyenne path. I was able to produce the surface datasets as well with the old files but not with the new ones.

In parallel I installed an older version ctsm1.0.dev105 as well. There I created the mapping files using the scrip file made by @adamrher amongst others. After that I was able to generate the surface dataset successful. The dataset can be found in

/glade/scratch/wijngaard/glacier_dataset/glacier2netcdf/surfdata_HMACUBIT_hist_78pfts_CMIP6_simyr1979_c210713.nc

I already checked the new version on its mean elevation differences between glaciers and the grid cells. I have attached one screenshot below with the elevation differences between the mean glacier elevation and mean grid cell elevation. As you can see the update did its work and the discrepancies that were largely there in the southeastern part of High Mountain Asia have disappeared.

Afbeelding3

@adamrher
Copy link
Contributor

I am super happy with this new dataset! Thank you so much, Rene. I think we should focus now on resuming our HMA runs that uses the older ctsm tag ... let's chat over email on how best to proceed with this work.

I was able to produce the surface datasets as well with the old files but not with the new ones.

I believe this is the final hurdle to closing this issue; getting the new glacier dataset to work with the new mkmapdata code avail in ctsm5.1dev040, or later. I may be able to look at this later this week, but since we are both mere users of the tools (i.e., mortals) our debugging abilities are limited and we may have to punt to next week when Bill Sacks is back, or until we can get the attention of Erik.

In terms of providing documentation, why don't you start by pasting your README into a google doc and share it so we can all comment on it? One issue we have to figure out is what to do with all these massive 50-100GB intermediate files needed to generate the final product. Ideally we can just provide the scripts to reproduce everything so we don't need to keep all these files around.

@renerwijn
Copy link
Author

Today I have created a document and made a draft for the dataset description and the README. I have send you an invitation to edit/read the document. What concerns the amount of data, I have made several compressed files (*tar.gz) including the input glacier/ice sheet outlines, the polygon grids, the glacier grids, and data required for generating the final dataset. That reduced the amount of space needed significantly. Including the final dataset, but excluding the conservative weighting map, the amount of storage is now 29GB (i.e. 15 GB for the final dataset and 14GB for the remaining files).

@billsacks
Copy link
Member

@ekluzek - this final dataset is in place at /glade/p/cesmdata/cseg/inputdata/lnd/clm2/rawdata/mksrf_glacier_3x3min_simyr2000.c20210708.nc . I'm assigning you to this because I assume you'll be the one to do the final piece of bringing this in as the new default, along with other surface dataset updates for ctsm5.2. I have not imported this to the svn repository, because I figured we should make sure everything looks right with a surface dataset generated with this new dataset before importing it. Note that @renerwijn @adamrher and others have looked closely at this, but they have not looked at a surface dataset generated with the latest version of CTSM, because they ran into problems with this latest version. So, when we get to the point of generating new surface datasets, it would be good to get someone to do a quick eyeball check to make sure that this field still looks right with the new surface dataset generation process.

@billsacks
Copy link
Member

Actually, I have gone ahead and tried generating a surface dataset out of latest ctsm master with the new mksrf_glacier file, by running ./mksurfdata.pl -d -no-crop -vic -glc_nec 10 -y 2000 -res 0.9x1.25 and then changing the mksrf_glacier file. This worked, and gave the following differences from the out-of-the-box version of that dataset:

 RMS PCT_NATVEG                       1.6275E+00            NORMALIZED  6.1765E-02
 RMS PCT_CROP                         4.4870E-03            NORMALIZED  1.8552E-03
 RMS PCT_NAT_PFT                      7.9285E-05            NORMALIZED  1.1893E-05
 RMS PCT_WETLAND                      1.6067E+00            NORMALIZED  2.7689E-02
 RMS PCT_LAKE                         2.0227E-03            NORMALIZED  3.5079E-03
 RMS PCT_GLACIER                      2.2871E+00            NORMALIZED  1.8379E-01
 RMS PCT_GLC_MEC                      3.7314E+00            NORMALIZED  3.7314E-01
 RMS TOPO_GLC_MEC                     1.7201E+01            NORMALIZED  1.1700E-02
 RMS PCT_URBAN                        6.7204E-19            NORMALIZED  1.1065E-17

I examined the diffs and/or actual fields (in old & new) for these variables, and this all looks reasonable. I haven't looked carefully, but from some quick looks nothing jumps out as problematic.

If anyone else wants to look and sign off on this, the files are here: /glade/work/sacks/ctsm_code/ctsm/tools/mksurfdata_map: see surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_c210721.nc (original), surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_c210721_NEW.nc and surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_c210721_DIFF.nc (new minus original).

@slevis-lmwg
Copy link
Contributor

I have now confirmed that the new mksurdata_esmf has NOT been pointing to the new glacier dataset. I will make the change.

@slevis-lmwg
Copy link
Contributor

Does switching to the new glacier dataset eliminate the need for both old datasets, the standard and the mergeGreenland?

@billsacks
Copy link
Member

Good question. Yes, I'm pretty sure the plan was for this one dataset to replace both of the old ones. I think we were going to drop support for the mergeGreenland (aka mergeGIS or merge_gis) option, because I don't think it was actually being used.

@adamrher
Copy link
Contributor

Yes, I'm pretty sure the plan was for this one dataset to replace both of the old ones.

I also recall this conversation from Bill, that we couldn't think of a good reason to keep both around.

Note that this dataset will only be used over Greenland when running with a stub GLC. When CISM is active (even in NOEVOLVE), the ice mask is currently passed to CLM via CISM, which overrides the ice mask in the surfdat file. Since this new glacier dataset uses the same ice mask as used to initialize CISM (BedMachine), there should be very little difference (if any) between these two separate configurations.

@slevis-lmwg
Copy link
Contributor

This issue was addressed for ctsm5.2 purposes in #1732
Is that sufficient for closing the issue?

@adamrher
Copy link
Contributor

adamrher commented Oct 5, 2022

Looking at the PR you linked, the mksurf namelists has replaced the 3x3min glacier dataset with:

<data_filename>lnd/clm2/rawdata/mksrf_glacier_3x3min_simyr2000.c20210708.nc</data_filename>

Which is the new dataset created in this issue. Is it OK to close this issue before all the surfdat files have been re-generated w/ the new datasets, or has that been done already?

@slevis-lmwg
Copy link
Contributor

Is it OK to close this issue before all the surfdat files have been re-generated w/ the new datasets, or has that been done already?

We have not, yet, generated all the fsurdat files with the new raw data.

@slevis-lmwg
Copy link
Contributor

@ekluzek and @slevisconsulting decided that this issue can be closed before generating all the new fsurdat files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability science Enhancement to or bug impacting science
Projects
No open projects
Development

No branches or pull requests

7 participants