Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netCDF Front End for Cloud Model Simulations #2176

Merged
merged 57 commits into from
Oct 7, 2020
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
0162faf
patched back in our API additions after pulling from master
keltonhalbert Sep 21, 2020
58126a4
fix grid_geometry_handler import, style fixes
chrishavlin Sep 21, 2020
922a4e6
more style fixes
chrishavlin Sep 21, 2020
de1b58a
fixed the units for simulated reflectivity
keltonhalbert Sep 21, 2020
30e6253
empty commit for tests
keltonhalbert Sep 21, 2020
5847686
empty commit for tests
keltonhalbert Sep 21, 2020
48505d0
fixed merge conflictthat shouldn't have existed in the first place?
keltonhalbert Sep 21, 2020
c9d48dd
remove copyright from nc4 frontend files
munkm Sep 21, 2020
66b4e03
added the staggered velocity variables
keltonhalbert Sep 21, 2020
378c0b4
Merge branch 'nc4-griddata' of github.com:keltonhalbert/yt into nc4-g…
keltonhalbert Sep 21, 2020
42b6abf
moved xarray to on_demand_imports
chrishavlin Sep 21, 2020
4106814
Merge branch 'nc4-griddata' of https://github.com/keltonhalbert/yt in…
chrishavlin Sep 21, 2020
15ac1bf
updated the is_valid method to have a more meaningful check. This che…
keltonhalbert Sep 21, 2020
7fe75e9
added the engine kwarg
keltonhalbert Sep 21, 2020
96306e8
stupid typos...
keltonhalbert Sep 21, 2020
dc181b2
tested and working implementation of _is_valid now properly checks co…
keltonhalbert Sep 21, 2020
df98c1f
some dummy tests copied from the FLASH frontend
keltonhalbert Sep 21, 2020
11b8c9a
add xarray as optional dependency
munkm Sep 21, 2020
e0bcc5d
sorta maybe adding tests?
keltonhalbert Sep 21, 2020
306cf09
uncommitted changes to data structures
keltonhalbert Sep 21, 2020
c448a7b
Merge branch 'nc4-griddata' of github.com:keltonhalbert/yt into nc4-g…
keltonhalbert Sep 22, 2020
58149e1
pulling in appveyor update
chrishavlin Sep 22, 2020
557621b
Updated some of the data_structures file to have new metadata info in
keltonhalbert Sep 22, 2020
04aeb6c
Merge branch 'nc4-griddata' of github.com:keltonhalbert/yt into nc4-g…
keltonhalbert Sep 22, 2020
b58ead2
adjusting _is_valid
chrishavlin Sep 22, 2020
0fc8367
removed unnecessary imports and image test
keltonhalbert Sep 22, 2020
bd5abb5
updated the file to the testfile provided by Leigh
keltonhalbert Sep 22, 2020
f3dff05
added storage of global metadata in parameters dict
keltonhalbert Sep 22, 2020
e1914b3
added very simple test of cm1 output version
keltonhalbert Sep 22, 2020
1dc8694
I promise I know how the handlers work
keltonhalbert Sep 22, 2020
f13bf04
is_valid update
chrishavlin Sep 22, 2020
fa1ee32
style fixes
chrishavlin Sep 22, 2020
214de7f
I had untracked changes apparently and need to pull
keltonhalbert Sep 23, 2020
e19de77
Merge branch 'nc4-griddata' of github.com:keltonhalbert/yt into nc4-g…
keltonhalbert Sep 23, 2020
48aed38
added default MKS units system
keltonhalbert Sep 23, 2020
3f49b4c
Made recommended changes from Matt that helps fix the issues with
keltonhalbert Sep 23, 2020
2f32253
working answer test (locally)
chrishavlin Sep 23, 2020
f20ab6b
answer test update
chrishavlin Sep 23, 2020
2ea2cdd
test filename update
chrishavlin Sep 23, 2020
cc4f460
add check and message for cm1 vs cm1_lofs
chrishavlin Sep 23, 2020
fe74465
answer test update, tests.yaml bump
chrishavlin Sep 23, 2020
4ed2d23
open with netcdf and xarray safely
munkm Sep 23, 2020
991497a
style fixes
munkm Sep 23, 2020
d90a8b2
switch to netcdf4 loader
chrishavlin Sep 25, 2020
dfea412
set NetCDF$FileHandler file mode to r always
chrishavlin Sep 25, 2020
80c6a20
remove xarray from on demand imports
munkm Sep 25, 2020
eef55ac
move the netcdf4 import to isvalid try block
chrishavlin Sep 25, 2020
dec3f0b
ensure netcd4 import before h5py
chrishavlin Sep 25, 2020
860fa19
Apply quick suggestions from code review
chrishavlin Sep 30, 2020
5c423ba
incorporating more suggested changes
chrishavlin Sep 30, 2020
0932088
switched netcdf handle to _handle instead of ds
chrishavlin Sep 30, 2020
1aada12
add kwargs to nc4 file handler, commenting
chrishavlin Sep 30, 2020
11116a2
remaining suggestions
chrishavlin Oct 2, 2020
58bca15
inline comment noting xh,yh,zh definition
chrishavlin Oct 2, 2020
53e6162
removing empty files
chrishavlin Oct 2, 2020
689cdf6
Apply suggestions from code review
chrishavlin Oct 5, 2020
12166da
need to also catch ImportError in is_valid
chrishavlin Oct 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 6 additions & 39 deletions appveyor.yml
Original file line number Diff line number Diff line change
@@ -1,48 +1,15 @@
# AppVeyor.com is a Continuous Integration service to build and run tests under
# Windows

environment:

global:
PYTHON: "C:\\Miniconda36-x64"

matrix:
- PYTHON_VERSION: "3.8"
# This is a dummy configuration file that *does not run any test*.
# The file should be deleted, and Appveyor deactivated as soon as all
# opened PRs have this configuration merged in.
# SEE PR#2908 (https://github.com/yt-project/yt/pull/2908/files)

platform:
- x64

install:
- "if not exist \"%userprofile%\\.config\\yt\" mkdir %userprofile%\\.config\\yt"
- "echo [yt] > %userprofile%\\.config\\yt\\ytrc"
- "echo suppressStreamLogging = True >> %userprofile%\\.config\\yt\\ytrc"
- "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"
- "copy tests\\matplotlibrc ."

# Install the build and runtime dependencies of the project.
# Create a conda environment
- "conda update -q --yes conda"
- "conda create -q --yes -n test python=%PYTHON_VERSION%"
- "activate test"
- "echo 'Not installing anything.'"

# Check that we have the expected version of Python
- "python --version"

# Install specified version of numpy and dependencies
- "conda install --yes -c conda-forge numpy scipy nose pytest setuptools ipython git unyt
Cython sympy fastcache h5py matplotlib=3.1.3 mock pandas cartopy conda-build pooch pyyaml
nose-timer pykdtree"
# install yt
- "pip install -e ."

# Not a .NET project
build: false

test_script:
- "nosetests --with-timer --timer-top-n=20 --nologcapture --with-xunit -sv --traverse-namespace yt"

# Enable this to be able to login to the build worker. You can use the
# `remmina` program in Ubuntu, use the login information that the line below
# prints into the log.
#on_finish:
#- ps: $blockRdp = $true; iex ((new-object net.webclient).DownloadString('https://raw.githubusercontent.com/appveyor/ci/master/scripts/enable-rdp.ps1'))
- "echo 'Not testing anything. This is done on Travis'"
3 changes: 3 additions & 0 deletions tests/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,9 @@ answer_tests:

#local_particle_trajectory_001:
# - yt/data_objects/tests/test_particle_trajectories.py

local_nc4_cm1_000: # PR 2176
- yt/frontends/nc4_cm1/tests/test_outputs.py:test_cm1_mesh_fields

other_tests:
unittests:
Expand Down
1 change: 1 addition & 0 deletions yt/frontends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
"halo_catalog",
"http_stream",
"moab",
"nc4_cm1",
"open_pmd",
"owls",
"owls_subfind",
Expand Down
6 changes: 6 additions & 0 deletions yt/frontends/nc4_cm1/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""
API for yt.frontends.nc4_cm1



"""
10 changes: 10 additions & 0 deletions yt/frontends/nc4_cm1/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""
API for yt.frontends.nc4_cm1



"""

from .data_structures import CM1Dataset, CM1Grid, CM1Hierarchy
from .fields import CM1FieldInfo
from .io import CM1IOHandler
221 changes: 221 additions & 0 deletions yt/frontends/nc4_cm1/data_structures.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
import os
import stat
import weakref
from collections import OrderedDict

import numpy as np

from yt.data_objects.index_subobjects.grid_patch import AMRGridPatch
from yt.data_objects.static_output import Dataset
from yt.geometry.grid_geometry_handler import GridIndex
from yt.utilities.file_handler import NetCDF4FileHandler, warn_netcdf
from yt.utilities.logger import ytLogger as mylog

from .fields import CM1FieldInfo


class CM1Grid(AMRGridPatch):
_id_offset = 0

def __init__(self, id, index, level, dimensions):
super(CM1Grid, self).__init__(id, filename=index.index_filename, index=index)
self.Parent = None
self.Children = []
self.Level = level
self.ActiveDimensions = dimensions

def __repr__(self):
return f"CM1Grid_{self.id:d} ({self.ActiveDimensions})"


class CM1Hierarchy(GridIndex):
grid = CM1Grid

def __init__(self, ds, dataset_type="cm1"):
self.dataset_type = dataset_type
self.dataset = weakref.proxy(ds)
# for now, the index file is the dataset!
self.index_filename = self.dataset.parameter_filename
self.directory = os.path.dirname(self.index_filename)
# float type for the simulation edges and must be float64 now
self.float_type = np.float64
super(CM1Hierarchy, self).__init__(ds, dataset_type)

def _initialize_state_variables(self):
super(CM1Hierarchy, self)._initialize_state_variables()
self.num_grids = 1
neutrinoceros marked this conversation as resolved.
Show resolved Hide resolved

def _detect_output_fields(self):
# build list of on-disk fields for dataset_type 'cm1'
self.field_list = []
for vname in self.dataset.parameters["variable_names"]:
self.field_list.append(("cm1", vname))
chrishavlin marked this conversation as resolved.
Show resolved Hide resolved

def _count_grids(self):
# This needs to set self.num_grids
self.num_grids = 1

def _parse_index(self):
self.grid_left_edge[0][:] = self.ds.domain_left_edge[:]
self.grid_right_edge[0][:] = self.ds.domain_right_edge[:]
self.grid_dimensions[0][:] = self.ds.domain_dimensions[:]
self.grid_particle_count[0][0] = 0
self.grid_levels[0][0] = 1
self.max_level = 1

def _populate_grid_objects(self):
self.grids = np.empty(self.num_grids, dtype="object")
for i in range(self.num_grids):
g = self.grid(i, self, self.grid_levels.flat[i], self.grid_dimensions[i])
g._prepare_grid()
g._setup_dx()
self.grids[i] = g
Comment on lines +63 to +67
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help but notice that in the AMRVAC frontend this method seems to do much less

    def _populate_grid_objects(self):
        # required method
        for g in self.grids:
            g._prepare_grid()
            g._setup_dx()

Note that the Athena frontend for example is closer to AMRVAC than to this.
I'm wondering which one is wrong.
@matthewturk ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewturk can you weigh in here please ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it depends. The reason we ever do self.grids[i] is if we are creating an object array in advance. I think it might also no longer be necessary -- there was a segfault that we ran into many years ago as a result of numpy object arrays, which is perhaps no longer operative.

But the idea was, create an empty numpy object array, then fill it. It's possible that in AMRVAC it creates the grid objects elsewhere and they are already populated.



class CM1Dataset(Dataset):
_index_class = CM1Hierarchy
_field_info_class = CM1FieldInfo

def __init__(
self,
filename,
dataset_type="cm1",
storage_filename=None,
units_override=None,
unit_system="mks",
):
self.fluid_types += ("cm1",)
self._handle = NetCDF4FileHandler(filename)
# refinement factor between a grid and its subgrid
self.refine_by = 1
super(CM1Dataset, self).__init__(
filename,
dataset_type,
units_override=units_override,
unit_system=unit_system,
)
self.storage_filename = storage_filename
self.filename = filename

def _setup_coordinate_handler(self):
# ensure correct ordering of axes so plots aren't rotated (z should always be
# on the vertical axis).
super(CM1Dataset, self)._setup_coordinate_handler()
self.coordinates._x_pairs = (("x", "y"), ("y", "x"), ("z", "x"))
self.coordinates._y_pairs = (("x", "z"), ("y", "z"), ("z", "y"))
neutrinoceros marked this conversation as resolved.
Show resolved Hide resolved

def _set_code_unit_attributes(self):
# This is where quantities are created that represent the various
# on-disk units. These are the currently available quantities which
# should be set, along with examples of how to set them to standard
# values.
with self._handle.open_ds() as _handle:
length_unit = _handle.variables["xh"].units
self.length_unit = self.quan(1.0, length_unit)
self.mass_unit = self.quan(1.0, "kg")
self.time_unit = self.quan(1.0, "s")
self.velocity_unit = self.quan(1.0, "m/s")
self.time_unit = self.quan(1.0, "s")

def _parse_parameter_file(self):
# This needs to set up the following items. Note that these are all
# assumed to be in code units; domain_left_edge and domain_right_edge
# will be converted to YTArray automatically at a later time.
# This includes the cosmological parameters.
#
# self.unique_identifier <= unique identifier for the dataset
# being read (e.g., UUID or ST_CTIME)
self.unique_identifier = int(os.stat(self.parameter_filename)[stat.ST_CTIME])
self.parameters = {} # code-specific items
with self._handle.open_ds() as _handle:
# _handle here is a netcdf Dataset object, we need to parse some metadata
# for constructing our yt ds.
dims = [_handle.dimensions[i].size for i in ["xh", "yh", "zh"]]

# TO DO: generalize this to be coordiante variable name agnostic in order to
# make useful for WRF or climate data. For now, we're hard coding for CM1
# specifically and have named the classes appropriately
xh, yh, zh = [_handle.variables[i][:] for i in ["xh", "yh", "zh"]]
self.domain_left_edge = np.array(
[xh.min(), yh.min(), zh.min()], dtype="float64"
)
self.domain_right_edge = np.array(
[xh.max(), yh.max(), zh.max()], dtype="float64"
)
Comment on lines +136 to +141
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are xh, yh, zh the cell centers or the cell edges?

I would naively expect that if xh represents the cell edges, then the number of dimensions is ds.dimensions["xk"].size - 1, and if it represents the cell centers, then the left and right edge should be computed by adding -dx/2 and +dx/2 respectively.

I however see that you are using ds.dimensions and ds.variables, so there is definitely some subtlety that I am missing. In any case, it'd be worth commenting to say what the xh (as well as yh and zh represent).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need @keltonhalbert to weigh in on xh, yh and zh (I think they are actually cell centers but I'm not positive?). I think adding a comment to the code would be good here -- maybe including a link to the cm1_lofs docs website?

But I think some confusion is coming from the fact that ds here is our netcdf dataset file handle, not a yt dataset. So ds.variables contains the netcdf objects with the actual data that need to be parsed and read while ds.dimensions is a netcdf attribute with dimensionality metadata. I think this will be clearer by switching ds to a different name (_handle as Matt suggested).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrishavlin @cphyc xh, yh, and zh are the cell center coordinates where scalars are defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to the comments around this section noting the xh, yh and zh coordinates.


# loop over the variable names in the netCDF file, record only those on the
# "zh","yh","xh" grid.
varnames = []
for key, var in _handle.variables.items():
if all(x in var.dimensions for x in ["time", "zh", "yh", "xh"]):
varnames.append(key)
self.parameters["variable_names"] = varnames
self.parameters["lofs_version"] = _handle.cm1_lofs_version
self.parameters["is_uniform"] = _handle.uniform_mesh
self.current_time = _handle.variables["time"][:][0]

# record the dimension metadata: __handle.dimensions contains netcdf
# objects so we need to manually copy over attributes.
dim_info = OrderedDict()
for dim, meta in _handle.dimensions.items():
dim_info[dim] = meta.size
self.parameters["dimensions"] = dim_info

self.dimensionality = 3
self.domain_dimensions = np.array(dims, dtype="int64")
self.periodicity = (False, False, False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always the case? Or could you imagine cloud models produced with your code with periodic boundary conditions? If this is likely, add a # TODO comment to say that periodicity should be supported, or even better (if possible) raise a NotImplementedError if periodicity is detected.
If no cloud model uses periodicity, then would you mind adding a comment to document it? Something like

Suggested change
self.periodicity = (False, False, False)
# Cloud Model Simulations do not support periodicity, so we hard-code it here.
self.periodicity = (False, False, False)


# Set cosmological information to zero for non-cosmological.
self.cosmological_simulation = 0
self.current_redshift = 0.0
self.omega_lambda = 0.0
self.omega_matter = 0.0
self.hubble_constant = 0.0

@classmethod
def _is_valid(cls, *args, **kwargs):
# This accepts a filename or a set of arguments and returns True or
# False depending on if the file is of the type requested.

warn_netcdf(args[0])
try:
nc4_file = NetCDF4FileHandler(args[0])
with nc4_file.open_ds(keepweakref=True) as _handle:
is_cm1_lofs = hasattr(_handle, "cm1_lofs_version")
is_cm1 = hasattr(_handle, "cm1 version") # not a typo, it is a space...
# ensure coordinates of each variable array exists in the dataset
coords = _handle.dimensions # get the dataset wide coordinates
failed_vars = [] # list of failed variables
for var in _handle.variables: # iterate over the variables
vcoords = _handle[var].dimensions # get the dims for the variable
ncoords = len(vcoords) # number of coordinates in variable
# number of coordinates that pass for a variable
coordspassed = sum(vc in coords for vc in vcoords)
if coordspassed != ncoords:
failed_vars.append(var)

if failed_vars:
mylog.warning(
(
"Trying to load a cm1_lofs netcdf file but the coordinates "
"of the following fields do not match the coordinates of "
f"the dataset: {failed_vars}"
)
)
return False

if not is_cm1_lofs:
if is_cm1:
mylog.warning(
(
"It looks like you are trying to load a cm1 netcdf file, "
"but at present yt only supports cm1_lofs output. Until"
" support is added, you can likely use"
" yt.load_uniform_grid() to load your cm1 file manually."
)
)
neutrinoceros marked this conversation as resolved.
Show resolved Hide resolved
return False
except Exception:
neutrinoceros marked this conversation as resolved.
Show resolved Hide resolved
return False

return True
1 change: 1 addition & 0 deletions yt/frontends/nc4_cm1/definitions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# This file is often empty. It can hold definitions related to a frontend.
67 changes: 67 additions & 0 deletions yt/frontends/nc4_cm1/fields.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
from yt.fields.field_info_container import FieldInfoContainer

# We need to specify which fields we might have in our dataset. The field info
# container subclass here will define which fields it knows about. There are
# optionally methods on it that get called which can be subclassed.


class CM1FieldInfo(FieldInfoContainer):
known_other_fields = (
# Each entry here is of the form
# ( "name", ("units", ["fields", "to", "alias"], # "display_name")),
("uinterp", ("m/s", ["velocity_x"], None)),
("vinterp", ("m/s", ["velocity_y"], None)),
("winterp", ("m/s", ["velocity_z"], None)),
("u", ("m/s", ["velocity_x"], None)),
("v", ("m/s", ["velocity_y"], None)),
("w", ("m/s", ["velocity_z"], None)),
("hwin_sr", ("m/s", ["storm_relative_horizontal_wind_speed"], None)),
("windmag_sr", ("m/s", ["storm_relative_3D_wind_speed"], None)),
("hwin_gr", ("m/s", ["ground_relative_horizontal_wind_speed"], None)),
("thpert", ("K", ["potential_temperature_perturbation"], None)),
("thrhopert", ("K", ["density_potential_temperature_perturbation"], None)),
("prespert", ("hPa", ["presure_perturbation"], None)),
("rhopert", ("kg/m**3", ["density_perturbation"], None)),
("dbz", ("dB", ["simulated_reflectivity"], None)),
("qvpert", ("g/kg", ["water_vapor_mixing_ratio_perturbation"], None)),
("qc", ("g/kg", ["cloud_liquid_water_mixing_ratio"], None)),
("qr", ("g/kg", ["rain_mixing_ratio"], None)),
("qi", ("g/kg", ["cloud_ice_mixing_ratio"], None)),
("qs", ("g/kg", ["snow_mixing_ratio"], None)),
("qg", ("g/kg", ["graupel_or_hail_mixing_ratio"], None)),
("qcloud", ("g/kg", ["sum_of_cloud_water_and_cloud_ice_mixing_ratios"], None)),
("qprecip", ("g/kg", ["sum_of_rain_graupel_snow_mixing_ratios"], None)),
("nci", ("1/cm**3", ["number_concerntration_of_cloud_ice"], None)),
("ncr", ("1/cm**3", ["number_concentration_of_rain"], None)),
("ncs", ("1/cm**3", ["number_concentration_of_snow"], None)),
("ncg", ("1/cm**3", ["number_concentration_of_graupel_or_hail"], None)),
("xvort", ("1/s", ["vorticity_x"], None)),
("yvort", ("1/s", ["vorticity_y"], None)),
("zvort", ("1/s", ["vorticity_z"], None)),
("hvort", ("1/s", ["horizontal_vorticity_magnitude"], None)),
("vortmag", ("1/s", ["vorticity_magnitude"], None)),
("streamvort", ("1/s", ["streamwise_vorticity"], None)),
("khh", ("m**2/s", ["khh"], None)),
("khv", ("m**2/s", ["khv"], None)),
("kmh", ("m**2/s", ["kmh"], None)),
("kmv", ("m**2/s", ["kmv"], None)),
)

known_particle_fields = (
# Identical form to above
# ( "name", ("units", ["fields", "to", "alias"], # "display_name")),
)

def __init__(self, ds, field_list):
super(CM1FieldInfo, self).__init__(ds, field_list)
# If you want, you can check self.field_list
neutrinoceros marked this conversation as resolved.
Show resolved Hide resolved

def setup_fluid_fields(self):
# Here we do anything that might need info about the dataset.
# You can use self.alias, self.add_output_field (for on-disk fields)
# and self.add_field (for derived fields).
pass

def setup_particle_fields(self, ptype):
super(CM1FieldInfo, self).setup_particle_fields(ptype)
# This will get called for every particle type.
Loading