Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GESLA obs netcdf file has multiple lon/lat values #862

Closed
7 tasks done
veenstrajelmer opened this issue May 28, 2024 · 0 comments · Fixed by #867
Closed
7 tasks done

GESLA obs netcdf file has multiple lon/lat values #862

veenstrajelmer opened this issue May 28, 2024 · 0 comments · Fixed by #867

Comments

@veenstrajelmer
Copy link
Collaborator

veenstrajelmer commented May 28, 2024

GESLA obs netcdf file has multiple lon/lat values, while it should have a single value for each like for instance the UHSLC file.

import os
import glob
import xarray as xr
import dfm_tools as dfmt
import matplotlib.pyplot as plt
plt.close("all")

dir_output = "./sealevel_data_subset"
os.makedirs(dir_output, exist_ok=True)

lon_min, lon_max, lat_min, lat_max = -6, -4, 49.5, 51
time_min, time_max = '2016-01-01','2016-02-01'

subset_kwargs = dict(lon_min=lon_min, lon_max=lon_max, lat_min=lat_min, lat_max=lat_max)
gesla_catalog_gpd_sel = dfmt.ssh_catalog_subset(source='gesla3', **subset_kwargs)
ioc_catalog_gpd_sel = dfmt.ssh_catalog_subset(source='ioc', **subset_kwargs)
cmems_list_gpd_sel = dfmt.ssh_catalog_subset(source='cmems', **subset_kwargs)
uhslc_json_rqds_sel = dfmt.ssh_catalog_subset(source='uhslc-rqds', **subset_kwargs)
uhslc_json_fast_sel = dfmt.ssh_catalog_subset(source='uhslc-fast', **subset_kwargs)
psmsl_gnssir_gpd_sel = dfmt.ssh_catalog_subset(source='psmsl-gnssir', **subset_kwargs)

subset_gpd_list = [gesla_catalog_gpd_sel,
                   ioc_catalog_gpd_sel,
                   cmems_list_gpd_sel, psmsl_gnssir_gpd_sel,
                   uhslc_json_rqds_sel, uhslc_json_fast_sel,
                   ]
# plot stations
fig,ax = plt.subplots(figsize=(12,7))
for subset_gpd in subset_gpd_list:
    if subset_gpd.empty:
        continue
    source = subset_gpd.iloc[0]["source"]
    nstations = len(subset_gpd)
    subset_gpd.geometry.plot(ax=ax, marker="x", label=f"{source} ({nstations} obs)")
ax.legend(loc=3)
ax.set_title("Sealevel observations in the English Channel")
ax.set_xlim(lon_min, lon_max)
ax.set_ylim(lat_min, lat_max)
dfmt.plot_coastlines(ax=ax, min_area=1000, linewidth=0.5, zorder=0)
dfmt.plot_borders(ax=ax, zorder=0)


# retrieve data (for all except gesla and ioc)
subset_gpd_list_retrieve = [gesla_catalog_gpd_sel,
                            ioc_catalog_gpd_sel,
                            cmems_list_gpd_sel, psmsl_gnssir_gpd_sel,
                            uhslc_json_rqds_sel, uhslc_json_fast_sel,
                            ]
for subset_gpd in subset_gpd_list_retrieve:
    dfmt.ssh_retrieve_data(subset_gpd, dir_output,
                           time_min=time_min, time_max=time_max)


# open all files an see what is inside
file_list = glob.glob(os.path.join(dir_output,"*.nc"))

for file_nc in file_list:
    fname = os.path.basename(file_nc)
    ds = xr.open_dataset(file_nc)
    # print(ds.attrs["source"])
    print(fname)
    print("> latitude shape:", ds.latitude.shape)
    print("> station_y_coordinate shape:", ds.station_y_coordinate.shape)
    print("> station_y_coordinate:", ds.station_y_coordinate.to_numpy())

Prints:

devonport-dev-gbr-bodc.nc
> latitude shape: (1, 3072)
> station_y_coordinate shape: ()
> station_y_coordinate: 50.368389
ioc-plym-plym.nc
> latitude shape: ()
> station_y_coordinate shape: ()
> station_y_coordinate: 50.37
ioc-ptis-ptis.nc
> latitude shape: ()
> station_y_coordinate shape: ()
> station_y_coordinate: 50.594
newlyn-new-gbr-bodc.nc
> latitude shape: (1, 3072)
> station_y_coordinate shape: ()
> station_y_coordinate: 50.103
newlyn-new-gbr-cmems.nc
> latitude shape: (1, 759)
> station_y_coordinate shape: ()
> station_y_coordinate: 50.099998
newlyn_cornwall-294a-gbr-uhslc.nc
> latitude shape: (1, 768)
> station_y_coordinate shape: ()
> station_y_coordinate: 50.102
plymouth-ply-gbr-cmems.nc
> latitude shape: (1, 759)
> station_y_coordinate shape: ()
> station_y_coordinate: 50.368401
port_isaac-pti-gbr-cco.nc
> latitude shape: (1, 4608)
> station_y_coordinate shape: ()
> station_y_coordinate: 50.59419

Todo:

  • remove gesla double variables
  • reduce timeseries dimension for uhslc datasets
  • align latitude and station_y_coordinate (same for lon), at least prevent these arrays with the same value
  • latitude is often not even a variable in the ds, but still accessible, how does this work?
  • the source attribute is not available in all datasets, fix this
  • also check for ddl
  • the accuracy of the coordinates differs per dataset, is this a bug or is it just all accuracy available within the dataset?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant