-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run_function metadata error #315
Comments
Hi @reint-fischer. Thanks for reporting this. We will look into it and get back to you. |
@reint-fischer, thanks for raising this issue! I was able to move past the error you encountered as follows:
path = "/tmp/tmpwxl4qo75/1EyjKoGo/777be2b9214151be7e2c4f211c36a334-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_research_zcyl5_2021_zcyl5_20210101v30001.nc"
# Start coding here!
import pandas as pd
import xarray as xr
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe, setup_logging
def make_url(time):
year=time.strftime('%Y')
year_month_day = time.strftime('%Y%m%d')
return(f'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/{year}/ZCYL5_{year_month_day}v30001.nc')
dates = pd.date_range('2021-01-01','2021-01-03', freq='D')
time_concat_dim = ConcatDim("time", dates, nitems_per_file=1)
pattern = FilePattern(make_url, time_concat_dim)
recipe = XarrayZarrRecipe(
pattern,
inputs_per_chunk=30,
+ xarray_open_kwargs=dict(engine="netcdf4")
)
setup_logging()
recipe_pruned = recipe.copy_pruned()
run_function = recipe_pruned.to_function()
run_function()
With these ValueError Traceback (most recent call last)
/tmp/ipykernel_369/1701840335.py in <module>
25 run_function = recipe_pruned.to_function()
26
---> 27 run_function()
/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/python.py in function()
44 stage.function(m, config=pipeline.config)
45 else:
---> 46 stage.function(config=pipeline.config)
47
48 return function
/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py in prepare_target(config)
497 "ignore"
498 ) # suppress the warning that comes with safe_chunks
--> 499 ds.to_zarr(target_mapper, mode="a", compute=False, safe_chunks=False)
500
501 # Regardless of whether there is an existing dataset or we are creating a new one,
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
2035 encoding = {}
2036
-> 2037 return to_zarr(
2038 self,
2039 store=store,
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
1404
1405 if mode in ["a", "r+"]:
-> 1406 _validate_datatypes_for_zarr_append(dataset)
1407 if append_dim is not None:
1408 existing_dims = zstore.get_dimensions()
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in _validate_datatypes_for_zarr_append(dataset)
1299
1300 for k in dataset.data_vars.values():
-> 1301 check_dtype(k)
1302
1303
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in check_dtype(var)
1290 ):
1291 # and not re.match('^bytes[1-9]+$', var.dtype.name)):
-> 1292 raise ValueError(
1293 "Invalid dtype for data variable: {} "
1294 "dtype must be a subtype of number, "
ValueError: Invalid dtype for data variable: <xarray.DataArray 'flag' (time: 2879)>
dask.array<concatenate, shape=(2879,), dtype=|S35, chunksize=(1440,), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2021-01-01 ... 2021-01-02T23:59:00
Attributes:
long_name: quality control flags
A: Units added
B: Data out of range
C: Non-sequential time
D: Failed T>=Tw>=Td
E: True wind error
F: Velocity unrealistic
G: Value > 4 s. d. from climatology
H: Discontinuity
I: Interesting feature
J: Erroneous
K: Suspect - visual
L: Ocean platform over land
M: Instrument malfunction
N: In Port
O: Multiple original units
P: Movement uncertain
Q: Pre-flagged as suspect
R: Interpolated data
S: Spike - visual
T: Time duplicate
U: Suspect - statistial
V: Spike - statistical
X: Step - statistical
Y: Suspect between X-flags
Z: Good data
metadata_retrieved_from: ZCYL5_20210101v10001.nc dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object Perhaps you or @rabernat have thoughts on how to resolve this error? |
I also in on this and filed this as an issue. There are other problems with the script that need to addressed to regarding the time dimension of the SAMOS NetCDF (not being in date-time format).
Please see: pangeo-forge/staged-recipes#120
/Eric
On Mar 7, 2022, at 8:57 AM, Charles Stern ***@***.******@***.***>> wrote:
@reint-fischer<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_reint-2Dfischer&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=uX5SMvcHHsJhDJXfbd3mVRPfRPy-_DmjxM6kljm7K7Q_qDFR6blcTdYZ3-JuU0Gr&s=bfsaiPS1TCvoZK4YiA214CEu1V4MtB9KCgsop7Jf4Vc&e=>, thanks for raising this issue!
I was able to move past the error you encountered as follows:
1. I ran the same code you posted above, but with setup_logging(level="DEBUG") to expose debugging logs. Here are the last few lines of debug logs before the error appears:
[Screen Shot 2022-03-07 at 8 44 48 AM]<https://urldefense.proofpoint.com/v2/url?u=https-3A__user-2Dimages.githubusercontent.com_62192187_157078721-2D0e73c833-2D8419-2D4fbf-2Db061-2D31f897b50f19.png&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=uX5SMvcHHsJhDJXfbd3mVRPfRPy-_DmjxM6kljm7K7Q_qDFR6blcTdYZ3-JuU0Gr&s=T9Ok8DJISn_Tfv9SptbmtiSyDWW95cLUY6V2mf17ufA&e=>
1. This revealed that we are hitting the issue when xarray tries to open one of cached inputs from the following path
path = "/tmp/tmpwxl4qo75/1EyjKoGo/777be2b9214151be7e2c4f211c36a334-http_tds.coaps.fsu.edu_thredds_fileserver_samos_data_research_zcyl5_2021_zcyl5_20210101v30001.nc"
1. Referring back to the traceback you posted, I noted that we appear to be deep into the h5py stack there. This lead me to guess that our issue was one of the xarray backend we are using to open your cached input. Pangeo Forge defaults to the h5netcdf xarray backend, but we can customize the backed as follows:
# Start coding here!
import pandas as pd
import xarray as xr
from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe, setup_logging
def make_url(time):
year=time.strftime('%Y')
year_month_day = time.strftime('%Y%m%d')
return(f'http://tds.coaps.fsu.edu/thredds/fileServer/samos/data/research/ZCYL5/{year}/ZCYL5_{year_month_day}v30001.nc')
dates = pd.date_range('2021-01-01','2021-01-03', freq='D')
time_concat_dim = ConcatDim("time", dates, nitems_per_file=1)
pattern = FilePattern(make_url, time_concat_dim)
recipe = XarrayZarrRecipe(
pattern,
inputs_per_chunk=30,
+ xarray_open_kwargs=dict(engine="netcdf4")
)
setup_logging()
recipe_pruned = recipe.copy_pruned()
run_function = recipe_pruned.to_function()
run_function()
With these xarray_open_kwargs added, I am able to get past the error you hit, but am now encountering a new error:
ValueError Traceback (most recent call last)
/tmp/ipykernel_369/1701840335.py in <module>
25 run_function = recipe_pruned.to_function()
26
---> 27 run_function()
/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/executors/python.py in function()
44 stage.function(m, config=pipeline.config)
45 else:
---> 46 stage.function(config=pipeline.config)
47
48 return function
/srv/conda/envs/notebook/lib/python3.9/site-packages/pangeo_forge_recipes/recipes/xarray_zarr.py in prepare_target(config)
497 "ignore"
498 ) # suppress the warning that comes with safe_chunks
--> 499 ds.to_zarr(target_mapper, mode="a", compute=False, safe_chunks=False)
500
501 # Regardless of whether there is an existing dataset or we are creating a new one,
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/core/dataset.py in to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
2035 encoding = {}
2036
-> 2037 return to_zarr(
2038 self,
2039 store=store,
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options)
1404
1405 if mode in ["a", "r+"]:
-> 1406 _validate_datatypes_for_zarr_append(dataset)
1407 if append_dim is not None:
1408 existing_dims = zstore.get_dimensions()
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in _validate_datatypes_for_zarr_append(dataset)
1299
1300 for k in dataset.data_vars.values():
-> 1301 check_dtype(k)
1302
1303
/srv/conda/envs/notebook/lib/python3.9/site-packages/xarray/backends/api.py in check_dtype(var)
1290 ):
1291 # and not re.match('^bytes[1-9]+$', var.dtype.name)):
-> 1292 raise ValueError(
1293 "Invalid dtype for data variable: {} "
1294 "dtype must be a subtype of number, "
ValueError: Invalid dtype for data variable: <xarray.DataArray 'flag' (time: 2879)>
dask.array<concatenate, shape=(2879,), dtype=|S35, chunksize=(1440,), chunktype=numpy.ndarray>
Coordinates:
* time (time) datetime64[ns] 2021-01-01 ... 2021-01-02T23:59:00
Attributes:
long_name: quality control flags
A: Units added
B: Data out of range
C: Non-sequential time
D: Failed T>=Tw>=Td
E: True wind error
F: Velocity unrealistic
G: Value > 4 s. d. from climatology
H: Discontinuity
I: Interesting feature
J: Erroneous
K: Suspect - visual
L: Ocean platform over land
M: Instrument malfunction
N: In Port
O: Multiple original units
P: Movement uncertain
Q: Pre-flagged as suspect
R: Interpolated data
S: Spike - visual
T: Time duplicate
U: Suspect - statistial
V: Spike - statistical
X: Step - statistical
Y: Suspect between X-flags
Z: Good data
metadata_retrieved_from: ZCYL5_20210101v10001.nc dtype must be a subtype of number, datetime, bool, a fixed sized string, a fixed size unicode string or an object
Perhaps you or @rabernat<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_rabernat&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=uX5SMvcHHsJhDJXfbd3mVRPfRPy-_DmjxM6kljm7K7Q_qDFR6blcTdYZ3-JuU0Gr&s=QqlMHv3fTBGEBGNW76N1FA3FCooUESksAr8McFQhMNM&e=> have thoughts on how to resolve this error?
—
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pangeo-2Dforge_pangeo-2Dforge-2Drecipes_issues_315-23issuecomment-2D1060907616&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=uX5SMvcHHsJhDJXfbd3mVRPfRPy-_DmjxM6kljm7K7Q_qDFR6blcTdYZ3-JuU0Gr&s=qAd7M9J4siPEbPQffsKNle6grjl4d1sPtH8El50NSVY&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQJ4AMNMSTWYRLJMQ7A2ZWLU6YYPJANCNFSM5P6JZTXA&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=uX5SMvcHHsJhDJXfbd3mVRPfRPy-_DmjxM6kljm7K7Q_qDFR6blcTdYZ3-JuU0Gr&s=GTOg6vPjFS0cQbztLtfG6PRfDKLaEniHe3YcS3Zf5uE&e=>.
Triage notifications on the go with GitHub Mobile for iOS<https://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=uX5SMvcHHsJhDJXfbd3mVRPfRPy-_DmjxM6kljm7K7Q_qDFR6blcTdYZ3-JuU0Gr&s=4OmpD33QQQo35AqACqZu0hpuNUuoBit-7CIMK4BR2HE&e=> or Android<https://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=9mghv0deYPYDGP-W745IEdQLV1kHpn4XJRvR6xMRXtA&r=oyDiokJpqC12t0ZN_mTYGu2t7Pnrk0fjHVsGJ33nKtw&m=uX5SMvcHHsJhDJXfbd3mVRPfRPy-_DmjxM6kljm7K7Q_qDFR6blcTdYZ3-JuU0Gr&s=qVWlBdFgc42O_-Dkeza3Sd9dqygP8nEQDHYwFZP9r9w&e=>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
|
@SBS-EREHM, thanks for weighing in, I've just opened #320 which I believe is the root of the problem both you and @reint-fischer are seeing. Manually setting I'll make a PR today to make this the default behavior, however. I'll ping this thread as soon as that's merged and released, after which point this should just work out-of-the-box (without any manual |
I'm not sure we want |
@reint-fischer and @SBS-EREHM, apologies for being too quick with a suggested fix in #315 (comment). Upon further reflection (and input from @rabernat), it looks like the source of this issue is deeper than I'd realized. We've opened h5netcdf/h5netcdf#157 to push these concerns upstream. I will report back on this thread when we have some progress on this. Edit: The solution for this problem is now being tracked in #320. I will ping this thread when we have a fix. |
@reint-fischer and @SBS-EREHM, thanks for your patience as we've been working through a series of issues that your recipe surfaced. I've made considerable headway with this recipe, which I will summarize in a new comment on pangeo-forge/staged-recipes#120. AFAICT, this issue is duplicative of that one, and that thread is less cluttered with my prior thoughts, so I figured it would be a better place to reset the conversation on this recipe. I'll close this issue now and we can continue the conversation on the linked thread. |
Hi all,
When trying to create a recipe for SAMOS data (https://samos.coaps.fsu.edu/html/nav.php?s=2) with @hsosik and @SBS-EREHM, we ran into a problem when trying to execute
run_function()
, where something goes wrong with consolidating the metadata (as in #300) which creates an error sayingFileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptzxt2o1f/Mnhnk1k0/.zmetadata'
. I was using the binder environment created for the OSM tutorial. Are we missing something in the code?The specific ship-data that we were trying to write a recipe for is shown here in the catalog: http://tds.coaps.fsu.edu/thredds/catalog/samos/data/research/ZCYL5/catalog.html
Below is the full code and error message:
Code
Error
Thank you,
Reint
The text was updated successfully, but these errors were encountered: