Improvements to the DMR++ parser #230

TomNicholas · 2024-08-26T16:03:11Z

The DMR++ parser was merged in #133, but there are a few ways it could be improved.

Docs. It's not actually listed anywhere publicly that DMR++ files are supported, not even in the docstring of open_virtual_dataset.
HDF4 support (Support HDF4? #216)
Use ChunkManifest.from_arrays, which should increase performance and will reduce reliance on the kerchunk in-memory format (open_virtual_dataset with dmr++ #113 (comment))
Internal code improvements, e.g.:
a. Use pathlib module instead of os internally
b. Refactor to be more functional, see open_virtual_dataset with dmr++ #113 (comment)

The text was updated successfully, but these errors were encountered:

Mikejmnez · 2024-08-26T16:26:49Z

I would like to be involved in some of this work. I can definitely work to better understand the complexities of HDF4 and the steps to enable support to HDF4.

TomNicholas · 2024-08-27T01:46:31Z

@ayushnag is there a way to identify a DMR++ file automatically? e.g. a file magic?

ayushnag · 2024-08-27T17:43:57Z

Not to my knowledge. All valid XML files must start with the string "<?xml" however beyond that I think there would need to be some reading of the header tags (e.g. xmlns:dmrpp="http://xml.opendap.org/dap/dmrpp/1.0.0#") to know it is a dmrpp file.

cc @Mikejmnez @jgallagher59701

Mikejmnez · 2024-08-27T20:22:52Z

@ayushnag is right. The first four elements are not be enough to discern between a generic xml from a dmrpp-generated xml.

TomNicholas added documentation Improvements or additions to documentation references generation Reading byte ranges from archival files labels Aug 26, 2024

TomNicholas added the DMR++ label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to the DMR++ parser #230

Improvements to the DMR++ parser #230

TomNicholas commented Aug 26, 2024

Mikejmnez commented Aug 26, 2024

TomNicholas commented Aug 27, 2024

ayushnag commented Aug 27, 2024

Mikejmnez commented Aug 27, 2024

Improvements to the DMR++ parser #230

Improvements to the DMR++ parser #230

Comments

TomNicholas commented Aug 26, 2024

Mikejmnez commented Aug 26, 2024

TomNicholas commented Aug 27, 2024

ayushnag commented Aug 27, 2024

Mikejmnez commented Aug 27, 2024