Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract metadata (NcML XML) from NetCDF/HDF5 files, new "requirements" option for external tools #9239

Merged
merged 14 commits into from
Jan 20, 2023

Conversation

pdurbin
Copy link
Member

@pdurbin pdurbin commented Dec 20, 2022

What this PR does / why we need it:

From talking to researchers who use NetCDF and HDF5 files, we believe there is value in extracting metadata from those files to give some insight about their contents.

I plan to also make a pull request in the https://github.com/gdcc/dataverse-previewers repo to add a preview tool for the extracted file, which is in NcML format.

Which issue(s) this PR closes:

Special notes for your reviewer:

Additional scope we could consider:

  • an API endpoint to extract NcML from existing NetCDF and HDF5 files
  • have the "requirements" option apply in more places than file level preview tools
  • making the extraction feature modular

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No but here's an example of two HDF5 files where one doesn't show the eyeball because it doesn't meet the requirement of having a aux file:

Screen Shot 2022-12-20 at 10 05 08 AM

Is there a release notes update needed for this change?:

Yes, included.

Additional documentation:

Included.

@pdurbin
Copy link
Member Author

pdurbin commented Dec 20, 2022

The failing test is this:

java.lang.AssertionError: expected:<201> but was:<500>
at edu.harvard.iq.dataverse.api.HarvestingClientsIT.testHarvestingClientRun(HarvestingClientsIT.java:184)

I just clicked "build now" to see if that fixes it.

@pdurbin pdurbin self-assigned this Jan 3, 2023
@mreekie
Copy link

mreekie commented Jan 4, 2023

Update. This was reviewed by Stephen, but is on hold pending some code tweaks from Phil.
Will go back to review by Stephen after that.

@mreekie
Copy link

mreekie commented Jan 5, 2023

update: Permissions are being discussed.
Stephen/Phil To keep this simple for now, superuser permissions will be required.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 5, 2023

an API endpoint to extract NcML from existing NetCDF and HDF5 files

I just added this in e2066c8.

I'm ready for more review so I'm unassigning myself.

@pdurbin pdurbin removed their assignment Jan 5, 2023
@pdurbin
Copy link
Member Author

pdurbin commented Jan 10, 2023

@sekmiller and talked a bit about the new "requirements" feature for external tools. I also removed an extraneous println he caught. Finally, I merged the latest from develop in.

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with Phil to get a fuller understanding of "meetsRequirements" on the external tool. passing to qa

@sekmiller sekmiller removed their assignment Jan 10, 2023
@mreekie
Copy link

mreekie commented Jan 11, 2023

size=10 for the next sprint

@mreekie mreekie added the Size: 10 A percentage of a sprint. 7 hours. label Jan 11, 2023
@coveralls
Copy link

coveralls commented Jan 11, 2023

Coverage Status

Coverage: 20.0% (-0.01%) from 20.013% when pulling 2302989 on 9153-extract-metadata into 1bef93a on develop.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 12, 2023

I'm not sure why I'm seeing this at https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-9239/8/consoleFull

TASK [dataverse : fire off installer] ******************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "/usr/bin/python3 /tmp/dvinstall/install.py -f --config_file=default.config --noninteractive > /tmp/dvinstall/install.out 2>&1", "delta": "0:02:23.416798", "end": "2023-01-11 23:03:08.721725", "msg": "non-zero return code", "rc": 1, "start": "2023-01-11 23:00:45.304927", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

@kcondon kcondon self-assigned this Jan 18, 2023
"Use a version like '4.11.0.1' in the example above where the
previously released version was 4.11" -- dev guide

That is, these scripts should have been 5.12.1.whatever since
the last release was 5.12.1. Fixing. (They were 5.13.whatever.)
@kcondon
Copy link
Contributor

kcondon commented Jan 19, 2023

Generally works good. Found 2 issues:

  1. Extracting metadata with api endpoint works on draft dataset but returns 404 on published one.
    [Kevin] Retested and this also works. What was confusing is the list aux file endpoint shows code":404,"message":"API endpoint does not exist on this server." when really, it's just that the aux file isn't there. When I correctly extracted it and ran the list again, it returned ok. The other part of the confusion is the external tool isn't appearing for a number of the test files so it gives the impression extract didn't work and you then need to do a list aux file.

  2. One test hdf file is processed and a NcML aux file is produced but not displaying eye icon for auxfiles previewer tool.
    [Kevin] I'm seeing many more that don't show the external tool aux file viewer icon, though the aux file is extracted and confirmed by list aux file. Might be worth looking at what the tool thinks when it fails?

@kcondon
Copy link
Contributor

kcondon commented Jan 20, 2023

Since the first issue was not reproducible and the second affects an external tool, we're merging and will work on the external tool side of things separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Extract metadata from NetCDF and HDF5 files as XML in NcML format
5 participants