Skip to content

Latest commit

 

History

History
149 lines (77 loc) · 11.3 KB

20221014 - OG1.0 meeting.md

File metadata and controls

149 lines (77 loc) · 11.3 KB

OG1.0 meeting

Date: 12 October 2022, 17:00 - 18:30 CEST

Participants

Victor Turpin, Kevin O'Brien, Justin Buck, Johhn Kerfoot, Gui Castelao, Emma Gardner, Callum Rollo, Jennifer Sevadjian

Agenda (planned)

  1. Introduction Kevin and Jennifer
  2. Report from UG2 meeting
  3. Stepping back from group
  4. Process to finalize OG1.0 by the end of the year

Meeting notes

Primary notetaker: Callum

0. Preamble

Victor: High expectations in US for finishing this format. Victor Presented this plan Data Management session in Seattle. Recommendation of groups raised a lot of issues/challenges from Kevin and others. We want to use ERDDAP as a key element, so issues related to groups cannot be ignored.

Justin: Done trial ingestionsto ERDDAP. Found issues. Even pre-group version does not meet CF criteria. Consensus: IOOS have already solved metadata in a CF way that does not use groups. discussion here https://groups.google.com/g/erddap/c/7g6ecOZZNNU

The key comment from IOOS on a prefered non-group approach Bob/all, As for the N_PARAM-related information, doesn't it make more sense to have for each measured variable (like altitude, bbp700, cdom, chla, ...) to have an "instrument" attribute (matching one of the values of the global "instrument" attribute) and a "units" attribute (which they already have)? "instrument" isn't a "variable attribute" in ACDD, but it seems an appropriate extension. And this approach is simpler in the sense that it is immediately clear that a given variable was measured by a given instrument. What you describe here for variable-level use of the 'instrument' attribute is already recommended by both the NCEI netCDF Templates v2.0, and the IOOS Metadata Profile 1.2 (which inherited the recommendation from the NCEI's template https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html ). In the IOOS Metadata profile link above, the 'Convention' column shows which convention each attribute originates from as a reference. IOOS added a few 'instrument' variable attributes that were needed for our purposes (calibration_date, component, discriminant, make_model) as listed in the table.
I don't know very much about the OG format requirements and why n_param and n_sensor dimensions were used instead of this approach. I'm sure there are good reasons. I only want to reinforce your point by adding that the more CF/ACDD-friendly approach is already in use in other communities - and to point anyone looking for a good reference to our Metadata Profile documentation because it spells it all out clearly. At some point, the CF/ACDD standards have limits, however, and the OG format needs may surpass what CF/ACDD can accommodate. Lesser compatibility with software like ERDDAP is the resulting tradeoff, unsurprisingly. Micah

John: Took cdl file and put it on ERDDAP. Data in groups did not make it into ERDDAP. Could be user error, but it is not automatically handled by ERDDAP. What do groups provide that we can't do simpler without them? Issue with combining multiple trajectories, but this is an edge case, so should not dictate the format. Prioritise simplicity. Groups are relatively new and there is no obvious consensus on their use. Groups do not achieve what we want for most use cases

Emma: Why did we vote for groups? Whatis the use case for them? Can ERDDAP ingest these as variables?

John: take e.g. global atrtibutes likek wmoid. These can be promoted to variables. Bob (ERDDAP creator) says if you have things as global attributes can promote them to be variables. But this could explode the number of variables. Would end up with the same thing as just an INSTRUMENT_CTD variables and attatched attributes. Do not see what groups add.

Emma: Was for if we duplicate the sensors

John: If we have two trajectories (two deployments) if the are aggregated in ERDDAP, attributes attatched to the same variable name will get cloberred. Only the attributes form the last netCDF file by mod time will be preserved.

Kevin: only way around this is to promote each piece of metadata like calibration date to variables, with corresponding size. It it's crucial you could just accept having lots of variables on ERDDAP and the user opts to not always download them. This is possible. Makes a lot of work for the ERDDAP manager though.

Gui: Take a few steps back. Suggest disentangling ERDDAP from conventions. Focus on good data structure first. No surprise that file doesn't work with ERDDAP. It is currently broken. Moved to CDL so we could see the format evolve, but it is not a working prototype. Not a priority to aggregate between multiple platforms, as per previous meeting. Some solutions proposed require naming. How do we give different names for different sensors? Instead of aggregating data in the key, we should aggregate somewhere else. Hence groups. Important questions: how do groups violate CF? Can't find how it does. 2. ERDDAP does not see groups, groups are new so very few people are using it, but that is the nature of progress. Advantage of groups is to split the data that should not go together like ADCP data and CTD data which have completely seperate temporal sampling regime. Makes sense to put each sensor in its own unit, so post processers can combine how they want.

John: There is no community consensus on how to use groups. Doesn't necessarily violate CF. NetCDF handles spare data fine if fill values are there. Every dataset has this idea implicitly. It can handle different sampling frequencies.

Victor: Be pragmatic. If we see difficulties with groups and US community has issues with them, should step back from them. It may be too soon to make this move. History of groups: we agreed that aggregating multiple trajectories should not constrain the format. As this is not the priority, we should move to something that IMOS and US GDACs use "encapsulate variables". For practicalities: this has been going too long. Let's use encapsualte variables and get OG1.0 out the door. Must be done by the end of the year.

Justin: Reinforce Victor's comments. This is OG1.0. On ERDDAP side, there are many tools we need this format to be ingested by, e.g. panoply. For first release let's have something that works for CF tools and majority CF users. Meet the community where they are. Need to embed someone in CF, so when they agree on viable use of groups, we bring it in. Get a pragmatic use of encapsulated variables for 1.0. Then pull groups in from CF for a later release.

Gui: What is the issue with groups in CF?

Justin: see note from Micah (first section of this doc). Not against groups, just want a format that the community already used.

Emma: Groups are the future, but they're not utilised.

Gui: Me and Kevin are on CF. Not clear for me what is missing. Not using inheritence or other advanced features. Let's put aside groups for now. If we will use encapsulated variables, need people to step forward and engage/put in PRs. More people need to vote/approve PRs.

Kevin: Agree with Jusin. Let's get a simple format out the door. Groups are needed for multiple trajectories in one file. At this point groups are not needed. ERDDAP can aggregate trajectories down the line

John: With a baseline format you can do a lot. Need that atomic unit of data that is consistent. DACs choose how to aggregate and serve

Victor: Agree with Gui, got to engage more on GitHub. Rules aren't strict. Need just two people to approve a PR.

John: If community consensus is there. Just make sure before a merge that ew understand consequences.

Gui: If no objections, let's remove groups. No objections. Need to formalize governance. Expand voting group. Make clear rules. Get a list of prioritised tasks and volunteers for each task by the end of this meeting.

Justin: Need a list of things that are essential for 1.0. All others can wait until the next version. I try to visit every 2 weeks. Often more like 1 month but 2 weeks is a good goal.

Gui: how to make list of priorities?

Justin: once we have a file with these encapsulated variables we can put it through CF checkers until it works. Then make that 1.0 release.

Emma: Use a defined, agreed upon vocabulary for naming of variables. Want to call variables by model name. e.g. go to L22 vocabulary and use specific name of the instrument for the variable name. Can have an attribute in the measurmetn variable(e.g. TEMP) that points to the instrument. This is what is done in EGO

John: we can desribe an instrument using attributes. Look to BODC etc. to find that accepted vocabulary. John will put together a CDL for this

Gui: Let's focus on a strategy now and get down in the weeds of issues later.

John: should we add new seperate cdl? Or just push changes over the existing cdl.

Gui: We'll end up with multiple cdls, so fine to make more now. Use github milestones. These work as pathways toward how we make OG0.1, OG1.0 etc. This should keep us on track.

John: current two issues marked 0.1 are the correct ones.

Justin: Many currently tagges as 1.0 could be 1.X

Victor: Many 1.0s are typos etc. so should be kept at 1.0. Need the document to be consistent/robust

John: format first. The manual must describe the cdl.

1. Introduction Kevin and Jennifer

Jennifer is a data manager at Scripps. She is taking on Gui's old role.

Kevin is chair of the data group at OCG. Working at NOAA.

  1. Issues tagged - 0.1

2. cdl

John: volunteer to make example PR with encapsualted variables. Need a broader discussion on names of these encapsualted variables. How to handle gliders with e.g two CTDs on the same glider. Need guidelines on how we name variables to make harvesting easier. Will put in a cdl example.This will be uploaded very soon.

Justin: once the cdl works, how do we proceed?

3. governance

Gui: We need to improve governance. Should be done in a small group. John, Kevin, Gui, Justin, Emma. This is essential for using asynchronous tools like github. Request to kevin to oversee governance. Want more rules like how long a voting period lasts. 2 votes not enough time, need to wait a certain amount of time. How do we progress when we do not have consensus? Who votes? Which votes count? Need to expand the voter pool. We should rotate the leaders etc.

Justin: Governance can wait until after 1.0. Suggest a meeting in two weeks. We should have an approved, working cdl by then. After this we can nominate a group of writers. Split into two writing teams: once for governance, one for the spec. Make two big PRs.

Victor: John will propose cdl to be published on github. Then we work on this cdl with issues/PRs. Reserve 0.1 tag for issues related to cdl.

Gui: please review 0.1 tags directly after meeting. Be sure to prioritise all new issues.

Justin: Need a governance focused Issue

4. practicalities

John could use some help with github

Justin not so confident in markdown.

Gui: I will make a github demo. If you have any issues please contact me.

Gui: 15 years ago I was pushing for DAC against resistance. These things take time! The CF community are friendly. Get involved!

5. Agenda toward release

  • cdl and nc should be published next week (john)
  • The team review the cdl (through Github) and meet again in two weeks (victor)
  • OG1.0 must be finalize at the end of the year

6. Next meeting

Around end of October