Develop metadata solution for reporting #77

fschreyer · 2021-07-28T16:07:11Z

Dear all,

following the remind2 task force meeting (see notes here), we decided that we need a solution to store metadata about mif files. In particular, a list of output variables with their definitions. From the discussion, I understood that the solution should include

a system to automatically generate a list of all REMIND output variables with their definitions
a system to make developers add/modify variable definitions to that list
ideally: a system of quality flags for each of the variables (e.g. "model input", "high confidence", "low confidence")
information on the origin/setup of the scenario in the mif file or a related metadata file in the run folder to make it easier to trace it back mif files to the actual runs and their configuration

I guess, we do not need all features at once but the key aspect would be to have a system to document variable definitions. Please add or correct if I misrepresented something.

Best,
Felix

cchrisgong · 2021-07-28T16:26:18Z

On the last point, I opened an issue in magclass and quitte:
pik-piam/magclass#101
pik-piam/quitte#19

In my opinion, we can add the run path, model version, reporting library version as comments at the top of mif file. Won't be more than 3 lines hence won't enlarge mif size. R scripts reading mif file will automatically skip these lines

variable documentation imo should be in a separate file since it might be large if all variables are defined. However, the comment header above can point to the path of this metadata file so people can trace a mif to both the run and the bespoke variable definitions for cross comparison

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q · 2021-10-14T13:20:58Z

Turning the eye of ~~Sauron~~ LOD-GEOSS on this issue …
As for variable definitions, this is an ongoing task in LOD-GEOSS, where this is pursued in tedious, painful detail. @giannou is involved with that, too. In the meantime, https://github.com/openENTRANCE/nomenclature might be a useful building block for this.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q · 2021-10-15T08:05:20Z

On the last point, I opened an issue in magclass and quitte:
pik-piam/magclass#101
pik-piam/quitte#19

I updated quitte (0.3093.0). read.quitte() now ignores any header of comments, and write.mif() can add such a header. But since the number of .mif files written by write.mif() per year is probably in the single digits, this feature depends on magclass, and RSE will need some prodding to look into this.

a system to automatically generate a list of all REMIND output variables with their definitions

Tall order. This will require a system parallel to remind2::convGDX2MIF() which has to be updated along the individual reportX() functions.
As a sketch, we could have a function variable_definitions() that in turn calls variable_definition_X() functions that return definitions for all variables returned by reportX() functions. The variable_definition_X() functions would go into the same files as the reportX() functions and we could automatically test that all variables returned by convGDX2MIF() also appear in the output of variable_definitions().
Problem is, the set of variables in the returned .mif file depends on the module realisations of the .gdx it is based on. Or at least will, since industry/fixed_shares will not report subsector information, only aggregate industry information. So that is something to be worked out. Possibly we can test the output of several .mif files with different realisations collectively against the variable definitions.

a system to make developers add/modify variable definitions to that list

Since variables would only be added or changed when code is added or changed to/in remind2, that system should be remind2 code as well.

ideally: a system of quality flags for each of the variables (e.g. "model input", "high confidence", "low confidence")

We would need a consensus on what these flags mean. There's some work on data quality being done in LOD-GEOSS, I can poke around if they did come up with something useful. "Confidence" is a term implying a quality that isn't actually what we want to communicate. Probably it is more useful to discern between "proper model outputs" and "downscaled figures".

fschreyer · 2021-10-18T10:50:22Z

Ok, thanks for the comments, Michaja. I added you to the reporting task force email where we will meet next week again. We can discuss there.

fschreyer assigned Loisel, giannou and Renato-Rodrigues Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop metadata solution for reporting #77

Develop metadata solution for reporting #77

fschreyer commented Jul 28, 2021

cchrisgong commented Jul 28, 2021 •

edited

Loading

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented Oct 14, 2021

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented Oct 15, 2021

fschreyer commented Oct 18, 2021

Develop metadata solution for reporting #77

Develop metadata solution for reporting #77

Comments

fschreyer commented Jul 28, 2021

cchrisgong commented Jul 28, 2021 • edited Loading

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented Oct 14, 2021

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented Oct 15, 2021

fschreyer commented Oct 18, 2021

cchrisgong commented Jul 28, 2021 •

edited

Loading