-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardcoded paths in MDIMs make future updates harder #3723
Comments
We discussed this in triage today. @Marigold thought this could be good to tackle at a time when we are fixing a related issue, to make MDIMs on the ETL side...
as it does now. |
To clarify, this should not include refactoring the currently existing mdims (e.g. covid). That should happen as a separate issue. |
This was fixed by #3945 It's now sufficient to specify |
That sounds like a great solution, thanks a lot @Marigold! |
Context
Currently, mdim steps require a config yaml file, which includes full paths of indicators in tables of grapher steps.
Potential problems
Possible solution
Treat the MDIM yaml as a template, where we fill in some variables at the time we ship it.
Ideally, we would avoid hardcoding paths (in steps and yaml files), and all dependencies would be specified in the DAG.
After a discussion with @lucasrodes we thought of a possible solution. The config yaml file (for example of the
covid
mdim step) could have a special placeholder for a dataset path, e.g.{ds:short_name}
(e.g.{ds:covid_cases}
), specifying the short name of a dataset listed as a dependency of the mdim step in the DAG.Then, the function
paths.load_mdim_config
would read the config yaml file, and replace those placeholders by the full URIs of the corresponding dataset.Possible rabbit holes or related issues
{ds:custom_short_name}
, and then pass a dictionary topaths.load_mdim_config
mapping those custom short names to the corresponding dataset URI.Table
does not have an URI, and we rely onTable.metadata.dataset.uri
. Maybe tables should also have a URI attribute.paths
to get the URI of a table in a dataset. Currently, the way we'd do that is by, e.g. `paths.load_dataset("dataset_path...") + "/table_name...".Impact
We're not encountering this problem so much yet, but it's more that we are currently setting precedents on how a large amount of work will be done, so we're interested in saving ourselves future work by getting this right.
The text was updated successfully, but these errors were encountered: