Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Derived datasets to be parents, if one of theire ancestors is Full #171

Open
14 tasks
noracato opened this issue Mar 4, 2025 · 0 comments
Open
14 tasks

Comments

@noracato
Copy link
Member

noracato commented Mar 4, 2025

This is a small development plan for allowing any Dataset::Derived to be a parent, as long as one ancestor remains Dataset::Full. This is the first step of the new dataset pipelines project.

There are three identified steps to be taken in Atlas:

  1. Every dataset can be a parent, not just full datasets
  2. Validation of complete parent/grandparent
  3. Chain fallback locations to grandparents etc to pick up the correct curves

I will describe what needs to happen for each of them.

Prologue

Before we start, let's clean up some deprecated code and methods with confusing names from the Dataset and Dataset::Derived models.

Initializer Inputs

As long as I can remember these have been deprecated, but the code is still present in the Dataset::Derived model. These inputs were used to initialise Derived sets that did not originate from ETLocal and thus had no graph values.

  • Remove the InitializerInput model
  • Remove any references to and validations of InitializerInput from the Dataset::Derived model. This includes everything surrounding uses_deprecated_initializer_inputs
  • Remove any affected specs

PARENT_VALUE

In Runtime Atlas defines methods that can be called for the dataset within nodes and edges to build the present graph (so for Refinery). This includes methods like EB and PRIMARY_PRODUCTION. The method PARENT_VALUE has been unused for a long time and references an obsolete csv file demands/parent_values.csv within the dataset. The name for this method can become confusing while we work on this project as well as in the future, so I'd like to get rid of it.

  • Remove PARENT_VALUE from Atlas::Runtime
  • Remove parent_values method from Dataset
  • Check for any of these obsolete csv's still hanging around in ETSource

1. Every dataset can be a parent, not just full datasets

Our first step does not include validation of a Full ancestor yet. We are merely setting up the hierarchy.

In the Dataset::Derived model, the parent should be able to be a Derived dataset:

def parent
  Dataset.find(base_dataset) # Was Dataset::Full.find(base_dataset)
end

The same goes for validate_presence_of_base_dataset.

  • Adjust parent method
  • Adjust validate_presence_of_base_dataset method
  • Adjust affected specs

2. Validation of complete parent/grandparent

Because a Full dataset is the only one that can use EB methods, and those are still in use, we need to ensure that one ancestor is still a Dataset::Full and the energy_balance method of the Derived dataset is delegated to that set. Otherwise Refinery will not be able to build the graph.

  • Add a spec that creates a hierarchy of three datasets Full -> Derived -> Derived and test the energy_balance method on the grandchild. (It could very well be that this will pass directly)
  • Add a spec that creates a hierarchy of three datasets Derived -> Derived -> Derived and test the creation of the grandchild. Test if the creation throws a validation error. (This we will build in the next todo)
  • Add validation method validate_presence_of_full_ancestor. It will be something like this:
def has_full_parent?
  Dataset::Full.exists?(base_dataset) || parent.has_full_parent?
end

def validate_presence_of_full_ancestor
  return if has_full_parent?

  errors.add(:base_dataset, 'has no Full parent')
end

3. Chain fallback locations to grandparents etc to pick up the correct curves

A Dataset::Derived may be incomplete because it picks up incomplete items from its parent, like missing curves.

For curves and other csvs the PathResolver will look in all supplied locations (dataset folders). For each Derived dataset the PathResolver is initialised with the method resolve_paths, which returns an array of locations that can be inspected in order of importance. We should alter this method to recursively look at the parent sets. It will be something like this:

def resolve_paths
   [dataset_dir] << parent.resolve_paths
end
  • Update the resolve_paths method in Dataset::Derived
  • Update and write new specs to ensure correct behaviour of located curves etc.

Epilogue: Atlas::Scaler

We now successfully made it possible for Derived datasets to have children. However, the Scaled datasets (these are Derived dataset with a scaler attached) just look only at their direct parent for scaling. If this direct parent is not Full this could possibly lead to some fallout. I did not check this thoroughly as I'd like to discuss our vision on scaled datasets first.

@mabijkerk Let's discuss in our brainstorm how we see the future of scaled datasets and check if and how they are still in use.

NB: if there is time to remove more stuff, I'd like to get rid of the old Preset still present in Atlas. These represent what we now have as featured scenarios in csv format. This has not been used for years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant