Allow Derived datasets to be parents, if one of theire ancestors is Full #171

noracato · 2025-03-04T16:16:37Z

This is a small development plan for allowing any Dataset::Derived to be a parent, as long as one ancestor remains Dataset::Full. This is the first step of the new dataset pipelines project.

There are three identified steps to be taken in Atlas:

Every dataset can be a parent, not just full datasets
Validation of complete parent/grandparent
Chain fallback locations to grandparents etc to pick up the correct curves

I will describe what needs to happen for each of them.

Prologue

Before we start, let's clean up some deprecated code and methods with confusing names from the Dataset and Dataset::Derived models.

Initializer Inputs

As long as I can remember these have been deprecated, but the code is still present in the Dataset::Derived model. These inputs were used to initialise Derived sets that did not originate from ETLocal and thus had no graph values.

Remove the InitializerInput model
Remove any references to and validations of InitializerInput from the Dataset::Derived model. This includes everything surrounding uses_deprecated_initializer_inputs
Remove any affected specs

PARENT_VALUE

In Runtime Atlas defines methods that can be called for the dataset within nodes and edges to build the present graph (so for Refinery). This includes methods like EB and PRIMARY_PRODUCTION. The method PARENT_VALUE has been unused for a long time and references an obsolete csv file demands/parent_values.csv within the dataset. The name for this method can become confusing while we work on this project as well as in the future, so I'd like to get rid of it.

Remove PARENT_VALUE from Atlas::Runtime
Remove parent_values method from Dataset
Check for any of these obsolete csv's still hanging around in ETSource

1. Every dataset can be a parent, not just full datasets

Our first step does not include validation of a Full ancestor yet. We are merely setting up the hierarchy.

In the Dataset::Derived model, the parent should be able to be a Derived dataset:

def parent
  Dataset.find(base_dataset) # Was Dataset::Full.find(base_dataset)
end

The same goes for validate_presence_of_base_dataset.

Adjust parent method
Adjust validate_presence_of_base_dataset method
Adjust affected specs

2. Validation of complete parent/grandparent

Because a Full dataset is the only one that can use EB methods, and those are still in use, we need to ensure that one ancestor is still a Dataset::Full and the energy_balance method of the Derived dataset is delegated to that set. Otherwise Refinery will not be able to build the graph.

Add a spec that creates a hierarchy of three datasets Full -> Derived -> Derived and test the energy_balance method on the grandchild. (It could very well be that this will pass directly)
Add a spec that creates a hierarchy of three datasets Derived -> Derived -> Derived and test the creation of the grandchild. Test if the creation throws a validation error. (This we will build in the next todo)
Add validation method validate_presence_of_full_ancestor. It will be something like this:

def has_full_parent?
  Dataset::Full.exists?(base_dataset) || parent.has_full_parent?
end

def validate_presence_of_full_ancestor
  return if has_full_parent?

  errors.add(:base_dataset, 'has no Full parent')
end

3. Chain fallback locations to grandparents etc to pick up the correct curves

A Dataset::Derived may be incomplete because it picks up incomplete items from its parent, like missing curves.

For curves and other csvs the PathResolver will look in all supplied locations (dataset folders). For each Derived dataset the PathResolver is initialised with the method resolve_paths, which returns an array of locations that can be inspected in order of importance. We should alter this method to recursively look at the parent sets. It will be something like this:

def resolve_paths
   [dataset_dir] << parent.resolve_paths
end

Update the resolve_paths method in Dataset::Derived
Update and write new specs to ensure correct behaviour of located curves etc.

Epilogue: `Atlas::Scaler`

We now successfully made it possible for Derived datasets to have children. However, the Scaled datasets (these are Derived dataset with a scaler attached) just look only at their direct parent for scaling. If this direct parent is not Full this could possibly lead to some fallout. I did not check this thoroughly as I'd like to discuss our vision on scaled datasets first.

@mabijkerk Let's discuss in our brainstorm how we see the future of scaled datasets and check if and how they are still in use.

NB: if there is time to remove more stuff, I'd like to get rid of the old Preset still present in Atlas. These represent what we now have as featured scenarios in csv format. This has not been used for years.

The text was updated successfully, but these errors were encountered:

noracato mentioned this issue Mar 6, 2025

Allow Derived datasets to be parents quintel/etlocal#579

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Derived datasets to be parents, if one of theire ancestors is Full #171

Allow Derived datasets to be parents, if one of theire ancestors is Full #171

noracato commented Mar 4, 2025 •

edited

Loading

Allow Derived datasets to be parents, if one of theire ancestors is Full #171

Allow Derived datasets to be parents, if one of theire ancestors is Full #171

Comments

noracato commented Mar 4, 2025 • edited Loading

Prologue

Initializer Inputs

PARENT_VALUE

1. Every dataset can be a parent, not just full datasets

2. Validation of complete parent/grandparent

3. Chain fallback locations to grandparents etc to pick up the correct curves

Epilogue: Atlas::Scaler

noracato commented Mar 4, 2025 •

edited

Loading

Epilogue: `Atlas::Scaler`