Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include optimization parameter basis #79

Merged

Conversation

glatterf42
Copy link
Member

As promised in a recent team meeting, the next steps for the message_ix data model work will use a more trunk-based development approach. This is where I got to today with introducing optimization.Parameters. The tests for creating and getting them pass for the DB layer, others are still disabled. Ruff might complain that some temporarily-commented lines in the test file are now too long.
If the tests run at all, maybe they only trigger on PRs to main. I could enable them for PRs to include/optimization-parameter as well or we merge this one and then set up a soon-to-be-final PR to main to collect further changes.

I'm not yet sure this is the best way to handle the values and units: as separate lists. I will test if this behaves as desired/expected and will rework if it doesn't.

@glatterf42 glatterf42 added the enhancement New feature or request label Apr 15, 2024
@glatterf42 glatterf42 self-assigned this Apr 15, 2024
import importlib.metadata

from ixmp4.core import IndexSet as IndexSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we really need to import all of these objects at the top-level? I guess that except for Platform, we will never call them directly, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if some downstream packages already use that syntax. We seem to only use it for our tests, where we regularly do from ixmp4 import Variable, from ixmp4 import Model, or from ixmp4 import IndexSet. I don't think this is necessary, but I also don't know what harm it's causing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no downstream packages yet, so better to keep the top-namespace lean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So out of all these imports, which would we want to keep? Only Platform?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd at least not add more imports at this point, and we can do a separate remove-imports-PR later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yes, I think that only Platform is the one that is already used - at least in pyam.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another useful one is "Datapoint" because we sometimes need the "Datapoint.Type" enum in user code.
But i think its time we find a better solution for that anyway...

from ixmp4.data.abstract.optimization import Column


class Parameter(BaseModelFacade):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not a child-object of the Table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are extremely similar and to some extent, we could have Parameter inherit from Table. I didn't do that for now because I thought if we then go and update Table, it might not be obvious that Parameter changes, too. Maybe the cleanest option is to set up a BaseOptimizationItem somewhere that contains the common ground between Table, Parameter, Variable, and Equation, and have all of them inherit from there.

Copy link
Member

@danielhuppmann danielhuppmann Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is any feature in Table that we would not want to be inherited by the other optimization-objects... So better to keep the codebase simple for now and extend later once this becomes relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not as straightforward as I thought it would be. Naively trying to extract the maximum number of properties to a common base, I receive

E   sqlalchemy.exc.NoForeignKeysError: Can't determine the inherit condition between inherited table 'optimization_baseoptimizationitemmodel' and inheriting table 'optimization_table'; tables have no foreign key relationships established.  Please ensure the inheriting table has a foreign key relationship to the inherited table, or provide an 'on clause' using the 'inherit_condition' mapper argument.

So it looks like if I set up columns in a common base class, sqlalchemy interprets that as a separate table and inheriting tables need to be linked via ForeignKeys, which is not what we want.

@glatterf42
Copy link
Member Author

glatterf42 commented Apr 26, 2024

I can't quite get the connection between Parameter and Unit to work. In theory, we want every Parameter to be able to have multiple Units and every Unit (defined for the RunPlatform) should be usable by multiple Parameters. The relationship pattern that most closely resembles this is a many to many relationship.
These typically work via an association table, i.e. a table in-between Parameter and Unit that simply holds Parameter.id and Unit.id pairs, so that each Parameter and Unit can refer to all pairs that include their own id. However, this association table wants to be as efficient as possible, so it half expects and is half recommended to only contain every pair of (Parameter.id, Unit.id) just once, and all examples I can find online only show how to set the relationship up that way.
Unfortunately, our needs are more complicated since we want Parameter.units to be a list of Units, which are not necessarily distinct (if I understand correctly, please correct me if I'm wrong). So a Parameter may contain several rows of data with the same Unit, which may even be ordered arbitrarily. Thus, our storage of this Unit list needs to preserve order and be able to store the same Unit more than once.
With the current many-to-many setup, I have not been able to make this work. I've also read up on association objects and association proxy in general and I'm not sure we can make it work. We would need to create a purposefully-inefficient table between Parameter and Unit or figure out a way to do something like this:

  1. Receive a list of Units (via unit.name, so we need to get() the Units first)
  2. Create entries in the association table with Parameter.id and Unit.ids
  3. Set Parameter.units to the Unit objects associated with Parameter, keeping the order of the input list and potentially get()ting the Units first

The advantages would be that other layers (such as the facade layer) would be able to access Parameter.units directly without having to get() Units first; that we could keep as many always-present columns as possible outside of the JSON field data (possibly increasing performance); that the relation between Parameter and Unit would be baked into the DB layer and apparent from there.

The alternative would be to simply keep units (and values, while we're at it) parts of data. This would eliminate both the need to care about order as this is taken care of by JSON and the need to invest more time now to come up with a solution that seems to be pretty non-standard, so would likely be fragile. We could also somewhat bake the relationship into the DB layer through the validator function for data. In there, we could try to get() all distinct Unit names once and raise an error if a Unit doesn't exist. This way, we would still ensure that all units (which we would store as their unit.names) exist in the Unit list of the RunPlatform, as I would understand the current ixmp docs. The only real downside would be that we would return a Parameter which doesn't come with the Unit objects pre-loaded, so users would have to manually get() them if they need them. That sounds acceptable to me.

@danielhuppmann, please let me know what you think.

@danielhuppmann
Copy link
Member

@glatterf42, I guess that this PR should now be directed to the main branch. The files-changed section show quite a number of changes that are already merged.

@danielhuppmann

This comment was marked as resolved.

@glatterf42
Copy link
Member Author

glatterf42 commented Apr 26, 2024

@glatterf42, I guess that this PR should now be directed to the main branch. The files-changed section show quite a number of changes that are already merged.

The idea was to merge this PR to include/optimization-parameter, which could collect changes to this part of the DB model before it's merged to main. That collection branch should also recognize which commits are already part of main, but maybe this one doesn't since it doesn't know where the collection branch is aiming.
The reason I didn't merge it yet was because I wanted to address your question why Parameter doesn't inherit from Table, which I did locally already based on #82, but didn't push here yet because I wanted to still expand the tests a little, too -- which was when I encountered the current issue.
And yes, that is technically not in line with trunk-based development and I should have just focused on the mixins here.

@danielhuppmann
Copy link
Member

danielhuppmann commented Apr 26, 2024

I guess that you have to rebase include/optimization-parameter onto main (now and every time other changes are merged there).

@glatterf42
Copy link
Member Author

I guess that you have to rebase optimization\parameter onto main (now and every time other changes are merged there).

True; at the moment, I'd have to rebase the mixin branch on top of main, the collection branch on top of that, and this branch on top of the new collection branch.

@danielhuppmann
Copy link
Member

danielhuppmann commented Apr 26, 2024

Re the actual question, without looking at the code yet (because of the rebase-branch-mixin-mixup):

I think that also our implementation of the "foreign-keys" using index-sets and columns can only be validated at the code level (not the database). Therefore, I think it's ok that we also do that for units, hence storing the columns plus units plus values in data.

@glatterf42
Copy link
Member Author

Great, thanks :)

glatterf42 added a commit that referenced this pull request Apr 26, 2024
* Does not work currently, see #79
* commit for future reference for many-to-many relationship setup attempt
@glatterf42
Copy link
Member Author

glatterf42 commented Apr 26, 2024

With this clarification, all uncommented tests should now work (and that's even more than before), so I'd be in favor of merging this PR and opening another one for the remaining tests and code layers :)

* Covers:
* run__id, data, name, uniqueness of name together with run__id
* Adapts tests since default order of columns changes
* Does not work currently, see #79
* commit for future reference for many-to-many relationship setup attempt
@glatterf42 glatterf42 force-pushed the include/optimization-parameter-basis branch from 5bd5a23 to efd1160 Compare June 27, 2024 12:30
* Enable remaining parameter tests
* Include optimization parameter api layer (#89)
* Bump several dependency versions
* Let api/column handle both tables and parameters
* Make api-layer tests pass
* Include optimization parameter core layer (#90)
* Enable parameter core layer and test it
* Fix things after rebase
* Ensure all intended changes survive the rebase
* Adapt data validation function for parameters
* Allow tests to pass again
@glatterf42 glatterf42 merged commit 5c8714d into include/optimization-parameter Jun 28, 2024
9 checks passed
@glatterf42 glatterf42 deleted the include/optimization-parameter-basis branch June 28, 2024 08:45
glatterf42 added a commit that referenced this pull request Jul 10, 2024
* Make Column generic enough for multiple parents
* Introduce optimization.Parameter
* Add tests for add_data
* Enable remaining parameter tests (#86)
* Enable remaining parameter tests
* Include optimization parameter api layer (#89)
* Bump several dependency versions
* Let api/column handle both tables and parameters
* Make api-layer tests pass
* Include optimization parameter core layer (#90)
* Enable parameter core layer and test it
* Fix things after rebase
* Ensure all intended changes survive the rebase
* Adapt data validation function for parameters
* Allow tests to pass again
glatterf42 added a commit that referenced this pull request Aug 22, 2024
* Make Column generic enough for multiple parents
* Introduce optimization.Parameter
* Add tests for add_data
* Enable remaining parameter tests (#86)
* Enable remaining parameter tests
* Include optimization parameter api layer (#89)
* Bump several dependency versions
* Let api/column handle both tables and parameters
* Make api-layer tests pass
* Include optimization parameter core layer (#90)
* Enable parameter core layer and test it
* Fix things after rebase
* Ensure all intended changes survive the rebase
* Adapt data validation function for parameters
* Allow tests to pass again
glatterf42 added a commit that referenced this pull request Sep 30, 2024
* Make Column generic enough for multiple parents
* Introduce optimization.Parameter
* Add tests for add_data
* Enable remaining parameter tests (#86)
* Enable remaining parameter tests
* Include optimization parameter api layer (#89)
* Bump several dependency versions
* Let api/column handle both tables and parameters
* Make api-layer tests pass
* Include optimization parameter core layer (#90)
* Enable parameter core layer and test it
* Fix things after rebase
* Ensure all intended changes survive the rebase
* Adapt data validation function for parameters
* Allow tests to pass again
glatterf42 added a commit that referenced this pull request Oct 1, 2024
* Make Column generic enough for multiple parents
* Introduce optimization.Parameter
* Add tests for add_data
* Enable remaining parameter tests (#86)
* Enable remaining parameter tests
* Include optimization parameter api layer (#89)
* Bump several dependency versions
* Let api/column handle both tables and parameters
* Make api-layer tests pass
* Include optimization parameter core layer (#90)
* Enable parameter core layer and test it
* Fix things after rebase
* Ensure all intended changes survive the rebase
* Adapt data validation function for parameters
* Allow tests to pass again
glatterf42 added a commit that referenced this pull request Oct 3, 2024
* Include optimization parameter basis (#79)
* Fix references to DB filters in docs
* Streamline naming in tests
* Fix and test parameter list and tabulate for specific runs
* Make indexset-creation a test utility
* Incorporate changes from #110
* Use pandas for updated add_data behaviour
* Raise minimum pandas version to enable add_data upsert
* Generalize UsageError for more optimization items
* Use generalized UsageError for Table
* Use own errors for Parameter
glatterf42 added a commit that referenced this pull request Oct 3, 2024
* Make Column generic enough for multiple parents
* Introduce optimization.Parameter
* Add tests for add_data
* Enable remaining parameter tests (#86)
* Enable remaining parameter tests
* Include optimization parameter api layer (#89)
* Bump several dependency versions
* Let api/column handle both tables and parameters
* Make api-layer tests pass
* Include optimization parameter core layer (#90)
* Enable parameter core layer and test it
* Fix things after rebase
* Ensure all intended changes survive the rebase
* Adapt data validation function for parameters
* Allow tests to pass again
glatterf42 added a commit that referenced this pull request Jan 16, 2025
* Include optimization parameter basis (#79)
* Fix references to DB filters in docs
* Streamline naming in tests
* Fix and test parameter list and tabulate for specific runs
* Make indexset-creation a test utility
* Incorporate changes from #110
* Use pandas for updated add_data behaviour
* Raise minimum pandas version to enable add_data upsert
* Generalize UsageError for more optimization items
* Use generalized UsageError for Table
* Use own errors for Parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants