Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database schema: set {} as default for DbNode.repository_metadata #4898

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Apr 30, 2021

Fixes #4893

Initially, when the column repository_metadata was added to the
DbNode table, it was set to be nullable since a significant number of
nodes will not actually have any files and therefore this value would be
empty. However, to prevent clients having to deal with a null-value, the
front-end ORM Node.repository_metadata property would return an empty
dictionary in this case, such that the return type is always a
dictionary. The main argument was that this would prevent unnecessary
bytes from being stored in the database.

However, a bug surfaced where some code expected a dictionary for the
repository_metadata but got None. This particular instance was in
the import code, which circumvents the ORM and went straight to the
database. This is of course undesirable, but it also happens through the
QueryBuilder that doesn't transform the returned attributes of
entities through the ORM interface. Given that there are a number of
layers from the ORM to the database, making sure that the typing across
all layers is consistent would be tricky and prone to more bugs. The
most secure solution is to simply set an empty dict as the default on
the database level. The added cost to the database size should still be
minimal and so is an acceptable downside to the increased stability of
the code.

Note that the column in the model is declared both with a server default
as well as a default on the ORM level. The reason is that the server
default is required for the migration. If the column were to be added
without the default, existing rows would violate the non-nullable
clause. For consistency, the server default is also added to the table
column declaration. The ORM default is necessary to guarantee that an
empty dictionary is set on a new DbNode instance when it is created.
SqlAlchemy cannot execute the server default and so would leave it as
None, but we require that even for unstored instances, the value
defaults to an empty dictionary.

Initially, when the column `repository_metadata` was added to the
`DbNode` table, it was set to be nullable since a significant number of
nodes will not actually have any files and therefore this value would be
empty. However, to prevent clients having to deal with a null-value, the
front-end ORM `Node.repository_metadata` property would return an empty
dictionary in this case, such that the return type is always a
dictionary. The main argument was that this would prevent unnecessary
bytes from being stored in the database.

However, a bug surfaced where some code expected a dictionary for the
`repository_metadata` but got `None`. This particular instance was in
the import code, which circumvents the ORM and went straight to the
database. This is of course undesirable, but it also happens through the
`QueryBuilder` that doesn't transform the returned attributes of
entities through the ORM interface. Given that there are a number of
layers from the ORM to the database, making sure that the typing across
all layers is consistent would be tricky and prone to more bugs. The
most secure solution is to simply set an empty dict as the default on
the database level. The added cost to the database size should still be
minimal and so is an acceptable downside to the increased stability of
the code.

Note that the column in the model is declared both with a server default
as well as a default on the ORM level. The reason is that the server
default is required for the migration. If the column were to be added
without the default, existing rows would violate the non-nullable
clause. For consistency, the server default is also added to the table
column declaration. The ORM default is necessary to guarantee that an
empty dictionary is set on a new `DbNode` instance when it is created.
SqlAlchemy cannot execute the server default and so would leave it as
`None`, but we require that even for unstored instances, the value
defaults to an empty dictionary.
@codecov
Copy link

codecov bot commented Apr 30, 2021

Codecov Report

Merging #4898 (bd87ecf) into develop (fa8d05f) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4898      +/-   ##
===========================================
+ Coverage    80.11%   80.12%   +0.01%     
===========================================
  Files          517      517              
  Lines        36659    36658       -1     
===========================================
  Hits         29367    29367              
+ Misses        7292     7291       -1     
Flag Coverage Δ
django 74.60% <75.00%> (+0.01%) ⬆️
sqlalchemy 73.53% <75.00%> (-<0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...db/migrations/0046_add_node_repository_metadata.py 100.00% <ø> (ø)
...migrations/versions/0edcdd5a30f0_dbgroup_extras.py 100.00% <ø> (ø)
aiida/backends/djsite/db/models.py 81.40% <100.00%> (ø)
...sions/7536a82b2cc4_add_node_repository_metadata.py 100.00% <100.00%> (ø)
aiida/backends/sqlalchemy/models/node.py 81.82% <100.00%> (ø)
aiida/orm/nodes/node.py 96.30% <100.00%> (+0.28%) ⬆️
aiida/transports/util.py 62.50% <0.00%> (-3.12%) ⬇️
aiida/transports/plugins/local.py 81.80% <0.00%> (+0.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa8d05f...bd87ecf. Read the comment docs.

@sphuber sphuber requested review from chrisjsewell and mbercx April 30, 2021 12:49
Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber! It all looks good, but I cannot field test the migration because of #4897. Will do some proper tests once that is fixed, but let's merge this for now.

@sphuber
Copy link
Contributor Author

sphuber commented Apr 30, 2021

I am almost ready with a fix for #4897 but it took quite some more rewriting than expected. I found another fundamentally problematic assumption in the code that went by unnoticed until the recent changes.

@sphuber sphuber merged commit 4a347b6 into aiidateam:develop Apr 30, 2021
@sphuber sphuber deleted the fix/4893/repository-metadata-database-default-dict branch April 30, 2021 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use an empty dictionary as the default for the repository_metadata column of DbNode
2 participants