-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database schema: set {}
as default for DbNode.repository_metadata
#4898
Database schema: set {}
as default for DbNode.repository_metadata
#4898
Conversation
Initially, when the column `repository_metadata` was added to the `DbNode` table, it was set to be nullable since a significant number of nodes will not actually have any files and therefore this value would be empty. However, to prevent clients having to deal with a null-value, the front-end ORM `Node.repository_metadata` property would return an empty dictionary in this case, such that the return type is always a dictionary. The main argument was that this would prevent unnecessary bytes from being stored in the database. However, a bug surfaced where some code expected a dictionary for the `repository_metadata` but got `None`. This particular instance was in the import code, which circumvents the ORM and went straight to the database. This is of course undesirable, but it also happens through the `QueryBuilder` that doesn't transform the returned attributes of entities through the ORM interface. Given that there are a number of layers from the ORM to the database, making sure that the typing across all layers is consistent would be tricky and prone to more bugs. The most secure solution is to simply set an empty dict as the default on the database level. The added cost to the database size should still be minimal and so is an acceptable downside to the increased stability of the code. Note that the column in the model is declared both with a server default as well as a default on the ORM level. The reason is that the server default is required for the migration. If the column were to be added without the default, existing rows would violate the non-nullable clause. For consistency, the server default is also added to the table column declaration. The ORM default is necessary to guarantee that an empty dictionary is set on a new `DbNode` instance when it is created. SqlAlchemy cannot execute the server default and so would leave it as `None`, but we require that even for unstored instances, the value defaults to an empty dictionary.
Codecov Report
@@ Coverage Diff @@
## develop #4898 +/- ##
===========================================
+ Coverage 80.11% 80.12% +0.01%
===========================================
Files 517 517
Lines 36659 36658 -1
===========================================
Hits 29367 29367
+ Misses 7292 7291 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am almost ready with a fix for #4897 but it took quite some more rewriting than expected. I found another fundamentally problematic assumption in the code that went by unnoticed until the recent changes. |
Fixes #4893
Initially, when the column
repository_metadata
was added to theDbNode
table, it was set to be nullable since a significant number ofnodes will not actually have any files and therefore this value would be
empty. However, to prevent clients having to deal with a null-value, the
front-end ORM
Node.repository_metadata
property would return an emptydictionary in this case, such that the return type is always a
dictionary. The main argument was that this would prevent unnecessary
bytes from being stored in the database.
However, a bug surfaced where some code expected a dictionary for the
repository_metadata
but gotNone
. This particular instance was inthe import code, which circumvents the ORM and went straight to the
database. This is of course undesirable, but it also happens through the
QueryBuilder
that doesn't transform the returned attributes ofentities through the ORM interface. Given that there are a number of
layers from the ORM to the database, making sure that the typing across
all layers is consistent would be tricky and prone to more bugs. The
most secure solution is to simply set an empty dict as the default on
the database level. The added cost to the database size should still be
minimal and so is an acceptable downside to the increased stability of
the code.
Note that the column in the model is declared both with a server default
as well as a default on the ORM level. The reason is that the server
default is required for the migration. If the column were to be added
without the default, existing rows would violate the non-nullable
clause. For consistency, the server default is also added to the table
column declaration. The ORM default is necessary to guarantee that an
empty dictionary is set on a new
DbNode
instance when it is created.SqlAlchemy cannot execute the server default and so would leave it as
None
, but we require that even for unstored instances, the valuedefaults to an empty dictionary.