Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code: add additional validation before storing #5204

Closed
wants to merge 1 commit into from

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Oct 28, 2021

Fixes #5203

Up till now it was possible to store a Code instance without a label;
either an empty string or even None. This is now checked in the
_validate method, which is automatically called when store is
called. It will raise a ValidationError if no valid label is defined.

In addition, a check is added for the uniqueness of the full label,
i.e., the string label@computer.label. Since this is usually used as
an identifier for a code, allowing duplicates will make it impossible to
load a code with it and one has to resort to using the PK or UUID.

One would imagine that these conditions should be enforced on the
database level, however, for the empty label this is not
straight-forward as the Code is just a special case of the node table
and there the label is optional and so None and the empty string are
perfectly possible. The same goes for the full label. This is a
uniqueness constraint across two tables, but it requires a WHERE
statement, since it should only apply if node_type == data.code.Code..
This is possible in Postgres through partial indexes, but it is not
clear if both Django and SqlAlchemy support defining these on the models.

@sphuber
Copy link
Contributor Author

sphuber commented Oct 28, 2021

@giovannipizzi @chrisjsewell @ramirezfranciscof any reason why a codes label should be allowed to be empty? I think this should be pretty uncontroversial as a change.

Making the full label unique is bit trickier. I think it is the right thing to do, but are there reasonable use cases to allow duplicates? And if not, are there any risks with disallowing this from v2.0 by merging this PR? I am trying to think if there are potential problems with databases that currently violate this uniqueness constraint. For example, if codes with duplicate full labels are exported and then imported (and this goes through the front-end ORM) it would raise. Good thing is that the label of the code is mutable so people can manually "fix" the problem if they have duplicates.

@sphuber sphuber force-pushed the fix/5203/code-validation branch from d81ed24 to b0f6ac6 Compare October 28, 2021 14:26
@giovannipizzi
Copy link
Member

  1. for a code is should probably not be allowed to be empty, but the property is just the label property of any node, so by default for all other node types it's empty
  2. as you say, the problem is when importing another DB, it's very common to have the same name. I wouldn't enforce a uniqueness at the DB level (we do for computers and we have the same issue, that is fixed "by hand" in the import code by renaming). If we prevent this from happening only when setting up a new code, then I'm OK (if you have an imported code, creating a new one with the same name will raise - then you tell users clearly in the error message to either pick a different name, or rename the other code). To be careful if verdi code list shows all codes by default (it might not if the code was created by someone else, or if it's on an unconfigured computer maybe? The second problem is less critical because probably if you are setting up a code, the computer is configured, the first might be very confusing because with the default flags users don't see it in verdi code list, so we should have a very clear message to check it using the -a -A flags - I don't remember if we need both for verdi code list, probably only one).

@sphuber sphuber force-pushed the fix/5203/code-validation branch 3 times, most recently from f71ee05 to a0e5ea8 Compare October 28, 2021 14:59
Up till now it was possible to store a `Code` instance without a label;
either an empty string or even `None`. This is now checked in the
`_validate` method, which is automatically called when `store` is
called. It will raise a `ValidationError` if no valid label is defined.

In addition, a check is added for the uniqueness of the full label,
i.e., the string `label@computer.label`. Since this is usually used as
an identifier for a code, allowing duplicates will make it impossible to
load a code with it and one has to resort to using the PK or UUID.

One would imagine that these conditions should be enforced on the
database level, however, for the empty label this is not
straight-forward as the `Code` is just a special case of the node table
and there the label is optional and so `None` and the empty string are
perfectly possible. The same goes for the full label. This is a
uniqueness constraint across two tables, but it requires a `WHERE`
statement, since it should only apply if `node_type == data.code.Code.`.
This is possible in Postgres through partial indexes, but it is not
clear if both Django and SqlAlchemy support defining these on the models.

Finally, some tests have been cleaned up:

 * `tests/test_generic.py`: has been deleted and its contents have been
   moved to `tests/orm/data/test_code.py` and `tests/orm/data/test_bool.py`
 * `tests/test_nodes.py`: tests related to the `Code` class have been
   moved to `tests/orm/data/test_code.py`.
@sphuber sphuber force-pushed the fix/5203/code-validation branch from a0e5ea8 to 7e748c9 Compare October 28, 2021 16:22
@sphuber
Copy link
Contributor Author

sphuber commented Oct 28, 2021

Thinking about this, the real problem is maybe that the uniqueness condition is not (label, computer.label) but rather (label, computer.label, user.email). Because User.email is the unique persistent identifier. I think having the same code of the same user for the same computer, doesn't really make sense. So what if I make the uniqueness condition more specific like that?

@codecov
Copy link

codecov bot commented Oct 28, 2021

Codecov Report

Merging #5204 (7e748c9) into develop (bb92dd5) will decrease coverage by 0.02%.
The diff coverage is 90.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5204      +/-   ##
===========================================
- Coverage    81.20%   81.18%   -0.01%     
===========================================
  Files          532      532              
  Lines        37307    37317      +10     
===========================================
+ Hits         30290    30293       +3     
- Misses        7017     7024       +7     
Flag Coverage Δ
django 76.03% <90.00%> (-0.01%) ⬇️
sqlalchemy 75.13% <90.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/orm/nodes/data/code.py 87.56% <90.00%> (-3.39%) ⬇️
aiida/transports/plugins/local.py 81.66% <0.00%> (+0.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb92dd5...7e748c9. Read the comment docs.

@sphuber
Copy link
Contributor Author

sphuber commented Dec 8, 2021

After discussion it was decided that the potential for breaking existing archives and databases was great and the added benefit was small, so we decide to leave this as is.

@sphuber sphuber closed this Dec 8, 2021
@sphuber sphuber deleted the fix/5203/code-validation branch December 8, 2021 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Code can be stored without a label whatsoever and duplicate full label
3 participants