Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify induced_class() / induced_slot() so it materializes non-scalar metaslots as well #335

Merged
merged 6 commits into from
Aug 8, 2024

Conversation

sujaypatil96
Copy link
Member

@sujaypatil96 sujaypatil96 commented Jul 31, 2024

Fixes linkml/linkml#2224

There are 3 ways in which we could have written

  1. if v2:
  2. if v2 is not None and ((isinstance(v2, (dict, list)) and v2) or (isinstance(v2, JsonObj) and as_dict(v2))):
  3. if not is_empty(v2):

Definition of is_empty()

def is_empty(v: Any) -> bool:
"""
Determine whether v is considered "empty" in the LinkML context.
An element is "empty" if:
1) it is None
2) It is an empty dictionary
3) It is an empty list
4) It is an empty JsonObj
"""
return v is None or (isinstance(v, (dict, list)) and not v) or (isinstance(v, JsonObj) and not as_dict(v))

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.49%. Comparing base (37297a8) to head (ba677c6).
Report is 14 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #335      +/-   ##
==========================================
+ Coverage   63.44%   63.49%   +0.04%     
==========================================
  Files          63       63              
  Lines        8942     8943       +1     
  Branches     2569     2569              
==========================================
+ Hits         5673     5678       +5     
+ Misses       2648     2646       -2     
+ Partials      621      619       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sneakers-the-rat
Copy link
Contributor

well it looks like it made the upstream tests fail in an interesting way. will put my comments on the actual PR in a review but first...

the error is here, keyerror on domain_of: https://github.com/linkml/linkml/blob/c26d8995b164223d0dc074721fc30cf96e4965e0/tests/test_generators/test_docgen.py#L320

first, the test is wrong - the source schema does not have domain_of set for Person.slot_usage.species_name (nor would it really make sense to set domain_of in slot_usage): https://github.com/linkml/linkml/blob/c26d8995b164223d0dc074721fc30cf96e4965e0/tests/test_generators/input/kitchen_sink.yaml#L142-L143

what's sort of funny is that what must have been happening there is that the list for Person.slot_usage['species_name'].domain_of was bound to Person.species_name.domain_of since that's what the old behavior was, so then later when that list is populated then the slot_usage version of that was also populated since it's the same object.

slot gotten earlier in the method is shallow copied so that should break the object identity relation, but ancestor_class.slot.slot_usage is not, so it is vulnerable to mutation, hence why even with the old old behavior of multiple deepcopies in induced_slot that this behavior persisted in the test.

that would be a real big problem if nonscalar slots metaslots were ever mutated outside of the schemaview (unfortunately they are), since then any values set on the slot would have the wildly unpredictable behavior of propagating back up to the slot usage of the last ancestor to have it set, which would then propagate back down to any other slots that inherit from it.

this is additionally a problem because induced_slot is often called twice or three times for each slot despite caching, depending on the generator (for each time the signature differs), so we should expect this to lead to nondeterministic behavior that depends on the number of times a slot is generated and how its inheritance tree branches * how slot_usage is used in it.

so anyway, obvi this PR should only be merged along with a patch to the upstream tests, but i think we are probably in for a bit of a ride as we figure out what other unanticipated behavior this method has in store for us lol

Copy link
Contributor

@sneakers-the-rat sneakers-the-rat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to induced_slot looks fine! probably the best we can do given the state of JsonObj et al making stuff so expensive. It does add a few %s more runtime per call to what is already the most expensive method in the library, which is not great, but it is what it is.

reason why i'm requesting changes is that the test here doesn't test for the problem in linkml/linkml#2224 - this test passes when I run it against the old behavior, while the MWE here correctly fails. good practice when bugfixing - write the test first, check that it fails, fix the bug, check that tests pass.

the bug is that slot usage for an unrelated metaslot causes a different nonscalar metaslot to be overridden - ie. tempo should have some nonscalar metaslot value set, DJController should have slot_usage for a different metaslot set, and then the test should assert that the first metaslot is unchanged. So, only override one or the other attr, check that the non-overridden one is the value of the base definition, check that the overridden one is the value that's set in slot_usage.

rest of comments are style things intended for maintainability :)

…_2224.yaml`

Co-authored-by: Jonny Saunders <sneakers-the-rat@protonmail.com>
Copy link
Member Author

@sujaypatil96 sujaypatil96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @sneakers-the-rat!! i'll make the necessary changes based on your review comments and request another review!

Copy link
Contributor

@sneakers-the-rat sneakers-the-rat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only thing I would change is adding a docstring to the test asserting what it tests, but good with me with or without ♥

tests/test_utils/test_schemaview.py Show resolved Hide resolved
@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Aug 1, 2024

Before we merge we should re-run upstream tests and patch that at the same time so we dont forget :)

(Should we just make upstream tests be on by default like the rest of the tests? Seems like in practice this way doesnt rly work bc have to keep manually re-running, even if it was a good idea in principle to save compute time.)

@sujaypatil96
Copy link
Member Author

I looked at the upstream linkml test that's causing the error in test_docgen.py, and I think it's okay to remove this assertion: https://github.com/linkml/linkml/blob/main/tests/test_generators/test_docgen.py#L320-L321

The contents of person_dict["slot_usage"] looks like below, there's no domain_of appearing in the materialization anymore:

{
    "name": {"name": "name", "pattern": "^\\S+ \\S+$"},
    "species name": {"equals_string": "human", "name": "species name"},
    "stomach count": {"equals_number": 1, "name": "stomach count"},
}

…t_utils/test_schemaview.py`

Co-authored-by: Jonny Saunders <sneakers-the-rat@protonmail.com>
@sneakers-the-rat
Copy link
Contributor

sneakers-the-rat commented Aug 2, 2024

Actually yno what would be a good idea is writing a specific test for that not happening. I can follow up with that.

We should probably consider anything that is specifically touched within induced slot worthy of close testing, since almost more than any other method im aware of, if it is doing unexpected things the whole everything could break

@sujaypatil96
Copy link
Member Author

Actually yno what would be a good idea is writing a specific test for that not happening. I can follow up with that.

Thank you!! will you make a PR for this? i'll be happy to review it 😁 once that patch is ready to go, we can merge this PR in first, and then the patch PR on linkml.

@sneakers-the-rat
Copy link
Contributor

Lets put that test over here, no? Since its a test of schemaview behavior?

@cmungall
Copy link
Member

cmungall commented Aug 2, 2024

@sierra-moxon noticed that the example here: linkml/linkml#2224 (comment) has an override, can we change the test example so the range at the slot level uses Any as a range?

@sujaypatil96
Copy link
Member Author

Made the change requested here in this commit (ba677c6)

@sujaypatil96 sujaypatil96 merged commit 2271044 into main Aug 8, 2024
9 checks passed
@sujaypatil96 sujaypatil96 deleted the issue-2224 branch August 8, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

some (multivalued?) slot definition metaslot values are lost after SchemaView.induced_class()
3 participants