Modify `induced_class()` / `induced_slot()` so it materializes non-scalar metaslots as well #335

sujaypatil96 · 2024-07-31T23:38:51Z

Fixes linkml/linkml#2224

There are 3 ways in which we could have written

linkml-runtime/linkml_runtime/utils/schemaview.py

Line 1376 in 798591b

if not is_empty(v2):

if v2:
if v2 is not None and ((isinstance(v2, (dict, list)) and v2) or (isinstance(v2, JsonObj) and as_dict(v2))):
if not is_empty(v2):

Definition of is_empty()

linkml-runtime/linkml_runtime/utils/formatutils.py

Lines 112 to 121 in 798591b

    
           def is_empty(v: Any) -> bool: 
        
               """ 
        
               Determine whether v is considered "empty" in the LinkML context. 
        
               An element is "empty" if: 
        
               1) it is None 
        
               2) It is an empty dictionary 
        
               3) It is an empty list 
        
               4) It is an empty JsonObj 
        
               """ 
        
               return v is None or (isinstance(v, (dict, list)) and not v) or (isinstance(v, JsonObj) and not as_dict(v))

codecov · 2024-07-31T23:40:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.49%. Comparing base (37297a8) to head (ba677c6).
Report is 14 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #335      +/-   ##
==========================================
+ Coverage   63.44%   63.49%   +0.04%     
==========================================
  Files          63       63              
  Lines        8942     8943       +1     
  Branches     2569     2569              
==========================================
+ Hits         5673     5678       +5     
+ Misses       2648     2646       -2     
+ Partials      621      619       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sneakers-the-rat · 2024-08-01T05:33:55Z

well it looks like it made the upstream tests fail in an interesting way. will put my comments on the actual PR in a review but first...

the error is here, keyerror on domain_of: https://github.com/linkml/linkml/blob/c26d8995b164223d0dc074721fc30cf96e4965e0/tests/test_generators/test_docgen.py#L320

first, the test is wrong - the source schema does not have domain_of set for Person.slot_usage.species_name (nor would it really make sense to set domain_of in slot_usage): https://github.com/linkml/linkml/blob/c26d8995b164223d0dc074721fc30cf96e4965e0/tests/test_generators/input/kitchen_sink.yaml#L142-L143

what's sort of funny is that what must have been happening there is that the list for Person.slot_usage['species_name'].domain_of was bound to Person.species_name.domain_of since that's what the old behavior was, so then later when that list is populated then the slot_usage version of that was also populated since it's the same object.

slot gotten earlier in the method is shallow copied so that should break the object identity relation, but ancestor_class.slot.slot_usage is not, so it is vulnerable to mutation, hence why even with the old old behavior of multiple deepcopies in induced_slot that this behavior persisted in the test.

that would be a real big problem if nonscalar slots metaslots were ever mutated outside of the schemaview (unfortunately they are), since then any values set on the slot would have the wildly unpredictable behavior of propagating back up to the slot usage of the last ancestor to have it set, which would then propagate back down to any other slots that inherit from it.

this is additionally a problem because induced_slot is often called twice or three times for each slot despite caching, depending on the generator (for each time the signature differs), so we should expect this to lead to nondeterministic behavior that depends on the number of times a slot is generated and how its inheritance tree branches * how slot_usage is used in it.

so anyway, obvi this PR should only be merged along with a patch to the upstream tests, but i think we are probably in for a bit of a ride as we figure out what other unanticipated behavior this method has in store for us lol

tests/test_issues/test_issue_2224.py

tests/test_issues/input/linkml_issue_2224.yaml

tests/test_issues/test_issue_2224.py

linkml_runtime/utils/schemaview.py

tests/test_issues/test_issue_2224.py

sneakers-the-rat

change to induced_slot looks fine! probably the best we can do given the state of JsonObj et al making stuff so expensive. It does add a few %s more runtime per call to what is already the most expensive method in the library, which is not great, but it is what it is.

reason why i'm requesting changes is that the test here doesn't test for the problem in linkml/linkml#2224 - this test passes when I run it against the old behavior, while the MWE here correctly fails. good practice when bugfixing - write the test first, check that it fails, fix the bug, check that tests pass.

the bug is that slot usage for an unrelated metaslot causes a different nonscalar metaslot to be overridden - ie. tempo should have some nonscalar metaslot value set, DJController should have slot_usage for a different metaslot set, and then the test should assert that the first metaslot is unchanged. So, only override one or the other attr, check that the non-overridden one is the value of the base definition, check that the overridden one is the value that's set in slot_usage.

rest of comments are style things intended for maintainability :)

…_2224.yaml` Co-authored-by: Jonny Saunders <sneakers-the-rat@protonmail.com>

sujaypatil96

Thank you for the review @sneakers-the-rat!! i'll make the necessary changes based on your review comments and request another review!

…ot_usage()

sneakers-the-rat

Looks great! Only thing I would change is adding a docstring to the test asserting what it tests, but good with me with or without ♥

tests/test_utils/test_schemaview.py

sneakers-the-rat · 2024-08-01T23:07:04Z

Before we merge we should re-run upstream tests and patch that at the same time so we dont forget :)

(Should we just make upstream tests be on by default like the rest of the tests? Seems like in practice this way doesnt rly work bc have to keep manually re-running, even if it was a good idea in principle to save compute time.)

sujaypatil96 · 2024-08-01T23:55:44Z

I looked at the upstream linkml test that's causing the error in test_docgen.py, and I think it's okay to remove this assertion: https://github.com/linkml/linkml/blob/main/tests/test_generators/test_docgen.py#L320-L321

The contents of person_dict["slot_usage"] looks like below, there's no domain_of appearing in the materialization anymore:

{
    "name": {"name": "name", "pattern": "^\\S+ \\S+$"},
    "species name": {"equals_string": "human", "name": "species name"},
    "stomach count": {"equals_number": 1, "name": "stomach count"},
}

…t_utils/test_schemaview.py` Co-authored-by: Jonny Saunders <sneakers-the-rat@protonmail.com>

sneakers-the-rat · 2024-08-02T03:44:27Z

Actually yno what would be a good idea is writing a specific test for that not happening. I can follow up with that.

We should probably consider anything that is specifically touched within induced slot worthy of close testing, since almost more than any other method im aware of, if it is doing unexpected things the whole everything could break

sujaypatil96 · 2024-08-02T18:14:27Z

Actually yno what would be a good idea is writing a specific test for that not happening. I can follow up with that.

Thank you!! will you make a PR for this? i'll be happy to review it 😁 once that patch is ready to go, we can merge this PR in first, and then the patch PR on linkml.

sneakers-the-rat · 2024-08-02T19:55:34Z

Lets put that test over here, no? Since its a test of schemaview behavior?

cmungall · 2024-08-02T20:23:17Z

@sierra-moxon noticed that the example here: linkml/linkml#2224 (comment) has an override, can we change the test example so the range at the slot level uses Any as a range?

sujaypatil96 · 2024-08-08T00:59:09Z

Made the change requested here in this commit (ba677c6)

induced_slot() now materializes non-scalar metaslots

798591b

sujaypatil96 requested review from cmungall and sneakers-the-rat July 31, 2024 23:42

cmungall approved these changes Aug 1, 2024

View reviewed changes