Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy_image: fix cache collisions leading to unrelated data being returned #443

Merged
merged 2 commits into from
Sep 3, 2021
Merged

lazy_image: fix cache collisions leading to unrelated data being returned #443

merged 2 commits into from
Sep 3, 2021

Conversation

IRDonch
Copy link

@IRDonch IRDonch commented Sep 2, 2021

Summary

Currently, the key used to look up the cached image is based on a hash of a tuple containing id(self), path, and loader. This means there are two situations in which a lazy_image can look up the wrong data:

  • If there previously existed another lazy_image object whose self and loader had the same object IDs as the current lazy_image's self and loader. This is possible, because deleted objects' IDs can be reused.

  • If a hash collision occurs between the current lazy_image's tuple and some other's.

Fix it by using a weak reference to self as the key instead. Different weak references will only compare equal if they point to the same object.

This will only work correctly if the loader and path of a lazy_image are not modified after creation. I don't think there are any use cases for modifying them (and there are no instances of that happening in the codebase), so it shouldn't be an issue. To reduce the temptation of client code to modify these fields, mark them as private.

Modifying the cache field should not cause issues, but just in case, make it private as well.

Fixes #409 (probably; I never managed to reliably reproduce it)

How to test

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2021 Intel Corporation
#
# SPDX-License-Identifier: MIT

Roman Donchenko added 2 commits September 2, 2021 16:18
…rned

Currently, the key used to look up the cached image is based on a hash of
a tuple containing `id(self)`, `path`, and `loader`. This means there are two
situations in which a `lazy_image` can look up the wrong data:

* If there previously existed another `lazy_image` object whose `self` and
  `loader` had the same object IDs as the current `lazy_image`'s `self` and
  `loader`. This is possible, because deleted objects' IDs can be reused.

* If a hash collision occurs between the current `lazy_image`'s tuple and
  some other's.

Fix it by using a weak reference to `self` as the key instead. Different weak
references will only compare equal if they point to the same object.

This will only work correctly if the loader and path of a `lazy_image` are
not modified after creation. I don't think there are any use cases for
modifying them (and there are no instances of that happening in the codebase),
so it shouldn't be an issue. To reduce the temptation of client code to modify
these fields, mark them as private.

Modifying the `cache` field should not cause issues, but just in case, make it
private as well.
@zhiltsov-max
Copy link
Contributor

This will only work correctly if the loader and path of a lazy_image are not modified after creation.

Maybe, make it frozen / tuple?

I don't see that in the current version different objects can be matched to the same cache entry, because all keys are unique. I won't insist on this requirement (especially, we discovered earlier that cache should have a small number of objects), however, I see such property quite useful.
If you want to implement this, consider using hash(type(self), path, loader). I suspect, there will be similar loader for other media types.

@IRDonch
Copy link
Author

IRDonch commented Sep 2, 2021

Maybe, make it frozen / tuple?

That would require it being an attrs-based class, and I don't think it's worthwhile to make it so. Marking the attributes as private should be sufficient to let client code know not to mess with them.

I don't see that in the current version different objects can be matched to the same cache entry, because all keys are unique.

That is true, but it was also true in the previous version (if you discount hash collisions). I haven't really changed the semantics here.

I won't insist on this requirement (especially, we discovered earlier that cache should have a small number of objects), however, I see such property quite useful.

I have pondered upon how one might implement this, and it seems like it would be pretty complicated.

In general, you can't decide, given an abstract pair of (path, loader), whether the result of calling loader(path) will be the same as it was before. If path is an actual disk path, the file at that path might have already changed. Therefore, to be able to reuse a cache entry of another lazy_image, we would have to have some way of asking the loader whether the underlying resource has changed.

I think this could be done, but that's more effort than I signed up for. I'm just trying to fix the bug. 🙂

If you want to implement this, consider using hash(type(self), path, loader).

FWIW, that would lead to the reoccurrence of this bug. Any kind of compound key would have to be a tuple of objects, not a number.

@zhiltsov-max
Copy link
Contributor

I'm pretty sure this topic will be revisited later, once we start working with large objects, which aren't cheap to load.

@zhiltsov-max zhiltsov-max merged commit df4a0d6 into openvinotoolkit:develop Sep 3, 2021
@IRDonch IRDonch deleted the fix-image-cache-collisions branch September 29, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rare test failures on windows CI
2 participants