Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All sampled images in image list #88

Merged
merged 22 commits into from
Sep 11, 2024
Merged

All sampled images in image list #88

merged 22 commits into from
Sep 11, 2024

Conversation

PaulHax
Copy link
Collaborator

@PaulHax PaulHax commented Aug 6, 2024

Now, the "image list" shows all images sampled from the dataset. As you scroll down the table, images are loaded, transformed, and object detected. To know what images are in view within the table, the quasar.QTable component has some "virtual scrolling" features that trigger a callback with the "in view" row numbers.

When you select images from the embeddings plot, the app filters the image list. Before, you had to select images from the embeddings plot to show them in the image list.

images.py caches loaded images, transformed images, and their object detection results. images.py uses a least recently used cache strategy.

To seperate caching method from downstream processing, embeddings_extractor.py and object_detector.py now take only loaded PIL.Images as arguments.

@PaulHax PaulHax marked this pull request as draft August 6, 2024 20:29
@PaulHax PaulHax force-pushed the all-images branch 2 times, most recently from da0bb37 to 993a697 Compare August 6, 2024 20:40
@PaulHax PaulHax marked this pull request as ready for review August 6, 2024 21:57
@PaulHax PaulHax requested a review from vicentebolea August 7, 2024 15:18
@PaulHax PaulHax force-pushed the all-images branch 2 times, most recently from e65f08e to 98b0b6c Compare August 21, 2024 15:30
This was referenced Aug 21, 2024
@PaulHax
Copy link
Collaborator Author

PaulHax commented Aug 21, 2024

I looked at using functools.lru_cache for the images.get_image function. images.py code runs some side-effects when a image is dropped from the cache (deletes image from the trame state). I did not see a "item ejected" callback for lru_cache.

@PaulHax PaulHax force-pushed the all-images branch 2 times, most recently from 022d09b to 1418c18 Compare August 21, 2024 17:58
@vicentebolea
Copy link
Collaborator

@PaulHax it seems that the current BufferCache does not remove the item from the trame cache, in any case I believe that we should detach the cache from the trame state, the evicting instruction is self.cache.popitem(last=False). lru_cache decorator should be able to substitute this.

@vicentebolea
Copy link
Collaborator

@PaulHax looks good, I want to better understand this, we went through the changes a couple of weeks ago but I Want to review this in more detail. I wonder if you could edit the description to add a brief note of the feature, the reason for adding the proposed new modules and why we want to change the existing "library" modules.

@PaulHax
Copy link
Collaborator Author

PaulHax commented Aug 29, 2024

it seems that the current BufferCache does not remove the item from the trame cache,

Holy mem leak Batman! Good catch. I fix in PR cuz i can't push to this branch anymore =)

If you have an idea on how to use lru_cache as user scrolls through long image list i'm all ears. I still don't know how that would work.

@vicentebolea
Copy link
Collaborator

If you have an idea on how to use lru_cache as user scrolls through long image list i'm all ears. I still don't know how that would work.

How about wrapping the image into a class that implements the del method, the class also has a ref to the state. In this manner the imagewrapper class can deallocate itself from the state just before being destroyed.

@PaulHax
Copy link
Collaborator Author

PaulHax commented Sep 4, 2024

Refactored images.py to use @lru_cache for loading and tranforming images with a wrapper and the __del__ trick. 🎉

Copy link
Collaborator

@vicentebolea vicentebolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes looks good to me, however, I am skeptical of moving data structures to singleton. I wrote a more detailed analysis in the comments among other comments.

I can see how this feature could be implemented without moving the ImageManager to a singleton implemented in image.py and I do not see any drawback other than having to instantiate in each application class when is it is not in a standalone mode.

I am also skeptical of having another annotations cache singleton too.

Other than that great work in this feature.

Comment on lines 30 to 63
class BufferCache:
"""Least recently accessed item is removed when the cache is full."""

def __init__(self, max_size: int, on_clear_item: Callable[[str], None]):
self.cache: OrderedDict[str, Any] = OrderedDict()
self.max_size = max_size
self.on_clear_item = on_clear_item

def add_item(self, key: str, item):
"""Add an item to the cache."""
self.cache[key] = item
self.cache.move_to_end(key)
if len(self.cache) > self.max_size:
oldest = next(iter(self.cache))
self.clear_item(oldest)

def get_item(self, key: str):
"""Retrieve an item from the cache."""
if key in self.cache:
self.cache.move_to_end(key)
return self.cache[key]
return None

def clear_item(self, key: str):
"""Remove a specific item from the cache."""
if key in self.cache:
self.on_clear_item(key)
del self.cache[key]

def clear(self):
"""Clear the cache."""
for key in self.cache.keys():
self.on_clear_item(key)
self.cache.clear()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this LRU be substutited as well by the functool lru_cache?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to use @lru_cache with the object detection function:

def get_annotations(detector: ObjectDetector, id_to_image: Dict[str, Image.Image]):

Comment on lines 193 to 220
@state.change("current_dataset")
def clear_all(**kwargs):
get_image.cache_clear()
get_cached_transformed_image.cache_clear()
annotation_cache.clear()


@change_checker(state, "dataset_ids")
def init_state(old, new):
if old is not None:
# clean old ids that are not in new
old_ids = set(old)
new_ids = set(new)
to_clean = old_ids - new_ids
for id in to_clean:
annotation_cache.clear_item(id) # ground truth
annotation_cache.clear_item(dataset_id_to_image_id(id)) # original image detection
keys = get_image_state_keys(id)
annotation_cache.clear_item(keys["transformed_image"])
delete_state(state, keys["meta_id"])

# create reactive annotation variables so ImageDetection component has live Refs
for id in new:
keys = get_image_state_keys(id)
for key in keys.values():
if not state.has(key):
state[key] = None
state.hovered_id = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new module contains a variety of functions and classes which do related things however different, this hurt readability.

  • I believe that things related to the state should ideal stay in their corresponding application (applet) classes. Common code can be factored out here.

  • Annotations related things could go into another module. lru cache pattern could reduce the LOC of it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this commit: d3bd004
I've factored out the annotation/obejct-detection code from images.py. annotations.py computes/retreaves the annotations and manages the caches. stateful_annotations.py copies the annotations to the trame state.

if len(paths) == 0:
return None
def extract(self, images, batch_size=32):
"""Extract features from images"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are benefits on keeping the imageManager here, it hides from the caller code the imaging loading while it allows to possibly having different ImageManager by extending the imageManager classes/ducktyping. Also eases testing and debugging.

Overall I take caution with global Objects/Variables, it is a well known anti-pattern to abuse them. I can see that we are moving our source code to that direction (I am did this with the dataset.py) and I am afraid that this will make our code harder to maintain in the future.

@PaulHax PaulHax force-pushed the all-images branch 3 times, most recently from b0a42ee to d54172f Compare September 6, 2024 22:50
@PaulHax PaulHax mentioned this pull request Sep 10, 2024
@PaulHax PaulHax force-pushed the all-images branch 3 times, most recently from d47fb47 to b52b183 Compare September 11, 2024 15:16
@jourdain jourdain dismissed vicentebolea’s stale review September 11, 2024 15:46

we'll address that on a followup PR

@PaulHax PaulHax merged commit 421ca9b into main Sep 11, 2024
12 checks passed
@PaulHax PaulHax deleted the all-images branch September 11, 2024 16:04
PaulHax added a commit that referenced this pull request Oct 8, 2024
* feat: show all images in list

* feat(embeddings): selection of points filters list

* refactor(object_detector): remove images_manager dependency

* refactor: move image modules into images folder

* refactor: use BufferCache for images

* refactor: use BufferCache for annotations

* refactor: clean dead code

* fix(embeddings): add transformed img point as computed

* chore(ci): run tests without depending on linters

* fix(image_list): respect client side sorting and filtering

* fix(ScatterPlot): correct is_transformed in hover

* fix(image_list): paginate grid view

Grid view does not do virtual scrolling

* perf(object_detector): reuse last successful batch size

* refactor: remove images_manager

* fix: flush transformed images to state before detection

* fix(images): actually call on_clear callback in BufferCache

* feat(image_list): hide dependant columns when transforms disabled

* refactor(images): use lru_cache for image functions

* refactor(ScatterPlot): rename props to use points

ScatterPlot could be used for non images data.

* refactor: reorder images.py and doc object_detector

* refactor(annotations): move logic from images.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants