-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All sampled images in image list #88
Conversation
da0bb37
to
993a697
Compare
e65f08e
to
98b0b6c
Compare
I looked at using |
022d09b
to
1418c18
Compare
Grid view does not do virtual scrolling
@PaulHax it seems that the current BufferCache does not remove the item from the trame cache, in any case I believe that we should detach the cache from the trame state, the evicting instruction is |
@PaulHax looks good, I want to better understand this, we went through the changes a couple of weeks ago but I Want to review this in more detail. I wonder if you could edit the description to add a brief note of the feature, the reason for adding the proposed new modules and why we want to change the existing "library" modules. |
Holy mem leak Batman! Good catch. I fix in PR cuz i can't push to this branch anymore =) If you have an idea on how to use |
How about wrapping the image into a class that implements the del method, the class also has a ref to the state. In this manner the imagewrapper class can deallocate itself from the state just before being destroyed. |
ScatterPlot could be used for non images data.
Refactored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks good to me, however, I am skeptical of moving data structures to singleton. I wrote a more detailed analysis in the comments among other comments.
I can see how this feature could be implemented without moving the ImageManager
to a singleton implemented in image.py and I do not see any drawback other than having to instantiate in each application class when is it is not in a standalone
mode.
I am also skeptical of having another annotations cache singleton too.
Other than that great work in this feature.
class BufferCache: | ||
"""Least recently accessed item is removed when the cache is full.""" | ||
|
||
def __init__(self, max_size: int, on_clear_item: Callable[[str], None]): | ||
self.cache: OrderedDict[str, Any] = OrderedDict() | ||
self.max_size = max_size | ||
self.on_clear_item = on_clear_item | ||
|
||
def add_item(self, key: str, item): | ||
"""Add an item to the cache.""" | ||
self.cache[key] = item | ||
self.cache.move_to_end(key) | ||
if len(self.cache) > self.max_size: | ||
oldest = next(iter(self.cache)) | ||
self.clear_item(oldest) | ||
|
||
def get_item(self, key: str): | ||
"""Retrieve an item from the cache.""" | ||
if key in self.cache: | ||
self.cache.move_to_end(key) | ||
return self.cache[key] | ||
return None | ||
|
||
def clear_item(self, key: str): | ||
"""Remove a specific item from the cache.""" | ||
if key in self.cache: | ||
self.on_clear_item(key) | ||
del self.cache[key] | ||
|
||
def clear(self): | ||
"""Clear the cache.""" | ||
for key in self.cache.keys(): | ||
self.on_clear_item(key) | ||
self.cache.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this LRU be substutited as well by the functool lru_cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to use @lru_cache
with the object detection function:
def get_annotations(detector: ObjectDetector, id_to_image: Dict[str, Image.Image]):
@state.change("current_dataset") | ||
def clear_all(**kwargs): | ||
get_image.cache_clear() | ||
get_cached_transformed_image.cache_clear() | ||
annotation_cache.clear() | ||
|
||
|
||
@change_checker(state, "dataset_ids") | ||
def init_state(old, new): | ||
if old is not None: | ||
# clean old ids that are not in new | ||
old_ids = set(old) | ||
new_ids = set(new) | ||
to_clean = old_ids - new_ids | ||
for id in to_clean: | ||
annotation_cache.clear_item(id) # ground truth | ||
annotation_cache.clear_item(dataset_id_to_image_id(id)) # original image detection | ||
keys = get_image_state_keys(id) | ||
annotation_cache.clear_item(keys["transformed_image"]) | ||
delete_state(state, keys["meta_id"]) | ||
|
||
# create reactive annotation variables so ImageDetection component has live Refs | ||
for id in new: | ||
keys = get_image_state_keys(id) | ||
for key in keys.values(): | ||
if not state.has(key): | ||
state[key] = None | ||
state.hovered_id = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new module contains a variety of functions and classes which do related things however different, this hurt readability.
-
I believe that things related to the state should ideal stay in their corresponding application (applet) classes. Common code can be factored out here.
-
Annotations related things could go into another module. lru cache pattern could reduce the LOC of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this commit: d3bd004
I've factored out the annotation/obejct-detection code from images.py
. annotations.py
computes/retreaves the annotations and manages the caches. stateful_annotations.py
copies the annotations to the trame state.
if len(paths) == 0: | ||
return None | ||
def extract(self, images, batch_size=32): | ||
"""Extract features from images""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are benefits on keeping the imageManager
here, it hides from the caller code the imaging loading while it allows to possibly having different ImageManager by extending the imageManager
classes/ducktyping. Also eases testing and debugging.
Overall I take caution with global Objects/Variables, it is a well known anti-pattern to abuse them. I can see that we are moving our source code to that direction (I am did this with the dataset.py) and I am afraid that this will make our code harder to maintain in the future.
b0a42ee
to
d54172f
Compare
d47fb47
to
b52b183
Compare
we'll address that on a followup PR
* feat: show all images in list * feat(embeddings): selection of points filters list * refactor(object_detector): remove images_manager dependency * refactor: move image modules into images folder * refactor: use BufferCache for images * refactor: use BufferCache for annotations * refactor: clean dead code * fix(embeddings): add transformed img point as computed * chore(ci): run tests without depending on linters * fix(image_list): respect client side sorting and filtering * fix(ScatterPlot): correct is_transformed in hover * fix(image_list): paginate grid view Grid view does not do virtual scrolling * perf(object_detector): reuse last successful batch size * refactor: remove images_manager * fix: flush transformed images to state before detection * fix(images): actually call on_clear callback in BufferCache * feat(image_list): hide dependant columns when transforms disabled * refactor(images): use lru_cache for image functions * refactor(ScatterPlot): rename props to use points ScatterPlot could be used for non images data. * refactor: reorder images.py and doc object_detector * refactor(annotations): move logic from images.py
Now, the "image list" shows all images sampled from the dataset. As you scroll down the table, images are loaded, transformed, and object detected. To know what images are in view within the table, the
quasar.QTable
component has some "virtual scrolling" features that trigger a callback with the "in view" row numbers.When you select images from the embeddings plot, the app filters the image list. Before, you had to select images from the embeddings plot to show them in the image list.
images.py
caches loaded images, transformed images, and their object detection results.images.py
uses a least recently used cache strategy.To seperate caching method from downstream processing,
embeddings_extractor.py
andobject_detector.py
now take only loadedPIL.Image
s as arguments.