Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Image Caching Mechanism to Improve Performance of geom_image #53

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xiayh17
Copy link

@xiayh17 xiayh17 commented Sep 1, 2024

Description:

This pull request introduces an image caching mechanism to geom_image, significantly improving performance for plots with multiple images or repeated plot generations.

Let's consider the example provided:

images = list.files(system.file("extdata", package="ggimage"),
                    pattern="png", full.names=TRUE)

df = data.frame(x = rep(1:20, each = 20),
                y = rep(1:20, 20),
                image = sample(images, 400, replace = TRUE))

ggplot(df, aes(x, y)) + 
  geom_image(aes(image=image), size=0.04)

This example creates a plot with 400 image points, potentially using multiple unique images repeatedly. Without caching:

  1. Each of the 400 points would require loading its image from disk, even if it's a duplicate.
  2. For large datasets or repeated plot generations (e.g., in Shiny apps), this could lead to significant performance issues.

With caching:

  1. Each unique image is loaded only once and stored in memory.
  2. Subsequent uses of the same image retrieve it from the cache instead of reloading from disk.
  3. This significantly reduces I/O operations and improves rendering speed, especially for larger datasets or interactive applications.

To demonstrate, you could run a simple benchmark:

library(microbenchmark)

> # With caching 
> microbenchmark(
+   print(ggplot(df, aes(x, y)) + geom_image(aes(image=image), size=0.04)),
+   times = 10
+ )
Unit: milliseconds
                                                                            expr      min      lq     mean   median       uq      max neval
 print(ggplot(df, aes(x, y)) + geom_image(aes(image = image),      size = 0.04)) 570.8385 574.692 600.3304 582.8108 597.2787 733.2289    10

> # Without caching
> microbenchmark(
+   print(ggplot(df, aes(x, y)) + geom_image(aes(image=image), size=0.04)),
+   times = 10
+ )
Unit: seconds
                                                                            expr     min      lq     mean   median       uq      max neval
 print(ggplot(df, aes(x, y)) + geom_image(aes(image = image),      size = 0.04)) 6.55834 48.1042 45.05545 49.15449 49.98102 51.79424    10

The results would likely show a significant performance improvement, especially on subsequent runs.

Key changes:

  1. Implemented an internal cache using an environment to store loaded images.
  2. Modified imageGrob and related functions to utilize the cache.
  3. Added functions to manage the cache (clear cache, get cache size).
  4. Deleted alpha and use opacity

Benefits:

  • Reduced disk I/O: Each unique image is loaded only once.
  • Improved rendering speed: Subsequent uses of the same image retrieve it from memory.
  • Enhanced performance for large datasets and interactive applications.
  • Disguised alpha and opacity

geom_subview also add a cache. but the not the key to speed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant