-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImageKit probing S3 excessively for existence of cache files #402
Comments
Hello @mgalgs you are using django-imagekit from github and the latest released version right? (I need to release the new version soon sorry for the delay). If you are not using latest django-imagekit from github then caching is not enabled by default and you need to configure it with Also if you are not set default timeout for the cache backend to be very big then it is by default only 5 minutes and I see that these requests are rare then 5 minutes. You can try with configuring the cache and see if it will help |
I'm using 3.3, yes.
Yeah I had thought about that but looking at the implementation it seems like the timeout only applies when the file does not exist. So we should only see the existence check once, then the cache file should actually be generated, then the fact that the cachefile exists should remain forever (or until it's evicted from the cache backend) with no further existence checks, right? def set_state(self, file, state):
key = self.get_key(file)
if state == CacheFileState.DOES_NOT_EXIST:
self.cache.set(key, state, self.existence_check_timeout)
else:
self.cache.set(key, state) (source) |
@mgalgs I'm not sure how you configure your Can you install imagekit from my branch and test it? I have one more place where I thing that caching can be implemented. Look at the docs https://django-imagekit.readthedocs.io/en/latest/caching.html#cache-file-backend and more precisely at the Note bellow
Probably this also need to be cached and calls to storage backend can be reduced. |
I see that you added a timeout when the cache file does exist. I don't think that's actually helpful... If it exists it's better to just leave it there indefinitely.
Sounds like a good enhancement, but separate from this issue, FWIW. Since the symptom here is excessive existence checks, not reads. |
The timeout that I was adding is when the file actually exists. Currently it uses default timeout provided from the cache backend (and by default in Django it is 5 minutes if not configured). I wanted to have sane defaults but anybody to have the ability to tune it by theirs own needs. I will change the logic by default to set the timeout to |
Actually in order to see if this change will have any effect in your setup you can make your cache configuration now as follows:
If this works with current version of ImageKit on your setup then the changes that I want to introduce in this merge request will have positive effect by default for other people in your situation. If this config does not help you then these changes will not make any difference for your case. |
Ah sorry, you're absolutely right. I was thinking the django default was to never expire. Let me try changing the default to never expire like you mentioned. |
@mgalgs can you say if there is any benefit with the caching if configured as in my previous comment or you need to wait for next billing? |
Trying it tomorrow. I'll know by the end of the day. Will report back.
…On Sun, Feb 12, 2017, 09:43 Venelin Stoykov ***@***.***> wrote:
@mgalgs <https://github.com/mgalgs> can you say if there is any benefit
with the caching if configured as in my previous comment or you need to
wait for next billing?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#402 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJRzjN1OIrXfSfiKFLmWj1mksIFpQ-pks5rb0TJgaJpZM4L6f-l>
.
|
thanks |
This seems to be working! I'm just judging based on the logs, but I'll know for sure at the end of the day tomorrow when I can check what my total API usage was for this type of request. |
Yeah I actually logged on yesterday and couldn't figure out a way to see my API requests over time... I forgot to make a note of my total requests before the change so I won't have a great way of confirming for sure that this made an impact until the end of the month... However, I'm 95% sure that this is going to reduce my API usage drastically, just based on the log files. I meant to submit a PR documenting this in the README since it can save your users potentially thousands of dollars, but since you might be solving it a different way I decided to hold off. |
Disclaimer: I asked this question on the google group a few days ago but the group seems pretty dead... I don't have any conclusive evidence that there is a problem in imagekit. This is more of a general discussion / asking for advice, so please feel free to spend as little time on it as you would like.
Recently I've been noticing that a HUGE portion of my S3 bill is from API requests (as opposed to actual data transfer/storage). For example, last month my S3 bill breakdown was:
The most interesting one is the PUT, COPY, POST, LIST line. ImageKit incurs a LIST request (by virtue of the underlying django-s3-storage library, in my case) each time it checks for the existence of a cache file. I don't think there are any COPY's going on, and uploading images will be a POST (or a PUT, not quite sure). I incurred a total of 850M+ requests in this category last month, but the total number of objects I have stored in S3 only increased by 25 million, and I'm not deleting anything, so the remaining ~825M requests must have been LIST requests. I assume most (or all?) of those LIST requests were from ImageKit checking for the existence of cached files.
I enabled logging on my S3 bucket and noticed that, indeed, there are a ton of duplicate LIST requests coming in for the same files in a relatively short time span:
(REST.GET.BUCKET appears to correspond to LIST requests in the logs)
I haven't changed IMAGEKIT_CACHE_BACKEND, which means it's using my default cache backend, which is memcached. As I understand it, the only reason we should be seeing these duplicate existence checks is if the info about the cached files is being evicted from memcached before the next time that it's needed. But looking at the logs above it just seems like those duplicate requests are too close together to be explained by eviction... I could be totally wrong on that and haven't yet validated that theory. I do see dozens of evictions per second in memcached, so maybe this really is what's happening...
Anyways, I was just hoping to get a second opinion on what might be going on here and ways to diagnose it before I take the somewhat drastic action of introducing another less ephemeral cache backend (e.g. redis) to my project.
Thanks in advance.
/cc @dkeitel
The text was updated successfully, but these errors were encountered: