-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: does gatsby-source-filesystem
not cache downloaded remote files between builds?
#4464
Comments
gatsby-source-filesystem
not cache downloaded remote files?gatsby-source-filesystem
not cache downloaded remote files between builds?
Also, it appears that |
Hey @awesomebob, would you like making PR to change that in |
@pieh I would be willing to do that. It looks like I just need to check if |
Upon further reflection, I'm still going to run into problems even once #4493 is merged in. Should I try to parallelize that download, or what else can I do? I currently have 3.5GB in the WordPress media library, and that's going to keep growing. |
Yeah — files aren't redownloaded — it's the clearing of the cache that's the problem. Perhaps we should make the automatic cache clearing less aggressive? Just remove the redux-state.json data? |
Ah you're right, it's not redownloading files on a 304. Clearing the cache less often might help, but at some point I'm going to have to redownload the entire WordPress Media Library again. It seems like I might have shot myself in the foot for adding 3.5GB of media there. I might need to look into pre-processing the images first instead of during the Gatsby build if it's going to take more than an hour for my large site. |
That's why I was so confused as I thought this was already working but reading code didn't make it obvious. I'd be up for exploring less aggressive cache invalidation as this might be quick win for cases like this.
I think that's all the things gatsby dynamically generates on itself (part from |
Yeah, Yeah, this would be an easy quick win. You want to take this @pieh? |
@KyleAMathews Yeah, I will try to prepare PR for tomorrow. @awesomebob Downloading is already parallelized ( https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-source-wordpress/src/normalize.js#L382-L413 ). Maybe running so many concurrent requests is actually hurting more than it helps? Maybe it would be worth exploring some kind of async queue for that with sane limit of concurrent operations? |
On the second thought - less aggressive cache invalidating with just deleting This would have to be more robust and probably would need separate cache instances for all plugins and deleting just caches for plugins that has changed. Thoughts? |
Oh hmmm yeah. Perhaps the remote file downloader can just use its own cache for now. |
I'm not sure if it's worth time/effort to do this just for file downloader - it depends how do you see permanent cache implemented (one you mentioned here - #3977 (comment) ). I could do initial part of just splitting global key-value cache into separate caches dedicated for each plugin ( this one where headers are stored - https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby/src/utils/cache.js ) - mainly this would be needed to not redownload files after cache invalidation (if we don't change In next iterations we can implement helper function for plugins to get their dedicated file cache path and store nodes and plugin status in dedicated partial state jsons. Ultimately I would see it something like this:
|
Invalidating the cache by plugin version would be a great start to make the cache permanent/immutable. Instead of separate caches though, how about we just prepend cache keys with the plugin version which would accomplish the same thing & avoid having dozens of caches. |
Due to the high volume of issues, we're closing out older ones without recent activity. Please open a new issue if you need help! |
I'm still digging into why my site with
gatsby-source-wordpress
is taking so long to build. (#4293)For context: I have a site with about 1,000 posts and about 2,000 (large) images.
My builds are currently taking 20-30 minutes to finish. If I skip downloading images, it takes 2.
Looking at create-remote-file-node it looks like the
cache
is only used to remember headers that were received during the current build, I guess to prevent downloading the same image twice during a single build in case the image is referenced multiple times.From what I can tell, Gatsby will still download every remote file on every build.
Would it not be possible to store the whole image to prevent redownloading and reprocessing with
gatsby-transformer-sharp
?The text was updated successfully, but these errors were encountered: