-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy FileFetcher (#1402) #1411
Lazy FileFetcher (#1402) #1411
Conversation
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice approach. It's a lot cleaner. It does change the default behavior and not all hosts may want that. Can you please include an option either at this level or via a caching layer to retain the default "load all at once and cache" behavior with adding an option for on-demand loading?
I have to give it some thought. The plan was to let the in_memory_cache do the storing, to avoid data duplication in memory. Getting the in_memory_cache layer primed by using something similar to the above filepath.Walk code can work but moves some of this directory handling to a module that should not care about it. Scanning and storing in this module makes this less of an improvement... |
I should also mention prebid-server currently uses 2 to 4 times the memory for these files due to this prefetching behavior, which is indirectly solved by this as well. In https://github.com/prebid/prebid-server/blob/master/stored_requests/config/config.go#L119-L123 I've verified this manually by adding a 500M json stored imp and watching the RSS of the process change by 3x that :) |
Sure. The end goal is to keep the same behavior we have today and then let hosts opt-in to new / better behavior. What do you think about an option for the file fetcher to fetch everything vs fetch only the requested ids? .. and then rely on the caching layer to call fetch just once and store everything in memory? Not sure how happy I am about that suggestion, but wanted to throw some ideas out. I'm really happy with the direction this is going in.
I didn't realize it did that. Nice to have that addressed. Maybe you can add a new cache type that just stores things forever, if we don't already have something like that today. Make that the default for the file fetcher and we should be good to go. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I implemented the previous behavior by adding a files event producer to populate This makes it consistent with the other modules, and able to use the cache. But it
CC @bsardo if you think this should wait until your refactor. |
5a90948
to
124a96d
Compare
Integrated requested changes and rebased branch, please review |
124a96d
to
6121e25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Have left some minor comments
ee1c30e
to
0c48b0b
Compare
Integrated feedback and rebased on latest master. |
…s event producer to populate the cacheto be consistent with the other sources.Default behavior enables a static unbounded cache so all the objectsare preloaded, and do not expire.If a lru cache is explicitly defined, it will be used instead.Setting lazy_load=true will disable preload and operate inlazy load backed by cache mode.Addressing feedbackAddress https://github.com/prebid/prebid-server/pull/1411/files#r476907655Address https://github.com/prebid/prebid-server/pull/1411/files#r476926516Fixed bug uncovered by new unit test.
0c48b0b
to
3a851af
Compare
I tested the changes with the AMP endpoint. The default behavior is still a bit different from what we have today with lazy loading disabled. The files do indeed seem to be cached at startup, but if a file isn't cached the file system will still be dynamically checked. The expected behavior is with the lazy loading disabled, the file system is never checked. Perhaps a new cache type can solve the problem in which a cache miss will immediately return and not hit the store. With lazy loading enabled, I verified that each request went out to the file system as expected. We might want to advise hosts of this behavior and encourage some kind of in-memory cache, even if short lived, to avoid hammering their disks. |
Also, with a manually configured cache other than in-memory, the items loaded would expire and have to be refreshed, even if we bulk loaded them at start. There are ways around it, but the bottom line is that with the prefetch, this change ballooned from using less code than before to using twice as much, and my confidence that it's worth including in the code base is somewhat shaken.
Indeed, one is expected to configure a cache as shown in #1411 (comment) In fact, the current configuration allows setting cache with a TTL that never works for files, which I'd argue is a little worse. |
That's a good point.
Do you need this change made? I have no problem with proceeding. I rather we fix it right, and I think the approach this PR is heading in is very good. |
Thank you. Appreciate all the review work that has gone into this. I would like to focus on #1426 (account configuration) instead, and put this PR on the back-burner or close it. Planning to keep the simple lazy fetcher internally for the time being and to start looking into setting up an HTTP API. |
Replaces the eager file fetcher for stored requests and categories with a lazy one.
This allows the cache layer to work and control refresh intervals for stored requests, but not for categories yet.
This configuration now works to cache files with the given settings. Before the change, it would cache data that is already in memory.