-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage watchdog #1056
Storage watchdog #1056
Conversation
…n for increased compatibility
…StorageWatchdog backend.
…StorageWatchdog backend.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1056 +/- ##
==========================================
- Coverage 46.20% 45.08% -1.13%
==========================================
Files 34 35 +1
Lines 3398 3476 +78
==========================================
- Hits 1570 1567 -3
- Misses 1828 1909 +81
☔ View full report in Codecov by Sentry. |
Hey @gridl0ck,
I also accidentally pushed these changes directly to master and then had to force push over them, since I wasn't sure that all tests were passing. This is why your PR got closed. |
Oh @vringar I completely missed that. That is most definitely left over from an early design I had intended to use but I have since moved away from using it. Do you need me to remove it and push those changes? |
@gridl0ck I hit send too early and still need to update the previous message with the rest of my feedback
As I have rewritten a large part of your original implementation and have a couple of open questions, I'd rather have you as a reviewer than a contributor. |
Dang ok. Let me know what, if anything, I need to do to get this added because I do think it is a helpful addition. |
My primary question right now is: The memory_watchdog just checks at a random time, sets the flag and then the BrowserManagerHandle checks the flag after a CommandSequence has completed. Is this Scenario unacceptable to you?:
Please note that I'm not disagreeing with doing the checks synchronously after the CS. I might even pull out the memory_watchdog check to the same location, because it makes it easier to reason about what can cause a browser to reset. I'm just genuinely curious. |
When I created this for my capstone, the amount of data generated by each crawl varied per website so I needed to check the size of the folder. As to why its at the end of the CS, I ran into problems with the StorageController not saving the data to the database before the watchdog got to it (or thats how I interpreted the problem at the time).
This goes back what I was saying earlier about the StorageController not saving the data because I did have this running asynchronously at first (trying to queue up restarts for when its most convenient) but I didn't know how to communicate that internally so running after each CS ensured that the data from each CS was stored and then if the resulting crawl pushed the profile directory over the threshold, then the browser would be restarted. I originally had a function that would go in and simply wipe all non-essential files (It wouldnt touch configuration files or anything but it was a very hacky way of cleaning that ended up slowing down everything as time went on) but realized that having the browser just restart after a threshold reached cleared those files and did the necessary setup for each browser because you built that functionality in. The StorageWatchdog essentially just monitors the size of the browser_profile directories in each of the BrowserManager threads and uses your built-in reset functionality in moderation. Before, when you set the reset flag, you would get a browser restart after every CS, which slowed down our crawls and part of our project was crawling a certain number of websites in a timely manner so this was inconvenient. With the StorageWatchdog, you can let the crawls run with little impact to speed because the browsers arent being reset after each CS, but you can also work with limited space. |
Okay, so I wanted to write tests to ensure this functionality keeps working, but seeing as our other two watchdogs also don't have any test and I can't think of a good way to test it (as restarts are supposed to be transparent/invisible to the user anyway) I'll just set this to automerge and way for the tests to pass. I'll create a new release with this feature in the next couple of days. |
Original Implementation done by @gridl0ck.
With modifications by @vringar.