-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AEOSVDC memory usage and container failures #157
Comments
This is likely due to the memory leak identified on 26 October. |
@cortlandstarrett do you know when a version with the fix can be provided to @FreddieMatherSmartDCSIT? |
@jt765487 , we could provide it today if desired. I can give @FreddieMatherSmartDCSIT the option of running now or waiting a day or two until we have run our own 24 hour test. |
@cortlandstarrett if you can provide us with a new version for defect retests that would be great. We would need a bit of time to build the PV and get prepped for the tests so that day or two would be useful. Its unlikely we would start the deployment process for the PV until tomorrow morning due to being near the end of the day here today but we would likely be ready by tomorrow afternoon-evening to start retests all well and good. (@jt765487 for you info) |
fixed in v1.1.3 (StoredJobId growth) |
On longer endurance test runs for the Protocol Verifier, the PV maxes out CPU and gets behind on processing events. When this happens the PV starts to consume memory holding onto the backed up events and this increases as more events are added. Once all the memory of the host box is consumed AEOSVDC containers start to fail periodically one by one (every time the memory of the box is full) and never restart.
Evidence
The 24 hour run had the following container numbers and events sent/s
The cumulative events processed diverges from the the cumulative events sent at approximately the 8 hour mark into the test.
The response times start to non-linearly increase at this point hitting a peak after which no more events are processed.
After approximately 8 hours the memory of all AEOSVDC containers starts to increase (claiming memory from Kafka) until the max memory of the box is hit. The AEOSVDC containers start to fail progressively one by one as they consume all the memory of the box. Eventually one AEOSVDC container out of the 4 is left and its memory usage continues to climb towards the max memory of the box.
Container memory usage
CPU usage showing containers failing
The text was updated successfully, but these errors were encountered: