-
Notifications
You must be signed in to change notification settings - Fork 844
Description
Log files previously rolled by ATS are not being removed when disk limits specified by the following parameters are reached:
proxy.config.log.max_space_mb_for_logs
proxy.config.log.max_space_mb_headroom
..and when proxy.config.log.auto_delete_rolled_files is set to 1.
This issue could be related to #5966. The configuration works in ATS 8.x, but does not work in ATS 9. Both rolling_max_count and rolling_min_count are set to zero. Setting the former to > 0 does not appear to have an effect, and the latter causes files to be rotated as expected, but prior to disk space exhaustion. It could be that the two settings influence the behavior in an unexpected manner, but our expectation is that the max_space parameters would work regardless of the min and max count settings.
Additionally, it appears the proxy.config.log.max_space_mb_for_orphan_log setting was removed, and the concern is that logs created by a prior running instance of ATS may not be selected as a candidate for removal. I was unable to confirm if this is the case because ATS is not even deleting the files it rolled during its current run.
In the following example, ATS 9 was put under load such that logging would eventually cross the thresholds specified by the above parameters. The parameters do cause ATS to detect that disk space has been exhausted, as seen in this sequence, but ATS is unable to find the logs it had just rolled itself. The log directory was filled by ATS' usual log rotation, creating a number of rotated files as expected and filling up the disk. That is where the log messages begin.
...manually removed all rolled files...
NOTE: Logging space is no longer exhausted.
NOTE: Logging disk is no longer low; access logging to local log directory resumed.
...load sent to ATS...
...ATS log files roll as expected...
NOTE: Cannot clear space because there are no recognized Traffic Server rolled logs for auto deletion.
NOTE: Logging space exhausted, any logs writing to local disk will be dropped!
WARNING: Access logging to local log directory suspended - configured space allocation almost exhausted.
NOTE: [Alarms::signalAlarm] Skipping Alarm: 'Access logging to local log directory suspended - configured space allocation almost exhausted.'
NOTE: Cannot clear space because there are no recognized Traffic Server rolled logs for auto deletion.
NOTE: Cannot clear space because there are no recognized Traffic Server rolled logs for auto deletion.
...continues until files are manually removed...
The LogConfig::update_space_used() method is where the files are detected for removal. Logic wraps the call to RolledLogDeleter's consider_for_candidacy() that could be failing, but consider_for_candidacy() also has a side effect where it's incrementing the candidate counter, that is then used later in update_space_used() via a call to RolledLogDeleter's has_candidates() method. That method returns false and prints the above error message (Cannot clear space because there are no recognized Traffic Server rolled logs for auto deletion) We can infer that means that consider_for_candidacy() never executed, or it did, but didn't find a candidate, and didn't increment the candidate counter.