-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support log file size and number limits #879
Conversation
Thanks for your PR. The following commands are available:
These commands can only be run by members of the vmware-tanzu organization. |
@antoninbas @tnqn : let me know if you are fine with the approach or not. |
3a1f12c
to
a155dcd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the approach looks fine to me. I feel like we should be able to write a unit test for this?
|
||
// InitLogFileLimits initializes log file maximum size and maximum number limits based on the | ||
// command line flags. | ||
func InitLogFileLimits(fs *pflag.FlagSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should let this function return an error, I don't see any reason not to, and it is itself calling functions that may return errors (even if unlikely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could return errors. Just feel no big harm to let Agent continue to run even at such errors. The errors could happen only when klog implementation changes.
pkg/log/log_file.go
Outdated
} | ||
|
||
// StartLogFileNumberMonitor starts monitoring the log files to make sure the | ||
// number of INFO log files does not exceed the maximum limit, when the log file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include all verbosity levels, not just INFO. You never know, the WARNING / ERROR files may grow large as well, and I don't see a downside to monitoring their size as well. I don't think we need to be concerned about missing important errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean for example <10 INFO files, <10 ERROR files, <10WARNING files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to limit all severity levels.
Great to know. I was thinking to add integration tests. Could consider unit tests too. |
You may be able to use this for unit tests (in-memory filesystem): https://github.com/spf13/afero. It's already an Antrea dependency. |
How can I let klog log to in-memory FS? |
You're right, it wouldn't be possible if we want to use klog to generate the test logs, which is probably the right thing to do. |
@antoninbas @tnqn : I tested the PR and added unit tests. Could I request a review from you? Hope to get it in 0.8.0 for the DECC request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, two minor comments on cleanup style.
if err != nil { | ||
klog.Errorf("Failed to open log directory %s: %v", logDir, err) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer f.Close()
here in case of Readdir failure?
The error of deferred call is normally ignored.
https://github.com/kubernetes/kubernetes/blob/a138be8722ebd0ce281029fa747315500a99ffd5/pkg/kubelet/kuberuntime/logs/logs.go#L295
https://github.com/moby/moby/blob/3b4cfa97237a8e1fb5eb985e4a7c0717cd14f5c8/daemon/graphdriver/devmapper/deviceset.go#L556
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I hope to close the file earlier. Changed the code to cover Readdir failure.
pkg/log/log_file_test.go
Outdated
assert.Equal(t, 2, infoLogFileNum, "info log file number after checking") | ||
assert.Equal(t, 2, warningLogFileNum, "warning log file number after checking") | ||
|
||
restoreFlagDefaultValues() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether there are other exits in the function, but is defer restoreFlagDefaultValues()
more safe and go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use defer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think the commit message / PR description still mention INFO
files only
} else if strings.Contains(file.Name(), ".log.WARNING.") { | ||
warningLogFiles = append(warningLogFiles, file) | ||
} else if strings.Contains(file.Name(), ".log.ERROR.") { | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have a comment here stating that the test is purposely not generating any error logs, or remove the empty if statement altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
pkg/log/log_file_test.go
Outdated
} | ||
defer os.RemoveAll(testLogDir) | ||
|
||
args := []string{"--logtostderr=false", "--log_dir=" + testLogDir, "--log_file_max_size=1", "--log_file_max_num=2"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think you can use a local const
for the max num here and use that in your asserts below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
pkg/log/log_file_test.go
Outdated
logs.InitLogs() | ||
defer restoreFlagDefaultValues() | ||
|
||
// Should generate 3 log files (100K * 30 / 1M). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use const
s for 100k, 1M and number of iterations
then we can add an assert like this to document our intent:
require.Greater(logSize * numIters / logFileMaxSize, logFileMaxNum, "test should generate enough logs to exceed --log_file_max_num")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is hard to accurately control the log message size/number, due to the log file/line header and klog flush time (I am not very sure about this but I did see log interval impact the number of log files created). So, I feel such assert does not help much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. You probably want to include that information in the comment though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Added some comments to explain this.
03b3538
to
e63b9b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
6755a23
to
c205232
Compare
/test-all |
pkg/log/log_file.go
Outdated
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
// Package main under directory cmd parses and validates user input, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
incorrect pkg description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
pkg/log/log_file.go
Outdated
if err != nil { | ||
return | ||
} | ||
if maxSize > 1024*1024 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defined a constant.
pkg/log/log_file.go
Outdated
return | ||
} | ||
if maxSize > 1024*1024 { | ||
klog.Errorf("The specified log file max size %d is too big, ignored", maxSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe quantify "too big" ? with a number that we allow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
klog does not enforce max log file size limit when the log file is not specifiec by the --log_file argument. This commit sets klog.MaxSize to the --log_file_max_size argument value when logging to file is enabled and --log_file is not provided. The commit also adds a new argument --log_file_max_num to define the max number (per severity level) of log files to be kept. The file number limit is enforced by Antrea code that periodically checks the log files and deletes the oldest files to keep at most the specified max number of log files. Fixes: antrea-io#788
/test-all |
// Check log file number every 10 mins. | ||
logFileCheckInterval = time.Minute * 10 | ||
// Allowed maximum value for the maximum file size limit. | ||
maxMaxSizeMB = 1024 * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was 1024*1024 previously.. decided to reduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Do not really what is the appropriate number. klog default value is 18G.
klog does not enforce max log file size limit when the log file is not specifiec by the --log_file argument. This commit sets klog.MaxSize to the --log_file_max_size argument value when logging to file is enabled and --log_file is not provided. The commit also adds a new argument --log_file_max_num to define the max number (per severity level) of log files to be kept. The file number limit is enforced by Antrea code that periodically checks the log files and deletes the oldest files to keep at most the specified max number of log files. Fixes: antrea-io#788
klog does not enforce max log file size limit when the log file is not
specifiec by the --log_file argument. This commit sets klog.MaxSize to
the --log_file_max_size argument value when logging to file is enabled
and --log_file is not provided.
The commit also adds a new argument --log_file_max_num to define the
max number of log files to be kept. The file number limit is enforced by
Antrea code that periodically checks the INFO log files and deletes the
oldest files to keep at most the specified max number of log files.
Fixes: #788