Skip to content

Commit

Permalink
update documentation on remote log level settings
Browse files Browse the repository at this point in the history
Add a chapter to LOGGING.md explaining how different log levels are
handled in the system.

Updated CONFIG-PROPERTIES.md to clarify remote log level settings and
provide full list of log level parameters

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
  • Loading branch information
europaul committed Nov 26, 2024
1 parent 07b895b commit 846b26b
Show file tree
Hide file tree
Showing 4 changed files with 165 additions and 15 deletions.
40 changes: 31 additions & 9 deletions docs/CONFIG-PROPERTIES.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,6 @@
| debug.enable.ssh | authorized ssh key | empty string(ssh disabled) | allow ssh to EVE |
| debug.enable.console | boolean | false | allow console access to EVE (reboot required to disable) |
| debug.enable.vnc.shim.vm | boolean | false | allow VNC access to the container application shim VM (reboot required to disable) |
| debug.default.loglevel | string | info | min level saved in files on device. Used logrus log levels as described here ["https://pkg.go.dev/github.com/sirupsen/logrus"]: panic, fatal, error, warning, info, debug and trace.
| debug.syslog.loglevel | string | info | min level of the syslog messages saved in files on device. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |
| debug.kernel.loglevel | string | info | min level of the kernel messages saved in files on device. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |
| debug.default.remote.loglevel | string | warning | min level sent to controller. Should be used log levels as described in "debug.syslog.loglevel" settings. |
| storage.dom0.disk.minusage.percent | integer percent | 20 | min. percent of persist partition reserved for dom0 |
| storage.zfs.reserved.percent | integer percent | 20 | min. percent of persist partition reserved for zfs performance |
| storage.apps.ignore.disk.check | boolean | false | Ignore disk usage check for Apps. Allows apps to create images bigger than available disk|
Expand Down Expand Up @@ -70,11 +66,37 @@
| goroutine.leak.detection.keep.stats.hours | integer (hours) | 24 | Amount of hours to keep the stats for leak detection. We keep more stats than the check window to be able to react to settings with a bigger check window via configuration. |
| goroutine.leak.detection.cooldown.minutes | integer (minutes) | 5 | Cooldown period in minutes after the leak detection is triggered. During this period, no stack traces are collected; only warning messages are logged. |

In addition, there can be per-agent settings.
The Per-agent settings begin with "agent.*agentname*.*setting*"
The following per-agent settings override the corresponding default ones:

## Log levels
Log level can be set for three different components of EVE: EVE microservices, syslog, and kernel.
The log levels set this way are used to control the verbosity of the logs produced by the corresponding components.
All logs produced this way will be saved locally in /persist/newlog/keepSentQueue/ directory and will be subject to rotation based on the max total size of stored logs.

Due to implementation specifics, there are two different sets of log levels that can be set: logrus and syslog levels.
Logrus levels are used by the EVE microservices, while syslog levels are used by syslog and kernel.

* the logrus levels are as follows: panic, fatal, error, warning, info, debug, and trace ["https://pkg.go.dev/github.com/sirupsen/logrus"].
* the syslog levels are as follows: emerg, alert, crit, err, warning, notice, info, debug ["https://man7.org/linux/man-pages/man3/syslog.3.html"].

Additionally all log levels can be set to "none" to disable logging for the corresponding component or to "all" to enable all log levels.

Furthermore, the "remote" log levels control, which subset of the produced logs will be sent to the controller.
A corresponding "remote" log level can be set for each of the three components: EVE microservices, syslog, and kernel.

| Name | Type | Default | Description |
| ---- | ---- | ------- | ----------- |
| debug.default.loglevel | string | debug | default level of logs produced by EVE microservices. Can be overwritten by agent.*agentname*.debug.loglevel. Uses logrus log levels as described here ["https://pkg.go.dev/github.com/sirupsen/logrus"]: panic, fatal, error, warning, info, debug and trace.
| debug.default.remote.loglevel | string | warning | default level of logs sent by EVE microservices to the controller. Can be overwritten by agent.*agentname*.debug.remote.loglevel. Uses logrus log levels as described here ["https://pkg.go.dev/github.com/sirupsen/logrus"]: panic, fatal, error, warning, info, debug and trace. |
| debug.syslog.loglevel | string | info | level of the produced syslog messages. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |
| debug.syslog.remote.loglevel | string | info | level of the syslog messages sent to the controller. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |
| debug.kernel.loglevel | string | info | level of the produced kernel log messages. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |
| debug.kernel.remote.loglevel | string | info | level of the kernel log messages sent to the controller. System default loglevel string representation should be used as described here ["https://man7.org/linux/man-pages/man3/syslog.3.html"]: emerg, alert, crit, err, warning, notice, info, debug. |

In addition, there can be per-agent settings to overwrite the default log level set for zedbox.
These use the same log levels as the default log level settings (logrus).
The per-agent settings begin with "agent.*agentname*.*setting*":

| Name | Type | Description |
| ---- | ---- | ----------- |
| agent.*agentname*.loglevel | string | if set overrides debug.default.loglevel | (Legacy setting debug.*agentname*.loglevel still supported)
| agent.*agentname*.remote.loglevel | string | if set overrides debug.default.remote.loglevel | (Legacy setting debug.*agentname*.remote.loglevel)
| agent.*agentname*.debug.loglevel | string | if set overrides debug.default.loglevel for this particular agent | (Legacy setting debug.*agentname*.loglevel still supported)
| agent.*agentname*.debug.remote.loglevel | string | if set overrides debug.default.remote.loglevel for this particular agent | (Legacy setting debug.*agentname*.remote.loglevel)
50 changes: 44 additions & 6 deletions docs/LOGGING.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,17 +38,54 @@ The following diagram shows the flow of logs from containers to newlogd and to c

## Log Aggregation, Reformatting and Compression for Persistent Log Files

All logs collected from various containers/services/kernel in the system will reach newlogd daemon. Newlogd formats the log entries and writes into temporary log files(with temporary file name suffix, e.g. 12345678) on disk in /persist/newlog/collect directory. The logs from device host side will be saved to file with name prefix with 'dev.log.' (filename dev.log.12345678) and logs from application/guest side will be saved to files with name prefix with 'app.APP-UUID.log.' (file app.APP-UUID.log.12345678) where APP-UUID is the application UUID assigned to guest application. The temporary file is kept on the disk until either the file size has exceeded 400 KBytes or the elapsed time on the file has been opened for longer than 5 minutes.
All logs collected from various containers/services/kernel in the system will reach newlogd daemon.
Newlogd formats the log entries and writes into temporary log files(with temporary file name suffix, e.g. 12345678) on disk in /persist/newlog/collect directory.
The logs that are meant to be sent to the controller will be saved with prefix dev.log.upload and the ones intended to stay on device host side will be saved to file with name prefix with 'dev.log.keep' (e.g. filename dev.log.keep.12345678) and logs from application/guest side will be saved to files with name prefix with 'app.APP-UUID.log.' (file app.APP-UUID.log.12345678) where APP-UUID is the application UUID assigned to guest application.
The temporary file is kept on the disk until either the file size has exceeded 400 KBytes or the elapsed time on the file has been opened for longer than 5 minutes.

There is a symbolic link '/persist/newlog/collect/current.device.log' points to the currently opened device logfile in the /persist/newlog/collect directory. One can use 'tail -F' on this symbolic link file to monitor the output of all the device side (not application side) logs.
There is a symbolic link '/persist/newlog/collect/current.device.log' points to the temporary "keep" device logfile in the /persist/newlog/collect directory.
One can use 'tail -F' on this symbolic link file to monitor the output of all the device side (not application side) logs.

When the above log file is closed either due to size or time limit has reached, it will be moved and compressed with gzip protocol into either 'devUpload' or 'appUpload' directory. The size of the gzip file is limited to 50 KBytes due to the northbound queueing configuration. If the compressed file is larger than the limit, it will be split and compressed into two separate gzip files. The gzip filename is encoded with current timestamp in Unix milliseconds, such as 'dev.log.1600831551491.gz' for device log, and with timestamp and application UUID such as 'app.62195aa9-7db4-4ac0-86d3-d8abe0ff0ea9.log.1599186248917.gz' for application logs. The metadata such as device-UUID, the image partition, and image version or app Name for application are encoded as part of the gzip metadata header along with the gzip file.
When the above log files are closed either due to size or time limit reached, they will be moved and compressed with gzip protocol into

* either 'devUpload' or 'appUpload' directory for the "upload" files
* or to 'keepSentQueue' directory for the "keep" files

The size of the gzip file is limited to 50 KBytes due to the northbound queueing configuration.
If the compressed file is larger than the limit, it will be split and compressed into two separate gzip files.
The gzip filename is encoded with current timestamp in Unix milliseconds, such as 'dev.log.upload.1600831551491.gz' for device log, and with timestamp and application UUID such as 'app.62195aa9-7db4-4ac0-86d3-d8abe0ff0ea9.log.1599186248917.gz' for application logs.
The metadata such as device-UUID, the image partition, and image version or app Name for application are encoded as part of the gzip metadata header along with the gzip file.

Upon the device restart, any unfinished temporary log files of previous life left in /persist/newlog/collect directory will be first moved and compressed by newlogd daemon into their upload gzip directories before any current log events are written onto the disk.

Once the gzip log files are uploaded to the cloud, the gzip files still available on the device in /persist/newlog/keepSentQueue directory. For any log files are still waiting to be uploaded, they are in the '/persist/newlog/devUpload' and '/persist/newlog/appUpload' directories. In the case the network connection to the cloud is good, but the logfile has repeatedly failed to upload, it will be moved out of the 'Upload' directory to '/persist/newlog/failedUpload' directory. EVE developers who have enabled ssh to the device for debugging purposes can look at the log entries in those directories by using "zcat" utility.
Once the "upload" gzip log files are uploaded to the cloud, they will be removed.
For any log files are still waiting to be uploaded, they are in the '/persist/newlog/devUpload' and '/persist/newlog/appUpload' directories.
In the case the network connection to the cloud is good, but the logfile has repeatedly failed to upload, it will be moved out of the 'Upload' directory to '/persist/newlog/failedUpload' directory.
EVE developers can use e.g. [egde-view](https://lf-edge.atlassian.net/wiki/spaces/EVE/pages/14584760/Edge-View+Architecture#Log-Search) to query the logs in those directories.

User can use config-properties to set a log file maximum quota in Mbytes on the device, using the 'newlog.gzipfiles.ondisk.maxmegabytes' config-item, the default is 2048 Mbytes, the configurable range is within (10, 4294967295) Mbytes and the quota is capped at 10% of '/persist' disk size.
Since the device retains logs in the 'collect', 'appUpload', 'devUpload', 'keepSentQueue' and 'failedUpload' directories which together form a circular buffer, when the quota is exceeded on the device, the log files are removed until the quota is met.
The removal process goes by directory and removes files there starting from the oldest.
Once a directory has no files anymore the process moves on to the next directory.
The order of directories is as follows (starting from higher - more likely to be removed to lower - less likely to be removed):

1. `keepSentQueue`
2. `failedUpload`
3. `devUpload`
4. `appUpload`

## Log Levels

Here is the diagram explaining how log level settings work for logs generated by the device itself:

![Log Level Diagram](images/eve-log-levels.png)

**Note:** As can be seen from the flow diagram, the remote log levels (`debug.default.remote.loglevel`, `agent.agentname.debug.remote.loglevel`, `debug.kernel.remote.loglevel` and `debug.syslog.remote.loglevel`) should be set to levels equal to or less verbose than the baseline log levels.
Setting them to a more verbose level wouldn't result in additional logs being uploaded since those logs aren't generated at lower verbosity levels.

There are no granularity nobs for the edge apps' log levels - all logs generated by the edge apps are sent to the controller OR kept on the device, depending on the `VmConfig.disableLogs` value (see this [section](#policy-for-application-logging-export-to-cloud-or-stay-on-device)).

User can use config-properties to set a log file maximum quota in Mbytes on the device, using the 'newlog.gzipfiles.ondisk.maxmegabytes' config-item, the default is 2048 Mbytes, the configurable range is within (10, 4294967295) Mbytes and the quota is capped at 10% of '/persist' disk size. Since the device retains logs in the 'collect', 'appUpload', 'devUpload', 'keepSentQueue' and 'failedUpload' directories which together form a circular buffer, when the quota is exceeded on the device, the log files are removed starting from the oldest in the 'keepSentQueue' directory until the total log file size is below the quota.
For the full list of log level paramters and the possible values, see the [config-properties](CONFIG-PROPERTIES.md#log-levels) doc.

## Log export to cloud

Expand Down Expand Up @@ -78,7 +115,8 @@ The uploading is controlled on a scheduled timer. When the timer fires, the "log

The "loguploader" collects stats of round-trip delay, controller CPU load percentage and log batch processing time. The current EVE implementation does not use those stats in calculating the uploading timer values.

The already uploaded gzip files are moved to /persist/newlog/keepSentQueue directory. This directory and together with 'collect', 'appUpload', 'devUpload' directories form a circular buffer and will be kept up to the quota limit (default is 2 Gbytes and can be changed by user config-item).
The already uploaded gzip files with app logs are moved to /persist/newlog/keepSentQueue directory or removed in case of dev.log.upload files.
This directory and together with 'collect', 'appUpload', 'devUpload' directories form a circular buffer and will be kept up to the quota limit (default is 2 Gbytes and can be changed by user config-item).

To prevent the log messages grow without bounds over time, the 'failedUpload' directory will only keep up to 1000 gzip files, each with maximum of 50K, to be under 50M in the directory. The '/persist' partition space is monitored, and if the available space is under 100M, the 'newlogd' will kick in the gzip file recycle operation just as the controller uplink is unreachable.

Expand Down
Loading

0 comments on commit 846b26b

Please sign in to comment.