From bd399a4eda59667d1998896f5427084770e20d0b Mon Sep 17 00:00:00 2001 From: Tom de Bruijn Date: Wed, 29 May 2019 10:31:32 +0200 Subject: [PATCH] Fix CPU metrics calculations for containers ## The problem The previous implementation of CPU metrics for containers was inaccurate. It reported trends (more/less CPU usage), but didn't accurately report the CPU metrics. This was the implementation was using the "total CPU usage" as the "total CPU time", which is not the same. The "total CPU usage" is how much the CPU was used during our measurements, but does not account for the CPU idle time, which the "total CPU time" does include. This caused the CPU metrics being reported to combine towards a total of a 100% usage, while the container itself may have only used 1/2% of the CPU at that time. The breakdown between `user` and `system` only applied to the `total usage` and not the "total CPU time". The implementation didn't account for the `idle` time which a CPU has, because the `/sys/fs` virtual file system does not report this value. It reports a total usage (combined `user` group, `system` group and whatever it doesn't expose in separate groups), and a breakdown for the `user` and `system` groups. ## The solution Instead we can use the timeframe in which the measurements were done, 60 seconds, as the "total CPU time". These values can be compare because the `total usage`, `user` and `system` values are also reported in nanoseconds. (Nanoseconds for `total usage` and `seconds / 100` for the `user` and `system` groups.) 60 seconds is the time in which we measure. We will use this as the "total CPU time". The delta values of the values reported for `total usage`, `user` and `system`, the time how long they spent on these groups within those 60 seconds, can then be used to compare the percentages for those values. Using the calculation: `delta value / total CPU time (60 seconds) * 100 = value in %` For example: ``` time frame = 60 seconds = 100% (60 / 60 * 100 = 100) total usage = 30 seconds = 50% (30 / 60 * 100 = 50) user = 24 seconds = 40% (24 / 60 * 100 = 40) system = 6 seconds = 10% ( 6 / 60 * 100 = 10) ``` Where with the previously implementation the result would have been: ``` total usage = 30 seconds = 100% (30 / 30 * 100 = 50) user = 24 seconds = 40% (24 / 30 * 100 = 80) system = 6 seconds = 10% ( 6 / 30 * 100 = 20) ``` In addition we now report the `total_usage` of the container CPU, rather than only the breakdown between the `user` and `system` groups. This implementation differs from the non-container implementation for CPU metrics, where we do not have a `total_usage` metric. ### When a container has multiple CPU cores When a container has multiple CPU cores the measurement time does not change. But the container can use more resources during that time, counting towards a higher reported usage value. This means that a container with 2 CPU cores running both cores to their maximum during the measurement time will report 200% in total usage. This implementation will report the same metrics as the output from `docker stats`. We've decided to keep the reported values the same so there is less confusion about what is the accurate metric. This creates a difference in reporting for CPU usage between container systems and non-container systems. For non-containers all the reported CPU group metrics combine together towards a 100% value regardless of how many cores are available, this is because we know the "total CPU time" there as the idle time and other groups are exposed. See: https://github.com/appsignal/probes-rs/blob/e2ac0412f6357d64423073adf82009a116ae226b/src/cpu.rs#L141-L162 If we would want to always report a 100% value no matter how many CPU cores a container has available, we can divide the reported values by the number of CPU cores. This is what we did here in our test script: https://gist.github.com/tombruijn/5f5e0c34af40cfb3967eca81ad8b5317#file-appsignal_stats-rb-L141 and is what `ctop -scale-cpu` (ctop (https://bcicen.github.io/ctop/) being an alternative to `docker stats`). But please consider the following section as well as that will not fix the problem with multiple containers on one host system. ### About container neighbors While we report the same values as `docker stats`, `docker stats` doesn't actually report the container's CPU usage. Instead it shows the container's CPU usage impact on the host system. This is not immediately visible when using one container on the host system, but when multiple containers are running at the same time this problem becomes more clear. Multiple containers running on the same host system will impact the readings of `docker stats` as the host's resources will be shared among those containers. For example: We have a Docker system on our host machine that has 3 CPU cores available. We then start containers without any specific CPU limits. - In the scenario of one container maxing out its CPU cores it will be reported as 300% CPU usage by `docker stats`. Which is the total CPU time available. - When three containers are maxing out their CPU cores it will be reported by `docker stats` as 100% CPU usage for every container as the resources are shared amongst them evenly. The problem here remains that we do not know the container's actual "total CPU time" it has available, total usage + idle time, isolated to the container itself. To make it more complicated, if all container would max out their CPU, the containers' CPU time gets throttled as well. Which is something we may want to expose in the future as it shows when a container has less resources available at which time. ## Acknowledgements and sources Based on the implementation of this Ruby script written together with Robert Beekman @matsimitsu: https://gist.github.com/tombruijn/5f5e0c34af40cfb3967eca81ad8b5317 Thanks Robert! Based on the docs as described here: https://docs.docker.com/config/containers/runmetrics/ Co-authored-by: Robert Beekman --- .../sys/fs/cgroup/cpuacct_1/cpuacct.stat | 4 +- .../sys/fs/cgroup/cpuacct_1/cpuacct.usage | 2 +- .../sys/fs/cgroup/cpuacct_2/cpuacct.stat | 4 +- .../sys/fs/cgroup/cpuacct_2/cpuacct.usage | 2 +- src/cpu/cgroup.rs | 405 ++++++------------ 5 files changed, 132 insertions(+), 285 deletions(-) diff --git a/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.stat b/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.stat index 065a839..da15820 100644 --- a/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.stat +++ b/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.stat @@ -1,2 +1,2 @@ -user 404 -system 749 +user 14934 +system 98 diff --git a/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.usage b/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.usage index cfe8384..56ca04f 100644 --- a/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.usage +++ b/fixtures/linux/sys/fs/cgroup/cpuacct_1/cpuacct.usage @@ -1 +1 @@ -13953750329 +152657213021 diff --git a/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.stat b/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.stat index 18a685f..5be1b68 100644 --- a/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.stat +++ b/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.stat @@ -1,2 +1,2 @@ -user 504 -system 849 +user 17783 +system 121 diff --git a/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.usage b/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.usage index df24264..f5d38c2 100644 --- a/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.usage +++ b/fixtures/linux/sys/fs/cgroup/cpuacct_2/cpuacct.usage @@ -1 +1 @@ -23953750329 +182405617026 diff --git a/src/cpu/cgroup.rs b/src/cpu/cgroup.rs index e952e71..caf8c1d 100644 --- a/src/cpu/cgroup.rs +++ b/src/cpu/cgroup.rs @@ -2,90 +2,58 @@ use super::super::{Result,calculate_time_difference,time_adjusted}; /// Measurement of cpu stats at a certain time #[derive(Debug,PartialEq)] -pub struct CpuMeasurement { +pub struct CgroupCpuMeasurement { pub precise_time_ns: u64, - pub stat: CpuStat + pub stat: CgroupCpuStat } -impl CpuMeasurement { - /// Calculate the cpu stats based on this measurement and a measurement in the future. - /// It is advisable to make the next measurement roughly a minute from this one for the - /// most reliable result. - pub fn calculate_per_minute(&self, next_measurement: &CpuMeasurement) -> Result { +impl CgroupCpuMeasurement { + pub fn calculate_per_minute(&self, next_measurement: &CgroupCpuMeasurement) -> Result { let time_difference = calculate_time_difference(self.precise_time_ns, next_measurement.precise_time_ns)?; - Ok(CpuStat { - total: time_adjusted("total", next_measurement.stat.total, self.stat.total, time_difference)?, + Ok(CgroupCpuStat { + total_usage: time_adjusted("total_usage", next_measurement.stat.total_usage, self.stat.total_usage, time_difference)?, user: time_adjusted("user", next_measurement.stat.user, self.stat.user, time_difference)?, - nice: time_adjusted("nice", next_measurement.stat.nice, self.stat.nice, time_difference)?, - system: time_adjusted("system", next_measurement.stat.system, self.stat.system, time_difference)?, - idle: time_adjusted("idle", next_measurement.stat.idle, self.stat.idle, time_difference)?, - iowait: time_adjusted("iowait", next_measurement.stat.iowait, self.stat.iowait, time_difference)?, - irq: time_adjusted("irq", next_measurement.stat.irq, self.stat.irq, time_difference)?, - softirq: time_adjusted("softirq", next_measurement.stat.softirq, self.stat.softirq, time_difference)?, - steal: time_adjusted("steal", next_measurement.stat.steal, self.stat.steal, time_difference)?, - guest: time_adjusted("guest", next_measurement.stat.guest, self.stat.guest, time_difference)?, - guestnice: time_adjusted("guestnice", next_measurement.stat.guestnice, self.stat.guestnice, time_difference)? + system: time_adjusted("system", next_measurement.stat.system, self.stat.system, time_difference)? }) } } -/// Cpu stats for a minute +/// Container CPU stats for a minute #[derive(Debug,PartialEq)] -pub struct CpuStat { - pub total: u64, +pub struct CgroupCpuStat { + pub total_usage: u64, pub user: u64, - pub nice: u64, - pub system: u64, - pub idle: u64, - pub iowait: u64, - pub irq: u64, - pub softirq: u64, - pub steal: u64, - pub guest: u64, - pub guestnice: u64 + pub system: u64 } -impl CpuStat { +impl CgroupCpuStat { /// Calculate the weight of the various components in percentages - pub fn in_percentages(&self) -> CpuStatPercentages { - CpuStatPercentages { + pub fn in_percentages(&self) -> CgroupCpuStatPercentages { + CgroupCpuStatPercentages { + total_usage: self.percentage_of_total(self.total_usage), user: self.percentage_of_total(self.user), - nice: self.percentage_of_total(self.nice), - system: self.percentage_of_total(self.system), - idle: self.percentage_of_total(self.idle), - iowait: self.percentage_of_total(self.iowait), - irq: self.percentage_of_total(self.irq), - softirq: self.percentage_of_total(self.softirq), - steal: self.percentage_of_total(self.steal), - guest: self.percentage_of_total(self.guest), - guestnice: self.percentage_of_total(self.guestnice) + system: self.percentage_of_total(self.system) } } fn percentage_of_total(&self, value: u64) -> f32 { - (value as f64 / self.total as f64 * 100.0) as f32 + // 60_000_000_000 being the total value. This is 60 seconds expressed in nanoseconds. + (value as f32 / 60_000_000_000.0) * 100.0 } } -/// Cpu stats converted to percentages +/// Cgroup Cpu stats converted to percentages #[derive(Debug,PartialEq)] -pub struct CpuStatPercentages { +pub struct CgroupCpuStatPercentages { + pub total_usage: f32, pub user: f32, - pub nice: f32, - pub system: f32, - pub idle: f32, - pub iowait: f32, - pub irq: f32, - pub softirq: f32, - pub steal: f32, - pub guest: f32, - pub guestnice: f32 + pub system: f32 } -/// Read the current CPU stats of the system. +/// Read the current CPU stats of the container. #[cfg(target_os = "linux")] -pub fn read() -> Result { +pub fn read() -> Result { os::read() } @@ -95,13 +63,12 @@ mod os { use std::io::BufRead; use time; use super::super::super::{Result,file_to_buf_reader,parse_u64,path_to_string,read_file_value_as_u64,dir_exists}; - use super::{CpuMeasurement,CpuStat}; + use super::{CgroupCpuMeasurement,CgroupCpuStat}; use error::ProbeError; const CPU_SYS_NUMBER_OF_FIELDS: usize = 2; - #[inline] - pub fn read() -> Result { + pub fn read() -> Result { let sys_fs_dir = Path::new("/sys/fs/cgroup/cpuacct/"); if dir_exists(sys_fs_dir) { read_and_parse_sys_stat(&sys_fs_dir) @@ -111,23 +78,15 @@ mod os { } } - pub fn read_and_parse_sys_stat(path: &Path) -> Result { + pub fn read_and_parse_sys_stat(path: &Path) -> Result { let time = time::precise_time_ns(); let reader = file_to_buf_reader(&path.join("cpuacct.stat"))?; - let total = nano_to_user(read_file_value_as_u64(&path.join("cpuacct.usage"))?); + let total_usage = read_file_value_as_u64(&path.join("cpuacct.usage"))?; - let mut cpu = CpuStat { - total: total, + let mut cpu = CgroupCpuStat { + total_usage: total_usage, user: 0, - system: 0, - nice: 0, - idle: 0, - iowait: 0, - irq: 0, - softirq: 0, - steal: 0, - guest: 0, - guestnice: 0 + system: 0 }; let mut fields_encountered = 0; @@ -137,11 +96,11 @@ mod os { let value = parse_u64(&segments[1])?; fields_encountered += match segments[0] { "user" => { - cpu.user = value; + cpu.user = value * 10_000_000; 1 }, "system" => { - cpu.system = value; + cpu.system = value * 10_000_000; 1 }, _ => 0 @@ -155,30 +114,23 @@ mod os { if fields_encountered != CPU_SYS_NUMBER_OF_FIELDS { return Err(ProbeError::UnexpectedContent("Did not encounter all expected fields".to_owned())) } - - Ok(CpuMeasurement { + let measurement = CgroupCpuMeasurement { precise_time_ns: time, stat: cpu - }) - } - - // [CPU usage] times are expressed in ticks of 1/100th of a second, also called "user jiffies". - // There are USER_HZ “jiffies” per second, and on x86 systems, USER_HZ is 100. - // See: https://docs.docker.com/config/containers/runmetrics/#cpu-metrics-cpuacctstat - fn nano_to_user(value: u64) -> u64 { - value.checked_div(10_000_000).unwrap_or(0) + }; + Ok(measurement) } } #[cfg(test)] mod test { - use super::{CpuMeasurement,CpuStat,CpuStatPercentages}; + use super::{CgroupCpuMeasurement,CgroupCpuStat}; use super::os::read_and_parse_sys_stat; use std::path::Path; use error::ProbeError; #[test] - fn test_read_cpu() { + fn test_read() { assert!(super::read().is_ok()); } @@ -186,17 +138,9 @@ mod test { fn test_read_sys_measurement() { let measurement = read_and_parse_sys_stat(&Path::new("fixtures/linux/sys/fs/cgroup/cpuacct_1/")).unwrap(); let cpu = measurement.stat; - assert_eq!(cpu.total, 1395); - assert_eq!(cpu.user, 404); - assert_eq!(cpu.nice, 0); - assert_eq!(cpu.system, 749); - assert_eq!(cpu.idle, 0); - assert_eq!(cpu.iowait, 0); - assert_eq!(cpu.irq, 0); - assert_eq!(cpu.softirq, 0); - assert_eq!(cpu.steal, 0); - assert_eq!(cpu.guest, 0); - assert_eq!(cpu.guestnice, 0); + assert_eq!(cpu.total_usage, 152657213021); + assert_eq!(cpu.user, 149340000000); + assert_eq!(cpu.system, 980000000); } #[test] @@ -226,37 +170,21 @@ mod test { #[test] fn test_calculate_per_minute_wrong_times() { - let measurement1 = CpuMeasurement { + let measurement1 = CgroupCpuMeasurement { precise_time_ns: 90_000_000_000, - stat: CpuStat { - total: 0, + stat: CgroupCpuStat { + total_usage: 0, user: 0, - nice: 0, - system: 0, - idle: 0, - iowait: 0, - irq: 0, - softirq: 0, - steal: 0, - guest: 0, - guestnice: 0 + system: 0 } }; - let measurement2 = CpuMeasurement { + let measurement2 = CgroupCpuMeasurement { precise_time_ns: 60_000_000_000, - stat: CpuStat { - total: 0, + stat: CgroupCpuStat { + total_usage: 0, user: 0, - nice: 0, - system: 0, - idle: 0, - iowait: 0, - irq: 0, - softirq: 0, - steal: 0, - guest: 0, - guestnice: 0 + system: 0 } }; @@ -266,54 +194,31 @@ mod test { } } + #[test] - fn test_calculate_per_minute_full_minute() { - let measurement1 = CpuMeasurement { + fn test_cgroup_calculate_per_minute_full_minute() { + let measurement1 = CgroupCpuMeasurement { precise_time_ns: 60_000_000_000, - stat: CpuStat { - total: 6380, + stat: CgroupCpuStat { + total_usage: 6380, user: 1000, - nice: 1100, - system: 1200, - idle: 1300, - iowait: 1400, - irq: 50, - softirq: 10, - steal: 20, - guest: 200, - guestnice: 100 + system: 1200 } }; - let measurement2 = CpuMeasurement { + let measurement2 = CgroupCpuMeasurement { precise_time_ns: 120_000_000_000, - stat: CpuStat { - total: 6440, + stat: CgroupCpuStat { + total_usage: 6440, user: 1006, - nice: 1106, - system: 1206, - idle: 1306, - iowait: 1406, - irq: 56, - softirq: 16, - steal: 26, - guest: 206, - guestnice: 106 + system: 1206 } }; - let expected = CpuStat { - total: 60, + let expected = CgroupCpuStat { + total_usage: 60, user: 6, - nice: 6, - system: 6, - idle: 6, - iowait: 6, - irq: 6, - softirq: 6, - steal: 6, - guest: 6, - guestnice: 6 + system: 6 }; let stat = measurement1.calculate_per_minute(&measurement2).unwrap(); @@ -323,52 +228,28 @@ mod test { #[test] fn test_calculate_per_minute_partial_minute() { - let measurement1 = CpuMeasurement { + let measurement1 = CgroupCpuMeasurement { precise_time_ns: 60_000_000_000, - stat: CpuStat { - total: 6380, - user: 1000, - nice: 1100, - system: 1200, - idle: 1300, - iowait: 1400, - irq: 50, - softirq: 10, - steal: 20, - guest: 200, - guestnice: 100 + stat: CgroupCpuStat { + total_usage: 1_000_000_000, + user: 10000_000_000, + system: 12000_000_000 } }; - let measurement2 = CpuMeasurement { + let measurement2 = CgroupCpuMeasurement { precise_time_ns: 90_000_000_000, - stat: CpuStat { - total: 6440, - user: 1006, - nice: 1106, - system: 1206, - idle: 1306, - iowait: 1406, - irq: 56, - softirq: 16, - steal: 26, - guest: 206, - guestnice: 106 + stat: CgroupCpuStat { + total_usage: 1_500_000_000, + user: 10060_000_000, + system: 12060_000_000 } }; - let expected = CpuStat { - total: 120, - user: 12, - nice: 12, - system: 12, - idle: 12, - iowait: 12, - irq: 12, - softirq: 12, - steal: 12, - guest: 12, - guestnice: 12 + let expected = CgroupCpuStat { + total_usage: 1_000_000_000, + user: 120_000_000, + system: 120_000_000 }; let stat = measurement1.calculate_per_minute(&measurement2).unwrap(); @@ -378,37 +259,21 @@ mod test { #[test] fn test_calculate_per_minute_values_lower() { - let measurement1 = CpuMeasurement { + let measurement1 = CgroupCpuMeasurement { precise_time_ns: 60_000_000_000, - stat: CpuStat { - total: 6380, - user: 1000, - nice: 1100, - system: 1200, - idle: 1300, - iowait: 1400, - irq: 50, - softirq: 10, - steal: 20, - guest: 200, - guestnice: 100 + stat: CgroupCpuStat { + total_usage: 63800_000_000, + user: 10000_000_000, + system: 12000_000_000 } }; - let measurement2 = CpuMeasurement { + let measurement2 = CgroupCpuMeasurement { precise_time_ns: 90_000_000_000, - stat: CpuStat { - total: 1040, - user: 106, - nice: 116, - system: 126, - idle: 136, - iowait: 146, - irq: 56, - softirq: 16, - steal: 26, - guest: 206, - guestnice: 106 + stat: CgroupCpuStat { + total_usage: 10400_000_000, + user: 1060_000_000, + system: 1260_000_000 } }; @@ -420,85 +285,67 @@ mod test { #[test] fn test_in_percentages() { - let stat = CpuStat { - total: 1000, - user: 450, - nice: 70, - system: 100, - idle: 100, - iowait: 120, - irq: 10, - softirq: 20, - steal: 50, - guest: 50, - guestnice: 30 + let stat = CgroupCpuStat { + total_usage: 24000000000, + user: 16800000000, + system: 1200000000 }; - let expected = CpuStatPercentages { - user: 45.0, - nice: 7.0, - system: 10.0, - idle: 10.0, - iowait: 12.0, - irq: 1.0, - softirq: 2.0, - steal: 5.0, - guest: 5.0, - guestnice: 3.0 - }; + let in_percentages = stat.in_percentages(); - assert_eq!(stat.in_percentages(), expected); + // Rounding in the floating point calculations can vary, so check if this + // is in the correct range. + assert!(in_percentages.total_usage > 39.9); + assert!(in_percentages.total_usage <= 40.0); + + assert!(in_percentages.user > 27.9); + assert!(in_percentages.user <= 28.0); + + assert!(in_percentages.system > 1.9); + assert!(in_percentages.system <= 2.0); } #[test] fn test_in_percentages_fractions() { - let stat = CpuStat { - total: 1000, - user: 445, - nice: 65, - system: 100, - idle: 100, - iowait: 147, - irq: 1, - softirq: 2, - steal: 50, - guest: 55, - guestnice: 35 + let stat = CgroupCpuStat { + total_usage: 24000000000, + user: 17100000000, + system: 900000000 }; - let expected = CpuStatPercentages { - user: 44.5, - nice: 6.5, - system: 10.0, - idle: 10.0, - iowait: 14.7, - irq: 0.1, - softirq: 0.2, - steal: 5.0, - guest: 5.5, - guestnice: 3.5 - }; + let in_percentages = stat.in_percentages(); + + // Rounding in the floating point calculations can vary, so check if this + // is in the correct range. + assert!(in_percentages.total_usage > 39.9); + assert!(in_percentages.total_usage <= 40.0); + + assert!(in_percentages.user > 28.4); + assert!(in_percentages.user <= 28.5); - assert_eq!(stat.in_percentages(), expected); + assert!(in_percentages.system > 1.4); + assert!(in_percentages.system <= 1.5); } #[test] fn test_in_percentages_integration() { - let mut measurement1 = read_and_parse_sys_stat(&Path::new("fixtures/linux/sys/fs/cgroup/cpuacct_1")).unwrap(); - measurement1.precise_time_ns = 60_000_000_000; - let mut measurement2 = read_and_parse_sys_stat(&Path::new("fixtures/linux/sys/fs/cgroup/cpuacct_2")).unwrap(); - measurement2.precise_time_ns = 120_000_000_000; + let mut measurement1 = read_and_parse_sys_stat(&Path::new("fixtures/linux/sys/fs/cgroup/cpuacct_1/")).unwrap(); + measurement1.precise_time_ns = 375953965125920; + let mut measurement2 = read_and_parse_sys_stat(&Path::new("fixtures/linux/sys/fs/cgroup/cpuacct_2/")).unwrap(); + measurement2.precise_time_ns = 376013815302920; let stat = measurement1.calculate_per_minute(&measurement2).unwrap(); let in_percentages = stat.in_percentages(); // Rounding in the floating point calculations can vary, so check if this // is in the correct range. + assert!(in_percentages.total_usage > 49.70); + assert!(in_percentages.total_usage < 49.71); - assert!(in_percentages.user >= 10.0); - assert!(in_percentages.user <= 10.0); + assert!(in_percentages.user > 47.60); + assert!(in_percentages.user < 47.61); - assert!(in_percentages.system >= 10.0); - assert!(in_percentages.system <= 10.0); + assert!(in_percentages.system > 0.38); + assert!(in_percentages.system < 0.39); } }