Alternative cpu utilization reporting #1002

ArneTR · 2024-11-25T16:45:34Z

@ribalba Can you please review. There is still the same discrepancy between the sum of cgroup reported values and the procfs one.

Example: https://metrics.green-coding.io/stats.html?id=c87ef224-9137-46d4-b5d2-60fca94dfc39

I also tried on another linux framebook and also tried with only one container.

If one container is on the machine the discrepancy is < 1% (absolute). But with multiple containers it stays at around 6%

github-actions · 2024-11-25T16:54:39Z

Old Energy Estimation

Eco-CI Output:

Label	🖥 avg. CPU utilization [%]	🔋 Total Energy [Joules]	🔌 avg. Power [Watts]	Duration [Seconds]
Total Run (incl. overhead)	27.1271	2155.04	4.06	531.45
Measurement #1	27.0865	2155.04	4.07	529.77

🌳 CO2 Data:
City: Chicago, Lat: 41.8874, Lon: -87.6318
IP: 20.102.223.129
CO₂ from energy is: 0.795209760 g
CO₂ from manufacturing (embodied carbon) is: 0.151630023 g
Carbon Intensity for this location: 369 gCO₂eq/kWh
SCI: 0.946840 gCO₂eq / pipeline run emitted

ArneTR · 2024-11-25T19:25:00Z

Ok, I have it. it is the overhead of the X11 window compositor.

If I turn that off the values strongly align. This is why the process was not spotted in the idle phase as existing overhead. It only activates once the windows are painted ...

ArneTR · 2024-11-26T07:57:26Z

Reopening to finalize discussion.

Also worth discussing: How to distribute power between containers with idea that some container "work" might be done outside of what the cpu utilization captures e.g. in X11 / Wayland

github-actions · 2024-11-26T12:41:51Z

Old Energy Estimation

Eco-CI Output:

Label	🖥 avg. CPU utilization [%]	🔋 Total Energy [Joules]	🔌 avg. Power [Watts]	Duration [Seconds]
Total Run (incl. overhead)	27.5502	2049.38	4.07	503.95
Measurement #1	27.5201	2049.38	4.08	502.25

🌳 CO2 Data:
City: Boydton, Lat: 36.6676, Lon: -78.3875
IP: 172.200.181.112
CO₂ from energy is: 0.709085480 g
CO₂ from manufacturing (embodied carbon) is: 0.143783894 g
Carbon Intensity for this location: 346 gCO₂eq/kWh
SCI: 0.852869 gCO₂eq / pipeline run emitted

* main: Setting run failed always Bump tqdm from 4.67.0 to 4.67.1 (#1000) Bump playwright/python in /docker/auxiliary-containers/gcb_playwright (#999) Bump orjson from 3.10.11 to 3.10.12 (#1001) Bump pytest-playwright from 0.5.2 to 0.6.2 (#1004) Bump pydantic from 2.10.1 to 2.10.2 (#1005)

github-actions · 2024-11-28T19:19:02Z

Old Energy Estimation

Eco-CI Output:

Label	🖥 avg. CPU utilization [%]	🔋 Total Energy [Joules]	🔌 avg. Power [Watts]	Duration [Seconds]
Total Run (incl. overhead)	25.3787	2481.31	3.99	621.38
Measurement #1	25.3473	2481.31	4.01	619.30

🌳 CO2 Data:
City: Phoenix, Lat: 33.4475, Lon: -112.0866
IP: 20.169.14.0
CO₂ from energy is: 0.471448900 g
CO₂ from manufacturing (embodied carbon) is: 0.177288294 g
Carbon Intensity for this location: 190 gCO₂eq/kWh
SCI: 0.648737 gCO₂eq / pipeline run emitted

* main: Adding measurement settings to the measurement view

github-actions · 2024-11-29T09:17:30Z

Old Energy Estimation

Eco-CI Output:

Label	🖥 avg. CPU utilization [%]	🔋 Total Energy [Joules]	🔌 avg. Power [Watts]	Duration [Seconds]
Total Run (incl. overhead)	27.5029	2184.39	4.07	536.54
Measurement #1	27.4774	2184.39	4.09	534.72

🌳 CO2 Data:
City: Chicago, Lat: 41.8874, Lon: -87.6318
IP: 20.25.192.65
CO₂ from energy is: 0.779827230 g
CO₂ from manufacturing (embodied carbon) is: 0.153082271 g
Carbon Intensity for this location: 357 gCO₂eq/kWh
SCI: 0.932910 gCO₂eq / pipeline run emitted

github-actions · 2024-12-01T16:06:36Z

Eco-CI Output:

Label	🖥 avg. CPU utilization [%]	🔋 Total Energy [Joules]	🔌 avg. Power [Watts]	Duration [Seconds]
Total Run (incl. overhead)	27.3226	2127.11	4.05	524.87
Measurement #1	27.2867	2127.11	4.07	523.16

🌳 CO2 Data:
City: Boydton, Lat: 36.6676, Lon: -78.3875
IP: 20.57.44.193
CO₂ from energy is: 0.882750650 g
CO₂ from manufacturing (embodied carbon) is: 0.149752658 g
Carbon Intensity for this location: 415 gCO₂eq/kWh
SCI: 1.032503 gCO₂eq / pipeline run emitted

* main: Tcp dump (#919) Hash must be decoded to understand spaces [skip ci] Allowing Deeplinks to specific phases [skip ci] (#1016) Bump python from 3.13.0-slim-bookworm to 3.13.1-slim-bookworm in /docker (#1015) Bump pydantic from 2.10.2 to 2.10.3 (#1010) Bump fastapi[standard] from 0.115.5 to 0.115.6 (#1011) Bump aiohttp from 3.11.9 to 3.11.10 (#1013) Bump redis from 5.2.0 to 5.2.1 (#1014) Bump python from 3.12.7-slim-bookworm to 3.13.0-slim-bookworm in /docker (#949) Bump hiredis from 3.0.0 to 3.1.0 (#1012) Bump pylint from 3.3.1 to 3.3.2 (#1008) Bump pytest from 8.3.3 to 8.3.4 (#1007) Bump aiohttp from 3.11.7 to 3.11.9 (#1009) Added kill script for GMT Adding cachetools as requirement EE Update

* main: Adding not implemented error Kill script for GMT must use full commandline when killing reporters +x for tcpdump

ArneTR · 2024-12-11T07:29:01Z

@greptileai

ArneTR · 2024-12-11T08:47:34Z

@greptileai

ArneTR · 2024-12-13T09:53:02Z

@greptileai

greptile-apps

PR Summary

This PR introduces system-level metric providers for CPU, memory, disk, and network monitoring using cgroups, with a focus on addressing CPU utilization reporting discrepancies between cgroup and procfs measurements.

Identified ~6% discrepancy in CPU utilization between cgroup and procfs when running multiple containers, but <1% with single containers
Added new cgroup path detection patterns in detect_cgroup_path.c for Window Managers and Session applications
Removed virtualization-related CPU times (steal_time, guest_time) from calculations in source.c files
Potential memory management issues in several providers' source.c files, including memory leaks and unsafe realloc usage
Incorrect super() calls in provider.py files could cause initialization problems in system metric providers

_{💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!}

_{23 file(s) reviewed, 21 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps · 2024-12-13T09:53:49Z

lib/phase_stats.py

@@ -179,7 +181,7 @@ def build_and_store_phase_stats(run_id, sci=None):
        if phase['name'] == '[RUNTIME]' and machine_carbon_in_ug is not None and sci is not None and sci.get('R', 0) != 0:
            csv_buffer.write(generate_csv_line(run_id, 'software_carbon_intensity_global', '[SYSTEM]', f"{idx:03}_{phase['name']}", (machine_carbon_in_ug + embodied_carbon_share_ug + network_io_carbon_in_ug) / sci['R'], 'TOTAL', None, None, f"ugCO2e/{sci['R_d']}"))

-        if machine_power_baseline and cpu_utilization_machine and cpu_utilization_containers:
+        if machine_power_phase and machine_power_baseline and cpu_utilization_machine and cpu_utilization_containers:


logic: The condition should also check if machine_energy_phase is not None before calculations since it's used in line 186

greptile-apps · 2024-12-13T09:54:05Z

lib/utils.py

+        if found_cgroups := len(lines) != 1:
+            raise RuntimeError(f"Could not find GMT\'s own cgroup or found too many. Amount: {found_cgroups}")


logic: Logic error in condition. found_cgroups := len(lines) != 1 assigns a boolean, but error message refers to amount. Should be found_cgroups = len(lines) followed by if found_cgroups != 1:

greptile-apps · 2024-12-13T09:54:05Z

lib/utils.py

+    with open(f"/proc/{current_pid}/cgroup", 'r', encoding='utf-8') as file:
+        lines = file.readlines()


logic: No error handling for file not found or permission denied when accessing /proc/{pid}/cgroup

greptile-apps · 2024-12-13T09:54:06Z

lib/utils.py

+            raise RuntimeError(f"Could not find GMT\'s own cgroup or found too many. Amount: {found_cgroups}")
+        return lines[0].split('/')[-1].strip()


style: Assumes cgroup path format is consistent. Should handle different cgroup hierarchy formats and empty lines

greptile-apps · 2024-12-13T09:54:21Z

metric_providers/cpu/utilization/cgroup/system/Makefile

@@ -0,0 +1,4 @@
+CFLAGS = -O3 -Wall -I../../../../../lib/c


style: consider adding -Werror to catch potential issues at compile time

greptile-apps · 2024-12-13T09:56:19Z

metric_providers/memory/used/cgroup/system/provider.py

+
+class MemoryUsedCgroupSystemProvider(MemoryUsedCgroupContainerProvider):
+    def __init__(self, resolution, skip_check=False, cgroups: dict = None):
+        super(MemoryUsedCgroupContainerProvider, self).__init__( # this will call BaseMetricProvider


logic: super() call uses wrong class name - should be super().init() or super(MemoryUsedCgroupSystemProvider, self).init()

greptile-apps · 2024-12-13T09:56:28Z

metric_providers/network/io/cgroup/system/Makefile

@@ -0,0 +1,6 @@
+CFLAGS = -O3 -Wall -lc -I../../../../../lib/c


style: -lc flag is unnecessary since libc is linked by default in gcc

greptile-apps · 2024-12-13T09:56:28Z

metric_providers/network/io/cgroup/system/Makefile

+	sudo chown root $@
+	sudo chmod u+s $@


style: sudo commands in Makefile may fail in some CI environments - consider making these optional or documenting requirements

greptile-apps · 2024-12-13T09:56:46Z

metric_providers/network/io/cgroup/system/README.md

@@ -0,0 +1,3 @@
+# Documentation
+
+Please see https://docs.green-coding.io/docs/measuring/metric-providers/network-io-cgroup-container/ for details


logic: URL points to network-io-cgroup-container instead of network-io-cgroup-system documentation

greptile-apps · 2024-12-13T09:56:56Z

metric_providers/network/io/cgroup/system/provider.py

+    def start_profiling(self, containers=None):
+        super().start_profiling(self._cgroups) # we hook here into the mechanism that can supply container names to the parent function
+
+    def read_metrics(self, run_id, containers=None):
+        return super().read_metrics(run_id, self._cgroups) # this will call NetworkIoCgroupContainerProvider


style: check if containers parameter is actually needed since it's always overridden with self._cgroups

ArneTR added 3 commits November 25, 2024 16:36

Alternative CPU % reporting

3e34b52

Using full time again in cgroup

e35133f

Removed old comment

15d6560

ArneTR requested a review from ribalba November 25, 2024 16:45

ArneTR closed this Nov 25, 2024

ArneTR reopened this Nov 26, 2024

Added CPU Utilization Cgroup System reporter

78c84c1

ArneTR added 3 commits November 26, 2024 14:12

More checks [skip ci]

5303a1d

Self-adding GMT Overhead to the monitored cgroups

f1d3af8

Merge branch 'main' into alternative-cpu-utilization-reporting

2ef47a6

* main: Adding measurement settings to the measurement view

Adding more cgroup system providers

23b71e2

ArneTR added 3 commits December 2, 2024 11:31

Nice names for JS frontend [skip ci]

7af955a

Merge branch 'main' into alternative-cpu-utilization-reporting [skip ci]

97e298d

* main: Adding not implemented error Kill script for GMT must use full commandline when killing reporters +x for tcpdump

greptile-apps bot reviewed Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative cpu utilization reporting #1002

Alternative cpu utilization reporting #1002

ArneTR commented Nov 25, 2024

github-actions bot commented Nov 25, 2024 •

edited

Loading

ArneTR commented Nov 25, 2024

ArneTR commented Nov 26, 2024

github-actions bot commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 29, 2024 •

edited

Loading

github-actions bot commented Dec 1, 2024

ArneTR commented Dec 11, 2024

ArneTR commented Dec 11, 2024

ArneTR commented Dec 13, 2024

greptile-apps bot left a comment

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

greptile-apps bot Dec 13, 2024

		if found_cgroups := len(lines) != 1:
		raise RuntimeError(f"Could not find GMT\'s own cgroup or found too many. Amount: {found_cgroups}")

		with open(f"/proc/{current_pid}/cgroup", 'r', encoding='utf-8') as file:
		lines = file.readlines()

		raise RuntimeError(f"Could not find GMT\'s own cgroup or found too many. Amount: {found_cgroups}")
		return lines[0].split('/')[-1].strip()

		@@ -0,0 +1,6 @@
		CFLAGS = -O3 -Wall -lc -I../../../../../lib/c

		@@ -0,0 +1,3 @@
		# Documentation

		Please see https://docs.green-coding.io/docs/measuring/metric-providers/network-io-cgroup-container/ for details

Alternative cpu utilization reporting #1002

Are you sure you want to change the base?

Alternative cpu utilization reporting #1002

Conversation

ArneTR commented Nov 25, 2024

github-actions bot commented Nov 25, 2024 • edited Loading

ArneTR commented Nov 25, 2024

ArneTR commented Nov 26, 2024

github-actions bot commented Nov 26, 2024 • edited Loading

github-actions bot commented Nov 28, 2024 • edited Loading

github-actions bot commented Nov 29, 2024 • edited Loading

github-actions bot commented Dec 1, 2024

ArneTR commented Dec 11, 2024

ArneTR commented Dec 11, 2024

ArneTR commented Dec 13, 2024

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 13, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 25, 2024 •

edited

Loading

github-actions bot commented Nov 26, 2024 •

edited

Loading

github-actions bot commented Nov 28, 2024 •

edited

Loading

github-actions bot commented Nov 29, 2024 •

edited

Loading