Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative cpu utilization reporting #1002

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ArneTR
Copy link
Member

@ArneTR ArneTR commented Nov 25, 2024

@ribalba Can you please review. There is still the same discrepancy between the sum of cgroup reported values and the procfs one.

Example: https://metrics.green-coding.io/stats.html?id=c87ef224-9137-46d4-b5d2-60fca94dfc39

I also tried on another linux framebook and also tried with only one container.

If one container is on the machine the discrepancy is < 1% (absolute). But with multiple containers it stays at around 6%

@ArneTR ArneTR requested a review from ribalba November 25, 2024 16:45
Copy link

github-actions bot commented Nov 25, 2024

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run (incl. overhead) 27.1271 2155.04 4.06 531.45
Measurement #1 27.0865 2155.04 4.07 529.77

🌳 CO2 Data:
City: Chicago, Lat: 41.8874, Lon: -87.6318
IP: 20.102.223.129
CO₂ from energy is: 0.795209760 g
CO₂ from manufacturing (embodied carbon) is: 0.151630023 g
Carbon Intensity for this location: 369 gCO₂eq/kWh
SCI: 0.946840 gCO₂eq / pipeline run emitted

@ArneTR
Copy link
Member Author

ArneTR commented Nov 25, 2024

Ok, I have it. it is the overhead of the X11 window compositor.

If I turn that off the values strongly align. This is why the process was not spotted in the idle phase as existing overhead. It only activates once the windows are painted ...

@ArneTR ArneTR closed this Nov 25, 2024
@ArneTR
Copy link
Member Author

ArneTR commented Nov 26, 2024

Reopening to finalize discussion.

Also worth discussing: How to distribute power between containers with idea that some container "work" might be done outside of what the cpu utilization captures e.g. in X11 / Wayland

@ArneTR ArneTR reopened this Nov 26, 2024
Copy link

github-actions bot commented Nov 26, 2024

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run (incl. overhead) 27.5502 2049.38 4.07 503.95
Measurement #1 27.5201 2049.38 4.08 502.25

🌳 CO2 Data:
City: Boydton, Lat: 36.6676, Lon: -78.3875
IP: 172.200.181.112
CO₂ from energy is: 0.709085480 g
CO₂ from manufacturing (embodied carbon) is: 0.143783894 g
Carbon Intensity for this location: 346 gCO₂eq/kWh
SCI: 0.852869 gCO₂eq / pipeline run emitted

* main:
  Setting run failed always
  Bump tqdm from 4.67.0 to 4.67.1 (#1000)
  Bump playwright/python in /docker/auxiliary-containers/gcb_playwright (#999)
  Bump orjson from 3.10.11 to 3.10.12 (#1001)
  Bump pytest-playwright from 0.5.2 to 0.6.2 (#1004)
  Bump pydantic from 2.10.1 to 2.10.2 (#1005)
Copy link

github-actions bot commented Nov 28, 2024

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run (incl. overhead) 25.3787 2481.31 3.99 621.38
Measurement #1 25.3473 2481.31 4.01 619.30

🌳 CO2 Data:
City: Phoenix, Lat: 33.4475, Lon: -112.0866
IP: 20.169.14.0
CO₂ from energy is: 0.471448900 g
CO₂ from manufacturing (embodied carbon) is: 0.177288294 g
Carbon Intensity for this location: 190 gCO₂eq/kWh
SCI: 0.648737 gCO₂eq / pipeline run emitted

* main:
  Adding measurement settings to the measurement view
Copy link

github-actions bot commented Nov 29, 2024

Old Energy Estimation

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run (incl. overhead) 27.5029 2184.39 4.07 536.54
Measurement #1 27.4774 2184.39 4.09 534.72

🌳 CO2 Data:
City: Chicago, Lat: 41.8874, Lon: -87.6318
IP: 20.25.192.65
CO₂ from energy is: 0.779827230 g
CO₂ from manufacturing (embodied carbon) is: 0.153082271 g
Carbon Intensity for this location: 357 gCO₂eq/kWh
SCI: 0.932910 gCO₂eq / pipeline run emitted

Copy link

github-actions bot commented Dec 1, 2024

Eco-CI Output:

Label 🖥 avg. CPU utilization [%] 🔋 Total Energy [Joules] 🔌 avg. Power [Watts] Duration [Seconds]
Total Run (incl. overhead) 27.3226 2127.11 4.05 524.87
Measurement #1 27.2867 2127.11 4.07 523.16

🌳 CO2 Data:
City: Boydton, Lat: 36.6676, Lon: -78.3875
IP: 20.57.44.193
CO₂ from energy is: 0.882750650 g
CO₂ from manufacturing (embodied carbon) is: 0.149752658 g
Carbon Intensity for this location: 415 gCO₂eq/kWh
SCI: 1.032503 gCO₂eq / pipeline run emitted

* main:
  Tcp dump (#919)
  Hash must be decoded to understand spaces [skip ci]
  Allowing Deeplinks to specific phases [skip ci] (#1016)
  Bump python from 3.13.0-slim-bookworm to 3.13.1-slim-bookworm in /docker (#1015)
  Bump pydantic from 2.10.2 to 2.10.3 (#1010)
  Bump fastapi[standard] from 0.115.5 to 0.115.6 (#1011)
  Bump aiohttp from 3.11.9 to 3.11.10 (#1013)
  Bump redis from 5.2.0 to 5.2.1 (#1014)
  Bump python from 3.12.7-slim-bookworm to 3.13.0-slim-bookworm in /docker (#949)
  Bump hiredis from 3.0.0 to 3.1.0 (#1012)
  Bump pylint from 3.3.1 to 3.3.2 (#1008)
  Bump pytest from 8.3.3 to 8.3.4 (#1007)
  Bump aiohttp from 3.11.7 to 3.11.9 (#1009)
  Added kill script for GMT
  Adding cachetools as requirement
  EE Update
* main:
  Adding not implemented error
  Kill script for GMT must use full commandline when killing reporters
  +x for tcpdump
@ArneTR
Copy link
Member Author

ArneTR commented Dec 11, 2024

@greptileai

2 similar comments
@ArneTR
Copy link
Member Author

ArneTR commented Dec 11, 2024

@greptileai

@ArneTR
Copy link
Member Author

ArneTR commented Dec 13, 2024

@greptileai

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

This PR introduces system-level metric providers for CPU, memory, disk, and network monitoring using cgroups, with a focus on addressing CPU utilization reporting discrepancies between cgroup and procfs measurements.

  • Identified ~6% discrepancy in CPU utilization between cgroup and procfs when running multiple containers, but <1% with single containers
  • Added new cgroup path detection patterns in detect_cgroup_path.c for Window Managers and Session applications
  • Removed virtualization-related CPU times (steal_time, guest_time) from calculations in source.c files
  • Potential memory management issues in several providers' source.c files, including memory leaks and unsafe realloc usage
  • Incorrect super() calls in provider.py files could cause initialization problems in system metric providers

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

23 file(s) reviewed, 21 comment(s)
Edit PR Review Bot Settings | Greptile

@@ -179,7 +181,7 @@ def build_and_store_phase_stats(run_id, sci=None):
if phase['name'] == '[RUNTIME]' and machine_carbon_in_ug is not None and sci is not None and sci.get('R', 0) != 0:
csv_buffer.write(generate_csv_line(run_id, 'software_carbon_intensity_global', '[SYSTEM]', f"{idx:03}_{phase['name']}", (machine_carbon_in_ug + embodied_carbon_share_ug + network_io_carbon_in_ug) / sci['R'], 'TOTAL', None, None, f"ugCO2e/{sci['R_d']}"))

if machine_power_baseline and cpu_utilization_machine and cpu_utilization_containers:
if machine_power_phase and machine_power_baseline and cpu_utilization_machine and cpu_utilization_containers:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The condition should also check if machine_energy_phase is not None before calculations since it's used in line 186

Comment on lines +189 to +190
if found_cgroups := len(lines) != 1:
raise RuntimeError(f"Could not find GMT\'s own cgroup or found too many. Amount: {found_cgroups}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Logic error in condition. found_cgroups := len(lines) != 1 assigns a boolean, but error message refers to amount. Should be found_cgroups = len(lines) followed by if found_cgroups != 1:

Comment on lines +187 to +188
with open(f"/proc/{current_pid}/cgroup", 'r', encoding='utf-8') as file:
lines = file.readlines()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: No error handling for file not found or permission denied when accessing /proc/{pid}/cgroup

Comment on lines +190 to +191
raise RuntimeError(f"Could not find GMT\'s own cgroup or found too many. Amount: {found_cgroups}")
return lines[0].split('/')[-1].strip()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Assumes cgroup path format is consistent. Should handle different cgroup hierarchy formats and empty lines

@@ -0,0 +1,4 @@
CFLAGS = -O3 -Wall -I../../../../../lib/c
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: consider adding -Werror to catch potential issues at compile time


class MemoryUsedCgroupSystemProvider(MemoryUsedCgroupContainerProvider):
def __init__(self, resolution, skip_check=False, cgroups: dict = None):
super(MemoryUsedCgroupContainerProvider, self).__init__( # this will call BaseMetricProvider
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: super() call uses wrong class name - should be super().init() or super(MemoryUsedCgroupSystemProvider, self).init()

@@ -0,0 +1,6 @@
CFLAGS = -O3 -Wall -lc -I../../../../../lib/c
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: -lc flag is unnecessary since libc is linked by default in gcc

Comment on lines +5 to +6
sudo chown root $@
sudo chmod u+s $@
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: sudo commands in Makefile may fail in some CI environments - consider making these optional or documenting requirements

@@ -0,0 +1,3 @@
# Documentation

Please see https://docs.green-coding.io/docs/measuring/metric-providers/network-io-cgroup-container/ for details
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: URL points to network-io-cgroup-container instead of network-io-cgroup-system documentation

Comment on lines +19 to +23
def start_profiling(self, containers=None):
super().start_profiling(self._cgroups) # we hook here into the mechanism that can supply container names to the parent function

def read_metrics(self, run_id, containers=None):
return super().read_metrics(run_id, self._cgroups) # this will call NetworkIoCgroupContainerProvider
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: check if containers parameter is actually needed since it's always overridden with self._cgroups

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant