Metric importer refactor #1035

ArneTR · 2024-12-27T22:41:31Z

Rework metric provider processing mechanism and removed storing to DB / run_id

run_id is now added in segmented helper metric_importer. Run ID is a concept the metric providers should not be aware of and
is thus handled now externally.
Furthermore the processing mechanism in the providers does now not allow overloading of externally callable functions
in read_metrics(). New private sub functions are created that go through different steps of reading, parsing, validation and adding data.

Greptile Summary

Major refactoring of the metric provider processing mechanism to improve separation of concerns by centralizing database operations and removing run_id handling from individual providers.

Added new lib/metric_importer.py to centralize database operations and run_id handling previously scattered across providers
Added resolution_avg, resolution_max, and resolution_95p columns to measurements table for better timing statistics
Introduced private methods _read_metrics(), _parse_metrics(), _add_unit_and_metric() in base provider class for better encapsulation
Made read_metrics() final in base provider class to prevent overriding and ensure consistent processing flow
Removed direct database operations from individual metric providers, improving separation of concerns

…nner

… / run_id run_id is now added in segmented helper metric_importer. Run ID is a concept the metric providers should not be aware of and is thus handled now externally. Furthermore the processing mechanism in the providers does now not allow overloading of externally callable functions in read_metrics(). New private sub functions are created that go through different steps of reading, parsing, validation and adding data.

greptile-apps

_{41 file(s) reviewed, 34 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

metric_providers/base.py

lib/metric_importer.py

tests/metric_providers/test_metric_provider_functions.py

tests/lib/test_phase_stats.py

tests/metric_providers/test_metric_providers.py

tests/test_functions.py

* main: Removed the email column from the runs table (change): Changed default resolution of powermetrics to a non-underflow resolution Bump uvicorn-worker from 0.2.0 to 0.3.0 (#1034) (fix): lmsensors was not reacting to skip_check (Tests): pytest running as default with -vv

…e warnings

…bout if provider can or cannot return empty rows

greptile-apps

_{42 file(s) reviewed, 13 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

lib/phase_stats.py

greptile-apps · 2024-12-28T12:32:49Z

lib/metric_importer.py

+    for _, row in detail_names.iterrows():
+        measurement_metric_id = DB().fetch_one('''
+            INSERT INTO measurement_metrics (run_id, metric, detail_name, unit)
+            VALUES (%s, %s, %s, %s)
+            RETURNING id
+        ''', params=(run_id, metric_name, row['detail_name'], row['unit']))[0]
+        df.loc[(df['detail_name'] == row['detail_name']) & (df['unit'] == row['unit']), 'measurement_metric_id'] = measurement_metric_id


style: Consider batching the INSERT statements for better performance with many detail_names

@greptileai: but how do I do that, when I need to return the ID for every one?

You can use a VALUES clause with multiple rows and still get the returned IDs using unnest. Here's how:

INSERT INTO measurement_metrics (run_id, metric, detail_name, unit) VALUES %s RETURNING id, detail_name, unit

Then construct the values tuple list in Python and use psycopg's execute_values(). This will return all IDs matched with their detail_name and unit, which you can then use to update the dataframe in one go.

Would you like me to show you the specific code changes needed to implement this?

metric_providers/memory/energy/rapl/msr/component/provider.py

metric_providers/network/connections/tcpdump/system/provider.py

tests/lib/test_metric_importer.py

tests/metric_providers/test_metric_provider_functions.py

tests/metric_providers/test_metric_providers.py

* main: (improvement): Added ICMP, Ehterframe, QUIC and TCP-Streams to TCPDump [skip ci] (#1036)

greptile-apps

_{44 file(s) reviewed, 12 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps · 2024-12-28T13:13:23Z

docker/structure.sql

+    resolution_avg DOUBLE PRECISION NOT NULL,
+    resolution_max DOUBLE PRECISION NOT NULL,
+    resolution_95p DOUBLE PRECISION NOT NULL,


style: Consider adding indexes on resolution columns if they will be frequently queried for analysis or filtering

greptile-apps · 2024-12-28T13:13:26Z

lib/metric_importer.py

+    f = StringIO(df[['measurement_metric_id', 'value', 'time']]
+        .to_csv(index=False, header=False))
+    DB().copy_from(file=f, table='measurement_values', columns=['measurement_metric_id', 'value', 'time'], sep=',')


style: Consider using a context manager (with statement) for the StringIO object to ensure proper cleanup

greptile-apps · 2024-12-28T13:15:13Z

metric_providers/network/io/procfs/system/provider.py

        df['value'] = df['received_bytes_intervals'] + df['transmitted_bytes_intervals']
        df['value'] = df.value.astype(int)


style: consider using df.loc[] for assignment to avoid SettingWithCopyWarning in pandas

greptile-apps · 2024-12-28T13:15:48Z

metric_providers/psu/energy/dc/rapl/msr/machine/provider.py

        df['detail_name'] = df.psys_id
        df = df.drop('psys_id', axis=1)


style: Consider adding error handling in case psys_id column is missing from the dataframe

greptile-apps · 2024-12-28T13:15:50Z

metric_providers/psu/energy/ac/sdia/machine/provider.py

+        if not self.cpu_chips:
+            raise MetricProviderConfigurationError(
+                'Please set the CPUChips config option for PsuEnergyAcSdiaMachineProvider in the config.yml')
+        if not self.tdp:
+            raise MetricProviderConfigurationError('Please set the TDP config option for PsuEnergyAcSdiaMachineProvider in the config.yml')


style: Consider moving configuration validation to check_system() to maintain consistency with other validation logic there

greptile-apps · 2024-12-28T13:19:17Z

tests/lib/test_phase_stats.py

+    assert data[1]['metric'] == 'psu_energy_ac_mcp_machine'
+    assert data[1]['detail_name'] == '[machine]'
+    assert data[1]['unit'] == 'mJ'
+    assert data[1]['value'] == 13175452


style: hardcoded value 13175452 should be defined as a constant or pulled from test data for better maintainability

greptile-apps · 2024-12-28T13:19:18Z

tests/lib/test_phase_stats.py

+    run_id = Tests.insert_run()
+    Tests.import_machine_energy(run_id)
+
+    sci = {"I":436,"R":0,"EL":4,"RS":1,"TE":181000,"R_d":"page request"}


style: consider moving sci configuration to a test constants file for reuse across tests

greptile-apps · 2024-12-28T13:19:41Z

tests/metric_providers/test_metric_provider_functions.py

+def test_powermetrics():
+    obj = PowermetricsProvider(499, skip_check=True)
+    obj._filename = os.path.join(GMT_ROOT_DIR, './tests/data/metrics/powermetrics.log')
+
+    df = obj.read_metrics()
+
+    assert list(df.metric.unique()) == ['cpu_time_powermetrics_vm', 'disk_io_bytesread_powermetrics_vm', 'disk_io_byteswritten_powermetrics_vm', 'energy_impact_powermetrics_vm', 'cores_energy_powermetrics_component', 'gpu_energy_powermetrics_component', 'ane_energy_powermetrics_component']
+
+    assert math.isclose(df[df.metric == 'energy_impact_powermetrics_vm'].value.mean(), 430.823529, rel_tol=1e-5)


logic: test_powermetrics() should verify all expected metrics are present and have valid values, not just energy_impact_powermetrics_vm

greptile-apps · 2024-12-28T13:20:34Z

tests/test_functions.py

+
+def import_cpu_utilization(run_id):
+
+    obj = CpuUtilizationCgroupContainerProvider(99, skip_check=True)


style: The hardcoded resolution value of 99 should be defined as a constant at the top of the file with other test constants for better maintainability

greptile-apps · 2024-12-28T13:20:35Z

tests/test_functions.py

+    phases = [
+        {"start": TEST_MEASUREMENT_START_TIME-8, "name": "[BASELINE]", "end": TEST_MEASUREMENT_START_TIME-7},
+        {"start": TEST_MEASUREMENT_START_TIME-6, "name": "[INSTALL]", "end": TEST_MEASUREMENT_START_TIME-5},
+        {"start": TEST_MEASUREMENT_START_TIME-4, "name": "[BOOT]", "end": TEST_MEASUREMENT_START_TIME-3},
+        {"start": TEST_MEASUREMENT_START_TIME-2, "name": "[IDLE]", "end": TEST_MEASUREMENT_START_TIME-1},
+        {"start": TEST_MEASUREMENT_START_TIME, "name": "[RUNTIME]", "end": TEST_MEASUREMENT_END_TIME},
+        {"start": TEST_MEASUREMENT_END_TIME+1, "name": "[REMOVE]", "end": TEST_MEASUREMENT_END_TIME+2},
+    ]


logic: Phase timestamps use simple arithmetic (-8, -7 etc) which could cause edge cases if TEST_MEASUREMENT_START_TIME is too small. Consider adding validation

ArneTR added 6 commits December 26, 2024 12:55

Added resolution AVG to the measurements table

09c4375

(feature): Added modular importer for metrics instead of inline in ru…

8ca1e73

…nner

Added more measurement test data

38352b2

(tests): For phase stats and metrics importer

c048ba1

Added tests

e502173

greptile-apps bot reviewed Dec 27, 2024

View reviewed changes

ArneTR added 9 commits December 28, 2024 10:54

(improvement): Added all powermetrics metrics to phase_stats to remov…

d045544

…e warnings

(improvement): Close StringIO buffer to prevent leaks if long runtime

c83f2a3

(consistency):containers is always required for function

93dd25a

(improvement): _check_empty introduced to have no magic assumptions a…

3f4bfc4

…bout if provider can or cannot return empty rows

(style): Typos and indents

43c7551

(fix): Tests where missing assert

4a4b09f

(improvement): Reading twice reduced to one in test helper

db14522

(Tests): Fix Tests

8651deb

greptile-apps bot reviewed Dec 28, 2024

View reviewed changes

ArneTR added 5 commits December 28, 2024 13:39

Merge branch 'main' into metric-importer-refactor

e92dfce

* main: (improvement): Added ICMP, Ehterframe, QUIC and TCP-Streams to TCPDump [skip ci] (#1036)

(fix): Duplicate name in phase_stats without functionality

91e7310

(style): Namings and typos

01d8835

Changed powermetrics file to non time underflow

37780b0

(fix): Removed email column from tests also

3f0a51d

ArneTR merged commit 6cef797 into main Dec 28, 2024
2 of 3 checks passed

ArneTR deleted the metric-importer-refactor branch December 28, 2024 13:13

greptile-apps bot reviewed Dec 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metric importer refactor #1035

Metric importer refactor #1035

ArneTR commented Dec 27, 2024 •

edited by greptile-apps bot

Loading

greptile-apps bot left a comment

greptile-apps bot left a comment

greptile-apps bot Dec 28, 2024

ArneTR Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot left a comment

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

greptile-apps bot Dec 28, 2024

		df['value'] = df['received_bytes_intervals'] + df['transmitted_bytes_intervals']
		df['value'] = df.value.astype(int)

		df['detail_name'] = df.psys_id
		df = df.drop('psys_id', axis=1)


		def import_cpu_utilization(run_id):

		obj = CpuUtilizationCgroupContainerProvider(99, skip_check=True)

Metric importer refactor #1035

Metric importer refactor #1035

Conversation

ArneTR commented Dec 27, 2024 • edited by greptile-apps bot Loading

Greptile Summary

greptile-apps bot left a comment

Choose a reason for hiding this comment

greptile-apps bot left a comment

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

ArneTR Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot left a comment

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

greptile-apps bot Dec 28, 2024

Choose a reason for hiding this comment

ArneTR commented Dec 27, 2024 •

edited by greptile-apps bot

Loading