Publish recall as a kpi metric #581

finnroblin · 2024-07-05T20:27:08Z

Description

Displays recall as part of the final benchmarking stats.
i.e.:

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------
            
|                                                         Metric |                     Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |                          |     16.1933 |    min |
.............................................................................................      .....................................................................
| Mean recall@k |             prod-queries |        0.99 |        |

|                                                  Mean recall@1 |             prod-queries |           1 |        |
|                           50th percentile recall_k percentiles |             prod-queries |           1 |        |
|                           90th percentile recall_k percentiles |             prod-queries |           1 |        |
|                           99th percentile recall_k percentiles |             prod-queries |           1 |        |
|                          100th percentile recall_k percentiles |             prod-queries |           1 |        |

Issues Resolved

(From workloads repo): 282
While this PR is not a general solution to #199, it does address the specific need mentioned in that issue of displaying the recall metadata as a summary.

I have added a kpi_metrics field to the GlobalStats metric collector where additional metrics could be added. Users would then need to add display logic to the results_publisher.py file in order for the metrics to show up in the summary.

Testing

New functionality includes testing

I added some unit tests to assert that recall metrics were collected. I also altered the OsResultsStoreTest to account for the new kpi_metrics field.

I manually tested the train-test workload, which contains a vector search operation, and the no-train-test-index-only workload, which does not contain a vector search operation. Both workloads ran to completion and displayed the expected statistics (recall for train-test, and no recall metrics for no-train-test-index-only).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

finnroblin · 2024-07-05T20:28:00Z

osbenchmark/metrics.py

@@ -13,7 +13,7 @@
 # not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
-#	http://www.apache.org/licenses/LICENSE-2.0
+# 	http://www.apache.org/licenses/LICENSE-2.0


Not sure why my code formatter keeps on changing code outside of the selected lines. I'll revert this and the other spacing changes to out-of-scope code in the next revision.

I would recommend reverting for this PR, just so its more clear what functional changes are being made.

finnroblin · 2024-07-05T20:29:33Z

osbenchmark/metrics.py

+            "operation": operation,
+            "recall@k": recall_at_k_stats,
+            "recall@1":recall_at_1_stats,
+            "recall_time_ms": recall_time_ms_stats,


I spoke with @VijayanB and recall_time_ms is not a stat we want to display. I'll remove it from the kpi_metrics method in the next revision.

Can we remove from this PR?

Sorry not sure what "this" is referring to. Do you mean remove the add_kpi_metrics function (and instead append directly to the kpi_metrics list), remove the recall_time_ms metric, or remove kpi_metrics all together and go a different direction for passing recall from metadata to the metrics store?

sorry meant the recall_time_ms - not sure if this is needed

finnroblin · 2024-07-09T17:52:55Z

osbenchmark/worker_coordinator/worker_coordinator.py

+                            name=recall_metric_name,
+                            value=sample.request_meta_data[recall_metric_name],
+                            unit="",
+                            task=sample.task.name,  # todo change unit to segment count unit...


Will remove todo comment in next revision

jmazanec15 · 2024-07-09T22:43:06Z

tests/metrics_test.py

@@ -1436,97 +1436,105 @@ def test_store_results(self):
                "benchmark-version": "0.4.4",
                "benchmark-revision": "123abc",
                "environment": "unittest",
-                "test-execution-id": OsResultsStoreTests.TEST_EXECUTION_ID,
+                "test-execution-id": "6ebc6e53-ee20-4b0c-99b4-09697987e9f4",


Why modify so much in here?

I think the ordering of the json got scrambled when I added the kpi_metrics variable. I agree though that I shouldn't have changed so much; I'll revert most of the unit test and just add a single extra element in the json array with the kpi_metrics and associated data.

jmazanec15 · 2024-07-09T22:46:47Z

osbenchmark/metrics.py

@@ -13,7 +13,7 @@
 # not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
-#	http://www.apache.org/licenses/LICENSE-2.0
+# 	http://www.apache.org/licenses/LICENSE-2.0


I would recommend reverting for this PR, just so its more clear what functional changes are being made.

jmazanec15 · 2024-07-09T23:00:35Z

osbenchmark/metrics.py

@@ -1966,6 +1986,7 @@ def single_latency(self, task, operation_type, metric_name="latency"):
 class GlobalStats:
    def __init__(self, d=None):
        self.op_metrics = self.v(d, "op_metrics", default=[])
+        self.kpi_metrics = self.v(d, "kpi_metrics", default=[])


What is the scope for kpi metrics? Aside from recall and maybe ndcg, what else would go in here?

I think any performance metric for k-NN is in scope for this. Right now it seems like recall is the most major one but if k-NN adds ndcg we could also track that in OSB. (The process for adding new metrics would be to calculate/add it to the metadata in the search query runner and then pull it from the metadata into this kpi_metrics list). Likewise if we or the anomaly detection plugin want to track false positives rate/specificity.

I mainly wanted to explicitly distinguish between algorithm metrics like recall and op_metrics like latency.

Oh I see. KPI though might include latency right? It seems like a good category here would be correctness_metrics or relevance_metrics

jmazanec15 · 2024-07-09T23:01:06Z

osbenchmark/metrics.py

+            "operation": operation,
+            "recall@k": recall_at_k_stats,
+            "recall@1":recall_at_1_stats,
+            "recall_time_ms": recall_time_ms_stats,


Can we remove from this PR?

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

…bug) Signed-off-by: Finn Roblin <finnrobl@amazon.com>

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

VijayanB · 2024-07-18T18:50:33Z

osbenchmark/metrics.py

@@ -13,7 +13,7 @@
 # not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
-#	http://www.apache.org/licenses/LICENSE-2.0
+#	http://www.apache.org/licenses/LICENSE-2.


can you remove unrelated formatting changes from this PR?

VijayanB · 2024-07-18T18:56:34Z

osbenchmark/results_publisher.py

+
+            keys = record.keys()
+            recall_keys_in_task_dict = "recall@1" in keys and "recall@k" in keys
+            if recall_keys_in_task_dict and record["recall@1"] and record["recall@k"]:


don't check for record['recall@k"], if recall is 0, it will be skipped. just check only for keys

VijayanB · 2024-07-18T18:57:57Z

osbenchmark/results_publisher.py

+        except KeyError:
+            return None


why do we need to catch this? can you return from like 214 if keys doesn't exist?

VijayanB · 2024-07-18T18:59:00Z

osbenchmark/worker_coordinator/worker_coordinator.py

@@ -1369,7 +1413,7 @@ def calculate_task_throughput(self, task, current_samples, bucket_interval_secs)
            self.task_stats[task] = ThroughputCalculator.TaskStats(bucket_interval=bucket_interval_secs,
                                                                   sample_type=first_sample.sample_type,
                                                                   start_time=first_sample.absolute_time - first_sample.time_period)
-        current = self.task_stats[task]
+        current = self.task_stats[task] # TaskStats object


why do we need this comment here and as part of this PR?

VijayanB · 2024-07-18T18:59:27Z

osbenchmark/worker_coordinator/worker_coordinator.py

@@ -1190,6 +1233,7 @@ def __init__(self, start_timestamp, buffer_size=16384):
    def add(self, task, client_id, sample_type, meta_data, absolute_time, request_start, latency, service_time,
            client_processing_time, processing_time, throughput, ops, ops_unit, time_period, percent_completed,
            dependent_timing=None):
+        self.logger.debug("Logging with metadata: [%s]", meta_data)


do we need this?

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

VijayanB · 2024-07-24T20:10:51Z

osbenchmark/worker_coordinator/worker_coordinator.py

+                            sample.request_meta_data,
+                        )
+
+                        self.logger.debug(


do we need this to log?

Thanks for catching this -- It's debug level so it shouldn't show up with normal logging settings. I was using it to figure out how samples worked, so this log has served its purpose and I'll remove it now.

VijayanB

LGTM

VijayanB · 2024-07-24T20:52:05Z

@jmazanec15 Can you take a look at this PR? Thanks

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

jmazanec15

This is awesome! great job @finnroblin !

Publish recall as a kpi metric

397b31b

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

finnroblin requested review from IanHoang, gkamat, beaioun, cgchinmay, rishabh6788 and VijayanB as code owners July 5, 2024 20:27

finnroblin commented Jul 5, 2024

View reviewed changes

finnroblin commented Jul 9, 2024

View reviewed changes

jmazanec15 reviewed Jul 9, 2024

View reviewed changes

finnroblin added 3 commits July 16, 2024 10:45

Addressed Jack feedback

44e37e6

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

Unpack correctness metrics before sending to OpenSearch (fixes Vijay …

e478335

…bug) Signed-off-by: Finn Roblin <finnrobl@amazon.com>

Address Vijay feedback

a1b9af0

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

finnroblin requested a review from jmazanec15 July 18, 2024 18:09

VijayanB reviewed Jul 18, 2024

View reviewed changes

VijayanB mentioned this pull request Jul 18, 2024

Add benchmark support for vector radial search #546

Merged

1 task

finnroblin added 2 commits July 22, 2024 11:59

Address Vijay feedback opensearch-project#2

626953f

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

Revert formatting changes not part of PR

d40921d

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

finnroblin requested a review from VijayanB July 22, 2024 19:26

Fix typo and linting

387276c

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

VijayanB reviewed Jul 24, 2024

View reviewed changes

VijayanB approved these changes Jul 24, 2024

View reviewed changes

Remove debug log

4d38ad4

Signed-off-by: Finn Roblin <finnrobl@amazon.com>

jmazanec15 approved these changes Jul 24, 2024

View reviewed changes

VijayanB merged commit 5886866 into opensearch-project:main Jul 24, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish recall as a kpi metric #581

Publish recall as a kpi metric #581

finnroblin commented Jul 5, 2024

finnroblin Jul 5, 2024

jmazanec15 Jul 9, 2024

finnroblin Jul 5, 2024

jmazanec15 Jul 9, 2024

finnroblin Jul 9, 2024

jmazanec15 Jul 10, 2024

finnroblin Jul 9, 2024

jmazanec15 Jul 9, 2024

finnroblin Jul 10, 2024

jmazanec15 Jul 9, 2024

jmazanec15 Jul 9, 2024

finnroblin Jul 9, 2024

jmazanec15 Jul 10, 2024

jmazanec15 Jul 9, 2024

VijayanB Jul 18, 2024

VijayanB Jul 18, 2024 •

edited

Loading

VijayanB Jul 18, 2024

VijayanB Jul 18, 2024

VijayanB Jul 18, 2024

VijayanB Jul 24, 2024

finnroblin Jul 24, 2024 •

edited

Loading

VijayanB left a comment

VijayanB commented Jul 24, 2024

jmazanec15 left a comment

Publish recall as a kpi metric #581

Publish recall as a kpi metric #581

Conversation

finnroblin commented Jul 5, 2024

Description

Issues Resolved

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VijayanB Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

finnroblin Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

VijayanB left a comment

Choose a reason for hiding this comment

VijayanB commented Jul 24, 2024

jmazanec15 left a comment

Choose a reason for hiding this comment

VijayanB Jul 18, 2024 •

edited

Loading

finnroblin Jul 24, 2024 •

edited

Loading