Return metrics #506

Jeadie · 2023-06-13T00:13:34Z

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

Feature

What is the current behavior? (You can also link to an open issue here)
no query parameter telemetry=True/False

What is the new behavior (if this is a feature change)?
For each API endpoint, if:

The query parameter telemetry=True is provided2.
The Response payload is of Dict structure

We will return timing metrics for the following (when applicable)

Indexing time/search time (OpenSearch-side)
Roundtrip OpenSearch time
Vectorise time
Image download time

These timings are returned under the key telemetry. For example, for POST add documents,

{
    "errors": false,
    "processingTimeMs": 3221.0470419959165,
    "index_name": "image-chunk-test",
    "items": [
        {
            "_id": "ffb0e477-c47b-4d50-8140-18464125da11",
            "result": "created",
            "status": 201
        },
        {
            "_id": "7d1cf90e-e85c-4892-8a73-99128e0c6ea6",
            "result": "created",
            "status": 201
        }
    ],
    "telemetry": {
        "timesMs": {
            "image_download.ai-nc.com/images/pages/heat-map.png": [
                409.49116594856605,
                953.6753330030479
            ],
            "image_download.thread_time": [
                485.5935419909656,
                1026.966415985953
            ],
            "image_download.full_time": 1028.2984169898555,
            "add_documents.create_vectors": [
                86.7510829702951,
                250.8142919978127,
                27.595290972385556,
                163.21662499103695
            ],
            "add_documents.preprocess": 2777.188499982003,
            "add_documents.opensearch._bulk": 65.44245796976611,
            "add_documents.opensearch._bulk.internal": 45.0,
            "add_documents.postprocess": 339.04774999246,
            "POST /indexes/image-chunk-test/documents": 3228.921584028285
        }
    }
}

Search

{
    "hits": [
        {
            "Title": "The Travels of Marco Polo",
            "Image": "https://www.ai-nc.com/images/pages/heat-map.png",
            "_id": "f8541882-1af6-49fe-bd06-6d73f5a8471a",
            "_highlights": {
                "Image": [
                    0.0,
                    0.0,
                    2336.0,
                    1420.0
                ]
            },
            "_score": 0.9221934
        },
        .... # More hits
    ],
    "query": "https://www.ai-nc.com/images/pages/heat-map.png",
    "limit": 10,
    "offset": 0,
    "processingTimeMs": 4011,
    "telemetry": {
        "timesMs": {
            "image_download.https://www.ai-nc.com/images/pages/heat-map.png": 633.6382500012405,
            "vector_inference": 3939.2102920101024,
            "search.vector_inference_full_pipeline": 3939.5244999905117,
            "search.vector.preprocess": 3939.6441249991767,
            "search.opensearch._msearch": 71.01766596315429,
            "search.opensearch._msearch.internal": 32.0,
            "search.vector.postprocess": 0.06112497067078948,
            "POST /indexes/image-chunk-test/search": 4011.3002080470324
        }
    }
}

Similarily, bulk search

    ..., 
    "telemetry": {
        "timesMs": {
            "bulk_search.vector_inference_full_pipeline": 23.14070804277435,
            "bulk_search.vector.preprocess": 23.191249987576157,
            "search.opensearch._msearch": 321.9512079958804,
            "search.opensearch._msearch.internal": 218.0,
            "bulk_search.vector.postprocess": 2.3989580222405493,
            "bulk_search.rerank": 0.00662502134218812,
            "POST /indexes/bulk/search": 349.0034170099534
        }
    }

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No, users must opt in to setting the new flag. Currently unavailable in py-marqo.

Have unit tests been run against this PR? (Has there also been any additional testing?)
Yep, and new ones added

Ran py-marqo, marqo-api-tests to ensure no regression
Ran py-marqo with hardcoded telemetry=True query parameter. Failed tests are all due to explicitly not expecting telemetry key in response.

Related Python client changes (link commit/PR here)
WIP

Related documentation changes (link commit/PR here)
WIP

Other information:

Please check if the PR fulfills these requirements

The commit message follows our guidelines
Tests for the changes have been added (for bug fixes/features)
Docs have been added / updated (for bug fixes / features)

pandu-k · 2023-06-20T05:29:20Z

Hi @Jeadie . Got some questions regarding search!

vector_inference. Is the time for a single vectorise call? What if we have to more than 1 vectorise call? For example multi query search? Does it become a list of floats?
search.create_vectors. Can we make this key self describing? Perhaps vector_inference_full_pipeline
search.vector.preprocess why is this higher than create_vectors? The name implies it captures processing operations before the vectorise step (and so should be a few ms at most). What does it represent?
For bulk search, is the telemetry object similar to single search (just aggregated across each search)?

Otherwise this is looking better!

Jeadie · 2023-06-20T05:37:44Z

I removed vector_inference I don't think it was adding much value.
I've made them search.vector_inference_full_pipeline and bulk_search.vector_inference_full_pipeline respectively
the *.preprocess tag represent the work done before the opensearch call (ignore some validation beforehand). So for vector search (unlike lexical for example), it contains vectorise.
Bulk search is similar. Will add to doc.

pandu-k · 2023-06-20T05:44:41Z

Thanks @Jeadie

For search.vector.preprocess. Can we also make this more self-explainable. For example: search.vector.processing_before_opensearch

Jeadie · 2023-06-20T05:45:54Z

Thanks @Jeadie

For search.vector.preprocess. Can we also make this more self-explainable. For example: search.vector.processing_before_opensearch

Do we want this for bulk_search (when lexical) and add_documents too? Currently preprocess means the same thing across endpoints (i.e. before opensearch)

pandu-k · 2023-06-20T05:47:23Z

Thanks @Jeadie
For search.vector.preprocess. Can we also make this more self-explainable. For example: search.vector.processing_before_opensearch

Do we want this for bulk_search (when lexical) and add_documents too? Currently preprocess means the same thing across endpoints (i.e. before opensearch)

yep, this can apply to other endpoints

Jeadie · 2023-06-20T06:38:21Z

Done

farshidz

In addition to inline comments, I think we will need a naming convention for metrics. It could be as simple as module_name.sth_unique_within_the_module

src/marqo/tensor_search/telemetry.py

Jeadie added 3 commits June 6, 2023 16:44

basic Request level metrics/telemetry

3229101

add unit tests for telemetry

ef8392b

add request metrics to add/search APIs

8120e53

Jeadie had a problem deploying to marqo-test-suite June 13, 2023 00:15 — with GitHub Actions Failure

python 3.8 fixes

da9f4a6

Jeadie had a problem deploying to marqo-test-suite June 13, 2023 01:23 — with GitHub Actions Failure

Jeadie and others added 2 commits June 13, 2023 13:27

simplify threaded metrics in download_images

80e437b

Merge branch 'mainline' into return-metrics

e65bf40

Jeadie had a problem deploying to marqo-test-suite June 13, 2023 03:35 — with GitHub Actions Failure

Jeadie added 3 commits June 13, 2023 15:05

fix bug

95ef839

simplify tests

8ce72e5

fix None in tests

c3ced91

Jeadie had a problem deploying to marqo-test-suite June 13, 2023 06:37 — with GitHub Actions Failure

make tests extra clean

20f05cd

Jeadie had a problem deploying to marqo-test-suite June 13, 2023 23:47 — with GitHub Actions Failure

make metric names more user friendly

f11504a

Jeadie temporarily deployed to marqo-test-suite June 15, 2023 05:55 — with GitHub Actions Inactive

add image download for search. Other improvements

4a9714f

Jeadie had a problem deploying to marqo-test-suite June 19, 2023 00:43 — with GitHub Actions Failure

fix unit tests for clip_utils

402e174

Jeadie temporarily deployed to marqo-test-suite June 19, 2023 02:42 — with GitHub Actions Inactive

remove 'vector_inference'

bb7e902

search.create_vectors -> search.vector_inference_full_pipeline

069e8e9

preprocess -> processing_before_opensearch

a8254f1

Jeadie requested a review from pandu-k June 21, 2023 03:35

farshidz reviewed Jun 21, 2023

View reviewed changes

better naming for RequestMetric(s)

4e072fa

farshidz approved these changes Jun 22, 2023

View reviewed changes

Jeadie merged commit 2c28905 into mainline Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return metrics #506

Return metrics #506

Jeadie commented Jun 13, 2023 •

edited

Loading

pandu-k commented Jun 20, 2023

Jeadie commented Jun 20, 2023

pandu-k commented Jun 20, 2023

Jeadie commented Jun 20, 2023

pandu-k commented Jun 20, 2023

Jeadie commented Jun 20, 2023

farshidz left a comment

Return metrics #506

Return metrics #506

Conversation

Jeadie commented Jun 13, 2023 • edited Loading

pandu-k commented Jun 20, 2023

Jeadie commented Jun 20, 2023

pandu-k commented Jun 20, 2023

Jeadie commented Jun 20, 2023

pandu-k commented Jun 20, 2023

Jeadie commented Jun 20, 2023

farshidz left a comment

Choose a reason for hiding this comment

Jeadie commented Jun 13, 2023 •

edited

Loading