Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return metrics #506

Merged
merged 17 commits into from
Jun 22, 2023
Merged

Return metrics #506

merged 17 commits into from
Jun 22, 2023

Conversation

Jeadie
Copy link
Contributor

@Jeadie Jeadie commented Jun 13, 2023

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

  • Feature

What is the current behavior? (You can also link to an open issue here)
no query parameter telemetry=True/False

What is the new behavior (if this is a feature change)?
For each API endpoint, if:

  1. The query parameter telemetry=True is provided2.
  2. The Response payload is of Dict structure

We will return timing metrics for the following (when applicable)

  • Indexing time/search time (OpenSearch-side)
  • Roundtrip OpenSearch time
  • Vectorise time
  • Image download time

These timings are returned under the key telemetry. For example, for POST add documents,

{
    "errors": false,
    "processingTimeMs": 3221.0470419959165,
    "index_name": "image-chunk-test",
    "items": [
        {
            "_id": "ffb0e477-c47b-4d50-8140-18464125da11",
            "result": "created",
            "status": 201
        },
        {
            "_id": "7d1cf90e-e85c-4892-8a73-99128e0c6ea6",
            "result": "created",
            "status": 201
        }
    ],
    "telemetry": {
        "timesMs": {
            "image_download.ai-nc.com/images/pages/heat-map.png": [
                409.49116594856605,
                953.6753330030479
            ],
            "image_download.thread_time": [
                485.5935419909656,
                1026.966415985953
            ],
            "image_download.full_time": 1028.2984169898555,
            "add_documents.create_vectors": [
                86.7510829702951,
                250.8142919978127,
                27.595290972385556,
                163.21662499103695
            ],
            "add_documents.preprocess": 2777.188499982003,
            "add_documents.opensearch._bulk": 65.44245796976611,
            "add_documents.opensearch._bulk.internal": 45.0,
            "add_documents.postprocess": 339.04774999246,
            "POST /indexes/image-chunk-test/documents": 3228.921584028285
        }
    }
}

Search

{
    "hits": [
        {
            "Title": "The Travels of Marco Polo",
            "Image": "https://www.ai-nc.com/images/pages/heat-map.png",
            "_id": "f8541882-1af6-49fe-bd06-6d73f5a8471a",
            "_highlights": {
                "Image": [
                    0.0,
                    0.0,
                    2336.0,
                    1420.0
                ]
            },
            "_score": 0.9221934
        },
        .... # More hits
    ],
    "query": "https://www.ai-nc.com/images/pages/heat-map.png",
    "limit": 10,
    "offset": 0,
    "processingTimeMs": 4011,
    "telemetry": {
        "timesMs": {
            "image_download.https://www.ai-nc.com/images/pages/heat-map.png": 633.6382500012405,
            "vector_inference": 3939.2102920101024,
            "search.vector_inference_full_pipeline": 3939.5244999905117,
            "search.vector.preprocess": 3939.6441249991767,
            "search.opensearch._msearch": 71.01766596315429,
            "search.opensearch._msearch.internal": 32.0,
            "search.vector.postprocess": 0.06112497067078948,
            "POST /indexes/image-chunk-test/search": 4011.3002080470324
        }
    }
}

Similarily, bulk search

    ..., 
    "telemetry": {
        "timesMs": {
            "bulk_search.vector_inference_full_pipeline": 23.14070804277435,
            "bulk_search.vector.preprocess": 23.191249987576157,
            "search.opensearch._msearch": 321.9512079958804,
            "search.opensearch._msearch.internal": 218.0,
            "bulk_search.vector.postprocess": 2.3989580222405493,
            "bulk_search.rerank": 0.00662502134218812,
            "POST /indexes/bulk/search": 349.0034170099534
        }
    }

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

  • No, users must opt in to setting the new flag. Currently unavailable in py-marqo.

Have unit tests been run against this PR? (Has there also been any additional testing?)
Yep, and new ones added

  • Ran py-marqo, marqo-api-tests to ensure no regression
  • Ran py-marqo with hardcoded telemetry=True query parameter. Failed tests are all due to explicitly not expecting telemetry key in response.

Related Python client changes (link commit/PR here)
WIP

Related documentation changes (link commit/PR here)
WIP

Other information:

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added / updated (for bug fixes / features)

@Jeadie Jeadie temporarily deployed to marqo-test-suite June 15, 2023 05:55 — with GitHub Actions Inactive
@Jeadie Jeadie temporarily deployed to marqo-test-suite June 19, 2023 02:42 — with GitHub Actions Inactive
@pandu-k
Copy link
Collaborator

pandu-k commented Jun 20, 2023

Hi @Jeadie . Got some questions regarding search!

  1. vector_inference. Is the time for a single vectorise call? What if we have to more than 1 vectorise call? For example multi query search? Does it become a list of floats?
  2. search.create_vectors. Can we make this key self describing? Perhaps vector_inference_full_pipeline
  3. search.vector.preprocess why is this higher than create_vectors? The name implies it captures processing operations before the vectorise step (and so should be a few ms at most). What does it represent?
  4. For bulk search, is the telemetry object similar to single search (just aggregated across each search)?

Otherwise this is looking better!

@Jeadie
Copy link
Contributor Author

Jeadie commented Jun 20, 2023

  1. I removed vector_inference I don't think it was adding much value.
  2. I've made them search.vector_inference_full_pipeline and bulk_search.vector_inference_full_pipeline respectively
  3. the *.preprocess tag represent the work done before the opensearch call (ignore some validation beforehand). So for vector search (unlike lexical for example), it contains vectorise.
  4. Bulk search is similar. Will add to doc.

@pandu-k
Copy link
Collaborator

pandu-k commented Jun 20, 2023

Thanks @Jeadie

For search.vector.preprocess. Can we also make this more self-explainable. For example: search.vector.processing_before_opensearch

@Jeadie
Copy link
Contributor Author

Jeadie commented Jun 20, 2023

Thanks @Jeadie

For search.vector.preprocess. Can we also make this more self-explainable. For example: search.vector.processing_before_opensearch

Do we want this for bulk_search (when lexical) and add_documents too? Currently preprocess means the same thing across endpoints (i.e. before opensearch)

@pandu-k
Copy link
Collaborator

pandu-k commented Jun 20, 2023

Thanks @Jeadie
For search.vector.preprocess. Can we also make this more self-explainable. For example: search.vector.processing_before_opensearch

Do we want this for bulk_search (when lexical) and add_documents too? Currently preprocess means the same thing across endpoints (i.e. before opensearch)

yep, this can apply to other endpoints

@Jeadie
Copy link
Contributor Author

Jeadie commented Jun 20, 2023

Done

@Jeadie Jeadie requested a review from pandu-k June 21, 2023 03:35
Copy link
Collaborator

@farshidz farshidz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to inline comments, I think we will need a naming convention for metrics. It could be as simple as module_name.sth_unique_within_the_module

src/marqo/tensor_search/telemetry.py Show resolved Hide resolved
src/marqo/tensor_search/telemetry.py Show resolved Hide resolved
src/marqo/tensor_search/telemetry.py Outdated Show resolved Hide resolved
src/marqo/tensor_search/telemetry.py Outdated Show resolved Hide resolved
src/marqo/tensor_search/telemetry.py Outdated Show resolved Hide resolved
src/marqo/tensor_search/telemetry.py Show resolved Hide resolved
@Jeadie Jeadie merged commit 2c28905 into mainline Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants