Exclude msmarco from IT tests #708

gareth-ellis · 2024-11-28T16:46:22Z

Whilst checking the output from another PR, i spotted that msmarco-v2-vector takes around 50 minutes to run even in test-mode.
Checking track.py there's a very suspicious loop where we read 12000 lines from a file, which we decompress on the fly. This is run many times during the track, as we load it for each operation. I ran on a very large instance I had running and it took 25 minutes to run, the machines we're using for CI are obviously a lot slower...

It would be good if we could get the test-mode config down to the paramsource etc, and then consider doing something different for this track (or others), but it doesnt seem thats too simple right now. I'll create a ticket in rally to address, but in the meantime I think this track should be excluded from system tests.

It would be nice if we could add a comment we could make that would run a specific test (e.g when making changes to some tracks that are excluded).

ebadyano

lgtm

github-actions · 2024-11-28T20:10:54Z

💔 All backports failed

Status	Branch	Result
❌	8.15	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

backport --pr 708

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

gbanasiak · 2024-11-28T20:15:17Z

Perhaps we need to backport #610 first ?

gareth-ellis · 2024-11-28T20:17:30Z

ah ha, yes - i hadnt checked, just assumed it was already in 8.15

* Add recall and NDCG operations in msmarco-v2-vector (#610) This change adds an operation called knn-recall that computes the following metrics: * Recall * NDCG * Avg number of nodes visited during search Given the size of the corpus, the true top N values used for recall operations have been approximated offline for each query as follows: ``` { "knn": { "field": "emb", "query_vector": query['emb'], "k": 10000, "num_candidates": 10000 }, "rescore": { "window_size": 10000, "query": { "query_weight": 0, "rescore_query": { "script_score": { "query": { "match_all": {} }, "script": { "source": "double value = dotProduct(params.query_vector, 'emb'); return sigmoid(1, Math.E, -value);", "params": { "query_vector": vec } } } } } } } ``` This means that the computed recall is measured against the system's best possible approximate neighbor run rather than the actual top N. For the relevance metrics, the `qrels.tsv` file contains annotations for all the queries listed in `queries.json`. This file is generated from the original training data available at [ir_datasets/msmarco_passage_v2](https://ir-datasets.com/msmarco-passage-v2.html#msmarco-passage-v2/train). (cherry picked from commit b6f3535) * Exclude msmarco from IT tests (#708) --------- Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>

Exclude msmarco from IT tests

e8aca3c

gareth-ellis added the backport-to-8.15 Automatically backport to 8.15 branch label Nov 28, 2024

gareth-ellis requested a review from a team November 28, 2024 16:46

gareth-ellis mentioned this pull request Nov 28, 2024

Allow access to test-mode in runners and ParamSources elastic/rally#1895

Open

ebadyano approved these changes Nov 28, 2024

View reviewed changes

gareth-ellis merged commit 2df96de into master Nov 28, 2024
26 checks passed

gareth-ellis mentioned this pull request Nov 28, 2024

Add recall and NDCG operations in msmarco-v2-vector #610

Merged

gareth-ellis added a commit to gareth-ellis/rally-tracks that referenced this pull request Dec 6, 2024

Exclude msmarco from IT tests (elastic#708)

bc8962e

gareth-ellis added a commit to gareth-ellis/rally-tracks that referenced this pull request Dec 6, 2024

Exclude msmarco from IT tests (elastic#708)

d730072

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude msmarco from IT tests #708

Exclude msmarco from IT tests #708

gareth-ellis commented Nov 28, 2024

ebadyano left a comment

github-actions bot commented Nov 28, 2024

gbanasiak commented Nov 28, 2024

gareth-ellis commented Nov 28, 2024

Exclude msmarco from IT tests #708

Exclude msmarco from IT tests #708

Conversation

gareth-ellis commented Nov 28, 2024

ebadyano left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 28, 2024

💔 All backports failed

Manual backport

Questions ?

gbanasiak commented Nov 28, 2024

gareth-ellis commented Nov 28, 2024