Refactor `tensor_search::search` to use shared functions from `tensor_search::bulk_search` #469

Jeadie · 2023-05-10T01:08:28Z

Changeset

Bulk search now supports
- Context vectors
- Score modifiers
Standard search is now refactored to use modular functions used in bulk search.

Jeadie · 2023-05-10T01:23:38Z

src/marqo/tensor_search/tensor_search.py

            index_info.get_vector_properties().keys()
        ))
+
+    # Validation for offset (pagination is single field) if offset not provided, validation is not run.


Moving into here so it can be used by both search/ bulk_search

nit: perhaps be moved to a separate validation function (may be out-of-scope for this refactor)

agree, but this is just a move from within the script.

Jeadie · 2023-05-10T01:24:19Z

src/marqo/tensor_search/tensor_search.py



-def construct_msearch_body_elements(vector_properties_to_search: List[str], offset: int, filter_string: str, index_info: IndexInfo, result_count: int, query_vector: List[float], attributes_to_retrieve: List[str], index_name: str, contextualised_filter: str) -> List[Dict[str, Any]]:
+def construct_msearch_body_elements(searchableAttributes: List[str], offset: int, filter_string: str, index_info: IndexInfo, result_count: int, query_vector: List[float], attributes_to_retrieve: List[str], index_name: str, score_modifiers: Optional[Dict[str, Any]] = None) -> List[Dict[str, Any]]:


This change lets:

search use construct_msearch_body_elements

Bulk search to use score modifiers

Jeadie · 2023-05-10T01:24:52Z

src/marqo/tensor_search/tensor_search.py

@@ -1419,14 +1406,17 @@ def bulk_msearch(config: Config, body: List[Dict]) -> List[Dict]:
    except KeyError as e:
        # KeyError indicates we have received a non-successful result
        try:
-            if "index.max_result_window" in response["responses"][0]["error"]["root_cause"][0]["reason"]:


This just makes it cleaner, and adds
elif root_cause_type == 'query_shard_exception' and root_cause_reason.startswith("Failed to parse query") to replicate what is in search.

good to clean this up 👍

Jeadie · 2023-05-10T01:25:11Z

src/marqo/tensor_search/tensor_search.py

-            merged_vector = np.mean(weighted_vectors, axis=0)
+
+            custom_tensors = q.get_context_tensor() 
+            if custom_tensors is not None:


custom tensor support for bulk_search (and search to use this method).

should we call this var context_tensors to be consistent?

Jeadie · 2023-05-10T01:25:39Z

src/marqo/tensor_search/tensor_search.py

@@ -1647,6 +1649,25 @@ def create_empty_query_response(queries: List[BulkSearchQueryEntity]) -> List[Di
        )
    )

+def run_vectorise_pipeline(config: Config, queries: List[BulkSearchQueryEntity], selected_device: Union[Device, str]) -> Dict[Qidx, List[float]]:


This abstracts the query -> vector pipeline so we can use it for (bulk) search

tests/tensor_search/test_lexical_search.py

tests/tensor_search/test_search.py

Jeadie · 2023-05-10T01:29:57Z

https://github.com/marqo-ai/marqo/actions/runs/4932152115

Jeadie · 2023-05-12T02:00:29Z

https://github.com/marqo-ai/marqo/actions/runs/4954372137

src/marqo/tensor_search/models/api_models.py

src/marqo/tensor_search/models/score_modifiers_object.py

pandu-k · 2023-05-15T03:10:31Z

src/marqo/tensor_search/models/score_modifiers_object.py

2 main points:

Make pydantic models immutable in 99.9%, unless there's a good reason not too

Lots of unit tests, including edge cases, for all logic

Including object methods. Even super trivial unit tests help Marqo more stable

src/marqo/tensor_search/tensor_search.py

vicilliar

looks good! just added some questions

src/marqo/tensor_search/tensor_search.py

src/marqo/tensor_search/models/score_modifiers_object.py

src/marqo/tensor_search/tensor_search.py

pandu-k · 2023-05-17T04:35:30Z

src/marqo/tensor_search/tensor_search.py

            index_info.get_vector_properties().keys()
        ))
+
+    # Validation for offset (pagination is single field) if offset not provided, validation is not run.


nit: perhaps be moved to a separate validation function (may be out-of-scope for this refactor)

pandu-k · 2023-05-17T05:00:34Z

src/marqo/tensor_search/tensor_search.py

@@ -1419,14 +1406,17 @@ def bulk_msearch(config: Config, body: List[Dict]) -> List[Dict]:
    except KeyError as e:
        # KeyError indicates we have received a non-successful result
        try:
-            if "index.max_result_window" in response["responses"][0]["error"]["root_cause"][0]["reason"]:


good to clean this up 👍

pandu-k · 2023-05-17T05:10:56Z

src/marqo/tensor_search/tensor_search.py

-            merged_vector = np.mean(weighted_vectors, axis=0)
+
+            custom_tensors = q.get_context_tensor() 
+            if custom_tensors is not None:


should we call this var context_tensors to be consistent?

pandu-k · 2023-05-17T05:12:37Z

src/marqo/tensor_search/tensor_search.py

+                merged_vector = np.mean(weighted_vectors, axis=0)
+            except ValueError as e:
+                raise errors.InvalidArgError(f"The provided vectors are not in the same dimension of the index."
+                                             f"This causes the error when we do `numpy.mean()` over all the vectors.\n"


let's leave out numpy as this is a Marqo implementation detail (just mean is fine).

This is a copy paste. I want to minimise refactors during a code reshuffle.

pandu-k

tests are looking good

pandu-k · 2023-05-19T03:56:32Z

src/marqo/tensor_search/tensor_search.py

-    )
+    vector_jobs_tuple: Tuple[Dict[Qidx, List[VectorisedJobPointer]], Dict[JHash, VectorisedJobs]] = create_vector_jobs(queries, config, selected_device)
+
+    print(vector_jobs_tuple, create_vector_jobs)


rm this print statement

ahhh, I ctrl-F ed for prints in my git diff :(

pandu-k · 2023-05-19T03:57:55Z

tests/tensor_search/test_bulk_search.py

+        self.job_to_vectors: Dict[JHash, Dict[str, List[float]]] = {
+            123: {"a test query":  [0.5, 0.5]},
+            456: {"another": [0.5, 0.6]},
+            789: {"red herring": [0.6, 0.5]}


What about a multimodal combination field?

pandu-k

cheers

Jeadie marked this pull request as draft May 10, 2023 01:08

Jeadie commented May 10, 2023

View reviewed changes

tests/tensor_search/test_lexical_search.py Show resolved Hide resolved

Jeadie commented May 10, 2023

View reviewed changes

tests/tensor_search/test_search.py Show resolved Hide resolved

Jeadie marked this pull request as ready for review May 10, 2023 01:31

Jeadie added 2 commits May 12, 2023 11:28

refactor tensor_search/search

ce262ad

move SearchQuery.context and SearchQuery.scoreModifiers to pydantic

7b914ed

Jeadie force-pushed the jack/simplify-search branch from ea44e22 to 7b914ed Compare May 12, 2023 01:32

fix removed SearchContext object

558d89f

Jeadie had a problem deploying to marqo-test-suite May 12, 2023 02:01 — with GitHub Actions Failure

fix tests

830c66b

Jeadie had a problem deploying to marqo-test-suite May 15, 2023 00:27 — with GitHub Actions Failure

pandu-k requested a review from vicilliar May 15, 2023 03:00

pandu-k reviewed May 15, 2023

View reviewed changes

vicilliar reviewed May 15, 2023

View reviewed changes

src/marqo/tensor_search/tensor_search.py Show resolved Hide resolved

src/marqo/tensor_search/tensor_search.py Show resolved Hide resolved

vicilliar reviewed May 15, 2023

View reviewed changes

src/marqo/tensor_search/models/score_modifiers_object.py Outdated Show resolved Hide resolved

fix inference for model auth tests

4cedaab

Jeadie force-pushed the jack/simplify-search branch from fa16411 to 4cedaab Compare May 16, 2023 01:07

Jeadie temporarily deployed to marqo-test-suite May 16, 2023 01:12 — with GitHub Actions Inactive

Merge branch 'mainline' into jack/simplify-search

22be91c

pandu-k reviewed May 17, 2023

View reviewed changes

Jeadie and others added 3 commits May 18, 2023 10:04

Merge branch 'mainline' into jack/simplify-search

6682e24

PR fixes

90de212

Merge branch 'mainline' into jack/simplify-search

f6de1c6

add more tests

25c514c

pandu-k reviewed May 19, 2023

View reviewed changes

Jeadie temporarily deployed to marqo-test-suite May 19, 2023 04:08 — with GitHub Actions Inactive

pandu-k reviewed May 19, 2023

View reviewed changes

multi modal test +1

67247a3

pandu-k reviewed May 19, 2023

View reviewed changes

Jeadie had a problem deploying to marqo-test-suite May 21, 2023 23:01 — with GitHub Actions Failure

Jeadie added 2 commits May 22, 2023 09:21

Merge branch 'mainline' into jack/simplify-search

11c52f6

Merge branch 'mainline' into jack/simplify-search

5c02fd6

pandu-k temporarily deployed to marqo-test-suite May 22, 2023 00:26 — with GitHub Actions Inactive

pandu-k approved these changes May 22, 2023

View reviewed changes

Jeadie merged commit c707686 into mainline May 22, 2023

Jeadie deleted the jack/simplify-search branch May 22, 2023 03:01

Jeadie mentioned this pull request May 24, 2023

Bulk search now supports context and score modifiers marqo-ai/py-marqo#91

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `tensor_search::search` to use shared functions from `tensor_search::bulk_search` #469

Refactor `tensor_search::search` to use shared functions from `tensor_search::bulk_search` #469

Jeadie commented May 10, 2023 •

edited

Loading

Jeadie May 10, 2023

pandu-k May 17, 2023

Jeadie May 18, 2023

Jeadie May 10, 2023

Jeadie May 10, 2023

pandu-k May 17, 2023

Jeadie May 10, 2023

pandu-k May 17, 2023

Jeadie May 10, 2023

Jeadie commented May 10, 2023

Jeadie commented May 12, 2023

pandu-k May 15, 2023

pandu-k May 15, 2023

vicilliar left a comment

pandu-k May 17, 2023

pandu-k May 17, 2023

pandu-k May 17, 2023

pandu-k May 17, 2023

Jeadie May 18, 2023

pandu-k left a comment

pandu-k May 19, 2023

Jeadie May 19, 2023

pandu-k May 19, 2023

pandu-k left a comment



		def construct_msearch_body_elements(vector_properties_to_search: List[str], offset: int, filter_string: str, index_info: IndexInfo, result_count: int, query_vector: List[float], attributes_to_retrieve: List[str], index_name: str, contextualised_filter: str) -> List[Dict[str, Any]]:
		def construct_msearch_body_elements(searchableAttributes: List[str], offset: int, filter_string: str, index_info: IndexInfo, result_count: int, query_vector: List[float], attributes_to_retrieve: List[str], index_name: str, score_modifiers: Optional[Dict[str, Any]] = None) -> List[Dict[str, Any]]:

Refactor tensor_search::search to use shared functions from tensor_search::bulk_search #469

Refactor tensor_search::search to use shared functions from tensor_search::bulk_search #469

Conversation

Jeadie commented May 10, 2023 • edited Loading

Changeset

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jeadie commented May 10, 2023

Jeadie commented May 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vicilliar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pandu-k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pandu-k left a comment

Choose a reason for hiding this comment

Refactor `tensor_search::search` to use shared functions from `tensor_search::bulk_search` #469

Refactor `tensor_search::search` to use shared functions from `tensor_search::bulk_search` #469

Jeadie commented May 10, 2023 •

edited

Loading