-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data plane integration testing suite #268
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the nitpicks and questions, you can obviously ignore and land as-is. This is fantastic, thanks for expanding our coverage, streamlining the integration flows, and packing in additional GRPC quality of life improvements. Also hot on the heels of all the dependency coverage work, wow! 🚢
outputs: | ||
index_name: | ||
description: 'The name of the index, including randomized suffix' | ||
value: ${{ steps.create-index.outputs.index_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referring to a different create-index
step that's present in this action, or is it assuming it's been run prior? It's working properly so I'm just going to assume the scope for steps.<whatever>
is global to the whole workflow. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this outputs
declaration is just accidental copy/pasta from the create-index
action I made earlier this week for dependency testing purposes so it can be removed. If I wanted to use it I could by having my test suite write the name of the index it creates to the GITHUB_OUTPUT
location (see here in script used by create-index action) but in this case it's not needed since the test creates and deletes the index.
# - euclidean | ||
# - dotproduct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you want to keep these commented out for now, but if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, leaving them out while still fleshing out the tests and iterating heavily. Dotproduct is having some kind of performance issue at the moment the db team are looking into.
- 3.9 | ||
- '3.10' | ||
- 3.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything but 3.8
is commented out in dependency-matrix-grpc
, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, not exactly. Mainly just trying to save some time and resources while iterating. I should bring them back on the grpc job.
SparseValues as NonGRPCSparseValues | ||
) | ||
|
||
class VectorFactoryGRPC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work getting the grpc side of things a bit more aligned with the non-grpc paths. 🙏
def build(item: Union[GRPCVector, NonGRPCVector, Tuple, Dict]) -> GRPCVector: | ||
if isinstance(item, GRPCVector): | ||
return item | ||
elif isinstance(item, NonGRPCVector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would, but would the VectorFactory
benefit from having the flexibility of GRPCVector
being added to the union in build
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps in some circumstances, but it's probably a lot less likely to happen than people trying to pass the NonGRPCVector
object into into the GRPC-flavored client. The GRPC-flavored version of this object is annoying because you have to convert metadata dicts yourself into Struct
before passing the kwarg param. So the ergonomics leave a lot to be desired.
return random_string(10) | ||
|
||
@pytest.fixture(scope='session') | ||
def idx(client, index_name, index_host): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Any specific reason for using idx
?
total_time += delta_t | ||
time.sleep(delta_t) | ||
|
||
def poll_fetch_for_ids_in_namespace(idx, ids, namespace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Lots of goodies for improving thinking around how to streamline the integration flows on typscript side, thank you again.
@@ -1,5 +1,6 @@ | |||
import pytest | |||
from pinecone.utils import convert_to_list | |||
from pinecone import SparseValues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Might not be needed.
def md1(): | ||
return {"genre": "action", "year": 2021} | ||
|
||
|
||
@pytest.fixture | ||
def md2(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: More descriptive metadata1()
and metadata2()
? Very nitpicky, but since these fixtures are used across tests in other files might help readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find everything about these tests loathsome even after some refactoring. 💀
They were previously using attributes on a test class called self.md1
, etc, so I kept the names the same for now just to simplify the refactor.
self.config = Config(api_key="test-api-key", host="foo") | ||
self.index = GRPCIndex(config=self.config, index_name="example-name", _endpoint_override="test-endpoint") | ||
|
||
def test_describeIndexStats_callWithoutFilter_CalledWithoutFilter(self, mocker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I was going to ask about the duplicative seeming naming for tests in the grpc suite, but it seems to be across the board in some of these files.
Problem
Want to add a much more comprehensive integration testing suite against the data plane. This is a large undertaking because of the amount of variables involved, but this is a start.
Some variables to consider:
Solution
Conceptually, I want every test run on github actions to target one combination of the factors above. This means some things which might otherwise be handled through test parameterization in a big monolithic test suite that is run with a single command are now toggled through environment variables. This makes it very easy to setup parallel runs with different configurations, but at the cost of some additional complexity. There's currently no single command to run every combo from my local machine in one go.
To run these locally right now, I might have an incantation like this:
test-data-plane
that allow tests to easily be run using different configurations for target environment/cloud/region, REST vs GRPC, and metric type (which is relevant because hybrid search / sparse vector stuff only makes sense in the context of indexes with metric=dotproduct).IndexGRPC#upsert
to acceptVector
orVectorGRPC
args. This greatly simplifies test reuse and will be a nice UX improvement for users also because it simplifies docs.IndexGRPC
class to use the newVectoryFactoryGRPC
class and triggering a few failures due to missed a few edge cases expressed through the unit tests. I split things up into smaller files and transitioned from usingself.<field>
everywhere to using pytest fixtures instead. It still sucks, but readability maybe slightly improved.None
via GRPC orFetchResult(vectors=[])
via REST. The simplicity ofNone
was nice, but now that there isread_unit
information to display even when there are no results, theFetchResponse
return value makes more sense.To save resources/time while initially building out these tests, for now I'm only running with serverless indexes. But it should be easy to add other configurations soon.
Some possible bugs/questions to follow up:
$ne
and$nin
aren't working as documentedreadUnits
toread_units
before shipping 3.0High-level coverage so far
Type of Change