-
Notifications
You must be signed in to change notification settings - Fork 109
Data plane integration testing suite #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
austin-denoble
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for all the nitpicks and questions, you can obviously ignore and land as-is. This is fantastic, thanks for expanding our coverage, streamlining the integration flows, and packing in additional GRPC quality of life improvements. Also hot on the heels of all the dependency coverage work, wow! 🚢
| outputs: | ||
| index_name: | ||
| description: 'The name of the index, including randomized suffix' | ||
| value: ${{ steps.create-index.outputs.index_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this referring to a different create-index step that's present in this action, or is it assuming it's been run prior? It's working properly so I'm just going to assume the scope for steps.<whatever> is global to the whole workflow. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this outputs declaration is just accidental copy/pasta from the create-index action I made earlier this week for dependency testing purposes so it can be removed. If I wanted to use it I could by having my test suite write the name of the index it creates to the GITHUB_OUTPUT location (see here in script used by create-index action) but in this case it's not needed since the test creates and deletes the index.
| # - euclidean | ||
| # - dotproduct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you want to keep these commented out for now, but if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, leaving them out while still fleshing out the tests and iterating heavily. Dotproduct is having some kind of performance issue at the moment the db team are looking into.
| - 3.8 | ||
| - 3.9 | ||
| - '3.10' | ||
| - 3.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything but 3.8 is commented out in dependency-matrix-grpc, is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, not exactly. Mainly just trying to save some time and resources while iterating. I should bring them back on the grpc job.
| SparseValues as NonGRPCSparseValues | ||
| ) | ||
|
|
||
| class VectorFactoryGRPC: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work getting the grpc side of things a bit more aligned with the non-grpc paths. 🙏
| def build(item: Union[GRPCVector, NonGRPCVector, Tuple, Dict]) -> GRPCVector: | ||
| if isinstance(item, GRPCVector): | ||
| return item | ||
| elif isinstance(item, NonGRPCVector): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would, but would the VectorFactory benefit from having the flexibility of GRPCVector being added to the union in build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps in some circumstances, but it's probably a lot less likely to happen than people trying to pass the NonGRPCVector object into into the GRPC-flavored client. The GRPC-flavored version of this object is annoying because you have to convert metadata dicts yourself into Struct before passing the kwarg param. So the ergonomics leave a lot to be desired.
| return random_string(10) | ||
|
|
||
| @pytest.fixture(scope='session') | ||
| def idx(client, index_name, index_host): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Any specific reason for using idx?
| total_time += delta_t | ||
| time.sleep(delta_t) | ||
|
|
||
| def poll_fetch_for_ids_in_namespace(idx, ids, namespace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Lots of goodies for improving thinking around how to streamline the integration flows on typscript side, thank you again.
| @@ -1,5 +1,6 @@ | |||
| import pytest | |||
| from pinecone.utils import convert_to_list | |||
| from pinecone import SparseValues | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Might not be needed.
| def md1(): | ||
| return {"genre": "action", "year": 2021} | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def md2(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: More descriptive metadata1() and metadata2()? Very nitpicky, but since these fixtures are used across tests in other files might help readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find everything about these tests loathsome even after some refactoring. 💀
They were previously using attributes on a test class called self.md1, etc, so I kept the names the same for now just to simplify the refactor.
| self.config = Config(api_key="test-api-key", host="foo") | ||
| self.index = GRPCIndex(config=self.config, index_name="example-name", _endpoint_override="test-endpoint") | ||
|
|
||
| def test_describeIndexStats_callWithoutFilter_CalledWithoutFilter(self, mocker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I was going to ask about the duplicative seeming naming for tests in the grpc suite, but it seems to be across the board in some of these files.
Problem
Want to add a much more comprehensive integration testing suite against the data plane. This is a large undertaking because of the amount of variables involved, but this is a start.
Some variables to consider:
Solution
Conceptually, I want every test run on github actions to target one combination of the factors above. This means some things which might otherwise be handled through test parameterization in a big monolithic test suite that is run with a single command are now toggled through environment variables. This makes it very easy to setup parallel runs with different configurations, but at the cost of some additional complexity. There's currently no single command to run every combo from my local machine in one go.
To run these locally right now, I might have an incantation like this:
test-data-planethat allow tests to easily be run using different configurations for target environment/cloud/region, REST vs GRPC, and metric type (which is relevant because hybrid search / sparse vector stuff only makes sense in the context of indexes with metric=dotproduct).IndexGRPC#upsertto acceptVectororVectorGRPCargs. This greatly simplifies test reuse and will be a nice UX improvement for users also because it simplifies docs.IndexGRPCclass to use the newVectoryFactoryGRPCclass and triggering a few failures due to missed a few edge cases expressed through the unit tests. I split things up into smaller files and transitioned from usingself.<field>everywhere to using pytest fixtures instead. It still sucks, but readability maybe slightly improved.Nonevia GRPC orFetchResult(vectors=[])via REST. The simplicity ofNonewas nice, but now that there isread_unitinformation to display even when there are no results, theFetchResponsereturn value makes more sense.To save resources/time while initially building out these tests, for now I'm only running with serverless indexes. But it should be easy to add other configurations soon.
Some possible bugs/questions to follow up:
$neand$ninaren't working as documentedreadUnitstoread_unitsbefore shipping 3.0High-level coverage so far
Type of Change