Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Merge with opensearch-py? #372

Open
dblock opened this issue Jan 18, 2024 · 11 comments
Open

[PROPOSAL] Merge with opensearch-py? #372

dblock opened this issue Jan 18, 2024 · 11 comments
Labels
enhancement New feature or request

Comments

@dblock
Copy link
Member

dblock commented Jan 18, 2024

Is your feature request related to a problem?

Users are confused with the many OpenSearch clients. Is opensearch-py-ml one too many?

The dsl client has a high-level and a low-level interface. For AWS users there’s also boto3 (see https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-sdk.html#serverless-sdk-python), used to create collections. We have opensearch-dsl-py client that has been deprecated since 2.1 and will be archived.

What solution would you like?

Merge opensearch-py-ml into opensearch-py.

What alternatives have you considered?

Leave things as is.

@saimedhi
Copy link

I've reviewed the repository code, and I feel merging opensearch-py-ml into opensearch-py is feasible. I estimate it would take approximately 3 weeks to 1 month to merge all the code and tests, ensuring everything runs smoothly.

@dblock dblock removed the untriaged label Jun 17, 2024
@dblock
Copy link
Member Author

dblock commented Jun 17, 2024

Catch All Triage - 1 2 3 4 5

@rbpasker
Copy link

I think it might be worthwhile to conduct a high-level review of the ML API before merging it in. There are consistency and orthogonality issues that caused me a lot of confusion.

Here would be some things to discuss:

  1. HTTP or CRUD idiom -- should the methods be named put, get, post, delete (as in the OpenSearch main API); or create, update, delete (as in ML)?
  2. should the methods be suffixed with the object name? eg, (using the HTTP idiom) Connector.put_connector() or Connector.put()?
  3. There are some differences in kwarg names that should be the same across the APIs, eg body and payload for the HTTP contents

my personal preference is for CRUD without repeating the object name, eg Connector.create()

@dblock dblock pinned this issue Nov 4, 2024
@saimedhi
Copy link

saimedhi commented Nov 20, 2024

Merging opensearch-py-ml to opensearch-py will include below steps:

  • Merge the content of opensearch-py-ml into opensearch-py (excluding the opensearch-py-ml tests). Ensure all the existing tests pass in opensearch-py.
    Go file by file, checking the content of each file to determine if it already exists in opensearch-py. If it does not, assess whether it is necessary. If needed, merge the file content into opensearch-py.
  • Then, add all the opensearch-py-ml tests to opensearch-py. Ensure that all the opensearch-py-ml tests also pass successfully in opensearch-py
  • Update UPGRADING.md file in opensearch-py explaining what are the changes.
  • Additionally, you might need to add async support for the features integrated into opensearch-py
  • Add deprecation warning to opensearch-py-ml as done for opensearch-dsl-py
  • Add deprecation banner to opensearch-py-ml in opensearch documentation.

Example PRs from opensearch-dsl-py to opensearch-py merging :

@nathaliellenaa
Copy link

I talked with the ml commons team about this proposal and we were thinking to move the ml client APIs to opensearch-py but keep notebooks, dataframes, and other heavy libraries in py-ml. What do you think about this? @saimedhi @dblock

@saimedhi
Copy link

saimedhi commented Dec 9, 2024

I talked with the ml commons team about this proposal and we were thinking to move the ml client APIs to opensearch-py but keep notebooks, dataframes, and other heavy libraries in py-ml. What do you think about this? @saimedhi @dblock

  • @nathaliellenaa, could you please state the reasons behind keeping notebooks, dataframes, and other libraries in py-ml.
  • In my opinion, I feel everything needs to at one place. The primary goal of this merging is to decrease the repo maintenance effort and also reduce user effort/confusion. I feel neither will be achieved by having both separate libraries at the end.

@nathaliellenaa
Copy link

nathaliellenaa commented Dec 9, 2024

Thanks for the input @saimedhi. The reason to keep other heavy libraries in py-ml is to maintain a lightweight opensearch-py package, since most of the py-ml libraries are for data analysis. Merging all these libraries into opensearch-py would make it significantly larger and potentially less efficient for users who don't need these specific functionalities.

@dblock
Copy link
Member Author

dblock commented Dec 9, 2024

Merging all these libraries into opensearch-py would make it significantly larger and potentially less efficient for users who don't need these specific functionalities.

How much larger would it make it and how much less efficient?

@nathaliellenaa
Copy link

nathaliellenaa commented Dec 10, 2024

How much larger would it make it and how much less efficient?

I will need some time to analyze the exact size implications and weigh all the pros and cons of merging all libraries. I believe I can work parallel with this and get started on merging the ML APIs first, since everyone is on board with that idea. This way, we can make progress while I'm conducting a thorough evaluation of the full merger's impact.

@saimedhi
Copy link

CC: @minalsha

@ylwu-amzn
Copy link
Collaborator

ylwu-amzn commented Dec 12, 2024

I would suggest keep the scope clear :

  1. opensearch-py is for exposing OpenSearch and plugin APIs. We can specify API spec in opensearch-py. Developers can install opensearch-py, then use all of the APIs.
  2. opensearch-py-ml is for ML/AI purpose. Let's keep these non-client-API functions in opensearch-py-ml: data analysis, sample data, demo notebooks etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants