Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a prototype tab; support ingest and search with guardrails #139

Merged
merged 7 commits into from
Apr 19, 2024

Conversation

ohltyler
Copy link
Member

@ohltyler ohltyler commented Apr 19, 2024

Description

This PR adds a Prototype tab that can be used for testing a workflow end-to-end. Note the placement of executing this in UX is TBD, so I've placed in its own sandboxed tab for now for ease of use. It dynamically parses out the necessary fields to generate both 1/ a valid document schema to ingest to the configured index, and 2/ a valid query to run some variation of neural search against the configured index, all based on the workflow's use case. For example, a SEMANTIC_SEARCH use case will have the index mapping configured with a specific set of fields (input field, vector field, model ID), defining the document schema, and a particular way to execute a search with the neural query clause utilizing those fields. This will all be different than other neural search variations, that will have different schemas and a different construction of valid queries to execute.

In particular (current scope is only semantic search), we have UI guardrails to make everything readonly besides:

  • the value of the plaintext field used when ingesting
  • the value of the plaintext field used when querying
    The rest of the construction is all generated based on the use case (for now only semantic search), and the values of the configured fields (input field, vector field, model ID, index name).

Note this is just a first start, we can easily expose more or less depending on how much flexibility is desired (e.g., freeform queries, opening up editing of k or other neural clause params, etc.)

Implementation details:

  • added Ingestor component to format, execute, and display response from running an indexing command against an index
  • added QueryExecutor component to format, execute, and display a formatted list of results from running a query against an index
  • added base Prototype component to handle global prototype state and tab state
  • added data_extractor_utils to organize all utility fns used for parsing the workflow data to get different fields based on the workflow use case. this will be expanded upon and refactored as more use cases are added
  • onboarded search index and index document APIs
  • minor handling of empty state in prototype tab when the workflow has not been provisioned and/or no resources are available

Demo video:

  • creating a semantic search workflow and defining all of the configured fields
  • provisioning to set up the index and ingest pipeline
  • using Ingest tab for ingesting some sample documents, only requiring plaintext input
  • using Query tab for searching against those documents, only requiring plaintext input
  • error handling on empty search
  • basic validation of semantic search working as expected
screen-capture.31.webm

Issues Resolved

Makes progress on #68

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
@amitgalitz
Copy link
Member

Looks awesome, quick question, if provisioning takes a long time, should we have some refresh capability or it will auto query until provisioning is done?

@ohltyler
Copy link
Member Author

Looks awesome, quick question, if provisioning takes a long time, should we have some refresh capability or it will auto query until provisioning is done?

Yep, currently it's very basic, only does one refresh after 5s, no auto refresh. My idea to improve will be logic in the state rendering, and if it's in a transitive state (provisioning/deprovisioning), perform auto-refresh on some 1 or 2s interval or something. Not that fine-grained yet though

@amitgalitz
Copy link
Member

Looks awesome, quick question, if provisioning takes a long time, should we have some refresh capability or it will auto query until provisioning is done?

Yep, currently it's very basic, only does one refresh after 5s, no auto refresh. My idea to improve will be logic in the state rendering, and if it's in a transitive state (provisioning/deprovisioning), perform auto-refresh on some 1 or 2s interval or something. Not that fine-grained yet though

Sounds good, even a refresh button later on if some jobs take a while like reindex or deploying a large pre trained model, we have refresh buttons all over core OSD too

@ohltyler
Copy link
Member Author

Looks awesome, quick question, if provisioning takes a long time, should we have some refresh capability or it will auto query until provisioning is done?

Yep, currently it's very basic, only does one refresh after 5s, no auto refresh. My idea to improve will be logic in the state rendering, and if it's in a transitive state (provisioning/deprovisioning), perform auto-refresh on some 1 or 2s interval or something. Not that fine-grained yet though

Sounds good, even a refresh button later on if some jobs take a while like reindex or deploying a large pre trained model, we have refresh buttons all over core OSD too

Agreed. I was thinking similar. Same with on the base workflow list page, can be helpful. I've already seen on my local single-node cluster how the pretrained models can take up a good 10-15s to finish getting deployed locally.

@ohltyler ohltyler merged commit e18bb11 into opensearch-project:main Apr 19, 2024
10 checks passed
@ohltyler ohltyler deleted the sparse-search branch April 19, 2024 16:35
opensearch-trigger-bot bot pushed a commit that referenced this pull request Apr 19, 2024
Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
(cherry picked from commit e18bb11)
ohltyler added a commit that referenced this pull request Apr 19, 2024
…#140)

Signed-off-by: Tyler Ohlsen <ohltyler@amazon.com>
(cherry picked from commit e18bb11)

Co-authored-by: Tyler Ohlsen <ohltyler@amazon.com>
@owaiskazi19
Copy link
Member

@ohltyler are we not adding tests for now?

@ohltyler
Copy link
Member Author

@ohltyler are we not adding tests for now?

No. Not until there's even an idea of a final proposed design, it's not worth adding as the code is too fluid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants