Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch out to make elasticsearch a generic search and support more search engines #534

Open
krickert opened this issue Aug 25, 2024 · 2 comments

Comments

@krickert
Copy link

Feature description

tl'dr - create a common search interface, micronaut-search. I can work on creating solr clients (micronaut-search-solr) and make this one be for ES (micronaut-search-opensearch).

I would then create the following for micronaut-search:

  • grpc interface for searching
  • REST interface for search
  • common admin features that integrate with search

Proposal: Creating a Unified Search Interface for Micronaut

I'm proposing we develop a common search interface, micronaut-search, for Micronaut applications. This would simplify the integration of various search engines, such as Solr and Elasticsearch.

My Role:
Create common search components that work on both solr-clients and elastic-search clients. (others such as vespa pinecone etc are possible too, but ES and solr have nearly identical setups since they're both based on lucene)

What I need help with:

Solr Client: Solr client module (micronaut-search-solr). Is this a difficult ask?
Elasticsearch Client: this client, but it would have some capability to interact with a generic search module

Motivation:

Simplify Search Integration: Indexing and searching can be complex. I aim to streamline this process for Micronaut developers.
Create common indexing components: Right now this is weak as hell in the market and requires ETL transform-types. I am trying to simplify that
Sentence embedding integration: Unfortunately for us, almost all good embedding calculations are either in python or require JNI. We would create an interface that would bridge that (as I've already done with both openapi and grpc)
Job management: I'd integrate with the job runner on micronaut to create a job manager for indexing and searching.
common search interface: get search results OOTB with either search engine

Current Progress:

Indexer Development: I've already developed an indexer that simplifies the creation of vector indexes on Solr. I'm going to do the same with elastic search.

Feature Integration: I'm working on integrating features like automatic Solr document creation from Google Protocol Buffers and Tika parsing from web crawling. I already have unit tests that demonstrate this with wikipedia and tika.

Next Steps:

Generic Search Client: create a more generic search client to support both Solr and Elasticsearch.
Advanced Features: This would enable us to explore advanced features like vector and semantic search.
Request for Collaboration:

I'm seeking assistance with the client code development. If we can create a shared client, it will make the project more attractive to a wider audience.

Thoughts?

@graemerocher
Copy link
Contributor

Seems useful

@krickert
Copy link
Author

Seems useful

Can you guide me as to what next steps I would need to do?

I already have a few components that are fully open source built on micronaut

Using a semantic search component requires two dependent components I made:

  • Chunker
  • Sentence embeddings

The next step would be to use a third component that I built:

  • Semantic indexer

Finally, you can try it all together with the fourth component:

  • Search API

These components interact with each other through containers using grpc. Where possible, I am using micronaut components

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants