Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations #18

Open
jonian opened this issue May 21, 2022 · 7 comments
Open

Aggregations #18

jonian opened this issue May 21, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@jonian
Copy link

jonian commented May 21, 2022

Hi @baygeldin, this is an awesome project! Do you have any plans to add support for aggregations?

@baygeldin
Copy link
Owner

Hey @jonian, thanks! Supporting aggregations would be nice, but currently I don't have any use for them, so it wasn't in my plans. What's your use case? And what kind of API are you interested in by the way?

@jonian
Copy link
Author

jonian commented May 21, 2022

I'm using elasticsearch aggregations to build filters. Currently I use only term count and min/max on numeric fields. I want to move away from es and was thinking to use meilisearch with facets distribution, but I saw your project and really liked it.

Any API would be great, I don't have something specific in mind. Maybe something like the API provided by searchkick.

I would really like to contribute on this feature, but unfortunately my experience with rust is very limited.
If it is of any help/inspiration, here is a tantivy-aggregations repo.

@jonian
Copy link
Author

jonian commented May 21, 2022

Tantivy has collectors that can be used for aggregations and especially MultiCollector and can be used when the Collector types are unknown at compile time.

@baygeldin
Copy link
Owner

Yeah, I'm aware of collectors, but at the time I was writing the code I didn't come up with a good use case for them (apart from the obvious one for which I used the TopDocs collector), so I decided to keep things simple. But looking at the screenshot in the searchkick documentation makes me realize that aggregations are actually pretty useful and it would be great to add ability to customize what data is aggregated during the search (but only to some extent because implementing fully custom collectors is tricky for the same reason it's difficult to implement #17).

So, here's what I propose:

  • Add Tantiny::Collector object (in the same fashion Tantiny::Tokenizer is done). There will be some predefined collectors (TopDocs and others collectors from tantivy docs) that user could configure individually (again, same as tokenizers).
  • Add collectors: option to Tantiny::Index.new where we would pass an array of Tantiny::Collector objects which will be used by default during the search (and also allow to override it when calling the #search method).
  • Make #search return whatever the collector collected (in case of multiple collectors it should probably be a hash with collectors as keys and the data they collected as values).

@baygeldin
Copy link
Owner

This does require some work, but at least it's more or less straighforward to implement. However, I don't know which collectors would cover your use case. As for filtering by numeric fields, it's already supported (check out the range_query), but I'm not sure what collector would work for aggregating the term count.

Maybe I will draft a PR in the next couple of weeks if I have spare time. You can help if you want (btw I don't have much experience with Rust myself tbh, but I don't find it particularly difficult to write as long as it's just bindings to another library).

P.S: btw would love to hear what motivated you to move away from elastic :) also, if you decide to go with Meilisearch it would be cool if you share your experience because I wanted to try it out myself, but didn't have an opportunity for that just yet.

@baygeldin baygeldin added the enhancement New feature or request label May 21, 2022
@jonian
Copy link
Author

jonian commented May 22, 2022

The proposed collector API looks great! Thank you for considering adding this feature. If only term count is added I think getting other aggregations like min, max, avg and mean can be calculated by the user. Is counting on numeric fields possible?

You can help if you want (btw I don't have much experience with Rust myself tbh, but I don't find it particularly difficult to write as long as it's just bindings to another library).

I was thinking I can contribute by adding some features like meilisearch-rails to make easier to integrate with ActiveRecord but if you think I can help with the rust extension I will give it a try.

P.S: btw would love to hear what motivated you to move away from elastic :) also, if you decide to go with Meilisearch it would be cool if you share your experience because I wanted to try it out myself, but didn't have an opportunity for that just yet.

I want to move away from elastic mainly because of the high memory usage. Right now it uses more memory than my apps with one being multi-tenant. Also the recent change in their license is another factor.

I will test Meilisearch in the coming weeks and I will share my experience. My use case is a bit complex, users define filters from the admin interface and elastic works well with that.

@baygeldin
Copy link
Owner

baygeldin commented May 23, 2022

Is counting on numeric fields possible?

Nope, currently only filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants