Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add summary endpoint for number of species #27

Open
fmichonneau opened this issue Dec 22, 2017 · 1 comment
Open

Add summary endpoint for number of species #27

fmichonneau opened this issue Dec 22, 2017 · 1 comment

Comments

@fmichonneau
Copy link
Member

It would be nice to have a summary endpoint (similar to summary/top/records/ and summary/count/records/) that would return the number of species (e.g. distinct scientificname) for a given query. That would allow to answer questions such as "how many species of phylum X are in country Y?"

@mjcollin
Copy link

mjcollin commented Jan 2, 2018

We've talked about this as a "unique values" API endpoint, ie "Show me the unique values of this field and their counts", adding a query to filter the records as you describe above like "phylum == X and country ==Y" would be a good refinement.

The difficulty is that Elastic Search is great at top-style queries that don't rely on collecting 100% of results and terrible at distinct and count type things. We're evaluating how to provide this in a performant manner. @godfoder

If you have an immediate research need, these are really easy to do in Spark and we can talk about how to get numbers you need off our cluster:

https://github.com/bio-guoda/guoda-examples/blob/master/iDigBio%20Country%20Checklist.ipynb

(Rendering that seems busted at the moment but it's typical filter, grouby, count stuff.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants