Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES: Configurable query result sizes #2437

Closed
albertteoh opened this issue Aug 31, 2020 · 4 comments · Fixed by #2453
Closed

ES: Configurable query result sizes #2437

albertteoh opened this issue Aug 31, 2020 · 4 comments · Fixed by #2453

Comments

@albertteoh
Copy link
Contributor

albertteoh commented Aug 31, 2020

Requirement - what kind of business use case are you trying to solve?

We limit the number of results returned from Elasticsearch (ES) from user queries to prevent large queries from negatively impacting the ES cluster. With ES as our Jaeger backing store, we also need to ensure jaeger-query -> ES requests abide by these limits, ideally as a configurable parameter.

Problem - what in Jaeger blocks you from solving the requirement?

Currently, defaultDocCount is applied to the ES query's size attribute and I was not able to find a means of overriding this value. Examples:

https://github.com/jaegertracing/jaeger/search?q=defaultDocCount&unscoped_q=defaultDocCount

In some cases, this value is hard-coded directly instead of using the const:

Size(10000). // the default elasticsearch allowed limit

Resulting in the need for less desirable workarounds to ensure requests from jaeger-query to ES set the appropriate result size limit.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Introduce a new integer flag, perhaps es.doc-count (ES_DOC_COUNT env var) for consistency, that represents an override to the 10000 defaultDocCount const. Open to suggestions on better parameter names as well.

Any open questions to address

  • Given 10,000 is applied across service/operation aggregations and dependency store (and also in OTEL), does it make sense to have the one configurable variable in all these places? Or should we separate, aggregation and dependency store doc counts?

  • The es.max-num-spans (used here) also defaults to a value of 10,000. Please let me know if there is any desire to consolidate this flag with the proposed es.doc-count.

@pavolloffay
Copy link
Member

The flag

 --es.max-num-spans int                                     The maximum number of spans to fetch at a time per query in Elasticsearch (default 10000)

seems very similar to the proposed --es.doc-count which limits the number of documents not only spans. To me it makes sense to have a single flag to control the number of fetched documents. Otherwise users can overload the cluster by running other queries.

Can I ask what what value do you want to use to keep the impact of the query on the cluster low?

@albertteoh
Copy link
Contributor Author

albertteoh commented Sep 1, 2020

Can I ask what what value do you want to use to keep the impact of the query on the cluster low?

We're limiting ES aggregation queries to 1000 at the moment, but for Jaeger queries, we're considering increasing this to 4000.

To me it makes sense to have a single flag to control the number of fetched documents.

Yup, I agree. Although --es.max-num-spans parameter could be re-used for the more general case to limit the number of fetched documents, the name implies a more specific use case and I feel, would not be appropriate for the more general use case (could be confusing). Changing its name would be also be a breaking change.

Perhaps we can introduce the more general --es.doc-count (or a better alternative name :) ) which would be used in all cases where ES fetch limits are used. --es.max-num-spans would take the value of min(es.max-num-spans, es.doc-count). Both --es.max-num-spans and --es.doc-count would default to 10,000.

It may then make sense to deprecate --es.max-num-spans and remove it after a few minor revisions or on the next major revision bump, but not sure what the community thinks about this.

Does that sound reasonable? Open to other thoughts and suggestions!

@pavolloffay
Copy link
Member

Even if use the --es.max-num-spans to limit search of other objects strictly speaking it might be a breaking change for somebody. I am wondering if it would impact somebody cc) @jaegertracing/elasticsearch? We have made a similar behavior changes to flags before It might be fine then.

We can introduce a new flag an deprecate the old one after a couple of releases.

@albertteoh
Copy link
Contributor Author

Thanks for the feedback @pavolloffay.

If there are no more objections, I'll go ahead with the following as discussed:

  • Add a new flag --es.max-doc-count (to be consistent with both the existing es.max-num-spans param and the existing defaultDocCount const) which will be used where ES doc limits are required (except where es.max-num-spans is used).
  • Default it to 10_000.
  • Leave --es.max-num-spans param, but add a deprecation note both in the parameter docs as well as code comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants