Shard request cache and script queries/aggregations #49321

AlexP-Elastic · 2019-11-19T15:40:41Z

Support for caching queries including scripts:

Although the documentation for the shard request query currently says:

If your query uses a script whose result is not deterministic (e.g. it uses a random function or references the current time) you should set the request_cache flag to false to disable caching for that request

In practice the cache is skipped whenever ScriptService is used

This is intentional, per @jimczi:

this is intentional ... we cannot ensure that the result is deterministic.

An alternative (which per the docs seems consistent with how some other scenarios are handled) would be to default to skipping the cache in such cases but allow use of the existing request_cache=true param for clients to declare their script is deterministic and can be cached

Note that scripted aggregations are often very expensive and therefore great candidates to be cached!

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-11-19T15:45:13Z

Pinging @elastic/es-search (:Search/Search)

jpountz · 2019-11-19T17:15:28Z

I'm seeing scripted queries/aggs as a way to trade performance for flexibility, as they allow to do things that had not been planned at index time. Since these are already trading performance for something else, it doesn't feel right to me to now trade correctness for performance by enabling caching when the user declares it is safe.

Maybe tell us more about your usage of scripts? I wonder that you might be using scripts as a workaround to a missing aggregation feature?

elasticmachine · 2019-11-19T17:44:53Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

AlexP-Elastic · 2019-11-19T18:59:35Z

I'm personally a heavy script user and my usage patterns certainly shouldn't be taken as representative :) but for the purposes of discussion, my uses of scripts include:

Formatting and transforming fields in Kibana using the script field functionality
- (unclear to what extent cache is needed for this scenario ... eg if I create a visualization and share the link, a cache is one way of handling the resulting spike? Is the shard request the right cache for that?)
Similarly, I use a spreadsheet connector (https://github.com/Alex-At-Home/elasticsearch-sheets) which lets (/encourages!) you to create script fields and scripts for queries and aggregations (and build quite complex transforms between the source data and the spreadsheet's cell range using the scripted_metric aggregation)
- (obviously a random app I built isn't evidence of any requirement though! The case of caching would be similar to the Kibana one, ie sharing a link to lots of people)
An aggregation I use somewhat commonly involves having a fairly frequently changing (or user entered) table of weights, and then using that lookup table to weight the results of a terms aggregation
- (this is actually the thing I was experimenting with the performance of when I came upon the out-of-date documentation and starting asking around)

So it could be summarized as a mix of "missing aggregation features", (related) "trading off performance to provide (query-time) flexibility". and to a lesser extent "trading off performance to keep all logic in one place"

In all cases I'm not so much trading off "correctness for performance" with cache, I'm trading off memory for performance (based on the knowledge/expectation that there will be a large number of queries with the same results in a given time period)

jpountz · 2019-11-19T21:53:24Z

Thanks for the detailed reply!

eg if I create a visualization and share the link, a cache is one way of handling the resulting spike? Is the shard request the right cache for that?

This is exactly the reason why we have this cache. :)

a fairly frequently changing (or user entered) table of weights, and then using that lookup table to weight the results of a terms aggregation

That one sounds interesting to me. Do I understand correctly that instead of sorting terms by doc_count descending, you want to sort them by descending weight? Or maybe even descending weight*doc_count? Can you tell me more about the higher-level use-case, is it something like a rollup?

To be clear I'm not against caching scripted queries or aggs, but I'm worried about allowing users to cache data that is not cacheable. My preferred way of fixing this would be by enabling Painless to tell us when a script is deterministic or not, so that we could make caching decisions accordingly. @jdconrad @stu-elastic Do you think it'd be doable?

polyfractal · 2019-11-19T22:02:33Z

An aggregation I use somewhat commonly involves having a fairly frequently changing (or user entered) table of weights, and then using that lookup table to weight the results of a terms aggregation

This caught my eye as well, would love to know more. We've talked about making bucket_sort scriptable, which would allow a lot more custom sorting of agg buckets. I realize that's still using a script, but being a post-processing step it'd also be a lot faster since it would only invoke the script once.

(although it would have different semantics since it's only sorting the final list of buckets, instead of all the buckets at runtime).

jdconrad · 2019-11-19T22:13:00Z

So, I think we could make this possible through Painless for which scripts are deterministic, but I don't think it would be all that useful unless we are safe to assume that any access to docvalues (or _source) or whatever else the user is doing would be flushed from the cache upon changes. And if anything is done from user-defined params (are weights done this way or is a new script created every time with constants?) then it's also not deterministic as we explicitly expect those to be changed throughout a script's life.

The other thing is right now Painless isn't really aware of something like doc and just views this input as a simple Map. We would need to specialize certain inputs to be known as deterministic.

Edit: After thinking about this I realized that all those values are deterministic because otherwise the cache wouldn't work. (Oops.) I think Painless only has one non-deterministic methods right now in randomUUID.

Refs: elastic#49321

Refs: #49321

Refs: elastic#49321

**Backport** Refs: #49321

stu-elastic · 2019-12-19T23:31:08Z

Fixed by the following changes:

Scripting: Groundwork for caching script results Scripting: Groundwork for caching script results #49895 (backport)
Scripting: Cache script results if deterministic Scripting: Cache script results if deterministic #50106 (backport)
[TEST] Unknown scripting annotations raise error [TEST] Unknown scripting annotations raise error #50343 (backport)
Scripting: ScriptFactory not required by compile Scripting: ScriptFactory not required by compile #50344 (backport)
[DOCS] Deterministic scripted queries are cached [DOCS] Deterministic scripted queries are cached #50408 (backport)

Refs: elastic#49321

mayya-sharipova added the :Search/Search Search-related issues that do not fall into other categories label Nov 19, 2019

mayya-sharipova added the >enhancement label Nov 19, 2019

$@polyfractal$ polyfractal added the :Analytics/Aggregations Aggregations label Nov 19, 2019

stu-elastic mentioned this issue Nov 21, 2019

Enable caching deterministic scripted queries in shard request cache #49466

Closed

stu-elastic added a commit to stu-elastic/elasticsearch that referenced this issue Dec 19, 2019

[DOCS] Deterministic scripted queries are cached

8766b53

Refs: elastic#49321

stu-elastic mentioned this issue Dec 19, 2019

[DOCS] Deterministic scripted queries are cached #50408

Merged

stu-elastic added a commit that referenced this issue Dec 19, 2019

[DOCS] Deterministic scripted queries are cached (#50408)

fb6ef69

Refs: #49321

stu-elastic added a commit to stu-elastic/elasticsearch that referenced this issue Dec 19, 2019

[DOCS] Deterministic scripted queries are cached (elastic#50408)

a6430c0

Refs: elastic#49321

stu-elastic mentioned this issue Dec 19, 2019

[DOCS] Deterministic scripted queries are cached (#50408) #50411

Merged

stu-elastic added a commit that referenced this issue Dec 19, 2019

[DOCS] Deterministic scripted queries are cached (#50408) (#50411)

2e76865

**Backport** Refs: #49321

stu-elastic closed this as completed Dec 19, 2019

SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this issue Jan 23, 2020

[DOCS] Deterministic scripted queries are cached (elastic#50408)

b6fdfb7

Refs: elastic#49321

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard request cache and script queries/aggregations #49321

Shard request cache and script queries/aggregations #49321

AlexP-Elastic commented Nov 19, 2019

elasticmachine commented Nov 19, 2019

jpountz commented Nov 19, 2019

elasticmachine commented Nov 19, 2019

AlexP-Elastic commented Nov 19, 2019

jpountz commented Nov 19, 2019

polyfractal commented Nov 19, 2019

jdconrad commented Nov 19, 2019 •

edited

Loading

stu-elastic commented Dec 19, 2019

Shard request cache and script queries/aggregations #49321

Shard request cache and script queries/aggregations #49321

Comments

AlexP-Elastic commented Nov 19, 2019

elasticmachine commented Nov 19, 2019

jpountz commented Nov 19, 2019

elasticmachine commented Nov 19, 2019

AlexP-Elastic commented Nov 19, 2019

jpountz commented Nov 19, 2019

polyfractal commented Nov 19, 2019

jdconrad commented Nov 19, 2019 • edited Loading

stu-elastic commented Dec 19, 2019

jdconrad commented Nov 19, 2019 •

edited

Loading