Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Faster search queries might see a timeout discrepancy in search response due to cached time #2000

Closed
Bukhtawar opened this issue Jan 27, 2022 · 10 comments · Fixed by #3454
Assignees
Labels
bug Something isn't working distributed framework

Comments

@Bukhtawar
Copy link
Collaborator

Bukhtawar commented Jan 27, 2022

Describe the bug
With search timeout set to 200ms user can end up seeing a response below which seems inconsistent since the timeout itself was set at 200ms and there would possibly be no way the took time is below the time out and yet the query times out

{
    "took": 60,
    "timed_out": true,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

The major issue could be wrong timeouts being enforced either pre-mature or too late based on the estimated time intervals. For eg it might timeout at 0ms or 400ms for a 200ms timeout

This happens due to the elapsed time computation which uses an optimization for System#nanoTime that caches time by 200ms by default based on the setting thread_pool.estimated_time_interval
Some latency sensitive query might see discrepancy based on theses defaults.

We need to check what is a reasonable default for the estimated time interval based on JMH benchmarks. Since today it exists as a static value, with basically no documentation on what to expect out of a search timeout. We can choose to make it dynamic with reasonable defaults and let users choose this interval within appropriate limits

@Bukhtawar Bukhtawar added bug Something isn't working untriaged labels Jan 27, 2022
@Bukhtawar Bukhtawar changed the title [BUG] Faster search queries might see a discrepancy in search response due to cached time [BUG] Faster search queries might see a timeout discrepancy in search response due to cached time Jan 28, 2022
@dblock
Copy link
Member

dblock commented Feb 2, 2022

So in this case is the value of "took" plain incorrect?

@Bukhtawar
Copy link
Collaborator Author

Bukhtawar commented Feb 2, 2022

@dblock there are two problems

  1. Premature or too relaxed timeouts, requested timeout of 200ms would kick in at 0ms or 400ms worst case. So essentially at low timeout values, actual timeouts might be totally off even resulting in missing results
  2. Took time can be off by 200ms irrespective

due to the relative times being cached at 200ms

Yes the actual took time looks wrong here

@dblock
Copy link
Member

dblock commented Feb 2, 2022

Caching time at any rate seems problematic. There must be a good reason that was introduced at all though?

@Bukhtawar
Copy link
Collaborator Author

I think the System#nanoTime is a slightly expensive system call and probably long running queries could live with the inaccuracy, thats my understanding of the motivation behind the cache.

@dblock
Copy link
Member

dblock commented Feb 3, 2022

Sounds like we use nanoTime to get more resolution, but then cache it, so should this be swapped for System.currentTimeMillis()? I am just brainstorming - worried that we optimize a thing that shouldn't exist.

https://stackoverflow.com/questions/19052316/why-is-system-nanotime-way-slower-in-performance-than-system-currenttimemill

@Bukhtawar
Copy link
Collaborator Author

Bukhtawar commented Feb 3, 2022

System.currentTimeMillis is based on physical clock time that can give negative deltas if clocks go back due to NTP. System.nanoTime uses a monotonically increasing time which is the right choice for measuring elapsed times

@reta
Copy link
Collaborator

reta commented Feb 3, 2022

Agree with @Bukhtawar, nanoTime is the best option in this case, the larger essay on the subject [1], highly recommend it. Thinking about loud, setting thread_pool.estimated_time_interval to -1 should disable time caching altogether, correct?

[1] https://shipilev.net/blog/2014/nanotrusting-nanotime/#_building_performance_models

@dblock
Copy link
Member

dblock commented Feb 3, 2022

No. System.currentTimeMillis is based on physical clock time that can give negative deltas if clocks go back due to NTP. System.nanoTime uses a monotonically increasing time which is the right choice for measuring elapsed times

Yes, of course. I should have read the documentation. Thx.

@aabukhalil
Copy link
Contributor

I'm looking into this.

@aabukhalil
Copy link
Contributor

aabukhalil commented May 26, 2022

Root cause:

Right now, when timeout cancellation Runnable is created in QueryPhase.java, it uses searchContext.getRelativeTimeInMillis() as startTime and to check if time has passed the timeout (multiple calls). searchContext.getRelativeTimeInMillis() is a cached time and will always be delayed from actual System.nanoTime(). Cached time lifespan is controlled by thread_pool.estimated_time_interval.

This premature false positive timeouts can happen when timeout < thread_pool.estimated_time_interval AND startTime + timeout falls in next cached time slot AND actual query execution end time falls in next cached time slot.

E.g.

time_cache_lifespan = 200ms 
cached_start_time = X, where X is multiple of time_cache_lifespan
timeout = 30ms 
max_end_time = X + 30ms
actual_time = X + 190ms (System.nanoTime() when the execution started)
actual_query_execution_time = 20ms

Cached Timeline:  |1X|------------------------------|2X|------------------------------|3X|
Events Timeline:  -----------------------------E1----E2----E3---------------------------

E1: at 1X + 190ms, timeout cancellation runnable is created with start_time = 1X (because it is cached)
E2: at 2X the cached time became 2X
E3: at 2x + 5ms, cancellation runnable is called to check if execution time exceeded timeout 
      which will evaluate  (current_cached_time - start_cached_time > timeout) and in this
      edge case 2X - 1X = X = 200ms which is > timeout so it will trigger timeout which is 
      invalid because at E3 query execution has passed only 15ms

to fix this, only startTime should use System.nanoTime() (which is only called once per query) and this will eliminate this issue but might still produce false negative timeouts (no timeout while it should) within the resolution of thread_pool.estimated_time_interval. AKA, timeouts still can be delayed from specified one up to thread_pool.estimated_time_interval which is more acceptable, I guess.

Steps to consistently reproduce:

# change thread_pool.estimated_time_interval to 10ms instead of 200ms to increase probability of false positive timeouts while executing fast queries with short query timeout

# index data
for i in {1..20}; do
    curl -XPOST localhost:9200/hyd_10000_01/_bulk?pretty --data-binary @hyd_ready_to_index_10000.json -H 'Content-Type: application/json'
done

# will result in
health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   hyd_10000_01 rxPDNe0oSImDAZgmplgr0g   1   1      99720            0     73.2mb         73.2mb


# run the test (query below typically takes 2~3ms)
for i in {1..1000}; do
    curl -s --location --request POST 'localhost:9200/hyd_10000_01/_search' --header 'Content-Type: application/json' --data-raw '{
        "timeout": "5ms",
      "query": {
        "fuzzy": {
          "tweet": {
            "value": "moh"
          }
        }
      }
    }' | jq '.took,.timed_out' | tr '\n' ',' >> output.csv
    echo >> output.csv
done

# check result stats 
cat output.csv | wc -l
1000
cat output.csv | grep -i false | wc -l
938
cat output.csv | grep -i true | wc -l
62 # false positives as all "took" is below timeout

and after using System.nanoTime() as startTime I didn't have any timeouts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working distributed framework
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants