Skip to content

Regression in 6.4: Scroll failes with large scroll_id #971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ChrisPortman opened this issue Jun 14, 2019 · 18 comments
Closed

Regression in 6.4: Scroll failes with large scroll_id #971

ChrisPortman opened this issue Jun 14, 2019 · 18 comments

Comments

@ChrisPortman
Copy link

changes to the scoll method in 6.4 submits the scroll id as part of the URL. This causes:

elasticsearch.exceptions.RequestError: RequestError(400, 'too_long_frame_exception', 'An HTTP line is larger than 4096 bytes.')

When there are a large number of shards involved creating a large scroll id.

def scroll(self, scroll_id=None, body=None, params=None):

@mwsmith2
Copy link

I struggled with this same problem today. I found that I could resolve the problem by making some minor changes to the scroll method, using POST instead of GET and stuffing the scroll_id into the body.

https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/client/__init__.py#L1300

    @query_params("scroll", "rest_total_hits_as_int", "scroll_id")
    def scroll(self, body=None, params=None):
        """
        Scroll a search request created by specifying the scroll parameter.
        `<http://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html>`_

        :arg scroll_id: The scroll ID
        :arg body: The scroll ID if not passed by URL or query parameter.
        :arg scroll: Specify how long a consistent view of the index should be
            maintained for scrolled search
        :arg rest_total_hits_as_int: This parameter is used to restore the total hits as a number
            in the response. This param is added version 6.x to handle mixed cluster queries where nodes
            are in multiple versions (7.0 and 6.latest)
        """
        _body = f'"scroll_id": {body}'
        return self.transport.perform_request(
            "POST", "/_search/scroll", params=params, body=_body
        )

@fxdgear
Copy link
Contributor

fxdgear commented Jun 21, 2019

I'm sorry you're running into issues, but unfortunately (or fortunately, because it's a config change and not a code change/new release) this error is not to do with the Elasticsearch-py client, but Elasticsearch itself.

The error you're getting is coming back directly from Elasticsearch.

For reference please see: elastic/elasticsearch#3210

In order to fix this you need to adjust some of the parameters in Elasticsearch itself.
For this particular error you need to change http.max_initial_line_length.

I hope this helps

@ChrisPortman
Copy link
Author

Respectfully, I would like to disagree. Reverting to version 6.3 of the client, with no other changes in server configuration resolved my issue.

6.3.1 of the client would include the scroll_id in the body where there was not other body provided. If there is a specific body supplied, it would resort to putting the scroll_id in the query params.

6.4 skips attempting to put the scroll_id in the body and only uses the query params.

@jtyoung
Copy link
Contributor

jtyoung commented Jun 24, 2019

I agree with @ChrisPortman. This change caught me off guard when I upgraded my cluster from 6.x to 7.x last week. Given that the scroll_id length scales with the number of shards, my smaller test cluster did not run into the issue with long scroll IDs, and I ended up with an unexpectedly broken production app.

While increasing http.max_initial_line_length will likely solve this issue for most clusters, larger clusters will require ever larger values, and the setting is not dynamic so changing it can be pretty time consuming, especially so for large clusters. Reverting back to passing scroll_id in the body instead of as a query param neatly solves this problem more completely, in my opinion.

In the meantime, I am using a forked version of this package where I switched back to passing the scroll_id in the request body, which has solved this issue for me. I've opened a PR in #973 with that change.

@gmazzola
Copy link

One additional comment. Commit 4030400, created in December 2013, had the scroll method send a POST request to /search/_scroll. This behaviour was changed to a HTTP GET in commit 619e7eb, in June 2014.

@Winterflower
Copy link

Hi everyone - thank you very much for raising this issue. I wanted to dig in into this change a bit to find out why it has been introduced into the codebase in the first place to make sure that reverting this back to POST won't introduce problems for someone else. As @gmazzola noted, back in December 2013, the scroll method would send a POST request to the endpoint and then in June 2014 this was changed to a GET method in order to enable method-based access control as you can see from the commit message that @gmazzola linked above. The use case that this commit envisioned was a situation where someone is using a reverse proxy to secure a read-only cluster. In this scenario, the reverse proxy filters out all other requests except for GET requests and thus in order to allow folks with this kind of an architecture to read data using the scroll endpoint, the method was altered to use GET instead of POST.

@ChrisPortman
Copy link
Author

ChrisPortman commented Jun 27, 2019

I need to clarify that this has issue, is not about POST vs GET. The breaking change is in two different approaches to performing the GET.

This issue is caused by a change to the client library code that occurred in version 6.4. Where the behaviour of the scroll method changed such that it would no longer send the scroll I'd in the GET body, only in the query params.

This is the 6.3 version of the method:

def scroll(self, scroll_id=None, body=None, params=None):

This is the 6.4 version:

def scroll(self, scroll_id=None, body=None, params=None):

You can see that in 6.3, it first tries to use the body to carry the scroll ID. 6.4 makes no such attempt and just puts it in the path

@ChrisPortman
Copy link
Author

ChrisPortman commented Jun 27, 2019

Specifically, it was the changes to the scroll method in this commit:
7ff0cd5?diff=unified#diff-2696ade2db2bc9506566dc21df9e00e8L1227

@gmazzola
Copy link

As discussed above, the root cause of this problem is Elasticsearch rejecting HTTP GET queries that are longer than http.max_initial_line_length -- a server-side configurable setting. Certain scroll operations can produce a large scroll_id that exceed this value.

The good news is that this http.max_initial_line_length setting is exposed via the cluster API:

>>> client=Elasticsearch()
>>> client.cluster.get_settings(include_defaults=True)['defaults']['http']['max_initial_line_length']
'4kb'
>>>

Would you consider a patch to the scroll method that did the following:

  1. Query cluster.get_settings for the current http.max_initial_line_length value. Perhaps we cache this result; I'll accept feedback there.
  2. If the scroll_id will cause us to exceed max_initial_line_length, switch to HTTP POST.
  3. Otherwise use GET.

This is obviously more complex than the current implementation, but it preserves the reverse-proxy and large-scroll use cases.

Once we achieve consensus, I'm happy to implement these changes in my PR.

@ChrisPortman
Copy link
Author

Is there any reason not to just roll the method back to the 6.3 state?

@jtyoung
Copy link
Contributor

jtyoung commented Jun 28, 2019

Yeah, the only issue here is that the method was recently switched from passing scroll_id in the request body to passing it as a query parameter. It can remain a GET, it just needs to return to passing scroll_id in the request body.

I personally don't call scroll directly. I ran into this issue using the scan helper, which I patched in #973, but I think it might make more sense to return to the original scroll functionality rather than patch scan to get around the new functionality.

@Winterflower
Copy link

As first priority, I'll check if we can go back to the 6.3 functionality where scroll_id was passed in the request body. If not, let's take a look at the other options suggested.

@skbly7
Copy link

skbly7 commented Aug 1, 2019

It seems the fix (#973) is present on master but no new release containing it?

@Winterflower
Copy link

@skbly7 correct, it's been merged to master and we're working on the release. Hopefully won't be too long!

@bbohen
Copy link

bbohen commented Aug 30, 2019

Has #973 made it's way into a release yet? Currently seeing this behavior via the scan helper in the 7.0.4 version.

@philkra
Copy link
Contributor

philkra commented Sep 2, 2019

@bbohen, yes #973 was released with the Client's 7.0.4 version.

@digizeph
Copy link

Can confirm, I had this issue for 7.0.3 and upgrading to 7.0.5 (currently most recent release on pip) resolved this issue. Thank you @ChrisPortman for creating this issue, and @jtyoung and @Winterflower for fixing and merging it!

@sethmlarson
Copy link
Contributor

I believe this is closed in both 6.x and 7.x, if I'm mistaken please let me know or reopen a new issue. Thanks all! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests