-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Regression in 6.4: Scroll failes with large scroll_id #971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I struggled with this same problem today. I found that I could resolve the problem by making some minor changes to the scroll method, using POST instead of GET and stuffing the
|
I'm sorry you're running into issues, but unfortunately (or fortunately, because it's a config change and not a code change/new release) this error is not to do with the Elasticsearch-py client, but Elasticsearch itself. The error you're getting is coming back directly from Elasticsearch. For reference please see: elastic/elasticsearch#3210 In order to fix this you need to adjust some of the parameters in Elasticsearch itself. I hope this helps |
Respectfully, I would like to disagree. Reverting to version 6.3 of the client, with no other changes in server configuration resolved my issue. 6.3.1 of the client would include the scroll_id in the body where there was not other body provided. If there is a specific body supplied, it would resort to putting the scroll_id in the query params. 6.4 skips attempting to put the scroll_id in the body and only uses the query params. |
I agree with @ChrisPortman. This change caught me off guard when I upgraded my cluster from 6.x to 7.x last week. Given that the scroll_id length scales with the number of shards, my smaller test cluster did not run into the issue with long scroll IDs, and I ended up with an unexpectedly broken production app. While increasing In the meantime, I am using a forked version of this package where I switched back to passing the scroll_id in the request body, which has solved this issue for me. I've opened a PR in #973 with that change. |
Hi everyone - thank you very much for raising this issue. I wanted to dig in into this change a bit to find out why it has been introduced into the codebase in the first place to make sure that reverting this back to POST won't introduce problems for someone else. As @gmazzola noted, back in December 2013, the |
I need to clarify that this has issue, is not about POST vs GET. The breaking change is in two different approaches to performing the GET. This issue is caused by a change to the client library code that occurred in version 6.4. Where the behaviour of the scroll method changed such that it would no longer send the scroll I'd in the GET body, only in the query params. This is the 6.3 version of the method:
This is the 6.4 version: elasticsearch-py/elasticsearch/client/__init__.py Line 1341 in 99effab
You can see that in 6.3, it first tries to use the body to carry the scroll ID. 6.4 makes no such attempt and just puts it in the path |
Specifically, it was the changes to the scroll method in this commit: |
As discussed above, the root cause of this problem is Elasticsearch rejecting HTTP GET queries that are longer than The good news is that this
Would you consider a patch to the
This is obviously more complex than the current implementation, but it preserves the reverse-proxy and large-scroll use cases. Once we achieve consensus, I'm happy to implement these changes in my PR. |
Is there any reason not to just roll the method back to the 6.3 state? |
Yeah, the only issue here is that the method was recently switched from passing scroll_id in the request body to passing it as a query parameter. It can remain a GET, it just needs to return to passing scroll_id in the request body. I personally don't call |
As first priority, I'll check if we can go back to the 6.3 functionality where scroll_id was passed in the request body. If not, let's take a look at the other options suggested. |
It seems the fix (#973) is present on master but no new release containing it? |
@skbly7 correct, it's been merged to master and we're working on the release. Hopefully won't be too long! |
Has #973 made it's way into a release yet? Currently seeing this behavior via the |
Can confirm, I had this issue for |
I believe this is closed in both 6.x and 7.x, if I'm mistaken please let me know or reopen a new issue. Thanks all! :) |
changes to the scoll method in 6.4 submits the scroll id as part of the URL. This causes:
elasticsearch.exceptions.RequestError: RequestError(400, 'too_long_frame_exception', 'An HTTP line is larger than 4096 bytes.')
When there are a large number of shards involved creating a large scroll id.
elasticsearch-py/elasticsearch/client/__init__.py
Line 1341 in 99effab
The text was updated successfully, but these errors were encountered: