Closed
Description
Reference documentation from https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/search_operations.html tells us to use scroll_id parameter in scroll() parameters. This leads to a Elasticsearch error in case scroll_id is too large, since - in this case - it is sent in request URI, which overcomes elasticsearch limits.
Example from https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/search_operations.html is :
$client = ClientBuilder::create()->build();
$params = [
'scroll' => '30s', // how long between scroll requests. should be small!
'size' => 50, // how many results *per shard* you want back
'index' => 'my_index',
'body' => [
'query' => [
'match_all' => new \stdClass()
]
]
];
// Execute the search
// The response will contain the first batch of documents
// and a scroll_id
$response = $client->search($params);
// Now we loop until the scroll "cursors" are exhausted
while (isset($response['hits']['hits']) && count($response['hits']['hits']) > 0) {
// **
// Do your work here, on the $response['hits']['hits'] array
// **
// When done, get the new scroll_id
// You must always refresh your _scroll_id! It can change sometimes
$scroll_id = $response['_scroll_id'];
// Execute a Scroll request and repeat
$response = $client->scroll([
'scroll_id' => $scroll_id, //...using our previously obtained _scroll_id
'scroll' => '30s' // and the same timeout window
]
);
}
After viewing src/Elasticsearch/Endpoints/Scroll.php, we can see that :
- there is a warning about using scroll_id main parameter, that we didn't see (we are using kubernetes, maybe is it misconfigured?)
- after a little reading inside, we guessed we could use the "body" parameter which would then end up with a sane URI using POST method and a body containing the needed scroll_id. It worked.
Correct form of scroll() with current version to avoid URI overflow :
$response = $client->scroll([
'body' => [
'scroll_id' => $scroll_id,
'scroll' => '30s'
]
]);
The same goes for deletion, the only way to delete a large scroll_id is :
$client->clearScroll([
'body' => [
'scroll_id' => $scroll_id
]
]);
We are using elasticsearch-php 7.4.1
Regards