[Bug]: Can't iterate over documents with number of chunks > 1000 #6137

shasha79 · 2025-03-16T09:34:33Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (Language Policy).
Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

9400725

RAGFlow image version

v0.17.2 slim

Other environment information

Actual behavior

If document has > 1000 chunks, iteration over them is not possible neither in UI nor in API due to ElasticSearch limitation. This is the error received in the API while calling function Document.list_chunks:

data = {'keywords': '', 'page': 834, 'page_size': 12}                                                         │           │
│ │  keywords = ''                                                                                                     │           │
│ │      page = 834                                                                                                    │           │
│ │ page_size = 12                                                                                                     │           │
│ │       res = {                                                                                                      │           │
│ │             │   'code': 100,                                                                                       │           │
│ │             │   'data': None,                                                                                      │           │
│ │             │   'message': "BadRequestError('search_phase_execution_exception', meta=ApiResponseMeta(status="+2168 │           │
│ │             }                                                                                                      │           │
│ │      self = <ragflow_sdk.modules.document.Document object at 0x78f6064a2870> 

Exception: BadRequestError('search_phase_execution_exception', meta=ApiResponseMeta(status=400, http_version='1.1', 
headers={'X-elastic-product': 'Elasticsearch', 'content-type': 'application/vnd.elasticsearch+json;compatible-with=8', 
'content-length': '1495'}, duration=0.012050151824951172, node=NodeConfig(scheme='http', host='es01', port=9200, path_prefix='', 
headers={'user-agent': 'elasticsearch-py/8.12.1 (Python/3.10.12; elastic-transport/8.12.0)'}, connections_per_node=10, 
request_timeout=10.0, http_compress=False, verify_certs=False, ca_certs=None, client_cert=None, client_key=None, 
ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={})), 
body={'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Result window is too large, from + size must be 
less than or equal to: [10000] but was [10008]. See the scroll api for a more efficient way to request large data sets. This limit 
can be set by changing the [index.max_result_window] index level setting.'}], 'type': 'search_phase_execution_exception', 'reason': 
'all shards failed', 'phase': 'query', 'grouped': True, 'failed_shards': [{'shard': 0, 'index': 
'ragflow_index', 'node': 'node, 'reason': {'type': 'illegal_argument_exception', 
'reason': 'Result window is too large, from + size must be less than or equal to: [10000] but was [10008]. See the scroll api for a 
more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level 
setting.'}}], 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'Result window is too large, from + size must be less 
than or equal to: [10000] but was [10008]. See the scroll api for a more efficient way to request large data sets. This limit can be
set by changing the [index.max_result_window] index level setting.', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 
'Result window is too large, from + size must be less than or equal to: [10000] but was [10008]. See the scroll api for a more 
efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.'}}}, 
'status': 400})

Expected behavior

Enable iteration for unlimited number of chunks per document

Steps to reproduce

- Add long document with > 1000 chunks
- call Document.list_chunks(page=834, page_size=12)

Additional information

No response

The text was updated successfully, but these errors were encountered:

KevinHuSh · 2025-03-17T03:54:37Z

Fetching more than 10K chunks has not been supported yet.

shasha79 added the 🐞 bug Something isn't working, pull request that fix bug. label Mar 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Can't iterate over documents with number of chunks > 1000 #6137

[Bug]: Can't iterate over documents with number of chunks > 1000 #6137

shasha79 commented Mar 16, 2025

KevinHuSh commented Mar 17, 2025

[Bug]: Can't iterate over documents with number of chunks > 1000 #6137

[Bug]: Can't iterate over documents with number of chunks > 1000 #6137

Comments

shasha79 commented Mar 16, 2025

Self Checks

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

KevinHuSh commented Mar 17, 2025