Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Can't iterate over documents with number of chunks > 1000 #6137

Open
4 tasks done
shasha79 opened this issue Mar 16, 2025 · 1 comment
Open
4 tasks done

[Bug]: Can't iterate over documents with number of chunks > 1000 #6137

shasha79 opened this issue Mar 16, 2025 · 1 comment
Labels
🐞 bug Something isn't working, pull request that fix bug.

Comments

@shasha79
Copy link

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (Language Policy).
  • Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
  • Please do not modify this template :) and fill in all the required fields.

RAGFlow workspace code commit ID

9400725

RAGFlow image version

v0.17.2 slim

Other environment information

Actual behavior

If document has > 1000 chunks, iteration over them is not possible neither in UI nor in API due to ElasticSearch limitation. This is the error received in the API while calling function Document.list_chunks:

data = {'keywords': '', 'page': 834, 'page_size': 12}                                                         │           │
│ │  keywords = ''                                                                                                     │           │
│ │      page = 834                                                                                                    │           │
│ │ page_size = 12                                                                                                     │           │
│ │       res = {                                                                                                      │           │
│ │             │   'code': 100,                                                                                       │           │
│ │             │   'data': None,                                                                                      │           │
│ │             │   'message': "BadRequestError('search_phase_execution_exception', meta=ApiResponseMeta(status="+2168 │           │
│ │             }                                                                                                      │           │
│ │      self = <ragflow_sdk.modules.document.Document object at 0x78f6064a2870> 

Exception: BadRequestError('search_phase_execution_exception', meta=ApiResponseMeta(status=400, http_version='1.1', 
headers={'X-elastic-product': 'Elasticsearch', 'content-type': 'application/vnd.elasticsearch+json;compatible-with=8', 
'content-length': '1495'}, duration=0.012050151824951172, node=NodeConfig(scheme='http', host='es01', port=9200, path_prefix='', 
headers={'user-agent': 'elasticsearch-py/8.12.1 (Python/3.10.12; elastic-transport/8.12.0)'}, connections_per_node=10, 
request_timeout=10.0, http_compress=False, verify_certs=False, ca_certs=None, client_cert=None, client_key=None, 
ssl_assert_hostname=None, ssl_assert_fingerprint=None, ssl_version=None, ssl_context=None, ssl_show_warn=True, _extras={})), 
body={'error': {'root_cause': [{'type': 'illegal_argument_exception', 'reason': 'Result window is too large, from + size must be 
less than or equal to: [10000] but was [10008]. See the scroll api for a more efficient way to request large data sets. This limit 
can be set by changing the [index.max_result_window] index level setting.'}], 'type': 'search_phase_execution_exception', 'reason': 
'all shards failed', 'phase': 'query', 'grouped': True, 'failed_shards': [{'shard': 0, 'index': 
'ragflow_index', 'node': 'node, 'reason': {'type': 'illegal_argument_exception', 
'reason': 'Result window is too large, from + size must be less than or equal to: [10000] but was [10008]. See the scroll api for a 
more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level 
setting.'}}], 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'Result window is too large, from + size must be less 
than or equal to: [10000] but was [10008]. See the scroll api for a more efficient way to request large data sets. This limit can be
set by changing the [index.max_result_window] index level setting.', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 
'Result window is too large, from + size must be less than or equal to: [10000] but was [10008]. See the scroll api for a more 
efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.'}}}, 
'status': 400})

Expected behavior

Enable iteration for unlimited number of chunks per document

Steps to reproduce

- Add long document with > 1000 chunks
- call Document.list_chunks(page=834, page_size=12)

Additional information

No response

@shasha79 shasha79 added the 🐞 bug Something isn't working, pull request that fix bug. label Mar 16, 2025
@KevinHuSh
Copy link
Collaborator

Fetching more than 10K chunks has not been supported yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working, pull request that fix bug.
Projects
None yet
Development

No branches or pull requests

2 participants