Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPO] harden error handling for single-document issues #1582

Open
2 tasks
seanstory opened this issue Sep 5, 2023 · 3 comments
Open
2 tasks

[SPO] harden error handling for single-document issues #1582

seanstory opened this issue Sep 5, 2023 · 3 comments

Comments

@seanstory
Copy link
Member

Problem Description

SPO connector will fail if a single request errors.

Screenshot 2023-09-05 at 12 26 09 PM

We should:

  • make sure that this specific error does not occur
  • make the whole connector more flexible and resilient to these types of errors
@seanstory
Copy link
Member Author

seanstory commented Sep 15, 2023

Example of another error we should be able to retry or move past:

Received 400 response from https://graph.microsoft.com/v1.0/sites/<site_id>/lists/<list_id>/items?$select=createdDateTime,id,lastModifiedDateTime,weburl,createdBy,lastModifiedBy,contentType&$expand=fields($select=Title,Link,Attachments,LinkTitle,LinkFilename,Description,Conversation)
full stack trace
[FMWK][12:44:01][WARNING] [Sync Job id: DjoWlIoB26kkwmCbnNr-, connector id: DToVlIoB26kkwmCb0dpW, index name: search-retail] Received 400 response from https://graph.microsoft.com/v1.0/sites/<site_id>/lists/<list_id>/items?$select=createdDateTime,id,lastModifiedDateTime,weburl,createdBy,lastModifiedBy,contentType&$expand=fields($select=Title,Link,Attachments,LinkTitle,LinkFilename,Description,Conversation)
[FMWK][12:44:01][CRITICAL] [Sync Job id: DjoWlIoB26kkwmCbnNr-, connector id: DToVlIoB26kkwmCb0dpW, index name: search-retail] The document fetcher failed
Traceback (most recent call last):
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 402, in _get
    async with self._http_session.get(
  File "/path/to/connectors-python/lib/python3.10/site-packages/aiohttp/client.py", line 1141, in __aenter__
    self._resp = await self._coro
  File "/path/to/connectors-python/lib/python3.10/site-packages/aiohttp/client.py", line 643, in _request
    resp.raise_for_status()
  File "/path/to/connectors-python/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url=URL('https://graph.microsoft.com/v1.0/sites/<site_id>/lists/<list_id>/items?$select=createdDateTime,id,lastModifiedDateTime,weburl,createdBy,lastModifiedBy,contentType&$expand=fields($select=Title,Link,Attachments,LinkTitle,LinkFilename,Description,Conversation)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/path/to/connectors-python/connectors/es/sink.py", line 387, in get_docs
    async for count, doc in aenumerate(generator):
  File "/path/to/connectors-python/connectors/utils.py", line 689, in aenumerate
    async for elem in asequence:
  File "/path/to/connectors-python/connectors/logger.py", line 134, in __anext__
    return await self.gen.__anext__()
  File "/path/to/connectors-python/connectors/es/sink.py", line 360, in _decorate_with_metrics_span
    async for doc in generator:
  File "/path/to/connectors-python/connectors/sync_job_runner.py", line 310, in prepare_docs
    async for doc, lazy_download, operation in self.generator():
  File "/path/to/connectors-python/connectors/sync_job_runner.py", line 342, in generator
    async for doc, lazy_download in self.data_provider.get_docs(
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 1547, in get_docs
    async for list_item, download_func in self.site_list_items(
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 1777, in site_list_items
    async for list_item in self.client.site_list_items(site_id, site_list_id):
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 745, in site_list_items
    async for page in self._graph_api_client.scroll(
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 347, in scroll
    graph_data = await self._get_json(scroll_url)
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 371, in _get_json
    async with self._get(absolute_url) as resp:
  File "/Users/gustavollermalylarrain/miniforge3/envs/connector/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 295, in wrapped
    async for item in func(*args, **kwargs):
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 413, in _get
    await self._handle_client_response_error(absolute_url, e)
  File "/path/to/connectors-python/connectors/sources/sharepoint_online.py", line 443, in _handle_client_response_error
    raise BadRequestError from e
connectors.sources.sharepoint_online.BadRequestError

Why should we not consider this a real 400? Because it's really not. SPO lies.

Screenshot 2023-09-14 at 4 56 43 PM

@seanstory
Copy link
Member Author

There's a draft PR here: #1584, but it's no where near ready, and there are other priorities right now. I'm going to un-assign myself, and remove it from the current sprint until this can be prioritized.

The urgent piece has been fixed.

@seanstory seanstory removed their assignment Sep 19, 2023
@seanstory seanstory removed this from the 2023-08-29 - 2023-09-11 milestone Sep 19, 2023
@seanstory
Copy link
Member Author

another single-document failure issue: https://github.com/elastic/enterprise-search-team/issues/7044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants