Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sharepoint Online] Invalid JSON will cause unactionable "generator didn't stop after athrow" errors #2309

Open
artem-shelkovnikov opened this issue Mar 29, 2024 · 10 comments
Labels
bug Something isn't working effort:low priority:high

Comments

@artem-shelkovnikov
Copy link
Member

artem-shelkovnikov commented Mar 29, 2024

Bug Description

An error was reported with the following stacktrace:

generator didn't stop after athrow()
File \"/app/connectors/es/sink.py\", line 365, in run
    await self.get_docs(generator)
  File \"/app/connectors/es/sink.py\", line 415, in get_docs
    async for count, doc in aenumerate(generator):
  File \"/app/connectors/utils.py\", line 800, in aenumerate
    async for elem in asequence:
  File \"/app/connectors/logger.py\", line 176, in __anext__
    return await self.gen.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/connectors/es/sink.py\", line 389, in _decorate_with_metrics_span    async for doc in generator:
  File \"/app/connectors/sync_job_runner.py\", line 309, in prepare_docs
    async for doc, lazy_download, operation in self.generator():
  File \"/app/connectors/sync_job_runner.py\", line 341, in generator
    async for doc, lazy_download in self.data_provider.get_docs(
  File \"/app/connectors/sources/sharepoint_online.py\", line 1660, in get_docs
    async for list_item, download_func in self.site_list_items(
  File \"/app/connectors/sources/sharepoint_online.py\", line 2009, in site_list_items
    async for list_item_attachment in self.client.site_list_item_attachments(
  File \"/app/connectors/sources/sharepoint_online.py\", line 909, in site_list_item_attachments
    list_item = await self._rest_api_client.fetch(url)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/connectors/sources/sharepoint_online.py\", line 359, in fetch
    return await self._get_json(url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/app/connectors/sources/sharepoint_online.py\", line 399, in _get_json
    async with self._get(absolute_url) as resp:
  File \"/usr/local/lib/python3.11/contextlib.py\", line 257, in __aexit__
    raise RuntimeError(\"generator didn't stop after athrow()\")

Upon investigation it became clear, that this problem happens if resource returns invalid json on GET request, example test (assert is incorrect here, but this test will reproduce the error):

    @pytest.mark.asyncio
    async def test_fetch_with_retrying_multiple_times(
        self, microsoft_api_session, mock_responses, patch_sleep
    ):
        url = "http://localhost:1234/url"
        payload = '[{"test": "hello world"}, { "test": "hello another world"}'

        first_request_error = Exception("lala")

        # First error out, then on request to same resource return good payload
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)
        mock_responses.get(url, body=payload)

        response = await microsoft_api_session.fetch(url)

        assert response == payload

To Reproduce

Steps to reproduce the behavior:

  1. Find a way to make Sharepoint Server endpoint return XML instead of JSON (not sure how yet, can be reproduced with test)
  2. Run a sync
  3. Sync will end with RuntimeError with message "generator didn't stop after athrow"

Expected behavior

Error is clearly communicated and printed to logs. Sync might continue, if it's non-critical error

@artem-shelkovnikov artem-shelkovnikov changed the title [Sharepoint Online] Retry code can raise errors that terminate sync without clear reason [Sharepoint Online] Invalid JSON will cause unactionable "generator didn't stop after athrow" errors Mar 29, 2024
@bean710
Copy link
Contributor

bean710 commented Jun 20, 2024

We are seeing this too and it is preventing our connector from completing a full sync. The issue I see looks very similar to yours, but it does not stem from getting XML back. It looks like there is a payload error caused by a connection reset. Due to the long-running nature of our syncs, I think these happens somewhat frequently:
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')

Then, it looks like the retry wrapper is what's causing that generator didn't stop after athrow error.

It seems like a ConnectionResetError would just get handled by the retry wrapper the same as all of the other retries:

def retryable_aiohttp_call(retries):
# TODO: improve utils.retryable to allow custom logic
# that can help choose what to retry
def wrapper(func):
@wraps(func)
async def wrapped(*args, **kwargs):
retry = 1
while retry <= retries:
try:
async for item in func(*args, **kwargs, retry_count=retry):
yield item
break
except (NotFound, BadRequestError):
raise
except Exception:
if retry >= retries:
raise
retry += 1
return wrapped
return wrapper

Unfortunately, I'm not familiar enough with how this connector handles async retries or how Python generators in general work.

@artem-shelkovnikov
Copy link
Member Author

@bean710 are you running connectors on-prem?

Do you have a stack trace?

Unfortunately, we don't have a Sharepoint Online instance that raises such errors, so we cannot reliably test fixes. It can help immensely if we test the bug fix together with you.

@bean710
Copy link
Contributor

bean710 commented Jun 21, 2024

@artem-shelkovnikov Yes, we are running the connector client on-prem and I have access to full logs with some additional logging I've added.

I'll add the relevant stack trace below.

I'd be happy to work with you in any way possible to get this resolved :)

We've run into a few bugs, but I've "resolved" most of them and this is the one which is most consistently preventing our connector from completing a full sync.

�[31;1m[FMWK][08:03:43][CRITICAL] [Connector id: JVur4Y4BM4YOrFqvPEzu, index name: search-sharepoint-debugging, Sync job id: Fn_mMJABM4YOrFqvddr2] Document extractor failed�[0m
Traceback (most recent call last):
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
    uncompleted = self._parser.feed_eof()
  File "aiohttp/_http_parser.pyx", line 507, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 325, in wrapped
    yield item
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 401, in _get_json
    return await resp.json()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1171, in json
    await self.read()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1111, in read
    self._body = await self.content.read()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 383, in read
    block = await self.readany()
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 405, in readany
    await self._wait("readany")
  File "/home/ec2-user/connector-2/connector-main/lib/python3.10/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 487, in run
    await self.get_docs(generator)
  File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 539, in get_docs
    async for count, doc in aenumerate(generator):
  File "/home/ec2-user/connector-2/connector-main/connectors/utils.py", line 856, in aenumerate
    async for elem in asequence:
  File "/home/ec2-user/connector-2/connector-main/connectors/logger.py", line 247, in __anext__
    return await self.gen.__anext__()
  File "/home/ec2-user/connector-2/connector-main/connectors/es/sink.py", line 521, in _decorate_with_metrics_span
    async for doc in generator:
  File "/home/ec2-user/connector-2/connector-main/connectors/sync_job_runner.py", line 458, in prepare_docs
    async for doc, lazy_download, operation in self.generator():
  File "/home/ec2-user/connector-2/connector-main/connectors/sync_job_runner.py", line 494, in generator
    async for doc, lazy_download in self.data_provider.get_docs(
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 1678, in get_docs
    async for list_item, download_func in self.site_list_items(
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 2027, in site_list_items
    async for list_item_attachment in self.client.site_list_item_attachments(
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 924, in site_list_item_attachments
    list_item = await self._rest_api_client.fetch(url)
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 360, in fetch
    return await self._get_json(url)
  File "/home/ec2-user/connector-2/connector-main/connectors/sources/sharepoint_online.py", line 400, in _get_json
    async with self._get(absolute_url) as resp:
  File "/usr/local/lib/python3.10/contextlib.py", line 249, in __aexit__
    raise RuntimeError("generator didn't stop after athrow()")
RuntimeError: generator didn't stop after athrow()

@bean710
Copy link
Contributor

bean710 commented Jul 8, 2024

Hi @artem-shelkovnikov, is there anything I can get to you to help resolve this? This is currently a blocker for us since although the issue is rare, our syncs are long-running which means we run into this regularly and are unable to complete a full sync.

@artem-shelkovnikov
Copy link
Member Author

Hey @bean710,

Sorry I did not get back to you - I've added a simple error ignore in the branch artem/try-fix-spo-attachment-download - can you cherry pick the change to your used version of connectors and try it out?

There's also some work in the meantime going on in this PR to add component to allow connectors ignore transient errors like this one. So let's test first if ignoring this specific error will help, if so I'll make a proper fix and merge.

@bean710
Copy link
Contributor

bean710 commented Jul 11, 2024

Hi @artem-shelkovnikov , it looks like I'm still getting that error related to the Content-Length header, causing the generator error. Here's a full stacktrace:

�[31;1m[FMWK][08:43:32][CRITICAL] [Connector id: JVur4Y4BM4YOrFqvPEzu, index name: search-sharepoint-debugging, Sync job id: oaAYnZABRdqpVZ6xIUX6] Document extractor failed�[0m
Traceback (most recent call last):
  File "/home/ec2-user/connector-git/lib/python3.10/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
    uncompleted = self._parser.feed_eof()
  File "aiohttp/_http_parser.pyx", line 507, in aiohttp._http_parser.HttpParser.feed_eof
aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ec2-user/connector-git/connectors/sources/sharepoint_online.py", line 326, in wrapped
    yield item
  File "/home/ec2-user/connector-git/connectors/sources/sharepoint_online.py", line 402, in _get_json
    return await resp.json()
  File "/home/ec2-user/connector-git/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1171, in json
    await self.read()
  File "/home/ec2-user/connector-git/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 1111, in read
    self._body = await self.content.read()
  File "/home/ec2-user/connector-git/lib/python3.10/site-packages/aiohttp/streams.py", line 383, in read
    block = await self.readany()
  File "/home/ec2-user/connector-git/lib/python3.10/site-packages/aiohttp/streams.py", line 405, in readany
    await self._wait("readany")
  File "/home/ec2-user/connector-git/lib/python3.10/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>. ConnectionResetError(104, 'Connection reset by peer')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/connector-git/connectors/es/sink.py", line 505, in run
    await self.get_docs(generator, skip_unchanged_documents=True)
  File "/home/ec2-user/connector-git/connectors/es/sink.py", line 557, in get_docs
    async for count, doc in aenumerate(generator):
  File "/home/ec2-user/connector-git/connectors/utils.py", line 856, in aenumerate
    async for elem in asequence:
  File "/home/ec2-user/connector-git/connectors/logger.py", line 247, in __anext__
    return await self.gen.__anext__()
  File "/home/ec2-user/connector-git/connectors/es/sink.py", line 539, in _decorate_with_metrics_span
    async for doc in generator:
  File "/home/ec2-user/connector-git/connectors/sync_job_runner.py", line 457, in prepare_docs
    async for doc, lazy_download, operation in self.generator():
  File "/home/ec2-user/connector-git/connectors/sync_job_runner.py", line 493, in generator
    async for doc, lazy_download in self.data_provider.get_docs(
  File "/home/ec2-user/connector-git/connectors/sources/sharepoint_online.py", line 1679, in get_docs
    async for list_item, download_func in self.site_list_items(
  File "/home/ec2-user/connector-git/connectors/sources/sharepoint_online.py", line 2028, in site_list_items
    async for list_item_attachment in self.client.site_list_item_attachments(
  File "/home/ec2-user/connector-git/connectors/sources/sharepoint_online.py", line 924, in site_list_item_attachments
    list_item = await self._rest_api_client.fetch(url)
  File "/home/ec2-user/connector-git/connectors/sources/sharepoint_online.py", line 361, in fetch
    return await self._get_json(url)
  File "/home/ec2-user/connector-git/connectors/sources/sharepoint_online.py", line 401, in _get_json
    async with self._get(absolute_url) as resp:
  File "/usr/local/lib/python3.10/contextlib.py", line 249, in __aexit__
    raise RuntimeError("generator didn't stop after athrow()")
RuntimeError: generator didn't stop after athrow()

@artem-shelkovnikov
Copy link
Member Author

Hi @bean710,

I've pushed another fix to the same branch artem/try-fix-spo-attachment-download - can you try again?

If it does not work, is it possible to share the response that you get from Sharepoint - specifically status code + body?

@bean710
Copy link
Contributor

bean710 commented Jul 18, 2024

Hi @artem-shelkovnikov

We ran another full sync and are still seeing the same issue. I'm not seeing any response body, but the status code is 200.

@artem-shelkovnikov
Copy link
Member Author

@bean710 is the stack trace the same? Can you share debug logs with all sensitive information wiped if you have any?

@jedrazb
Copy link
Member

jedrazb commented Aug 23, 2024

Linking relevant issue from aiohttp aio-libs/aiohttp#4581 - maybe this will give us more insight into handling this edge case better for SPO. People also report that the connection being broken from the server side causes this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working effort:low priority:high
Projects
None yet
Development

No branches or pull requests

3 participants