Non-ParseError exceptions during update_feeds() prevent updates for the remaining feeds #218

lemon24 · 2021-02-18T14:07:17Z

Currently, storage or plugin-raised exceptions prevent updates for the following feeds, but that's not necessarily by design.

We should look at this when we expose the plugin API (#80).

Related to #218 / #96.

lemon24 · 2023-06-15T20:42:43Z

...like here: #310 (comment) (the Twitter plugin).

lemon24 · 2023-06-17T09:31:17Z

The main goal here is to allow updates to continue after a feed has failed.

Things to take into consideration:

We don't really need to wrap hook (including Session hooks) and retriever/(sub)parser calls one by one, we may get away with doing it somewhere at "top level", e.g. here in Pipeline.update().
- But, we should be consistent / uniform.
OTOH, when should unexpected exceptions (bugs) not allow updates to other feeds to continue?
1. reader "infrastructure" code (Parser, updater) should raise immediately; this is a reader bug, and Should Never Happen™.
  - How about StorageErrors (e.g. when the updater stores the feed)?
2. reader built-in retrievers/parsers should also probably raise immediately, but it may be acceptable to continue.
3. User-/plugin-supplied retrievers/parsers should continue (we've seen at least one bug here, in the Twitter plugin).
4. Update hooks should continue (we've seen a bug here too, IIRC).
  - What about {before,after}_feeds_update_hooks, which go "around" the update loop? We don't need to wrap them, but we may want to, for consistency.
  - What if an after_feeds_update_hooks depends on an after_feed_update_hooks having run successfully? Arguably, it shouldn't (we might want to mention this somewhere in the docs).
  - What about a Session hook, when called by a built-in retriever? It should continue.
    - Update: These already raise ParseError (when called by the HTTP retriever).
Update: How granular should we be with allowing things to continue?
- What do we return after_{entry,feed}_update_hooks fails?
  - In theory, the feed was updated at this point, so we have an UpdatedFeed we can return; we could return the exception "on the side" somehow (should have UpdateResult and UpdatedFeed been a single struct from the beginning?).
  - OTOH, hooks "should not fail" – if the user said a plugin should run, the plugin failing should be an error.
  - Of course, we can go as granular as it is useful – but how granular is that? (See the next question too.)
- What happens if after_entry_update_hooks fails? Do we try to run it for the rest of the entries?

For #218.

(update hooks, session hooks, retriever/parser unexpected exceptions). For #218.

lemon24 · 2023-08-06T11:26:23Z

Some thoughts on update hook exception handling (also see the "How granular ..." question above):

Hooks "should not fail". Treating them otherwise opens up a lot of other decisions that need to be made, that potentially makes the end user API too complicated.
Currently, a hook failure for a feed raises immediately – that is, it prevents all other hooks from running for that feed.
Hooks that want to be resilient can follow the "tag before, clear tag after" pattern mentioned here: Handle redirects and gone feeds gracefully #246 (comment). This allows completing the work the plugin was supposed to do on a subsequent run.
- This might prove to not be enough / make writing plugins too complicated. We can add more granularity later, if needed, but at the expense of API churn for plugins.
Nevertheless, failures happen; ideally, it should be hard for an updated feed to remain unprocessed because of a failure; moreso, a hook that works should not have to pay the price for one at the same stage that doesn't.
We can run all the hooks and raise an ExceptionGroup (Python 3.11) for the hooks that failed.
- But, if we do this for all stages, we or the plugins might need to coordinate between hooks belonging to the same plugin, e.g. before_feed_... and a corresponding after_feed_....
- For stages before the feed is updated (before_feed_...), does it make sense to continue with the feed update? If we fail early here, it simplifies the mental model quite a bit.
- For after_entry_..., raising an ExceptionGroup isn't all that useful (and we might end up with thousands of exceptions).

Somewhat related: We continue with at least the granularity of one feed mostly because update_feeds_iter() is a thin wrapper over for feed in get_feeds(): update_feed(feed) (but with exception handling; we should document this).

lemon24 · 2023-08-06T13:57:30Z

OK, here's how update hooks get called (pseudocode; might want to document it when we're sure the order stays):

for hook in before_feeds_update_hooks:
    hook(reader)

for feed in get_feeds():
    try:
        parsed_feed = parse(feed)
    except ParseError as e:
        parsed_feed = e

    intents = make_intents(feed, parsed_feed)

    for hook in before_feed_update_hooks:
        hook(reader, feed.url)

    store(intents)

    for entry in intents.entries:
        for hook in after_entry_update_hooks:
            hook(reader, entry.entry, entry.status)

    for hook in after_feed_update_hooks:
        hook(reader, feed.url)

    if not parsed_feed or isinstance(parsed_feed, Exception):
        yield feed.url, parsed_feed
    else:
        yield feed.url, make_updated_feed(intents)
    
for hook in after_feeds_update_hooks:
    hook(reader)

Following is a proposal that takes into account the notes from previous comments.

Goals:

Wrap hook failures so they don't prevent updates for other feeds during update_feeds_iter().
Wrap hook failures so it is clear where the error came from (which phase, optionally which hook).
After data has been written to storage, do our best to allow hooks to run (if one fails, the others should still run).

Hooks wrap unexpected exceptions as follows (new exceptions detailed below):

before_feeds_update_hooks -> raise SingleUpdateHookError
- fail early at the first hook failure
- not backwards compatible; however, exceptions are currently raised directly, which is arguably a bug
before_feed_update_hooks -> yield url, SingleUpdateHookError
- fail early at the first hook failure
- backwards compatible, we can already yield ReaderError
after_entry_update_hooks -> yield url, UpdateHookErrorGroup
- after also running after_feed_update_hooks
- if there are concerns about keeping to many exceptions around (max is entries * entry hooks), we can keep the first 5 or something, and log the rest; there is precedent for this
- backwards compatible, we can already yield ReaderError
after_feed_update_hooks -> yield url, UpdateHookErrorGroup
- including errors from after_entry_update_hooks
- shadows UpdatedFeed, acceptable (for now)
- backwards compatible, we can already yield ReaderError
after_feeds_update_hooks -> raise UpdateHookErrorGroup
- not backwards compatible, same as before_feeds_update_hooks

We add the following new exceptions:

ReaderError (old)
 +--- UpdateError
       +--- UpdateHookError
       |     +--- SingleUpdateHookError
       |     +--- UpdateHookErrorGroup [ExceptionGroup]
       +--- ParseError (old)

UpdateHookError will not be raised directly; it exists and has the shorter name because most people won't care about the individual errors (more than to log them), and will do except UpdateHookError; interested people can do except* SingleUpdateHookError to look at individual errors. SingleUpdateHookError cannot just be UpdateHookError because it will have attributes about the hook and stage that UpdateHookErrorGroup cannot not have.

For Python 3.10, which doesn't have ExceptionGroup, we use a shim like this one:

import traceback

class _ExceptionGroup(Exception):
    """ExceptionGroup shim for Python 3.10.

    Caveat: The tracebacks always show in str(exc).
    
    Avoids dependency on https://pypi.org/project/exceptiongroup/

    """
    def __init__(self, msg, excs):
        super().__init__(msg, tuple(excs))
        
    @property
    def message(self):
        return self.args[0]
    
    @property
    def exceptions(self):
        return self.args[1]
            
    def _format_lines(self):
        count = len(self.exceptions)
        s = 's' if count != 1 else ''
        yield f"{self.message} ({count} sub-exception{s})\n"
        for i, exc in enumerate(self.exceptions, 1):
            yield f"+{f' {i} '.center(36, '-')}\n"
            for line in traceback.format_exception(exc):
                for line in line.rstrip().splitlines():
                    yield f"| {line}\n"
        yield f"+{'-' * 36}\n"
            
    def __str__(self):
        return ''.join(self._format_lines()).rstrip()

try:
    ExceptionGroup
except NameError:
    ExceptionGroup = _ExceptionGroup
    
# example

try:
    1/0
except Exception as e:
    one = e

raise _ExceptionGroup('stuff happened', [
    NameError('name'), 
    _ExceptionGroup('more stuff happened', [
        one, 
        AttributeError('attr'),
    ]),
])

"""
Traceback (most recent call last):
  File ".../exc.py", line 47, in <module>
    raise _ExceptionGroup('stuff happened', [
__main__._ExceptionGroup: stuff happened (2 sub-exceptions)
+---------------- 1 -----------------
| NameError: name
+---------------- 2 -----------------
| _ExceptionGroup: more stuff happened (2 sub-exceptions)
| +---------------- 1 -----------------
| | Traceback (most recent call last):
| |   File ".../exc.py", line 43, in <module>
| |     1/0
| | ZeroDivisionError: division by zero
| +---------------- 2 -----------------
| | AttributeError: attr
| +------------------------------------
+------------------------------------
"""

~~Open~~ Closed questions:

Is it OK for after_feed_update_hooks errors to shadow UpdatedFeed?
- Yes; we got the side effects that we expected, but we still signal errors.
Is it OK for after_feed_update_hooks errors to shadow a ParseError?
- No; arguably, after_*_update_hooks should not even run if there was an error getting the feed. ACTION ITEM
- If this isn't enough, we can add later an UpdateErrorGroup to include both the parse and hook errors.
- before_feed_update_hooks may run if we move them to before getting the feed, or if they become able to transform the feed update intent before it is stored (a feed update intent exists even if there was a ParseError, to set last_exception).
Should the feed or entry hooks set Feed.last_exception?
- Not really, per Don't know when a feed is broken #68 last_exception is about the feed being broken, not a plugin (different audiences).

…rorGroup. For #218.

For #218.

lemon24 · 2023-08-13T12:34:16Z

For #218.

…218

lemon24 · 2023-08-21T21:52:05Z

OK, moving on to unexpected retriever/parser unexpected errors: per #218 (comment), they should not prevent updates for remaining feeds; wrapping seems the way to go.

(1) A possible maximal exception hierarchy:

ReaderError (old)
 +--- UpdateError (old)
       +--- ParseError [FeedError, ReaderWarning] (old)
       |     +--- UnexpectedParseError
       +--- RetrieveError [ParseError (temporary)]
             +--- UnexpectedRetrieverError

This has the benefit of distinguishing clearly between the errors caused by a retriever and by those caused by a (sub-)parser, while not making the hierarchy to deep.

Some disadvantages:

RetrieveError must remain a ParseError subclass until 4.0 to maintain backwards compatibility
exceptions raised in reader._parser don't have a specific common parent
there is no exception for non-retriever, non-sub-parser errors (e.g. raised in the meta-Parser ... but we don't have case like this now)

(2) Alternative that doesn't have them, at the expense of deeper hierarchy and slightly confusing naming (ParserError):

ReaderError (old)
 +--- UpdateError (old)
       +--- ParseError [FeedError, ReaderWarning] (old)
             +--- RetrieverError
             |     +--- UnexpectedRetrieverError
             +--- ParserError
                   +--- UnexpectedParserError

The minimal solution is to either (3) add no new classes, or (4) add a single UnexpectedParseError subclass.

An additional complication is that currently, Feed.last_exception contains details about the cause of the ParseError, if any, not the ParseError itself. If ParseError gets subclasses, it becomes imperative to also have details about the exception itself (type, message); arguably, we should have done this from the start (it would also allow capturing UpdateHookError details, if we ever decide to).

Update: I went with solution 3. ccf8d26

…se__. #218

lemon24 · 2023-08-24T16:37:16Z

Session hook error already get wrapped in ParseError, and we have a test for it; good enough, for now.

lemon24 added the core label Feb 18, 2021

lemon24 added a commit that referenced this issue Sep 10, 2022

Test unexpected exceptions in update hooks aren't suppressed.

f81aaee

Related to #218 / #96.

lemon24 mentioned this issue Jun 17, 2023

Handle redirects and gone feeds gracefully #246

Open

lemon24 added a commit that referenced this issue Aug 3, 2023

Rename fixture call_update_method to update_feed.

05aefc1

For #218.

lemon24 added a commit that referenced this issue Aug 3, 2023

Add (failing) tests for non-ParseErrors raised during update_feeds()

669fb03

(update hooks, session hooks, retriever/parser unexpected exceptions). For #218.

lemon24 added a commit that referenced this issue Aug 12, 2023

Add UpdateError, UpdateHookError, SingleUpdateHookError, UpdateHookEr…

1e52e3c

…rorGroup. For #218.

lemon24 added a commit that referenced this issue Aug 12, 2023

Fix mypy complaining about ExceptionGroup used-before-def on 3.10.

524b961

For #218.

lemon24 added a commit that referenced this issue Aug 12, 2023

Fix mypy complaining about ExceptionGroup used-before-def on 3.10.

ffb891a

For #218.

lemon24 added a commit that referenced this issue Aug 13, 2023

Wrap unexpected update hook errors in UpdateHookError (#218).

711127c

lemon24 added a commit that referenced this issue Aug 13, 2023

Fix _ExceptionGroup.__init__() bug (#218).

15bdf10

lemon24 added a commit that referenced this issue Aug 13, 2023

Narrow down UpdateResult.value from ReaderError to UpdateError. (#218)

da8f78f

lemon24 added a commit that referenced this issue Aug 13, 2023

Narrow down UpdateResult.value from ReaderError to UpdateError. (#218)

3a77991

lemon24 added a commit that referenced this issue Aug 13, 2023

Handle update_feeds() hook errors (#218).

dda9675

lemon24 added a commit that referenced this issue Aug 14, 2023

Don't assume feeds are updated in order in update hook error tests.

7826842

For #218.

lemon24 added a commit that referenced this issue Aug 14, 2023

Extract update hook running in a class (#218).

47b1aa7

lemon24 added a commit that referenced this issue Aug 15, 2023

UpdateHook cleanup (#218).

0f61c9d

lemon24 added a commit that referenced this issue Aug 17, 2023

_FancyExceptionBase.message should be compatible with ExceptionGroup. #…

0e1e9d8

…218

lemon24 added a commit that referenced this issue Aug 17, 2023

Test UpdateHookErrorGroup. #218

55d0b9f

lemon24 added a commit that referenced this issue Aug 17, 2023

Test update hook error handling in the CLI. #218

0ca0503

lemon24 added a commit that referenced this issue Aug 23, 2023

Wrap unexpected retriever/parser errors in ParseError. #218

ccf8d26

lemon24 added a commit that referenced this issue Aug 24, 2023

Feed.last_exception can be any UpdateError, not just ParseError.__cau…

04f593d

…se__. #218

lemon24 added a commit that referenced this issue Aug 24, 2023

Feed.last_exception can be any UpdateError, not just ParseError.__cau…

e183267

…se__. #218

lemon24 added a commit that referenced this issue Aug 24, 2023

Dev notes for #218.

e7a1de0

lemon24 closed this as completed Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-ParseError exceptions during update_feeds() prevent updates for the remaining feeds #218

Non-ParseError exceptions during update_feeds() prevent updates for the remaining feeds #218

lemon24 commented Feb 18, 2021 •

edited

Loading

lemon24 commented Jun 15, 2023 •

edited

Loading

lemon24 commented Jun 17, 2023 •

edited

Loading

lemon24 commented Aug 6, 2023 •

edited

Loading

lemon24 commented Aug 6, 2023 •

edited

Loading

lemon24 commented Aug 13, 2023 •

edited

Loading

lemon24 commented Aug 21, 2023 •

edited

Loading

lemon24 commented Aug 24, 2023

Non-ParseError exceptions during update_feeds() prevent updates for the remaining feeds #218

Non-ParseError exceptions during update_feeds() prevent updates for the remaining feeds #218

Comments

lemon24 commented Feb 18, 2021 • edited Loading

lemon24 commented Jun 15, 2023 • edited Loading

lemon24 commented Jun 17, 2023 • edited Loading

lemon24 commented Aug 6, 2023 • edited Loading

lemon24 commented Aug 6, 2023 • edited Loading

lemon24 commented Aug 13, 2023 • edited Loading

lemon24 commented Aug 21, 2023 • edited Loading

lemon24 commented Aug 24, 2023

lemon24 commented Feb 18, 2021 •

edited

Loading

lemon24 commented Jun 15, 2023 •

edited

Loading

lemon24 commented Jun 17, 2023 •

edited

Loading

lemon24 commented Aug 6, 2023 •

edited

Loading

lemon24 commented Aug 6, 2023 •

edited

Loading

lemon24 commented Aug 13, 2023 •

edited

Loading

lemon24 commented Aug 21, 2023 •

edited

Loading