Skip to content

Exception handling in batch processing doesn't work as expected #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pcolazurdo opened this issue Jan 21, 2021 · 5 comments
Closed

Exception handling in batch processing doesn't work as expected #275

pcolazurdo opened this issue Jan 21, 2021 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@pcolazurdo
Copy link

In this code:

processor = PartialSQSProcessor(suppress_exception=False)
with processor(records, record_handler) as proc:
    result = proc.process()

If there is an exception raised inside the record_handler function, I expect the exception to be raised in the function and get some details about what caused the exception. The code in https://github.com/awslabs/aws-lambda-powertools-python/blob/e104bfb5dcedf7b52d4a7925ae446ff719b8c110/aws_lambda_powertools/utilities/batch/sqs.py#L126 tries to be smart about it and raise a list of all the exceptions in the batch, but this doesn't work as expected, as the Exception code is not smart enough to show a list of exceptions. The actual result looks like this:

Traceback (most recent call last):
  File "/var/task/aws_lambda_powertools/logging/logger.py", line 270, in decorate
    return lambda_handler(event, context)
  File "/var/task/aws_lambda_powertools/tracing/tracer.py", line 303, in decorate
    response = lambda_handler(event, context, **kwargs)
  File "/var/task/aws_lambda_powertools/metrics/metrics.py", line 144, in decorate
    response = lambda_handler(event, context)
  File "/var/task/app.py", line 179, in lambda_handler
    result = proc.process()  # Returns a list of all results from record_handler
  File "/var/task/aws_lambda_powertools/utilities/batch/base.py", line 58, in __exit__
    self._clean()
  File "/var/task/aws_lambda_powertools/utilities/batch/sqs.py", line 126, in _clean
    raise SQSBatchProcessingError(list(self.exceptions))

What were you trying to accomplish?

Expected Behavior

I would except the Exception to be raised in a manner that all the exceptions are shown together.
Also, reviewing the code I see there is a fail_message property that doesn't seem to be exposed anywhere. Is there a plan for this? This may be very useful in some cases.

Current Behavior

Please review the Stack trace above

Possible Solution

I think re raising an exception may need some extra work like what is commented here: https://nedbatchelder.com/blog/200711/rethrowing_exceptions_in_python.html

Steps to Reproduce (for bugs)

Raise an exception inside a function called from the record handler. Without debug mode, you can't see the details of this exception.

Environment

  • Powertools version used: aws-lambda-powertools-1.10.1
  • Packaging format (Layers, PyPi): PyPi
  • AWS Lambda function runtime: Python 3.8
  • Debugging logs

How to enable debug mode**

LogStream&Time:13.430000 START RequestId: fc69d380-4601-5cf1-b606-8b044590e830 Version: $LATEST
LogStream&Time:13.769000 2021-01-21 22:34:13,769 aws_lambda_powertools.utilities.batch.base [DEBUG] Record processing exception: 'Metadata' object has no attribute '_sk'
LogStream&Time:13.769000 2021-01-21 22:34:13,769 aws_lambda_powertools.utilities.batch.sqs [DEBUG] 1 records failed processing, raising exception
LogStream&Time:13.949000 aws_lambda_powertools.utilities.batch.exceptions.SQSBatchProcessingError: [AttributeError("'Metadata' object has no attribute '_sk'")]
LogStream&Time:13.950000 [ERROR] SQSBatchProcessingError: [AttributeError("'Metadata' object has no attribute '_sk'")]
Traceback (most recent call last):
  File "/var/task/aws_lambda_powertools/logging/logger.py", line 270, in decorate
    return lambda_handler(event, context)
  File "/var/task/aws_lambda_powertools/tracing/tracer.py", line 303, in decorate
    response = lambda_handler(event, context, **kwargs)
  File "/var/task/aws_lambda_powertools/metrics/metrics.py", line 144, in decorate
    response = lambda_handler(event, context)
  File "/var/task/app.py", line 183, in lambda_handler
    result = proc.process()  # Returns a list of all results from record_handler
  File "/var/task/aws_lambda_powertools/utilities/batch/base.py", line 58, in __exit__
    self._clean()
  File "/var/task/aws_lambda_powertools/utilities/batch/sqs.py", line 126, in _clean
    raise SQSBatchProcessingError(list(self.exceptions))
LogStream&Time:14.008000 END RequestId: fc69d380-4601-5cf1-b606-8b044590e830
LogStream&Time:14.008000 REPORT RequestId: fc69d380-4601-5cf1-b606-8b044590e830  Duration: 343.54 ms     Billed Duration: 344 ms    Memory Size: 128 MB     Max Memory Used: 105 MB Init Duration: 1149.70 ms
XRAY TraceId: 1-600a0164-4d5d5e616bb97d8a4e3f804b       SegmentId: 1972c55e04b1b0f3     Sampled: true
@pcolazurdo pcolazurdo added bug Something isn't working triage Pending triage from maintainers labels Jan 21, 2021
@to-mc
Copy link
Contributor

to-mc commented Jan 22, 2021

Hi @pcolazurdo thanks for raising! I just tried to reproduce this with a minimal Lambda using a record handler that throws different exceptions, and this is the logged output I get:

1611349258953,"[ERROR] SQSBatchProcessingError: [ZeroDivisionError('division by zero'), TypeError('can only concatenate str (not ""int"") to str'), ZeroDivisionError('division by zero'), TypeError('can only concatenate str (not ""int"") to str'), TypeError('can only concatenate str (not ""int"") to str'), TypeError('can only concatenate str (not ""int"") to str')]
Traceback (most recent call last):
  File ""/opt/python/aws_lambda_powertools/logging/logger.py"", line 270, in decorate
    return lambda_handler(event, context)
  File ""/opt/python/aws_lambda_powertools/middleware_factory/factory.py"", line 133, in wrapper
    response = middleware()
  File ""/opt/python/aws_lambda_powertools/utilities/batch/sqs.py"", line 181, in sqs_batch_processor
    processor.process()
  File ""/opt/python/aws_lambda_powertools/utilities/batch/base.py"", line 58, in __exit__
    self._clean()
  File ""/opt/python/aws_lambda_powertools/utilities/batch/sqs.py"", line 126, in _clean
    raise SQSBatchProcessingError(list(self.exceptions))"
1611349258968,"END RequestId: b34f1073-dd51-5db7-822a-48f6e26a197a"

The first line in the log shows a list of the exceptions that were raised - can you confirm that you don't see this line without debug mode enabled? If its not being logged for you, I'll need some more details about your code / function configuration to help me reproduce it.

If you need to log the tracebacks of individual errors which occur in the record handler, a quick solution would be to catch the exceptions and add a logging statement in your own code before re-raising (example below). We could also log the tracebacks one by one as they occur, but would probably want to implement that with a parameter defaulting to disabled.

    try:
        do_something_with_record(record)
    except Exception as exc:
        logger.exception(exc)
        raise

@pcolazurdo
Copy link
Author

pcolazurdo commented Jan 23, 2021

@cakepietoast thanks for investigating this. I think your code reproduces the problem quite well. I'm copying mine just in case but in your traceback you can see that, while the list of Exception message is preserved, the actual location of the exception is lost. This is problematic for nested calls or a complex if then structures (among many others).
Here is an actual code that reproduce my problem. This happens with even only 1 record in the Queue.

You can see in this Traceback that it loses the information of where the exception actually happened as it should point to def process_request

2021/01/23/[$LATEST]83e299d9a2484bfe8af7882b76aeb868 2021-01-23T10:20:51.344000 [ERROR] SQSBatchProcessingError: [NameError("name 'non_existant_var' is not defined")]
Traceback (most recent call last):
  File "/var/task/aws_lambda_powertools/logging/logger.py", line 270, in decorate
    return lambda_handler(event, context)
  File "/var/task/aws_lambda_powertools/tracing/tracer.py", line 303, in decorate
    response = lambda_handler(event, context, **kwargs)
  File "/var/task/app.py", line 54, in lambda_handler
    result = proc.process()  # Returns a list of all results from record_handler
  File "/var/task/aws_lambda_powertools/utilities/batch/base.py", line 58, in __exit__
    self._clean()
  File "/var/task/aws_lambda_powertools/utilities/batch/sqs.py", line 126, in _clean
    raise SQSBatchProcessingError(list(self.exceptions))
2021/01/23/[$LATEST]83e299d9a2484bfe8af7882b76aeb868 2021-01-23T10:20:51.363000 END RequestId: a3371043-940b-5a88-aedd-48c66ec9aa33

The code for this is:

import json
# import os

# import boto3


# from aws_lambda_powertools.metrics import Metrics, MetricUnit
from aws_lambda_powertools.logging import Logger
from aws_lambda_powertools.tracing import Tracer
from aws_lambda_powertools.utilities.batch import PartialSQSProcessor
# from aws_lambda_powertools.logging.logger import set_package_logger


SERVICE = 'powertools_test'

# metrics = Metrics(service=SERVICE)
tracer = Tracer(service=SERVICE)
logger = Logger(service=SERVICE)


@tracer.capture_method
def process_request():

    return {
        "statusCode": 200,
        "body": json.dumps({
            'status': 'PageSaved',
            'url': non_existant_var,
            'crawledFile': non_existant_var
        })
    }


@tracer.capture_method
def record_handler(record):
    # This will be called for each individual message from a batch
    # It should raise an exception if the message was not processed successfully
    try:
        res = process_request()
        return res
    except Exception as e:
        raise e


@logger.inject_lambda_context(log_event=True)
@tracer.capture_lambda_handler
def lambda_handler(event, context):
    """
    """

    records = event["Records"]
    processor = PartialSQSProcessor(suppress_exception=False)
    with processor(records, record_handler) as proc:
        result = proc.process()  # Returns a list of all results from record_handler

    logger.structure_logs(append=True, results=result)
    return result

Hope this helps,
Pablo

@to-mc
Copy link
Contributor

to-mc commented Jan 25, 2021

Gotcha, thanks for clarifying and providing the extra detail! Working on a solution for this now - would this be a reasonable format for the exception output?

Traceback (most recent call last):
  File "cause_batch_exceptions.py", line 60, in <module>
    lambda_handler(event, {})
  File "/tmp/aws-lambda-powertools-python/aws_lambda_powertools/middleware_factory/factory.py", line 133, in wrapper
    response = middleware()
  File "/tmp/aws-lambda-powertools-python/aws_lambda_powertools/utilities/batch/sqs.py", line 186, in sqs_batch_processor
    processor.process()
  File "/tmp/aws-lambda-powertools-python/aws_lambda_powertools/utilities/batch/base.py", line 58, in __exit__
    self._clean()
  File "/tmp/aws-lambda-powertools-python/aws_lambda_powertools/utilities/batch/sqs.py", line 127, in _clean
    raise SQSBatchProcessingError(
aws_lambda_powertools.utilities.batch.exceptions.SQSBatchProcessingError: Not all records processed succesfully. 2 individual errors logged separately below.

Traceback (most recent call last):
  File "/tmp/aws-lambda-powertools-python/aws_lambda_powertools/utilities/batch/sqs.py", line 94, in _process_record
    result = self.handler(record)
  File "cause_batch_exceptions.py", line 36, in record_handler
    return aa
NameError: name 'aa' is not defined

Traceback (most recent call last):
  File "/tmp/aws-lambda-powertools-python/aws_lambda_powertools/utilities/batch/sqs.py", line 94, in _process_record
    result = self.handler(record)
  File "cause_batch_exceptions.py", line 38, in record_handler
    print(42/0)
ZeroDivisionError: division by zero

@pcolazurdo
Copy link
Author

Yes, that looks great!!!!

@to-mc to-mc added area/utilities pending-release Fix or implementation already in dev waiting to be released and removed triage Pending triage from maintainers labels Feb 1, 2021
@heitorlessa
Copy link
Contributor

Now available in 1.10.2!! Thanks for reporting and reproducing it @pcolazurdo !! and for @cakepietoast for the fix (As always)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

3 participants