Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match error handling to other runtimes #241

Closed
rimutaka opened this issue Jul 20, 2020 · 35 comments
Closed

Match error handling to other runtimes #241

rimutaka opened this issue Jul 20, 2020 · 35 comments
Labels
enhancement New feature or request

Comments

@rimutaka
Copy link
Contributor

The format of error messages returned by the runtime is hard to work with at the client end.

Example:

{
  "errorType": "alloc::boxed::Box<dyn std::error::Error+core::marker::Sync+core::marker::Send>",
  "errorMessage": "LambdaError { msg: \"cust error\", source: \"somewhere in code\", code: InvalidInput }"
}

It is produced by this code in

Err(e) => EventErrorRequest {

Err(e) => EventErrorRequest {
                request_id,
                diagnostic: Diagnostic {
                    error_message: format!("{:?}", e),
                    error_type: type_name_of_val(e).to_owned(),
                },
            }
            .into_req()?,

with help from

fn type_name_of_val<T>(_: T) -> &'static str {
    std::any::type_name::<T>()
}

Proposed change

  1. Log the error message, probably via error! so that it appears in CloudWatch.
  2. Serialise e into JSON instead of Debug output ({:?}).
  3. Allow to set error type and fall back on std::any::type_name if none was provided.

Here are examples of what other runtimes are producing:

Discussion

I'm not 100% sure how it all can be implemented yet. Just wondering if this is a problem that impacts others and is worth looking into.

@softprops
Copy link
Contributor

Serialise e into JSON instead of Debug output ({:?}).

I have an even simpler suggestion, use Display instead of Debug. Display is often implemented precisely for displaying a message. Debug is not meant for this purpose. You get this for free as implementing Error requires it so no extra type constraints are needed .

@softprops
Copy link
Contributor

Allow to set error type and fall back on std::any::type_name if none was provided.

We had a similar set up in the past but required muuuch more machinery and was a bit onerous on the user end. There might still.be a way to have the best of both worlds with fancy downcasting. I'd propose some unit tests here. Might have caught the usability issue sooner

@softprops
Copy link
Contributor

Log the error message, probably via error! so that it appears in CloudWatch.

+1

@rimutaka
Copy link
Contributor Author

I'll attempt a PR in a few days.

@rimutaka
Copy link
Contributor Author

Actually, this is a breaking change. The existing code will compile, but may output nothing or some dummy text.

@softprops
Copy link
Contributor

Actually, this is a breaking change.

You are fortunately afforded the luxury of timing here here. The current state of code has evolved much beyond what's been released in a breaking change sort of way. If there was a good time to make a change to improve usability but might be considered a breaking change, now is that time.

@rimutaka
Copy link
Contributor Author

rimutaka commented Jul 20, 2020

Logging errors from inside async fn run_inner requires these 2 dependencies:

simple_logger = "1.6.0"
log = "0.4.8"

and an init statement somewhere before error!() is called

simple_logger::init_with_level(log::Level::Info)?;

I'm inclined to make the loggers an optional feature, disabled by default.
The init will have to be done in the body of the lambda handler function to avoid conflicts and select the right logging level.

@davidbarsky
Copy link
Contributor

Thanks for opening this issue! I'm personally inclined to use tracing over log(given that I'm one of the authors of tracing...), if only because it integrates with XRay already.

@softprops
Copy link
Contributor

requires these 2 dependencies

One is an interface and the other should be a user choice. When you bind simple logger you remove a user choice. Please try to avoid removing user choice for logging!

An alternative is opting for your simple logger in the mains where you want the extra logging and set that to error or what have you.

I'm a fan of log if for no other reason is it's ubiquity. Trying to be unbiased here :)

tracing is nice but also feels like it should be a user choice

@davidbarsky
Copy link
Contributor

davidbarsky commented Jul 20, 2020

Here are examples of what other runtimes are producing...

Thanks for the reference. I'm worried that we might limited in what information we can send to Lambda, but I'll need to dig into this: https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html.

Log the error message, probably via error! so that it appears in CloudWatch.

+1. @nmoutschen and I are working on finishing up a tracing-xray integration that already works in this runtime. That might be another option here.

One is an interface and the other should be a user choice. When you bind simple logger you remove a user choice. Please try to avoid removing user choice for logging!

+1

I'm a fan of log if for no other reason is it's ubiquity. Trying to be unbiased here :)
tracing is nice but also feels like it should be a user choice

For sure! One of the things we've put a lot of effort into tracing is to enable existing log user to consume tracing-instrumented libraries (for instance, see the PRs I've opened on hyper and h2) with no performance or interop issues whatsoever. From the perspective of of something of like env_logger, tracing is log. tracing users, however, are able to get richer diagnostics and see higher performance when using a tracing Subscriber. We're not attempting to boil the ocean with tracing :)

@davidbarsky
Copy link
Contributor

Sorry for accidentally closing this issue! Didn't mean to.

@rimutaka
Copy link
Contributor Author

rimutaka commented Jul 21, 2020

This snippet is from Cargo.toml (https://github.com/awslabs/aws-lambda-rust-runtime/blob/master/lambda/Cargo.toml#L9):

[features]
# no features by default
default = ["simulated", "derive"]
simulated = []

which doesn't make sense to me.

  1. It says "# no features by default" while enabling derive and simulated.
  2. I can't find any references to simulated in the code. Looks like mod simulated is used for testing only.

Shall we remove simulated from the list of features while I am at it?

@rimutaka
Copy link
Contributor Author

rimutaka commented Jul 22, 2020

@davidbarsky David, does tracing actually work in lambda?

It works fine with log, but when I switch to tracing nothing is being logged. What am I missing?

  1. I added #[instrument(skip(incoming, handler))]
  2. The error is logged with error!("{}", e);
  3. A test log entry is written unconditionally with error!("Run inner started");
  4. use tracing::{error, instrument}; sits at the top of the module

It all compiles and runs on AWS Lambda, but no log entries are written into CloudWatch via tracing at all.
That same code works just fine if I use a properly initiated log.

I could not get any logging via tracing from client.rs either.

Here is the full snippet ...

#[instrument(skip(incoming, handler))]
async fn run_inner<A, B, F>(
    client: &Client,
    incoming: impl Stream<Item = Result<http::Response<hyper::Body>, Error>>,
    handler: &mut F,
) -> Result<(), Error>
where
    F: Handler<A, B>,
    <F as Handler<A, B>>::Error: fmt::Display,
    A: for<'de> Deserialize<'de>,
    B: Serialize,
{
    tokio::pin!(incoming);
    error!("Run inner started");
    while let Some(event) = incoming.next().await {
        let event = event?;
        let (parts, body) = event.into_parts();

        let ctx: Context = Context::try_from(parts.headers)?;
        let body = hyper::body::to_bytes(body).await?;
        let body = serde_json::from_slice(&body)?;

        let request_id = &ctx.request_id.clone();
        let f = handler.call(body, ctx);

        let req = match f.await {
            Ok(res) => EventCompletionRequest { request_id, body: res }.into_req()?,
            Err(e) => {
                error!("{}", e);
                EventErrorRequest {
                    request_id,
                    diagnostic: Diagnostic {
                        error_message: format!("{}", e),
                        error_type: type_name_of_val(e).to_owned(),
                    },
                }
                .into_req()?
            }
        };
        client.call(req).await?;
    }

    Ok(())
}

@davidbarsky
Copy link
Contributor

I'm assuming you're using tracing_subscriber::fmt::Layer, which only emits logs when an event occurs, as it tries to reducing buffering as much as possible. Try setting https://docs.rs/tracing-subscriber/0.2.8/tracing_subscriber/fmt/struct.Layer.html#method.with_span_events to FmtSpan::FULL and creating a global subscriber like this: https://docs.rs/tracing-subscriber/0.2.8/tracing_subscriber/fmt/index.html#composing-layers.

You can also add additional log-like info! statements to write data out.

@rimutaka
Copy link
Contributor Author

I'm just using error! macros, exactly as in client.rs. The tracing docs say to only use macros in libs.
The test executable uses log/simple_logger for its own logging. My understanding is it should not affect how tracing works.
If this makes no sense I'll make a commit for your review.

@softprops
Copy link
Contributor

Just wanted to echo the voice of a minimal runtime again so something can ship soon. The problem to be solved here is to log an error, correct?

tracing feels a bit much just to print an error to the stdout/err. What was wrong with log::error!("{}", e) and letting the user select the log impl of their choice in their main again?

@davidbarsky
Copy link
Contributor

It makes sense; I wrote a decent portion of those docs!

The test executable uses log/simple_logger for its own logging. My understanding is it should not affect how tracing works.

tracing's spans will be a different target in log. I forget which target it is (tracing?) as I'm getting ready for bed. I'd also try the tracing_subscriber logger and see if you can reproduce this issue there.

@davidbarsky
Copy link
Contributor

Just wanted to echo the voice of a minimal runtime again so something can ship soon. The problem to be solved here is to log an error, correct?

Yup on both counts. Wanted to provide support where I could. Plus, had an off-day/break today, so the README changes didn't get posted.

tracing feels a bit much just to print an error to the stdout/err. What was wrong with log::error!("{}", e) and letting the user select the log impl of their choice in their main again?

Nothing wrong with it! You can write the exact same thing in tracing, but record the error as a proper key/value (tracing and log will end up sharing the same K/V recording system, for what it's worth). It also has an XRay exporter.

@rimutaka
Copy link
Contributor Author

  1. The output of tracing looks complicated. Here is an example at DEBUG level.
    image

What do all those extra brackets and characters mean?
It will probably break most log parsers.

The CloudWatch log looks cleaner, but I don't get what those 2m, 34m and other prefixes mean. [1} - wrong closing bracket ?

image

  1. The executable with the Lambda handler will have to have a reference to tracing-subscriber crate and this init line:
tracing_subscriber::fmt().with_max_level(tracing::Level::DEBUG).init();

which took me a while to dig out from tracing examples. It is not even documented (not that I noticed).

Initializing the logger via a different crate has no effect on tracing. So this one will work, but the runtime with tracing won't log anything:

simple_logger::init_with_level(log::Level::Info)?;
  1. Tracing is used in 2 places only via macros:
error!(message = "Got error serving connection", e = %e);
info!("Received cancelation signal");

Proposal

  1. Replace tracing with log
  2. Implement tracing properly in the next release

Here is an example of log output at the same level for the same call, but without tracing.

image

This line was logged by the runtime in response to a handler error, which was the whole point of my quest :)

2020-07-22 04:25:43,093 ERROR [lambda] {"is_authenticated":false,"msg":"A custom error as requested!","req_id":"102c7ecc-c805-4e1c-8f1b-62d4b849c18f"}

@davidbarsky
Copy link
Contributor

What do all those extra brackets and characters mean?

They're ansi escape codes that can be disabled using using this method: https://docs.rs/tracing-subscriber/0.2.8/tracing_subscriber/fmt/struct.Layer.html#method.with_ansi.

This line was logged by the runtime in response to a handler error, which was the whole point of my quest :)

Yeah, we went down a wild goose chase. Sorry about that!

Proposal

  1. Replace tracing with log
  2. Implement tracing properly in the next release

I'd normally agree with this approach, but I think we can resolve this quickly. In your PR, try replacing log with tracing, but ensure the tracing dependency is declared as tracing = { version = "0.1", features = ["log"] }.

@rimutaka
Copy link
Contributor Author

tracing adds complexity for no added benefit. Consider a case where some lambdas are in Node.JS and some are in Rust. Even if the handler outputs the same error message the output will look different with tracing. So now I have to make changes to my Appsync code, Vue, Graphana, and anything else that touches it to handle both variants.

I understand tracing is required to enable X-Ray, but can we introduce it so that it doesn't double everyone's development cost?

@davidbarsky
Copy link
Contributor

davidbarsky commented Jul 23, 2020

@rimutaka

tracing adds complexity for no added benefit. Consider a case where some lambdas are in Node.JS and some are in Rust. Even if the handler outputs the same error message the output will look different with tracing. So now I have to make changes to my Appsync code, Vue, Graphana, and anything else that touches it to handle both variants.

This is where I disagree and regret mentioning tracing. tracing will not impact how your error will be rendered or logged; I only mentioned tracing in reference to your comment on rust-lang/log#357 because it'll allow you to preserve the fact that you're recording an error if and only if you're using a tracing Subscriber. log users will not benefit from this feature of tracing, but they will not be harmed either—they will merely get the fallback Debug or Display rendering.


With that being said—and please correct me if I'm wrong—this issue was opened to resolve how this runtime should report errors to the Invocation error endpoint. The current implementation has lots of room for improvement—I'll accept ownership and blame for that.

I've personally found eyre::Report to be a really compelling illustration of the distinction between error handling and error reporting, the latter of which this issue focuses on. I'll say this off the bat: I don't think we can use eyre, but I think we can get some inspiration from its ideas. One approach I've been thinking about is keeping the Into<Box<dyn std::error::Error + Send + Sync + 'static>> bounds but (re)introducing a std::panic::set_hook-style error reporting hook to this runtime. It'll have a closure that accepts a Box<dyn std::error::Error + Send + Sync + 'static>, but you'd be able to downcast to specific error variants and convert them to a Diagnostic accordingly. Here's a limited example:

lambda::set_hook(Box::new(|e: Box<dyn std::error::Error + Send + Sync + 'static>| {
    if let Some(e) = e.downcast_ref::<std::io::Error>() {
        Diagnostic {
            error_message: format!("{}", e),
            error_type: "std::io::Error"
        } 
    }
    // handle any other variant you care to report, falling back to the `type_name`-based approach as a default.
    // I didn't write a fallback, but let's assume this closure _must_ return a `lambda::Diagnostic`.
}));

As @Veetaha mentioned in #243 (comment), we're between a rock (the current AWS APIs) and a hard place (the lack of stable specialization in Rust). This downcasting-based approach feels like the least bad approach to me, but what do you think?

Additionally, free to ping me directly over Discord or even setup a call. I'd be happy to walk through and discuss my thinking, which might be lossy over text.

@rimutaka
Copy link
Contributor Author

David, can you give me a quick example how to init tracing so that it produces a clean output, preferably identical to that of log?
I'll change it all back to tracing and resubmit the PR.

@davidbarsky
Copy link
Contributor

davidbarsky commented Jul 23, 2020

@rimutaka Take a look at there examples I just pushed: https://github.com/awslabs/aws-lambda-rust-runtime/tree/davidbarsky/update-readme/lambda/examples. The example hello-without-macro-tracing.rs shows how to use tracing_subscriber without ansi color codes and hello-without-macro-log.rs shows how to use simple_logger. Here are the respective outputs:
Screen Shot 2020-07-22 at 9 54 29 PM

@davidbarsky
Copy link
Contributor

davidbarsky commented Jul 23, 2020

Don't worry about undoing your PR; rebasing your very helpful documentation changes atop of that branch would be the most helpful.

@brainstorm
Copy link

brainstorm commented Jul 23, 2020

I had tracing testing on my backlog for a while now, nice to see this discussion (thanks @davidbarsky for roping me in).

tracing adds complexity for no added benefit. Consider a case where some lambdas are in Node.JS and some are in Rust. Even if the handler outputs the same error message the output will look different with tracing. So now I have to make changes to my Appsync code, Vue, Graphana, and anything else that touches it to handle both variants.

I think that disabling the ANSI color codes would solve that for most the "output will look different" part. Now, I hear @softprops and @rimutaka when it comes to adding more (unnecessary?) deps: I'm biased towards AWS-managed ways.

Having X-Rays is a big plus on my book. OTOH, if that introduces several MB on the final release lambda bin, then that's not cool and I would rather prefer leaner bins for lambdas.

What does the release binsize delta look like? I think that cargo-bloat can help us making a more informed decision, since binary bloat can affect runtime unnecessarily (thus hurting the main point of short-lived, lean lambdas).

OT-ish: I can PR cargo-bloat as an additional step on GitHub Actions, btw, if there's enough interest and that's a metric that will be contemplated in the future?

@rimutaka
Copy link
Contributor Author

@davidbarsky , I got it about .with_ansi(false). Can we tell tracing NOT to output the timestamp? It comes up twice in CloudWatch. That would be the only remaining difference with log.

Also, when I tried to init the logger via simple_logger it had no effect on tracing - nothing was logged by the runtime.
I presume that https://github.com/awslabs/aws-lambda-rust-runtime/blob/davidbarsky/update-readme/lambda/examples/hello-without-macro-log.rs#L10 enables tracing somehow.
Will get onto it shortly.

@davidbarsky
Copy link
Contributor

I got it about .with_ansi(false). Can we tell tracing NOT to output the timestamp? It comes up twice in CloudWatch. That would be the only remaining difference with log.

Yup, here's how: https://github.com/awslabs/aws-lambda-rust-runtime/blob/davidbarsky/update-readme/lambda/examples/hello-without-macro-tracing.rs#L16.

I presume that https://github.com/awslabs/aws-lambda-rust-runtime/blob/davidbarsky/update-readme/lambda/examples/hello-without-macro-log.rs#L10 enables tracing somehow.

Kinda. That'll enable tracing's log facade, which is in turn enabled by this feature. This feature was initially missing.

Also, when I tried to init the logger via simple_logger it had no effect on tracing - nothing was logged by the runtime.

Hrm, I'm unable to reproduce this. The first example below makes use of simple_logger and it has output from the lambda crate. tracing-subscriber output has a similar output, also from the lambda crate.
Screen Shot 2020-07-22 at 10 53 36 PM


@brainstorm

I think using cargo bloat is a good idea, but I'd hold off on making a PR.

@rimutaka
Copy link
Contributor Author

@davidbarsky, I applied my changes on top of your branch, kept tracing and your examples. Both lambda-http/examples failed to compile. It complained about a missing trait. Nothing to do with my changes. Do you know what needs to be fixed? Shall I commit as-is or attempt to fix it?

@davidbarsky
Copy link
Contributor

Both lambda-http/examples failed to compile. It complained about a missing trait. Nothing to do with my changes. Do you know what needs to be fixed? Shall I commit as-is or attempt to fix it?

Commit as is to your PR; we can dig into those issues as needed.

@rimutaka
Copy link
Contributor Author

Further to the discussion on the bloat and the size of the binaries. Compiled with cargo build --release --target x86_64-unknown-linux-musl --examples. I have no idea if this is too big or OK. Just stating the facts.

Example Size
basic 6,771,664
error-handling 6,876,392
hello 6,703,288
hello-without-macro-log 6,793,136
hello-without-macro-tracing 6,960,136

@softprops
Copy link
Contributor

softprops commented Jul 23, 2020

I'm wondering if it helps to move things forward. can we separate how the error name is derived from how it is logged? there's lots of ways to log an error but I assume there's like one pretty good way to format the name of the error type.

On picking a log impl and size and all that. cargo feature flags can help with that!

@rimutaka
Copy link
Contributor Author

davidbarsky/update-readme branch has more changes than just the readme. E.g., lib.rs was re-factored quite a bit. We moved from a working build that had a minor error logging problem to a non-working build with changes to the core code.

I agree with @softprops that we should deal with the error_type separately. It's not trivial and is probably not that critical right now.

@davidbarsky
Copy link
Contributor

I keep accidentally closing this issue; sorry about that.

markdingram pushed a commit to markdingram/aws-lambda-rust-runtime that referenced this issue Dec 30, 2020
* rimutaka/master: (26 commits)
  Removed unnecessary + Sync on handler's future
  Added docs and missing _vars to reduce warnings
  Enforced Send+Sync for http Handler impl.
  Added some tracing!() to lib.rs.
  Fixed "Process exited" error in lib.rs.
  client.rs formatting fix
  Log panic as Debug+Display, improved examples
  Added err logging to lib.rs, consolidated examples
  Added error handling for awslabs#241, interim, broken.
  Removed unused refs from basic.rs
  disable time on `tracing_subscriber`
  Add log and tracing examples; fix `tracing` dependency features.
  Upper-cased and edited examples/readme.md
  Corrected handler_fn interface misconception in basic.rs
  Fixed comment in lib.rs
  Formatting fix in error-handling.rs.
  Fixed headers in examples readme.
  Tidied up examples readme.
  Added basic.rs example, comments to other examples
  Log handler errors in runtime. Issue awslabs#241
  ...
@jkshtj jkshtj added the enhancement New feature or request label Jan 21, 2021
bahildebrand added a commit that referenced this issue Feb 4, 2021
* reorg

* Replace `genawaiter` with `async-stream` (#240)

* don't depend on serde_derive directly

* Add Cargo.lock to gitignore; whoops.

* Cleanup docs; handle panics in lambda functions correctly.

* Create a reusable runtime struct + builder

* Log handler errors in runtime. Issue #241

* Added basic.rs example, comments to other examples

* Tidied up examples readme.

* Fixed headers in examples readme.

* Formatting fix in error-handling.rs.

* Fixed comment in lib.rs

* Corrected handler_fn interface misconception in basic.rs

Co-authored-by: Veetaha <veetaha2@gmail.com>

* Upper-cased and edited examples/readme.md

* Add log and tracing examples; fix `tracing` dependency features.

* disable time on `tracing_subscriber`

* Removed unused refs from basic.rs

* Added error handling for #241, interim, broken.

* Added err logging to lib.rs, consolidated examples

* Log panic as Debug+Display, improved examples

* client.rs formatting fix

* Fixed "Process exited" error in lib.rs.

* Added some tracing!() to lib.rs.

* Enforced Send+Sync for http Handler impl.

* Added docs and missing _vars to reduce warnings

* Removed unnecessary + Sync on handler's future

* All tests passing

* Remove deprecated simple_logger calls

* Remove Cargo.lock

* cargo fmt changes

Co-authored-by: David Barsky <me@davidbarsky.com>
Co-authored-by: Lucio Franco <luciofranco14@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-47-28.ap-southeast-2.compute.internal>
Co-authored-by: rimutaka <max@onebro.me>
Co-authored-by: Veetaha <veetaha2@gmail.com>
@bahildebrand
Copy link
Contributor

This change was merged with #284. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants