Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing nbd logging to host console on uncaught errors (CA-373776) #4878

Conversation

jameshensmancitrix
Copy link
Contributor

Added unhandled exception handler to nbd to log errors instead of the messages being printed to the host console

Signed-off-by: jameshensmancitrix james.hensman@citrix.com

@lindig
Copy link
Contributor

lindig commented Dec 19, 2022

A better subject line would be "CA-373776 catch unhandled exceptions and log them." "Added unhandled exception handler" is ambiguous.

@@ -255,4 +255,9 @@ let () =
in the failed state, and a backtrace will show up in the logs. *)
Cleanup.Runtime.register_signal_handler () ;
setup_logging () ;
let handler exn _ =
Lwt_log.ign_error (Printexc.to_string exn) ;
Lwt_log.ign_error (Printexc.get_backtrace ())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the logging call destroy the backtrace? I think it would be best to let-bind the backtrace before doing anything else in the handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure on that one, I can do a let bind to ensure that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handler does have a backtrace on it I am no using to print the backtrace.

@jameshensmancitrix jameshensmancitrix force-pushed the private/jameshen/nbd-logs branch 2 times, most recently from 5c29e9c to 9b63143 Compare December 19, 2022 14:07
@stormi
Copy link
Contributor

stormi commented Dec 19, 2022

A better subject line would be "CA-373776 catch unhandled exceptions and log them." "Added unhandled exception handler" is ambiguous.

It would also be nice to update the PR title with this, as the CA number's signification is not obvious at first read (nor at second read if you don't have access to the internal issue tracker it refers to).

@jameshensmancitrix jameshensmancitrix changed the title CA-373776 Fixing nbd logging to host console on uncaught errors (CA-373776) Dec 19, 2022
@jameshensmancitrix jameshensmancitrix force-pushed the private/jameshen/nbd-logs branch 7 times, most recently from 76f3717 to dca1fe4 Compare January 4, 2023 15:29
>>= fun () -> Lwt.fail e
)
)
try Lwt_main.run (ignore_exn_delayed t ())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct I think: Lwt_main.run according to its documentation will raise an exception if the promise failed, and the try/with will also catch any OCaml exceptions that might escape.

You might also want to set the async exception hook https://ocsigen.org/lwt/latest/api/Lwt#VALasync_exception_hook, which by default prints the exception and terminates the program.

Copy link
Contributor

@edwintorok edwintorok Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another approach would be to use 'Lwt.wrap' to catch the OCaml exception and turn it into an Lwt one, at which point the Lwt_log from Lwt.catch will work.
And that might be better because then we can run the logger correctly (otherwise if the logger would call into any blocking function it'd wait for the Lwt main loop to resume running it, but because the 'with _... ' below is outside of it I'm not sure it'd run to completion or just hang waiting.

)
)
try Lwt_main.run (ignore_exn_delayed t ())
with e -> Lwt_log.ign_fatal (Printexc.to_string e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this changes the exitcode of the process which isn't necessarily right. If we had an exception we want a non-zero exitcode, like before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding a call to 'exit 3' here or something similar after logging the exception. Although the Lwt main loop is not running anymore at this point so I doubt we'll succeed in logging it.
Maybe it'd be better to use Lwt_main.run (Lwt_log.fatal...)

@jameshensmancitrix jameshensmancitrix force-pushed the private/jameshen/nbd-logs branch 3 times, most recently from 72c51db to 07887a5 Compare January 10, 2023 11:08
(Lwt.catch t (fun e ->
Lwt_log.fatal_f "Caught unexpected exception: %s"
(Printexc.to_string e)
>>= fun () -> Lwt.fail e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception is no longer reraised, instead it's ignored. I think in case of unhandled exception nbd should fail with an error code.

I recommend changing this back to how it was and then change fatal_f for error_f

…ead of the messages being printed to the host console

Signed-off-by: jameshensmancitrix <james.hensman@citrix.com>
@jameshensmancitrix
Copy link
Contributor Author

Will do a final test before merging

@jameshensmancitrix jameshensmancitrix merged commit 14ee043 into xapi-project:master Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants