Write panic errors to logs (#445) #446

gvbgduh · 2024-04-04T14:49:44Z

Loggin from panic::set_hook.

It uses error macro from the same package module log

It works, but only after the logger is initialized by the kaspad daemon, before that it does nothing, so there is a gap.

>:~/.rusty-kaspa/kaspa-testnet-11/logs$ cat rusty-kaspa_err.log
2024-04-05 01:17:20.187+11:00 [ERROR] Panic at kaspad/src/daemon.rs:239: Test logger panic
>:~/.rusty-kaspa/kaspa-testnet-11/logs$ cat rusty-kaspa.log
2024-04-05 01:17:20.187+11:00 [INFO ] Logger is initiated with the log level: INFO and the log directory: "/home/george/.rusty-kaspa/kaspa-testnet-11/logs"
2024-04-05 01:17:20.187+11:00 [INFO ] kaspad v0.13.5
2024-04-05 01:17:20.187+11:00 [INFO ] Application directory: /home/george/.rusty-kaspa
2024-04-05 01:17:20.187+11:00 [INFO ] Data directory: /home/george/.rusty-kaspa/kaspa-testnet-11/datadir
2024-04-05 01:17:20.187+11:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
2024-04-05 01:17:20.187+11:00 [ERROR] Panic at kaspad/src/daemon.rs:239: Test logger panic

gvbgduh · 2024-04-04T14:51:37Z

core/src/panic.rs

@@ -8,6 +9,23 @@ pub fn configure_panic() {
        default_hook(panic_info);
        println!("Exiting...");
        // TODO: setup a wait time and fold the log system properly


There is also this comment, it's probably partially addressed, but I'm not sure about the case when the logger is not initialized yet and also what's meant by the wait time.

I think this means we should make sure that logs are actually flushed into the files before the process exits. This can possibly be done by making special calls to the underlying log library (maybe @tiram88 would know how), or at least adding a short sleep here

Interesting, it's also the requirement I haven't clarified yet. When is it ok to lose the logs?

Even if we try to wait, the logger might never be initialized as the panic condition might happen before.
Another approach is probably to initialize a fallback logger within the hook or even write to the log file directly if needed.
But, it can cause a race condition, as the logger system can only be initialized once (link), unless we use the different mechanism for the fallback.
But, even with this, logs still will be lost in cases like no disk space or potentially the lack of resources/permissions.

Also, the default behavior of the log facade is to ignore failures writing logs:

Faillible Log interface rust-lang/log#382

Is this possible to make a failable log facade ? rust-lang/log#588

And it seems that there is not much can be done other than deciding whether to panic on log failures globally or not. (Unless we can rely on the specific loggers implementations, but more research needed.)
It's not very helpful in this case, as panicking on a log failure inside the panic itself should causes an infinite recursion code flow, but it should be possible to avoid.

So, what would be the reasonable approach and what trade offs are acceptable?

I'd probably propose to keep this implementation, but also write the panic error to the dedicated file.
In this case, we write to logs when the logger is available, and we write the panic file regardless of the logger status (as long as we can write to the file system).
The issue here is it probably needs some standardization, like location of the file and its content/structure, as it might also be helpful to extend it to the panic report of sorts, providing additional details about the host.
But, I'm not too sure how suitable this approach is.

gvbgduh · 2024-04-04T14:53:23Z

I'm still thinking about unit tests

aspect · 2024-04-07T15:24:13Z

I am concerned that calling error!() before the logger is initialized could cause a panic by itself. If this is the case perhaps a separate function that sets a static atomic bool after the logger init indicating that it is safe to use error!() would make sense. If this is not the case I guess this comment can be ignored.

gvbgduh · 2024-04-07T16:54:33Z

I am concerned that calling error!() before the logger is initialized could cause a panic by itself. If this is the case perhaps a separate function that sets a static atomic bool after the logger init indicating that it is safe to use error!() would make sense. If this is not the case I guess this comment can be ignored.

It's quite the opposite, the default behavior is to silently swallow all log failures, I have corresponding links in another comment. But a static atomic bool might still be of the help.

gvbgduh · 2024-04-09T22:59:48Z

core/src/log/mod.rs

@@ -16,6 +17,8 @@ mod logger;
    }
 }

+pub static LOGGING_INITIALIZED: AtomicBool = AtomicBool::new(false);


on a second thought, it might be better to keep it private and have only a public getter in the module

coderofstuff

I tested this with these configs, adding a panic in the code

--logdir=./logs - panic is logged in both std and error log files as an Error
--nologfiles - still works fine

PR is functional.

biryukovmaxim · 2024-04-23T06:57:15Z

@gvbgduh hi! Thanks for your efforts, it's nice to see our gang of contributors getting bigger!

there are some locations where panic handler is settled before logger initialization. like here

impl Runtime {
    pub fn from_args(args: &Args) -> Self {
        // Configure the panic behavior
        kaspa_core::panic::configure_panic();

        let log_dir = get_log_dir(args);

        // Initialize the logger
        kaspa_core::log::init_logger(log_dir.as_deref(), &args.log_level);

        Self { log_dir: log_dir.map(|log_dir| log_dir.to_owned()) }
    }
}

could you check them?

log::logger() function already has a fallback
we can rely on it.
so if we set hook using the function. when we panic before initialization - it won't produce logs
after - it will work.
Imo no need to use atomic bool, moreover the current approach blocks the thread(since it's sync function) waiting for logger initialization. it means the only way to set the logger - is from another thread, unlikely it will ever be done
to handle the case where we panic before the initialization. (if we really want to handle it now and here) I would suggest to use OnceBool or lazy_static unit () to make it impossible to set false in purposes to determine if logger is initialized.
the state machine would look like:

   None before logger initialized 
   Some(()) after initialization
   No way to make it None again
Only one possible transition

and direct writing to file when it's not initialized

gvbgduh · 2024-04-23T11:30:54Z

thanks @biryukovmaxim, very interesting.

thb, i haven't exercised the thought about 3, but it makes perfect sense as i call sleep in the sync function.

i'd agree we don't really need atomic bool as i see it just recreates the fallback logic from the link in 2, it's just that the initialization state is private to the log crate.
i think it should probably be possible to hack around some logic about the logger itself, all in all, it'll be NoOp logger otherwise, but, i think it's a bit too hacky.

if we really want to handle it now and here

yes, from the practical point of view, i'm really on the fence it's worth the trouble, as i'm not sure there is a lot of business value in covering the gaps for panic before the logging system is up as in most cases it would be within the interaction with CLI.

i'd personally propose to revert to the first revision: we do our best effort to log, but it'll be up to the logging system;
and have a separate ticket for the panic report: on every panic write a dedicate report directly to a file. that file should be modelled and have additional context like host details, host health, etc. that would cover cases when logging wasn't up, but there was still unattended panic during the execution.

but, i'm happy to follow the OnceBool path as well.

coderofstuff

Please address @biryukovmaxim's comments

Reason: Waiting here will block the thread, and as the thread is locked and we wait for the logger system to be initiated that would wait for the thread to be released itself, making the waiting pointless

gvbgduh · 2024-05-01T04:43:46Z

As waiting there is pointless, reverting to the initial commit, where logging is done relying on the logging fallback itself.
Errors caused by panic before the logging system is ready would be ignored (follow up issue #465).

biryukovmaxim · 2024-05-01T05:01:27Z

core/src/panic.rs

+            },
+        };
+        // Log the panic
+        error!("Panic at {}:{}: {}", file, line, message);


PanicInfo implements Display almost the same way as yours. Could you check how the hook work with panic_any passing custom type?

There's method message , however it's only accessable on nightly toolchain. Although std uses it when implements Display

I would suggest to use Display fmt if it works with panic_any, otherwise there's no difference to me

coderofstuff

I retested this by injecting a panic! in the code and also running with --nologfiles and it's functional.

Sample logged panic found in the log files:

2024-04-30 22:52:39.202-06:00 [INFO ] kaspad v0.13.6
2024-04-30 22:52:39.202-06:00 [INFO ] Application directory: /home/dev/.rusty-kaspa
2024-04-30 22:52:39.202-06:00 [INFO ] Data directory: /home/dev/.rusty-kaspa/kaspa-mainnet/datadir
2024-04-30 22:52:39.202-06:00 [INFO ] Logs directory: ./logs
thread 'main' panicked at kaspad/src/daemon.rs:249:5:
At the disco
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exiting...
2024-04-30 22:52:39.203-06:00 [ERROR] Panic at kaspad/src/daemon.rs:249: At the disco

PR looks much simpler than before.

gvbgduh · 2024-05-01T05:06:08Z

Testing
Setup:

cargo run --release --bin kaspad -- --testnet --netsuffix=11

Manually introduced panic after login init:

>~/rusty-kaspa$ git diff
diff --git a/kaspad/src/daemon.rs b/kaspad/src/daemon.rs
index dd8a215..d60c3b0 100644
--- a/kaspad/src/daemon.rs
+++ b/kaspad/src/daemon.rs
@@ -240,6 +240,9 @@ pub fn create_core_with_runtime(runtime: &Runtime, args: &Args, fd_total_budget:
         }
     }

+    // Point of failure after the logs are initialized
+    panic!("This is a test panic with logging initiated");
+
     let consensus_db_dir = db_dir.join(CONSENSUS_DB);

Stderr:

2024-05-01 14:53:43.032+10:00 [INFO ] kaspad v0.13.6
2024-05-01 14:53:43.033+10:00 [INFO ] Application directory: /home/george/.rusty-kaspa
2024-05-01 14:53:43.033+10:00 [INFO ] Data directory: /home/george/.rusty-kaspa/kaspa-testnet-11/datadir
2024-05-01 14:53:43.033+10:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
thread 'main' panicked at kaspad/src/daemon.rs:244:5:
This is a test panic with logging initiated
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exiting...
2024-05-01 14:53:43.033+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244: This is a test panic with logging initiated

Logs:

george@Darksteel:~/rusty-kaspa$ ls /home/george/.rusty-kaspa/kaspa-testnet-11/logs
rusty-kaspa.log  rusty-kaspa_err.log
george@Darksteel:~/rusty-kaspa$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa.log
2024-05-01 14:57:01.610+10:00 [INFO ] kaspad v0.13.6
2024-05-01 14:57:01.610+10:00 [INFO ] Application directory: /home/george/.rusty-kaspa
2024-05-01 14:57:01.610+10:00 [INFO ] Data directory: /home/george/.rusty-kaspa/kaspa-testnet-11/datadir
2024-05-01 14:57:01.611+10:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
2024-05-01 14:57:01.611+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244: This is a test panic with logging initiated
$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa_err.log
2024-05-01 14:57:01.611+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244: This is a test panic with logging initiated

biryukovmaxim · 2024-05-01T05:06:43Z

@gvbgduh could you go through the code and reorder all places where panic handler is settled before logger initialization?

coderofstuff

Oops, my bad. Please check the comments again.

Also, I noticed in the full panic log it looks like this:

thread 'main' panicked at kaspad/src/daemon.rs:249:5:
At the disco

Notice the :249:5: above but the log in the file only has this:

2024-04-30 22:52:39.203-06:00 [ERROR] Panic at kaspad/src/daemon.rs:249: At the disco

The panic log in the file is missing the :5 - this could be useful in lines where there are multiple things happening.

gvbgduh · 2024-05-01T05:08:38Z

@biryukovmaxim yep, can do
@coderofstuff indeed, thanks, let me fix it

gvbgduh · 2024-05-01T05:28:42Z

Updated test:

$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa.log | tail -n 2
2024-05-01 15:25:36.190+10:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
2024-05-01 15:25:36.190+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244:5: This is a test panic with logging initiated
$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa_err.log | tail -n 2
2024-05-01 15:05:35.678+10:00 [ERROR] Panic at kaspad/src/daemon.rs:202: This is a test panic without logging initiated
2024-05-01 15:25:36.190+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244:5: This is a test panic with logging initiated

gvbgduh · 2024-05-01T05:34:57Z

@biryukovmaxim just for a thought exercise, is it worth considering cases when there is panic before kaspa_core::panic::configure_panic();?

biryukovmaxim · 2024-05-01T05:44:22Z

@biryukovmaxim just for a thought exercise, is it worth considering cases when there is panic before kaspa_core::panic::configure_panic();?

I think it's a rare case, that mostly can appear during development phase, where regular panic should be okay.
What can we do with that? Setting another panic(call it prehook) handler before current? Should we considering panics before prehook again? =D

gvbgduh · 2024-05-01T05:50:07Z

@biryukovmaxim thanks, the last commit should cover all places where the panic hook is set

michaelsutton · 2024-05-01T10:03:19Z

The name of the panicking thread is important too

gvbgduh · 2024-05-01T10:50:54Z

good point, update the error message

    Finished release [optimized] target(s) in 1m 06s
     Running `target/release/kaspad --testnet --netsuffix=11`
thread 'main' panicked at kaspad/src/daemon.rs:225:5:
Test panic with logging
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exiting...
2024-05-01 20:46:43.740+10:00 [ERROR] Panic at the thread main at kaspad/src/daemon.rs:225:5: Test panic with logging

~/rusty-kaspa$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa.log | tail -n 2
2024-05-01 20:15:01.498+10:00 [ERROR] Panic at kaspad/src/daemon.rs:225:5: Test panic with logging
2024-05-01 20:46:43.740+10:00 [ERROR] Panic at the thread main at kaspad/src/daemon.rs:225:5: Test panic with logging

~/rusty-kaspa$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa_err.log | tail -n 2
2024-05-01 20:15:01.498+10:00 [ERROR] Panic at kaspad/src/daemon.rs:225:5: Test panic with logging
2024-05-01 20:46:43.740+10:00 [ERROR] Panic at the thread main at kaspad/src/daemon.rs:225:5: Test panic with logging

gvbgduh · 2024-05-01T11:58:35Z

core/src/panic.rs

@@ -19,18 +15,17 @@ pub fn configure_panic() {
            Some(s) => *s,
            None => match panic_info.payload().downcast_ref::<String>() {
                Some(s) => &s[..],
-                None => "unknown",
+                None => "Box<dyn Any>",


interesting, what would it produce?

or it's still the string? is there a reason for such a value?

Just copying https://doc.rust-lang.org/src/std/panicking.rs.html#256

i see, thanks :)

We can also find a way to use this https://doc.rust-lang.org/src/core/panic/location.rs.html#198 but I think we're good

michaelsutton · 2024-05-01T12:46:26Z

Merged. Congratulations @gvbgduh for the first contribution 💯

* Write panic errors to logs (kaspanet#445) * Remove the comment as it doesn't need to be addressed Reason: Waiting here will block the thread, and as the thread is locked and we wait for the logger system to be initiated that would wait for the thread to be released itself, making the waiting pointless * Add column to the log message for the panic error * Move panic setup after the logger init in all places * Add the thread name to the panic error * default hook invoke position + minor style changes

gvbgduh commented Apr 4, 2024

View reviewed changes

gvbgduh commented Apr 9, 2024

View reviewed changes

coderofstuff approved these changes Apr 23, 2024

View reviewed changes

coderofstuff requested changes Apr 29, 2024

View reviewed changes

Write panic errors to logs (kaspanet#445)

b0eb783

gvbgduh force-pushed the panic-logs branch from bfe0c71 to b0eb783 Compare May 1, 2024 04:00

Remove the comment as it doesn't need to be addressed

823e5d7

Reason: Waiting here will block the thread, and as the thread is locked and we wait for the logger system to be initiated that would wait for the thread to be released itself, making the waiting pointless

gvbgduh mentioned this pull request May 1, 2024

Create a panic report with details #465

Closed

biryukovmaxim reviewed May 1, 2024

View reviewed changes

coderofstuff approved these changes May 1, 2024

View reviewed changes

coderofstuff requested changes May 1, 2024

View reviewed changes

Add column to the log message for the panic error

392b6de

Move panic setup after the logger init in all places

6f18944

Add the thread name to the panic error

f395122

default hook invoke position + minor style changes

1268a29

michaelsutton approved these changes May 1, 2024

View reviewed changes

gvbgduh commented May 1, 2024

View reviewed changes

michaelsutton merged commit 034d25d into kaspanet:master May 1, 2024
6 checks passed

coderofstuff mentioned this pull request May 6, 2024

Panic logs should appear both in the standard log and the error log files #445

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write panic errors to logs (#445) #446

Write panic errors to logs (#445) #446

gvbgduh commented Apr 4, 2024

gvbgduh Apr 4, 2024

michaelsutton Apr 7, 2024

gvbgduh Apr 7, 2024 •

edited

Loading

gvbgduh Apr 7, 2024

gvbgduh commented Apr 4, 2024

aspect commented Apr 7, 2024

gvbgduh commented Apr 7, 2024

gvbgduh Apr 9, 2024

coderofstuff left a comment

biryukovmaxim commented Apr 23, 2024 •

edited

Loading

gvbgduh commented Apr 23, 2024

coderofstuff left a comment

gvbgduh commented May 1, 2024

biryukovmaxim May 1, 2024 •

edited

Loading

coderofstuff left a comment

gvbgduh commented May 1, 2024

biryukovmaxim commented May 1, 2024

coderofstuff left a comment

gvbgduh commented May 1, 2024

gvbgduh commented May 1, 2024

gvbgduh commented May 1, 2024

biryukovmaxim commented May 1, 2024 •

edited

Loading

gvbgduh commented May 1, 2024

michaelsutton commented May 1, 2024

gvbgduh commented May 1, 2024

gvbgduh May 1, 2024

gvbgduh May 1, 2024 •

edited

Loading

michaelsutton May 1, 2024

gvbgduh May 1, 2024

michaelsutton May 1, 2024 •

edited

Loading

michaelsutton commented May 1, 2024

Write panic errors to logs (#445) #446

Write panic errors to logs (#445) #446

Conversation

gvbgduh commented Apr 4, 2024

gvbgduh Apr 4, 2024

Choose a reason for hiding this comment

michaelsutton Apr 7, 2024

Choose a reason for hiding this comment

gvbgduh Apr 7, 2024 • edited Loading

Choose a reason for hiding this comment

gvbgduh Apr 7, 2024

Choose a reason for hiding this comment

gvbgduh commented Apr 4, 2024

aspect commented Apr 7, 2024

gvbgduh commented Apr 7, 2024

gvbgduh Apr 9, 2024

Choose a reason for hiding this comment

coderofstuff left a comment

Choose a reason for hiding this comment

biryukovmaxim commented Apr 23, 2024 • edited Loading

gvbgduh commented Apr 23, 2024

coderofstuff left a comment

Choose a reason for hiding this comment

gvbgduh commented May 1, 2024

biryukovmaxim May 1, 2024 • edited Loading

Choose a reason for hiding this comment

coderofstuff left a comment

Choose a reason for hiding this comment

gvbgduh commented May 1, 2024

biryukovmaxim commented May 1, 2024

coderofstuff left a comment

Choose a reason for hiding this comment

gvbgduh commented May 1, 2024

gvbgduh commented May 1, 2024

gvbgduh commented May 1, 2024

biryukovmaxim commented May 1, 2024 • edited Loading

gvbgduh commented May 1, 2024

michaelsutton commented May 1, 2024

gvbgduh commented May 1, 2024

gvbgduh May 1, 2024

Choose a reason for hiding this comment

gvbgduh May 1, 2024 • edited Loading

Choose a reason for hiding this comment

michaelsutton May 1, 2024

Choose a reason for hiding this comment

gvbgduh May 1, 2024

Choose a reason for hiding this comment

michaelsutton May 1, 2024 • edited Loading

Choose a reason for hiding this comment

michaelsutton commented May 1, 2024

gvbgduh Apr 7, 2024 •

edited

Loading

biryukovmaxim commented Apr 23, 2024 •

edited

Loading

biryukovmaxim May 1, 2024 •

edited

Loading

biryukovmaxim commented May 1, 2024 •

edited

Loading

gvbgduh May 1, 2024 •

edited

Loading

michaelsutton May 1, 2024 •

edited

Loading