-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write panic errors to logs (#445) #446
Conversation
core/src/panic.rs
Outdated
@@ -8,6 +9,23 @@ pub fn configure_panic() { | |||
default_hook(panic_info); | |||
println!("Exiting..."); | |||
// TODO: setup a wait time and fold the log system properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also this comment, it's probably partially addressed, but I'm not sure about the case when the logger is not initialized yet and also what's meant by the wait time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this means we should make sure that logs are actually flushed into the files before the process exits. This can possibly be done by making special calls to the underlying log library (maybe @tiram88 would know how), or at least adding a short sleep here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it's also the requirement I haven't clarified yet. When is it ok to lose the logs?
Even if we try to wait, the logger might never be initialized as the panic condition might happen before.
Another approach is probably to initialize a fallback logger within the hook or even write to the log file directly if needed.
But, it can cause a race condition, as the logger system can only be initialized once (link), unless we use the different mechanism for the fallback.
But, even with this, logs still will be lost in cases like no disk space or potentially the lack of resources/permissions.
Also, the default behavior of the log
facade is to ignore failures writing logs:
- Faillible Log interface rust-lang/log#382
- Is this possible to make a failable log facade ? rust-lang/log#588
And it seems that there is not much can be done other than deciding whether to panic on log failures globally or not. (Unless we can rely on the specific loggers implementations, but more research needed.)
It's not very helpful in this case, as panicking on a log failure inside the panic itself should causes an infinite recursion code flow, but it should be possible to avoid.
So, what would be the reasonable approach and what trade offs are acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably propose to keep this implementation, but also write the panic error to the dedicated file.
In this case, we write to logs when the logger is available, and we write the panic file regardless of the logger status (as long as we can write to the file system).
The issue here is it probably needs some standardization, like location of the file and its content/structure, as it might also be helpful to extend it to the panic report of sorts, providing additional details about the host.
But, I'm not too sure how suitable this approach is.
I'm still thinking about unit tests |
I am concerned that calling |
It's quite the opposite, the default behavior is to silently swallow all log failures, I have corresponding links in another comment. But a static atomic bool might still be of the help. |
core/src/log/mod.rs
Outdated
@@ -16,6 +17,8 @@ mod logger; | |||
} | |||
} | |||
|
|||
pub static LOGGING_INITIALIZED: AtomicBool = AtomicBool::new(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a second thought, it might be better to keep it private and have only a public getter in the module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this with these configs, adding a panic in the code
--logdir=./logs
- panic is logged in both std and error log files as anError
--nologfiles
- still works fine
PR is functional.
@gvbgduh hi! Thanks for your efforts, it's nice to see our gang of contributors getting bigger!
impl Runtime {
pub fn from_args(args: &Args) -> Self {
// Configure the panic behavior
kaspa_core::panic::configure_panic();
let log_dir = get_log_dir(args);
// Initialize the logger
kaspa_core::log::init_logger(log_dir.as_deref(), &args.log_level);
Self { log_dir: log_dir.map(|log_dir| log_dir.to_owned()) }
}
} could you check them?
and direct writing to file when it's not initialized |
thanks @biryukovmaxim, very interesting. thb, i haven't exercised the thought about i'd agree we don't really need atomic bool as i see it just recreates the fallback logic from the link in
yes, from the practical point of view, i'm really on the fence it's worth the trouble, as i'm not sure there is a lot of business value in covering the gaps for panic before the logging system is up as in most cases it would be within the interaction with CLI. i'd personally propose to revert to the first revision: we do our best effort to log, but it'll be up to the logging system; but, i'm happy to follow the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address @biryukovmaxim's comments
Reason: Waiting here will block the thread, and as the thread is locked and we wait for the logger system to be initiated that would wait for the thread to be released itself, making the waiting pointless
As waiting there is pointless, reverting to the initial commit, where logging is done relying on the logging fallback itself. |
core/src/panic.rs
Outdated
}, | ||
}; | ||
// Log the panic | ||
error!("Panic at {}:{}: {}", file, line, message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PanicInfo
implements Display
almost the same way as yours. Could you check how the hook work with panic_any
passing custom type?
There's method message
, however it's only accessable on nightly toolchain. Although std
uses it when implements Display
I would suggest to use Display
fmt if it works with panic_any, otherwise there's no difference to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested this by injecting a panic!
in the code and also running with --nologfiles
and it's functional.
Sample logged panic found in the log files:
2024-04-30 22:52:39.202-06:00 [INFO ] kaspad v0.13.6
2024-04-30 22:52:39.202-06:00 [INFO ] Application directory: /home/dev/.rusty-kaspa
2024-04-30 22:52:39.202-06:00 [INFO ] Data directory: /home/dev/.rusty-kaspa/kaspa-mainnet/datadir
2024-04-30 22:52:39.202-06:00 [INFO ] Logs directory: ./logs
thread 'main' panicked at kaspad/src/daemon.rs:249:5:
At the disco
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exiting...
2024-04-30 22:52:39.203-06:00 [ERROR] Panic at kaspad/src/daemon.rs:249: At the disco
PR looks much simpler than before.
Testing
Manually introduced panic after login init:
Stderr:
Logs:
|
@gvbgduh could you go through the code and reorder all places where panic handler is settled before logger initialization? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, my bad. Please check the comments again.
Also, I noticed in the full panic log it looks like this:
thread 'main' panicked at kaspad/src/daemon.rs:249:5:
At the disco
Notice the :249:5:
above but the log in the file only has this:
2024-04-30 22:52:39.203-06:00 [ERROR] Panic at kaspad/src/daemon.rs:249: At the disco
The panic log in the file is missing the :5
- this could be useful in lines where there are multiple things happening.
@biryukovmaxim yep, can do |
Updated test:
|
@biryukovmaxim just for a thought exercise, is it worth considering cases when there is panic before |
I think it's a rare case, that mostly can appear during development phase, where regular panic should be okay. |
@biryukovmaxim thanks, the last commit should cover all places where the panic hook is set |
The name of the panicking thread is important too |
good point, update the error message
|
@@ -19,18 +15,17 @@ pub fn configure_panic() { | |||
Some(s) => *s, | |||
None => match panic_info.payload().downcast_ref::<String>() { | |||
Some(s) => &s[..], | |||
None => "unknown", | |||
None => "Box<dyn Any>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, what would it produce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or it's still the string? is there a reason for such a value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also find a way to use this https://doc.rust-lang.org/src/core/panic/location.rs.html#198 but I think we're good
Merged. Congratulations @gvbgduh for the first contribution 💯 |
* Write panic errors to logs (kaspanet#445) * Remove the comment as it doesn't need to be addressed Reason: Waiting here will block the thread, and as the thread is locked and we wait for the logger system to be initiated that would wait for the thread to be released itself, making the waiting pointless * Add column to the log message for the panic error * Move panic setup after the logger init in all places * Add the thread name to the panic error * default hook invoke position + minor style changes
* Write panic errors to logs (kaspanet#445) * Remove the comment as it doesn't need to be addressed Reason: Waiting here will block the thread, and as the thread is locked and we wait for the logger system to be initiated that would wait for the thread to be released itself, making the waiting pointless * Add column to the log message for the panic error * Move panic setup after the logger init in all places * Add the thread name to the panic error * default hook invoke position + minor style changes
* Write panic errors to logs (kaspanet#445) * Remove the comment as it doesn't need to be addressed Reason: Waiting here will block the thread, and as the thread is locked and we wait for the logger system to be initiated that would wait for the thread to be released itself, making the waiting pointless * Add column to the log message for the panic error * Move panic setup after the logger init in all places * Add the thread name to the panic error * default hook invoke position + minor style changes
Loggin from
panic::set_hook
.It uses
error
macro from the same package modulelog
It works, but only after the logger is initialized by the kaspad daemon, before that it does nothing, so there is a gap.