Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe problem solved by this pull request
Logger sometimes doesn't create a log file.
If the logger gets stopped at an unfortunate time, the
buffer.close_file()
function can be skipped, in the log_writer_file run loop.The lock gets unlocked here
https://github.com/PX4/Firmware/blob/master/src/modules/logger/log_writer_file.cpp#L274-L277
which allows for
_buffers[0]._should_run
to change to false. The file didn't close yet since the_buffers[0]._should_run
was true when that check happened.The logger now does't have new data so the
notify()
is not called, which results in:https://github.com/PX4/Firmware/blob/master/src/modules/logger/log_writer_file.cpp#L299
waiting indefinitely.
When the logger is started again, this while loop runs infinitely, waiting for the file to get closed, which is blocked by the
pthread_cond_wait(&_cv, &_mtx)
.https://github.com/PX4/Firmware/blob/master/src/modules/logger/log_writer_file.cpp#L90-L95
and no new log file is created.
Describe your solution
Do not wait for new data if the logger is not running, instead run the loop once more and allow for the remaining data to be written and file closed.
Test data / coverage
Due to being a race condition, the problem can be reproduced at random times when stopping the logger. The problem occurs in SITL as well. I debugged the problem and tested a solution with various print statements. In order to test if the solution is valid since it occurs randomly, I caught situations when the loop would be stuck in a wait condition, but was "saved" by the fix instead, and one more loop iteration was performed, and the file was closed.
Additional context
Our system basically never shuts down, which means that once this bug happens, we do not get the logs anymore until we notice we do not get them (usually already too late) and reboot the drone.