Revise the logging infrastructure #722

ivg · 2017-10-05T14:21:26Z

Our current system provides one log per run, and provides the log rotation mechanism, that will oldify previous logs before starting a new one, thus providing a history of the last 100 invocations.

The major limitation of this scheme is that it doesn't work fine with the multiple runs of bap in parallel. Once one bap instance starts it creates a new file named log and starts writing into it. If before the first instance finishes another is started, a new file will be created and the old one will be renamed to log~1 (and old~1 will be renamed to old~2 and so on until the oldest file old~99 will be reached, which will be just deleted). So at the end, the log file will correspond to an instance of bap that started last. That's not too bad, if we will ignore the fact that it is hard to figure out which log corresponds to which instance. However, on a high congestion system, it may happen that several instances of bap will enter the log-rotate sequence, that is in fact non-reenterant and is not protected. The ramifications are: (1) a log may be just unlinked - lost log or (2) two instances may open the same file - log corrupted. Basically, this is the classic TOCTOU class of bug.

The latter two issues can be fixed by introducing some kind of locking mechanism (as it is done in the caching subsystem). However, it will not solve the usability issue that will make it hard to guess which log belongs to which instance.

Basically, we have the following options:

Keep everything as it is (just fix the TOCTOU bug), possibly logging each job into its own directory.
Push data from all instances to the same log, prefixing each entry with the pid number. Enable log rotation based on the log size or make it daily
Create a unique log file for each instance, probably mixing in the PID number, and implement some sort of GC, that will remove too old logs.
Something else?

Fix it and keep it

Basically, the scheme works fine with multiple instances of BAP if each instance is using its own folder. This is how we use it with IDA Pro. If there are several chains of calls, then each chain may have its own logging folder, so the sequence of logs will be useful. We still need to fix the TOCTOU bug, as we may not expect a user to adhere to this scheme. We also need to provide a command line interface for logger directory location, as right now we are relying on the BAP_LOG variable.

This is the solution of the least effort.

Big mess of messages

This is a standard solution in the style of syslogger. All instances are writing to the same file, prefixing messages with the PID of a process. This approach has two main drawbacks:

It doesn't scale well. The log file becomes a congestion point very soon, since we will need to sync multiple instances of BAP on every event. The problem can be delayed by buffering, but it will be still a problem.
Logs will be unreadable without a preprocessing. Basically, we will need to build tools that will extract data from a log file. Doesn't sound like a lot of fun both from a user and developer point of view.

The good thing is that we can:

reuse the system logger for that
it is always easy to get one and only one log file, as we will in fact have a master log file (that can be piped to a console, so that we can always monitor the current activity in the platform).

Lots of logs

We may just rely on temporary files with a PID and a time stamp mixed in to always get a unique but more or less identifiable file for each running instance. This is a simple and well established mechanism that scales great with, basically, only one significant drawback. It's hard for a user to find a log file that corresponds to a particular instance. Especially after it has finished, who knows what was its PID?

Best of all?

Maybe we can try some mixture of solutions. Like having unique log files, that are linked to the chain of logs in the order of creation and everything is governed by a master log the logging system itself, that logs every time an instances is started and finishes.

Discussion

This is marked as discussion issue, so everyone are welcome to support this or that option, sugest their own solutions, and just asking questions.

The text was updated successfully, but these errors were encountered:

issue-sh · 2017-11-08T21:03:16Z

ivg set pipeline to Icebox

ivg · 2018-08-08T14:35:39Z

In the current state we are able to specify the log file directory location via command line parameters, so the common workflow now is to run each analysis in a specific directory with its own log files.

The TOCTOU bug is still not fixed. @gitoleg, please fix it for 1.5

ivg added the discuss label Oct 5, 2017

issue-sh bot added the Icebox label Nov 8, 2017

ivg removed Icebox discuss labels Aug 8, 2018

ivg assigned gitoleg Aug 8, 2018

ivg added this to the 1.5.0 milestone Aug 8, 2018

ivg modified the milestones: 1.5.0, 1.6.0 Feb 18, 2019

gitoleg mentioned this issue Mar 28, 2019

fixes TOCTOU bug in bap log #937

Merged

gitoleg closed this as completed in #937 Apr 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise the logging infrastructure #722

Revise the logging infrastructure #722

ivg commented Oct 5, 2017 •

edited

Loading

issue-sh bot commented Nov 8, 2017

ivg commented Aug 8, 2018

Revise the logging infrastructure #722

Revise the logging infrastructure #722

Comments

ivg commented Oct 5, 2017 • edited Loading

Fix it and keep it

Big mess of messages

Lots of logs

Best of all?

Discussion

issue-sh bot commented Nov 8, 2017

ivg commented Aug 8, 2018

ivg commented Oct 5, 2017 •

edited

Loading