Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise the logging infrastructure #722

Closed
ivg opened this issue Oct 5, 2017 · 2 comments · Fixed by #937
Closed

Revise the logging infrastructure #722

ivg opened this issue Oct 5, 2017 · 2 comments · Fixed by #937
Assignees
Milestone

Comments

@ivg
Copy link
Member

ivg commented Oct 5, 2017

Our current system provides one log per run, and provides the log rotation mechanism, that will oldify previous logs before starting a new one, thus providing a history of the last 100 invocations.

The major limitation of this scheme is that it doesn't work fine with the multiple runs of bap in parallel. Once one bap instance starts it creates a new file named log and starts writing into it. If before the first instance finishes another is started, a new file will be created and the old one will be renamed to log~1 (and old~1 will be renamed to old~2 and so on until the oldest file old~99 will be reached, which will be just deleted). So at the end, the log file will correspond to an instance of bap that started last. That's not too bad, if we will ignore the fact that it is hard to figure out which log corresponds to which instance. However, on a high congestion system, it may happen that several instances of bap will enter the log-rotate sequence, that is in fact non-reenterant and is not protected. The ramifications are: (1) a log may be just unlinked - lost log or (2) two instances may open the same file - log corrupted. Basically, this is the classic TOCTOU class of bug.

The latter two issues can be fixed by introducing some kind of locking mechanism (as it is done in the caching subsystem). However, it will not solve the usability issue that will make it hard to guess which log belongs to which instance.

Basically, we have the following options:

  1. Keep everything as it is (just fix the TOCTOU bug), possibly logging each job into its own directory.
  2. Push data from all instances to the same log, prefixing each entry with the pid number. Enable log rotation based on the log size or make it daily
  3. Create a unique log file for each instance, probably mixing in the PID number, and implement some sort of GC, that will remove too old logs.
  4. Something else?

Fix it and keep it

Basically, the scheme works fine with multiple instances of BAP if each instance is using its own folder. This is how we use it with IDA Pro. If there are several chains of calls, then each chain may have its own logging folder, so the sequence of logs will be useful. We still need to fix the TOCTOU bug, as we may not expect a user to adhere to this scheme. We also need to provide a command line interface for logger directory location, as right now we are relying on the BAP_LOG variable.

This is the solution of the least effort.

Big mess of messages

This is a standard solution in the style of syslogger. All instances are writing to the same file, prefixing messages with the PID of a process. This approach has two main drawbacks:

  1. It doesn't scale well. The log file becomes a congestion point very soon, since we will need to sync multiple instances of BAP on every event. The problem can be delayed by buffering, but it will be still a problem.
  2. Logs will be unreadable without a preprocessing. Basically, we will need to build tools that will extract data from a log file. Doesn't sound like a lot of fun both from a user and developer point of view.

The good thing is that we can:

  1. reuse the system logger for that
  2. it is always easy to get one and only one log file, as we will in fact have a master log file (that can be piped to a console, so that we can always monitor the current activity in the platform).

Lots of logs

We may just rely on temporary files with a PID and a time stamp mixed in to always get a unique but more or less identifiable file for each running instance. This is a simple and well established mechanism that scales great with, basically, only one significant drawback. It's hard for a user to find a log file that corresponds to a particular instance. Especially after it has finished, who knows what was its PID?

Best of all?

Maybe we can try some mixture of solutions. Like having unique log files, that are linked to the chain of logs in the order of creation and everything is governed by a master log the logging system itself, that logs every time an instances is started and finishes.

Discussion

This is marked as discussion issue, so everyone are welcome to support this or that option, sugest their own solutions, and just asking questions.

@ivg ivg added the discuss label Oct 5, 2017
@issue-sh
Copy link

issue-sh bot commented Nov 8, 2017

ivg set pipeline to Icebox

@ivg
Copy link
Member Author

ivg commented Aug 8, 2018

In the current state we are able to specify the log file directory location via command line parameters, so the common workflow now is to run each analysis in a specific directory with its own log files.

The TOCTOU bug is still not fixed. @gitoleg, please fix it for 1.5

@ivg ivg added this to the 1.5.0 milestone Aug 8, 2018
@ivg ivg modified the milestones: 1.5.0, 1.6.0 Feb 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants