-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Tainting Program Input #1094
Comments
Do you mean the command-line input? It is surprisingly hard :) Like for simple binaries, it is totally doable, just taint the ;; file taint-main.lisp -- taints the inputs of the main function
(require posix)
(defmethod call (name argc argv)
(when (= name 'main)
(+= argv (sizeof ptr_t)) ; skip the binary name
(while (read-word ptr_t argv)
(let ((arg (read-word ptr_t argv)))
(taint-introduce-indirectly
'command-line-arguments arg (strlen arg)))
(+= argv (sizeof ptr_t))))) So far so good, and if you will take a simple example from our testsuite it will work like a charm
Instead of propagating the taint event to the static representation of the program we just record all observations of the
You can further cut the address from the event using
These are the instructions that are tainted by the input. If your binary is compiled with the debugging information, you can even pipe them through But here comes the real-world and in it is always not that easy. In the real-world (e.g., in To analyze real binaries en mass, we are not focusing on command-line arguments only, but we generalize to all inputs that come to the binary. So, we track the known user-input functions and taint the data that they define. In some of our projects, we even employed a special module that taints all upper exposed variables, so that we can be sure, that argv-pointed data and environment variables are also included in the set. I hope that it gives enough starting points :)
Indeed, thanks for noticing this. We had to delete the old documentation because we ran out of space on github.io. I think that I have fixed all of them, tell me if you find any non-working reference. |
Indeed, thank you very much already :)
Luckily, I have already compiled all the coreutils myself, disabling ASLR as well stack canaries, and also not stripping debug information. (I have done that to speed up CFG generation). Using Using taint-sources instead of taint-main, I do not get any (taint-attached) results, do I perhaps need to further change the arguments in some way?
At the time of writing this, the two links under here are still invalid. |
There is no taint propagation between the filename and its contents by default, as indeed that are two different pieces of data that are not intersecting. But you can, of course, enable it with the following simple Primus Lisp script (sorry didn't have time to check if it works) (defmethod call-return (name path fd)
(when (and (= name 'open)
(taint-get-direct 'command-line-arguments path))
(dict-add 'tainted-file-descriptors fd path)))
(defmethod call-return (name fd buf _ bytes-read)
(when (and (= name 'read)
(dict-has 'tainted-file-descriptors fd))
(taint-introduce-indirectly 'user-input buf bytes-read))) Just add it to the Now, we we see the call to read that returns
yeah, add also |
Hm, it doesn't seem to work quite yet, I get the same addresses.
Also here I still get no taint-attached results. :/ |
Can you drop the binary here (in a zip to make GitHub happy), so that we can be on the same page?
then |
Here it is: cat.zip
Like this: |
I see, we forgot that there is the flags parameter in So, with the following code (require posix)
(require pointers)
(defmethod call (name argc argv)
(when (= name 'main)
(+= argv (sizeof ptr_t)) ; skip the binary name
(while (read-word ptr_t argv)
(let ((arg (read-word ptr_t argv)))
(taint-introduce-indirectly
'command-line-arguments arg (strlen arg)))
(+= argv (sizeof ptr_t)))))
(defmethod call-return (name path _ fd)
(when (and (= name 'open)
(taint-get-indirect 'command-line-arguments path))
(dict-add 'tainted-file-descriptors fd path)))
(defmethod call-return (name fd buf _ bytes-read)
(when (and (= name 'read)
(/= bytes-read 0)
(/= bytes-read -1)
(dict-has 'tainted-file-descriptors fd))
(taint-introduce-indirectly 'user-input buf bytes-read))) we can run bap as
This will add only one address to the set of tainted addresses, the long awaited
You can even then zip it (together with the
On mine machine it will investigate 765 more paths and discover about 30 more instructions that potentially depend on the user input. It will now cover nearly half of the program. This is the code that we were able to reach from main. It looks that the binary is static and includes a lot of unreachable code, as far as I can see. |
First of all, thanks a lot for your efforts! Running the updated taint-main I run into some weird problems though. |
Nope, it's a bug, fixed in #1049
That's strange. Did you restore the contents of the file? As in the standard interpretation mode the empty file will not induce any data dependency on write. Also, take a look in the log file (it is either in
In the standard interpretation mode, yes, it will contain only the contents of those files. In promiscuous mode it may contain random values as the interpreter will randomly explore various paths with random inputs.
Yep, |
Running it again (making 100% sure foo and bar are not empty), I don't get anything with "error", "fail", "missing" etc in the log. The only remotely interesting lines I found are
as well as
which properly are nothing out of the ordinary. This might be far-fetched, but does this look alright? (I can only suspect that the execution potentially stops preemptively)
|
Yep, everything looks alright. But do you see the file contents in the stdout file? Looks like that you need to update. I don't remember the details, but without #1049 it won't work, as the file contents is deleted before it is read. |
Using the current master (instead of the 2.0.0) version, I now got it working :) Thanks a lot again. |
You may also find this discussion in our Gitter channel interesting. |
@ivg One question about the observation log, is there a way of adding or recovering information about the order in which the different addresses have been hit? (Like a control flow recovery, in order to determine the "last" address tainted). |
Hello,
I want to taint the input of a program (64bit ELF on Linux to be precise). For example, running cat on one or more files, I'd like to know what instructions (specifically, at what addresses), the input changes/influences the program execution, up until one or more sinks are reached (in this case a sink would be cat writing to stdout).
So far, I have tried to use primus-taint, however I am not quite sure how I could use the IR-results to get the info I need. Is there a simple way of achieving the aforementioned goal?
On a side note I might add that the links under Running Primus Taint Analysis point to deleted posts.
All help is appreciated, thanks in advance.
The text was updated successfully, but these errors were encountered: