Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend SymCC to perform forking exploration (not just concolic execution) #14

Open
pgarba opened this issue Aug 15, 2020 · 7 comments
Open
Labels
enhancement New feature or request

Comments

@pgarba
Copy link

pgarba commented Aug 15, 2020

Hi,

I'm testing with a simple test case and it seems to fail. Any idea what's going wrong ?

I'm compiling with -O0 and the expected result should be 0x10. It works with >=-O2 as the
for loop will be simplified by LLVM.

Thanks,
Peter

#include <stdint.h>
#include <stdio.h>
#include <unistd.h>

uint64_t g_value = 0x10;

int main(int argc, char *argv[]) {
  uint64_t x;
  if (read(STDIN_FILENO, &x, sizeof(x)) != sizeof(x)) {
    printf("Failed to read x\n");
    return -1;
  }

  uint64_t y=0;
  for (int i=0;i<x;i++) {
    ++y;
  }

  printf("%s\n", (y == g_value) ? "yes" : "no");

  return 0;
}
@sebastianpoeplau
Copy link
Collaborator

Hi @pgarba,

The behavior in your test case unfortunately depends on the input you pass to the program: if it's less than 0x10, it's very unlikely that you get 0x10 immediately (although you will eventually if you keep feeding the new test cases back to SymCC), while higher inputs are likely to give you 0x10 on the first attempt.

The reason is that, when the compiler doesn't simplify the for loop, new test cases are generated by the check i < x (because this is where the symbolic input x enters into a branch condition). Now, with a small x, say 0x00, the only time this check happens is when evaluating 0x00 < x to false - the solver will generate some larger x which indeed flips the outcome of the conditional, but it's very unlikely that you get exactly 0x10 (simply because the solver doesn't know that it's an important value at this point in the program). Now let's assume you run with a large input, say 0x20. Then the check happens in each iteration of the loop, one of them being i = 0x10 and thus evaluating 0x10 < x to true. At this point, the solver can flip the conditional by generating a test case x := 0x10 (which it does in my local tests). There is one more complication: The QSYM backend has a back-off mechanism that stops generating new test cases for the same conditional after some number of repetitions - if you're unlucky, it makes the analysis skip the comparison 0x10 < x... (But again, feeding SymCC's outputs back into the program will eventually get you there.)

In the optimized case, in contrast, the program probably sets y := x directly and then compares y == g_value, so it is easy for the solver to produce the test case x := g_value.

As far as I know, this is a problem in all symbolic executors. The best countermeasure, I suppose, is to have the compiler optimize the program as much as possible - this also helps to reduce the cost of solving, e.g., when the compiler replaces multiplications with shifts.

Hope this makes sense :)

Cheers,
Sebastian

@sebastianpoeplau sebastianpoeplau added the question Further information is requested label Aug 17, 2020
@pgarba
Copy link
Author

pgarba commented Aug 17, 2020

Thanks for your answer. After reading the SymCC paper it's a little bit more clear to me why this test case is failing.

What I still don't understand is why the true path for (y == g_value) will be not explored. Is it because you assume that y has become concrete as no test case has been generated that proves that its not ?

For this test cases I think KLEE can create a valid test cases that reaches the true path. It just takes longer to prove that one.

Thanks,
Peter

@sebastianpoeplau
Copy link
Collaborator

What I still don't understand is why the true path for (y == g_value) will be not explored. Is it because you assume that y has become concrete as no test case has been generated that proves that its not ?

When the program compares y == g_value, both variables are concrete: g_value is a constant, and y is initialized to the constant 0 and possibly incremented by the constant 1. Its value is, of course, dependent on control flow, which in turn depends on symbolic input, but the executor lacks that abstract view.

For this test cases I think KLEE can create a valid test cases that reaches the true path. It just takes longer to prove that one.

KLEE lacks an abstract view of loops just like SymCC, but it explores all possible paths where SymCC only gives you the "closest" new paths. Effectively, KLEE solves the test by trying all possible numbers of loop iterations (which is why it can take very long). You get the same result by running SymCC in a loop that feeds the newly generated test cases back into the program.

Cheers,
Sebastian

@pgarba
Copy link
Author

pgarba commented Aug 18, 2020

Would be it possible to have an option in SymCC to enforce exploration of all possible paths ? I'm asking as we would like to use SymCC in our tool SATURN to explorer all possible paths in a partial control flow graph that is lifted from an input binary.

Thanks,
Peter

@aurelf
Copy link
Member

aurelf commented Aug 18, 2020

Hi,

Would be it possible to have an option in SymCC to enforce exploration of all possible paths ?

Yes this would be possible, and nice to have. I imagine this could rely on doing a fork() at each branch instead of only following the concrete one, plus a way to schedule processes. Using the copy on write memory would allow to limit the performance impact. However, I imagine there will be weirdness with sockets and file handles and I'm not sure how best to handle this.

Best,
Aurélien

@aurelf aurelf changed the title Failing on simple test case Extend SymCC to perform forking exploration (not just concolic execution) Aug 18, 2020
@aurelf aurelf added the enhancement New feature or request label Aug 18, 2020
@sebastianpoeplau sebastianpoeplau removed the question Further information is requested label Oct 1, 2020
@AndrewQuijano
Copy link
Contributor

Hello,
Is there any update on this task and whether PR #157 would also fix this? Will PR #157 be completed relatively soon?

@sebastianpoeplau
Copy link
Collaborator

@AndrewQuijano I don't think #157 addresses this particular issue; it's an introspection feature, mainly intended to help with debugging incorrect expressions in SymCC and other projects using the same runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants