Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Delta Debugging and compare to FLiT Bisect #171

Open
mikebentley15 opened this issue Jul 10, 2018 · 0 comments
Open

Implement Delta Debugging and compare to FLiT Bisect #171

mikebentley15 opened this issue Jul 10, 2018 · 0 comments
Labels
documentation Involves touching documentation enhancement python Involves touching python code

Comments

@mikebentley15
Copy link
Collaborator

I just read the original paper introducing Delta Debugging. It is a very good read and surprising that it is not more well known. It can be coerced into doing exactly what FLiT bisect sets out to accomplish. It looks for a minimal set of items that still causes the test function to return false. This can be potentially used in one of the following ways (and perhaps more):

  1. Have a test that returns false only when you get the exact bad answer (all bad files)
  2. Have a test hat returns false only when you get the exact good answer (all good files)
  3. Have a test that returns false if there is any difference from the good answer, find the minimal set, remove, rinse, and repeat (similar to the bisection algorithm used now)

Numbers 1 and 2 will not necessarily produce the same files. In fact, if there is a change that requires the optimization to be done to two files for it to be noticeable, then the 2nd approach will have one of the two bad files in the list of good files, which would be misleading.

I'm thinking number 1 will be the most performant to find all necessary optimization effects. But we can compare numbers 1 and 3 together and with the existing bisect implementation. The advantage of doing number 3 is that you will identify which files are independent (i.e. cause a problem by themselves), and which ones need to be optimized together (in pairs of 2 or 3 or more).

The Delta Debugging algorithm can be defined by the following:

Let C be the set of full input that causes a function test() to return failure, where test(<empty>) returns success. The Delta Debugging algorithm determines an approximation to the smallest c that is a subset of C that causes test() to still fail, in such a way that if any element of c were removed, the test would not fail.

set ddmin(set C, function test) {
  return ddmin_recursive(C, test, 2);
}

set ddmin_recursive(set C, function test, int n) {
  list<set> deltas = split_set(C, n); // split into n almost equal pieces
  // Try reduce to subset
  for (auto delta : deltas) {
    if (test(delta) == false) {
      return ddmin_recursive(delta, test, 2);
    }
  }
  // Try reduce to complement
  for (auto delta : deltas) {
    set complement = C.set_minus(delta);
    if (test(delta) == false) {
      return ddmin_recursive(complement, test, max(n-1, 2));
    }
  }
  // Try increase granularity
  if (n < C.size()) {
    return ddmin_recursive(C, test, min(C.size(), 2*n))
  }
  // Otherwise, done
  return C;
}

An optimization mentioned in the paper is that the test() function should memoize the return values for specific inputs as it may end up testing the same thing more than once. For those who do not know what memoization is, it is caching previous input/output pairs and returning the previously returned output for an already seen input.

@mikebentley15 mikebentley15 added enhancement python Involves touching python code labels Jul 13, 2018
@mikebentley15 mikebentley15 added the documentation Involves touching documentation label Aug 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Involves touching documentation enhancement python Involves touching python code
Projects
None yet
Development

No branches or pull requests

1 participant