Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve test discovery algorithm #8

Open
domanchi opened this issue Aug 29, 2019 · 1 comment
Open

Improve test discovery algorithm #8

domanchi opened this issue Aug 29, 2019 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@domanchi
Copy link
Contributor

Issue

Currently, we perform a naive search for request sequences longer than n=1. That is, if we have three endpoints (A, B, and C) and n=2, we would generate 6 request sequences:

  1. A,B
  2. A,C
  3. B,A
  4. B,C
  5. C,A
  6. C,B

We can do better than this.

Solution

In this research paper (Section 3.1), Microsoft describes their test generation algorithm which narrows the realm of possibilities with this simple rule:

Check that all objects referenced in a request are produced by some response in a prior request.

This gets tricky with the use of fixtures though, because input precedence is important. I propose a pre-processing test discovery step to address this issue.

Example

Let's say we have four endpoints:

paths:
  /a:
    post:
      parameters:
        - name: "five"
          in: "formData"
          type: "string"
      responses:
        200:
          schema:
            type: "object"
            properties:
              two:
                type: "string"
  /b:
    post:
      parameters:
        - name: "three"
          in: "formData"
          type: "string"
      responses:
        200:
          schema:
            type: "object"
            properties:
              four:
                type: "string"
              five:
                type: "string"
  /c:
    post:
      parameters:
        - name: "six"
          in: "formData"
          type: "string"
      responses:
        200:
          schema:
            type: "object"
            properties:
              seven:
                type: "string"
  /vulnerable:
    post:
      parameters:
        - name: "one"
          in: "formData"
          type: "string"
        - name: "two"
          in: "formData"
          type: "string"
        - name: "four"
          in: "formData"
          type: "string"
        - name: "eight"
          in: "formData"
          type: "string"
      responses:
        200:
          description: "success"

We also have a factory fixture for one, and we want to test /vulnerable.

Algorithm Details

With the pre-processing step, the above snippet can be distilled into this input-output diagram:

5 --> A --> 2
3 --> B --> 4,5
6 --> C --> 7
1,2,4,8 --> vulnerable --> none

Then, working backwards, N dictates how many hops we look back to derive our input parameters. Using the example to illustrate:

  • With n=1, we don't have prior knowledge of what requests could create the data we need for 1, 2, 4 and 8. Since we have a factory fixture for 1, we can use that to generate that parameter's value. For everything else, we need to fuzz it with the default fuzzing strategy.

  • When n=2, we search the outputs of the known endpoints to try and get values for what we need. We can see that A and B gives us 2 and 4 respectively, so know we can call A and B to generate our values. However, since we're only limited to a request sequence of 2, we can only call one of them at a time. We would still use our factory fixture for 1, and still need to fuzz 8. Since we don't have prior values for inputs for A and B, we will have to fuzz those values.

  • When n=3, we can call both A and B to generate all the values we need to properly test the vulnerable endpoint. Furthermore, we should acknowledge some sort of ordering in this: that B needs to be called first, so that its output can be fed into A.

Future Improvements

Assuming this works, we might even be able to suggest an optimal N for the user to run, to capture all cases.

@domanchi domanchi added the enhancement New feature or request label Aug 29, 2019
@domanchi
Copy link
Contributor Author

NOTE: The reason why the algorithm works backwards is because we need to be in a situation when we have knowledge of all possibilities that the endpoints provide, before we know which parameters to fuzz with a factory.

This is to say, we are unable to check that all objects referenced in a request are produced by some response in a prior request, because they may simply be fuzzed, or generated with a factory.

Using the example to illustrate, if we proceed with the forwards approach as the paper suggests, we would still generate (/c, /vulnerable) because inputs can still be fuzzed or factory-generated. If we only proceed with requests that use at least one input from all prior output from a request, we would ignore (/a, /vulnerable), (/b, /vulnerable) and even (/b, /a, /vulnerable). Or even this case:

none --> X --> 1
2 --> Y --> 3
1,3 --> vulnerable --> none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants