Skip to content

JuanCarlosAlonsoValenzuela/daikon_modified

Repository files navigation

Table of contents

  1. Minimizing false positives
  2. Added invariants
  3. Disabled invariants
  4. Enabled invariants
  5. Changes in computeConfidence functions
  6. Suppression of derived variables
  7. Invariants disabled in the default Daikon configuration

This document describes the changes performed in Daikon version 5.8.10 (released on November 2021)

Section Invariants disabled in the default Daikon configuration contains the 13 invariants that we disabled in the default Daikon configuration to avoid a combinatorial explosions of string comparisons and a high number of false positives.

Minimizing false positives

In the evaluation of the first version of AGORA, we identified the invariants of the arithmetic comparison category as the cause of 3 out of every 4 false positives. Specifically, most false positives were invariants comparing two independent output properties. For example, the invariant return.duration_ms > size(return.artists[]) states that the duration of a song in milliseconds should always be greater than the number of artists in the song. While this may be true in most cases, it clearly does not represent the expected API behaviour and is considered a false positive. Finding counterexamples to automatically rule out these false positives is extremely unlikely, and the number of reported false positives can become unbearable if the API has several numerical response fields.

To address this, AGORA integrates a novel heuristic to minimize false positives derived from arithmetic comparisons based on the following observation: parameters with a different value range are unlikely to be related. In the previous example, for instance, the duration of a song in milliseconds may oscillate in the order of thousands, whereas the number of artists of a song may vary from 1 to around 10.

In practice, we propose a heuristic that discards the invariants that represent arithmetic inequalities (i.e., x < y, x <= y, x > y and x >= y) if there is no overlap between the values of the two variables involved. Specifically, we modified a total of 32 Daikon invariants so that invariants are not reported if the maximum observed value for one of the involved variables (e.g., 10 artists) is less than the minimum observed value of the other variable (e.g., 30K milliseconds).

This heuristic is enabled by default, to disable it, set the boolean parameter use_agora_pp of the modified version of Daikon to false.

Added invariants

  1. IsUrl: Indicates that the value of a string variable is always a URL. Represented as x is Url.

  2. SequenceStringElementsAreUrl: Indicates that all elements of an array of strings are URLs. Represented as All the elements of x are URLs.

  3. FixedLengthString: Indicates that the value of a string variable always has a fixed length n. Represented as LENGTH(x)==n.

  4. SequenceFixedLengthString: Indicates that all the elements of an array of strings have a fixed length n. Represented as All the elements of x have LENGTH=n.

  5. IsNumeric: Indicates that the characters of a string variable are always numeric. Represented as x is Numeric.

  6. SequenceStringElementsAreNumeric: Indicates that the characters of all the elements of an array of strings are always numeric. Represented as All the elements of x are Numeric.

  7. IsEmail: Indicates that the value of a string type variable is always an email. Represented as x is Email.

  8. SequenceStringElementsAreEmail: Indicates that all elements of an array of strings are emails. Represented as All the elements of x are emails.

  9. IsDateDDMMYYYY: Indicates that the value of a string variable is always a date following the format DD/MM/YYYY (the separator can be “/” or “-“). Represented as x is a Date. Format: DD/MM/YYYY.

  10. SequenceStringElementsAreDateDDMMYYYY: Indicates that all the elements of an array of strings are dates following the format DD/MM/YYYY. Represented as All the elements of x are dates. Format: DD/MM/YYYY.

  11. IsDateMMDDYYYY: Indicates that the value of a string variable is always a date following the format MM/DD/YYYY (the separator can be “/” or “-“). Represented as x is a Date. Format: MM/DD/YYYY.

  12. SequenceStringElementsAreDateMMDDYYYY: Indicates that all the elements of an array of strings are dates following the format MM/DD/YYYY. Represented as All the elements of x are dates. Format: MM/DD/YYYY.

  13. IsDateYYYYMMDD: Indicates that the value of a string variable is always a date following the format YYYY/MM/DD (the separator can be “/” or “-“). Represented as x is a Date. Format: YYYY/MM/DD.

  14. SequenceStringElementsAreDateYYYYMMDD: Indicates that all the elements of an array of strings are dates following the format YYYY/MM/DD. Represented as All the elements of x are dates. Format: YYYY/MM/DD.

  15. IsHour: Indicates that the value of a string variable is always a time in 24-hour format. Represented as x is Hour: HH:MM 24-hour format, optional leading 0.

  16. SequenceStringElementsAreHour: Indicates that all elements of an array of strings are hours in 24-hour format. Represented as All the elements of x are Hours: HH:MM 24-hour format, optional leading 0.

  17. IsHourAMPM: Indicates that the value of a string variable is always a time in 12-hour format. Represented as x is Hour: HH:MM 12-hour format, optional leading 0.

  18. SequenceStringElementsAreHourAMPM: Indicates that all elements of an array of strings are hours in 12-hour format. Represented as All the elements of x are Hours: HH:MM 12-hour format, optional leading 0, mandatory meridiems (AM/PM).

  19. IsHourWithSeconds: Indicates that the value of a string variable is always a time in 24-hour format, including seconds. Represented as x is Hour: HH:MM:SS 24-hour format with optional leading 0.

  20. SequenceStringElementsAreHourWithSeconds: Indicates that all elements of an array of strings are hours in 24-hour format, including seconds. Represented as All the elements of x are Hours: HH:MM:SS 24-hour format with optional leading 0.

  21. IsTimestampYYYYMMHHThhmmssmm: Indicates that the value of a string variable is always a timestamp. Represented as x is Timestamp. Format: YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional).

  22. SequenceStringElementsAreTimestampYYYYMMHHThhmmssmm: Indicates that all elements of an array of strings are timestamps. Represented as All the elements of x are Timestamp. Format: YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional)

Disabled invariants

Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.

  1. daikon.inv.unary.scalar.NonZero
  2. daikon.inv.unary.scalar.NonZeroFloat
  3. daikon.inv.unary.scalar.RangeInt.PowerOfTwo
  4. daikon.inv.unary.sequence.EltNonZero
  5. daikon.inv.unary.sequence.EltNonZeroFloat
  6. daikon.inv.unary.sequence.EltRangeInt.PowerOfTwo
  7. daikon.inv.binary.twoScalar.IntNonEqual
  8. daikon.inv.binary.twoScalar.FloatNonEqual
  9. daikon.inv.binary.twoScalar.LinearBinary
  10. daikon.inv.binary.twoScalar.LinearBinaryFloat
  11. daikon.inv.binary.twoString.StringNonEqual
  12. daikon.inv.binary.twoString.StringLessThan
  13. daikon.inv.binary.twoString.StringGreaterThan
  14. daikon.inv.binary.twoString.StringLessEqual
  15. daikon.inv.binary.twoString.StringGreaterEqual
  16. daikon.inv.binary.twoSequence.SeqSeqStringEqual
  17. daikon.inv.binary.twoSequence.SeqSeqStringLessThan
  18. daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
  19. daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
  20. daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
  21. daikon.inv.binary.twoSequence.PairwiseStringLessThan
  22. daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
  23. daikon.inv.binary.twoSequence.PairwiseStringLessEqual
  24. daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual
  25. daikon.inv.ternary.threeScalar.LinearTernary
  26. daikon.inv.ternary.threeScalar.LinearTernaryFloat
  27. daikon.inv.binary.twoScalar.NumericInt.Divides
  28. daikon.inv.binary.twoScalar.NumericInt.Square
  29. daikon.inv.binary.twoScalar.NumericFloat.Divides
  30. daikon.inv.binary.twoScalar.NumericFloat.Square
  31. daikon.inv.binary.twoSequence.PairwiseNumericInt.Divides
  32. daikon.inv.binary.twoSequence.PairwiseNumericInt.Square
  33. daikon.inv.binary.twoSequence.PairwiseNumericFloat.Divides
  34. daikon.inv.binary.twoSequence.PairwiseNumericFloat.Square
  35. daikon.inv.binary.twoSequence.PairwiseLinearBinary
  36. daikon.inv.binary.twoSequence.PairwiseLinearBinaryFloat

Enabled invariants

Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.

  1. daikon.inv.binary.twoString.StdString.SubString: We applied two modifications on this invariant:

    • We added a property that sets a minimum string length (2 by default) in order to avoid reporting that a string that is always empty is always a substring of every other string in the program point.

    • This invariant may report redundant information that would bloat our results, for example, if x is a substring of y and y is a substring of z, Daikon would report the invariants: “x is substring of y”, “y is substring of z” and “x is substring of z”. In this situation we consider the “x is substring of z” invariant redundant and modified Daikon so it would not report redundant invariants, in order to avoid bloating the results.

  2. daikon.inv.binary.twoSequence.SubSequence

  3. daikon.inv.binary.twoSequence.SubSequenceFloat

  4. daikon.inv.binary.twoSequence.SubSet

  5. daikon.inv.binary.twoSequence.SubSetFloat

  6. daikon.inv.binary.twoSequence.SuperSequence

  7. daikon.inv.binary.twoSequence.SuperSequenceFloat

  8. daikon.inv.binary.twoSequence.SuperSet

  9. daikon.inv.binary.twoSequence.SuperSetFloat

Changes in computeConfidence functions

At the end of the Daikon execution, the computeConfidence function is executed for every invariant that has not been falsified, returning a number between 0 and 1. If this number achieves a certain threshold, the invariant is considered to be statistically justified and it is reported to the user. We found some invariants for which Daikon always returned a confidence of 1.

protected double computeConfidence() {
   return CONFIDENCE_JUSTIFIED;
}

We modified these functions to return 1 only if there was at least one occurrence of the invariant during the program execution.

protected double computeConfidence() {
    if (ppt.num_samples() == 0) {
      return CONFIDENCE_UNJUSTIFIED;
    }
return CONFIDENCE_JUSTIFIED;
}

These are the invariants for which we modified the computeConfidence function:

  • Unary:
    • Scalar:
      • RangeFloat
      • RangeInt
    • Sequence:
      • EltRangeFloat
      • EltRangeInt
  • Binary:
    • SequenceString:
      • MemberString
    • SequenceScalar:
      • Member
      • MemberFloat
    • TwoSequence:
      • Reverse
    • ReverseFloat
      • SubSequence
      • SubSequenceFloat
      • Subset

Suppression of derived variables

A derived variable is an expression that does not appear directly in the instrumented program, but which Daikon uses as a variable for invariant detection. For example, for array variables, Daikon generates a derived variable whose value is its size, allowing to detect invariants such as input.limit >= size(return.items).

However, there are derived variables that do not provide any relevant information, this is the case of the derived variable orig(), which indicates the value of a variable at the input program point and is used to make comparisons with the value of the same variable at the output program point. In the context of black box testing of RESTful APIs, the value of the input parameters does not change at any time, so Daikon would only detect equality invariants between the parameter value at the input and output (input.limit == orig(input.limit)) that will always be satisfied. For this reason, this derived variable has been disabled.

In some cases, Daikon applies the shift derived variable (consisting on decreasing the value of a numerical variable by a shift value) to try to find relations between numerical variables (e.g., input-1 > return), we disabled this derived variable because it would most likely report misleading information.

Invariants disabled in the default Daikon configuration

Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.

  1. daikon.inv.binary.twoString.StringNonEqual
  2. daikon.inv.binary.twoString.StringLessThan
  3. daikon.inv.binary.twoString.StringGreaterThan
  4. daikon.inv.binary.twoString.StringLessEqual
  5. daikon.inv.binary.twoString.StringGreaterEqual
  6. daikon.inv.binary.twoSequence.SeqSeqStringLessThan
  7. daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
  8. daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
  9. daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
  10. daikon.inv.binary.twoSequence.PairwiseStringLessThan
  11. daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
  12. daikon.inv.binary.twoSequence.PairwiseStringLessEqual
  13. daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published