- Minimizing false positives
- Added invariants
- Disabled invariants
- Enabled invariants
- Changes in computeConfidence functions
- Suppression of derived variables
- Invariants disabled in the default Daikon configuration
This document describes the changes performed in Daikon version 5.8.10 (released on November 2021)
Section Invariants disabled in the default Daikon configuration contains the 13 invariants that we disabled in the default Daikon configuration to avoid a combinatorial explosions of string comparisons and a high number of false positives.
In the evaluation of the first version of AGORA, we identified the
invariants of the arithmetic comparison category as the cause of 3 out of every 4 false positives. Specifically, most
false positives were invariants comparing two independent output properties. For example, the invariant
return.duration_ms > size(return.artists[])
states that the duration of a song in milliseconds should always be
greater than the number of artists in the song. While this may be true in most cases, it clearly does not represent the
expected API behaviour and is considered a false positive. Finding counterexamples to automatically rule out these false
positives is extremely unlikely, and the number of reported false positives can become unbearable if the API has several
numerical response fields.
To address this, AGORA integrates a novel heuristic to minimize false positives derived from arithmetic comparisons based on the following observation: parameters with a different value range are unlikely to be related. In the previous example, for instance, the duration of a song in milliseconds may oscillate in the order of thousands, whereas the number of artists of a song may vary from 1 to around 10.
In practice, we propose a heuristic that discards the invariants that represent arithmetic inequalities
(i.e., x < y
, x <= y
, x > y
and x >= y
) if there is no overlap between the values of the two variables
involved. Specifically, we modified a total of 32 Daikon invariants so that invariants are not reported if the maximum
observed value for one of the involved variables (e.g., 10 artists) is less than the minimum observed value of the other
variable (e.g., 30K milliseconds).
This heuristic is enabled by default, to disable it, set the boolean parameter use_agora_pp
of the modified version
of Daikon to false.
-
IsUrl: Indicates that the value of a string variable is always a URL. Represented as
x is Url
. -
SequenceStringElementsAreUrl: Indicates that all elements of an array of strings are URLs. Represented as
All the elements of x are URLs
. -
FixedLengthString: Indicates that the value of a string variable always has a fixed length n. Represented as
LENGTH(x)==n
. -
SequenceFixedLengthString: Indicates that all the elements of an array of strings have a fixed length n. Represented as
All the elements of x have LENGTH=n
. -
IsNumeric: Indicates that the characters of a string variable are always numeric. Represented as
x is Numeric
. -
SequenceStringElementsAreNumeric: Indicates that the characters of all the elements of an array of strings are always numeric. Represented as
All the elements of x are Numeric
. -
IsEmail: Indicates that the value of a string type variable is always an email. Represented as
x is Email
. -
SequenceStringElementsAreEmail: Indicates that all elements of an array of strings are emails. Represented as
All the elements of x are emails
. -
IsDateDDMMYYYY: Indicates that the value of a string variable is always a date following the format DD/MM/YYYY (the separator can be “/” or “-“). Represented as
x is a Date. Format: DD/MM/YYYY
. -
SequenceStringElementsAreDateDDMMYYYY: Indicates that all the elements of an array of strings are dates following the format DD/MM/YYYY. Represented as
All the elements of x are dates
. Format: DD/MM/YYYY. -
IsDateMMDDYYYY: Indicates that the value of a string variable is always a date following the format MM/DD/YYYY (the separator can be “/” or “-“). Represented as
x is a Date. Format: MM/DD/YYYY
. -
SequenceStringElementsAreDateMMDDYYYY: Indicates that all the elements of an array of strings are dates following the format MM/DD/YYYY. Represented as
All the elements of x are dates
. Format: MM/DD/YYYY. -
IsDateYYYYMMDD: Indicates that the value of a string variable is always a date following the format YYYY/MM/DD (the separator can be “/” or “-“). Represented as
x is a Date. Format: YYYY/MM/DD
. -
SequenceStringElementsAreDateYYYYMMDD: Indicates that all the elements of an array of strings are dates following the format YYYY/MM/DD. Represented as
All the elements of x are dates. Format: YYYY/MM/DD
. -
IsHour: Indicates that the value of a string variable is always a time in 24-hour format. Represented as
x is Hour: HH:MM 24-hour format, optional leading 0
. -
SequenceStringElementsAreHour: Indicates that all elements of an array of strings are hours in 24-hour format. Represented as
All the elements of x are Hours: HH:MM 24-hour format, optional leading 0
. -
IsHourAMPM: Indicates that the value of a string variable is always a time in 12-hour format. Represented as
x is Hour: HH:MM 12-hour format, optional leading 0
. -
SequenceStringElementsAreHourAMPM: Indicates that all elements of an array of strings are hours in 12-hour format. Represented as
All the elements of x are Hours: HH:MM 12-hour format, optional leading 0, mandatory meridiems (AM/PM)
. -
IsHourWithSeconds: Indicates that the value of a string variable is always a time in 24-hour format, including seconds. Represented as
x is Hour: HH:MM:SS 24-hour format with optional leading 0
. -
SequenceStringElementsAreHourWithSeconds: Indicates that all elements of an array of strings are hours in 24-hour format, including seconds. Represented as
All the elements of x are Hours: HH:MM:SS 24-hour format with optional leading 0
. -
IsTimestampYYYYMMHHThhmmssmm: Indicates that the value of a string variable is always a timestamp. Represented as x is Timestamp. Format:
YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional)
. -
SequenceStringElementsAreTimestampYYYYMMHHThhmmssmm: Indicates that all elements of an array of strings are timestamps. Represented as
All the elements of x are Timestamp. Format: YYYY-MM-DDTHH:MM:SS.mmZ (Miliseconds are optional)
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
- daikon.inv.unary.scalar.NonZero
- daikon.inv.unary.scalar.NonZeroFloat
- daikon.inv.unary.scalar.RangeInt.PowerOfTwo
- daikon.inv.unary.sequence.EltNonZero
- daikon.inv.unary.sequence.EltNonZeroFloat
- daikon.inv.unary.sequence.EltRangeInt.PowerOfTwo
- daikon.inv.binary.twoScalar.IntNonEqual
- daikon.inv.binary.twoScalar.FloatNonEqual
- daikon.inv.binary.twoScalar.LinearBinary
- daikon.inv.binary.twoScalar.LinearBinaryFloat
- daikon.inv.binary.twoString.StringNonEqual
- daikon.inv.binary.twoString.StringLessThan
- daikon.inv.binary.twoString.StringGreaterThan
- daikon.inv.binary.twoString.StringLessEqual
- daikon.inv.binary.twoString.StringGreaterEqual
- daikon.inv.binary.twoSequence.SeqSeqStringEqual
- daikon.inv.binary.twoSequence.SeqSeqStringLessThan
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
- daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
- daikon.inv.binary.twoSequence.PairwiseStringLessThan
- daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
- daikon.inv.binary.twoSequence.PairwiseStringLessEqual
- daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual
- daikon.inv.ternary.threeScalar.LinearTernary
- daikon.inv.ternary.threeScalar.LinearTernaryFloat
- daikon.inv.binary.twoScalar.NumericInt.Divides
- daikon.inv.binary.twoScalar.NumericInt.Square
- daikon.inv.binary.twoScalar.NumericFloat.Divides
- daikon.inv.binary.twoScalar.NumericFloat.Square
- daikon.inv.binary.twoSequence.PairwiseNumericInt.Divides
- daikon.inv.binary.twoSequence.PairwiseNumericInt.Square
- daikon.inv.binary.twoSequence.PairwiseNumericFloat.Divides
- daikon.inv.binary.twoSequence.PairwiseNumericFloat.Square
- daikon.inv.binary.twoSequence.PairwiseLinearBinary
- daikon.inv.binary.twoSequence.PairwiseLinearBinaryFloat
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
-
daikon.inv.binary.twoString.StdString.SubString: We applied two modifications on this invariant:
-
We added a property that sets a minimum string length (2 by default) in order to avoid reporting that a string that is always empty is always a substring of every other string in the program point.
-
This invariant may report redundant information that would bloat our results, for example, if x is a substring of y and y is a substring of z, Daikon would report the invariants: “x is substring of y”, “y is substring of z” and “x is substring of z”. In this situation we consider the “x is substring of z” invariant redundant and modified Daikon so it would not report redundant invariants, in order to avoid bloating the results.
-
-
daikon.inv.binary.twoSequence.SubSequence
-
daikon.inv.binary.twoSequence.SubSequenceFloat
-
daikon.inv.binary.twoSequence.SubSet
-
daikon.inv.binary.twoSequence.SubSetFloat
-
daikon.inv.binary.twoSequence.SuperSequence
-
daikon.inv.binary.twoSequence.SuperSequenceFloat
-
daikon.inv.binary.twoSequence.SuperSet
-
daikon.inv.binary.twoSequence.SuperSetFloat
At the end of the Daikon execution, the computeConfidence
function is executed for every invariant that has not been falsified, returning a number between 0 and 1.
If this number achieves a certain threshold, the invariant is considered to be statistically justified and it is reported to the user. We found some invariants for which
Daikon always returned a confidence of 1.
protected double computeConfidence() {
return CONFIDENCE_JUSTIFIED;
}
We modified these functions to return 1 only if there was at least one occurrence of the invariant during the program execution.
protected double computeConfidence() {
if (ppt.num_samples() == 0) {
return CONFIDENCE_UNJUSTIFIED;
}
return CONFIDENCE_JUSTIFIED;
}
These are the invariants for which we modified the computeConfidence
function:
- Unary:
- Scalar:
- RangeFloat
- RangeInt
- Sequence:
- EltRangeFloat
- EltRangeInt
- Scalar:
- Binary:
- SequenceString:
- MemberString
- SequenceScalar:
- Member
- MemberFloat
- TwoSequence:
- Reverse
- ReverseFloat
- SubSequence
- SubSequenceFloat
- Subset
- SequenceString:
A derived variable is an expression that does not appear directly in the instrumented program,
but which Daikon uses as a variable for invariant detection.
For example, for array variables, Daikon generates a derived variable whose value is its size,
allowing to detect invariants such as input.limit >= size(return.items)
.
However, there are derived variables that do not provide any relevant information, this is the case of the derived variable
orig()
, which indicates the value of a variable at the input program point and is used to make comparisons with
the value of the same variable at the output program point. In the context of black box testing of RESTful APIs, the value
of the input parameters does not change at any time, so Daikon would only detect equality invariants between the parameter
value at the input and output (input.limit == orig(input.limit)
) that will always be satisfied.
For this reason, this derived variable has been disabled.
In some cases, Daikon applies the shift
derived variable (consisting on decreasing the value of a numerical variable by a shift value)
to try to find relations between numerical variables (e.g., input-1 > return
), we disabled this derived variable because it would most likely
report misleading information.
Section 5.5 Invariant list of the Daikon user manual contains a description of each one of these invariants.
- daikon.inv.binary.twoString.StringNonEqual
- daikon.inv.binary.twoString.StringLessThan
- daikon.inv.binary.twoString.StringGreaterThan
- daikon.inv.binary.twoString.StringLessEqual
- daikon.inv.binary.twoString.StringGreaterEqual
- daikon.inv.binary.twoSequence.SeqSeqStringLessThan
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterThan
- daikon.inv.binary.twoSequence.SeqSeqStringLessEqual
- daikon.inv.binary.twoSequence.SeqSeqStringGreaterEqual
- daikon.inv.binary.twoSequence.PairwiseStringLessThan
- daikon.inv.binary.twoSequence.PairwiseStringGreaterThan
- daikon.inv.binary.twoSequence.PairwiseStringLessEqual
- daikon.inv.binary.twoSequence.PairwiseStringGreaterEqual