Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use scores to evalute test results #4

Merged
merged 28 commits into from
Aug 19, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
8d7c2de
Test that the order of expected blocks is not important
orangejulius Jul 16, 2015
83d88f2
Refactor locations substitution so it's testable
orangejulius Jul 16, 2015
b0d2d01
Extract functional behavior and improve error messages for locations
orangejulius Jul 16, 2015
25bea1f
Extract testing for unexpected properties to a function
orangejulius Jul 17, 2015
06e8949
Replace tertiary logic with if/return
orangejulius Jul 17, 2015
9e0a242
Simplify and extract functionality to evaluate expected output
orangejulius Jul 17, 2015
9c94d9e
Add helpful comments
orangejulius Jul 20, 2015
2355022
Add scoreProperties initial implementation
orangejulius Jul 20, 2015
7da7521
Extract scorePrimitiveProperty method
orangejulius Jul 20, 2015
5e7c53c
Add support for an array specifying property weights
orangejulius Jul 20, 2015
cb13c20
Use deep-diff to print details when tests fail
orangejulius Jul 20, 2015
6021ced
Add complete score functionality
orangejulius Jul 21, 2015
2194e18
Add message with score on test pass
orangejulius Jul 22, 2015
157197c
Whitespace
orangejulius Jul 22, 2015
54f3090
Remove unnecessary error check
orangejulius Jul 22, 2015
cee113b
Move test case checking logic to sanitiseTestCase
orangejulius Jul 22, 2015
7efab95
Move more formatting logic into helper
orangejulius Jul 22, 2015
ff49245
Rename reduceScores to combineScores
orangejulius Jul 22, 2015
4af396a
Add comments
orangejulius Jul 22, 2015
b454ba9
Remove unnecessary error handling
orangejulius Jul 22, 2015
2faabc8
Improve formatting of a big, dense array
orangejulius Jul 22, 2015
82b804d
Build context object for each test suite and test case
orangejulius Jul 22, 2015
36026bb
Handle setting default weights in one place
orangejulius Jul 22, 2015
a7d9b53
Pass full context deeper into scoring functions
orangejulius Jul 22, 2015
b079931
Add description of scores and example weights to readme
orangejulius Jul 22, 2015
1140b1e
Use Tape's deepEqual instead of deep-diff
orangejulius Jul 23, 2015
61b101c
Check score recursively only if both sides are objects
orangejulius Aug 5, 2015
695f979
Remove empty array and empty string values from diff
orangejulius Aug 17, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ This is the pelias fuzzy tester library, used for running our
What are fuzzy tests? See the original [problem statement](https://github.com/pelias/acceptance-tests/issues/109)
that lead to the creation of this library.

Most importantly, fuzzy tests deliver more than just a single bit of pass or fail for each test:
they specify a total number of points (a score) for the test, and return how many points out of the
maximum were achieved. The weighting of individual parts of the test can be adjusted.

**Note:** fuzzy-tester requires NPM version 2 or greater. The NPM team
[recommends](http://blog.npmjs.org/post/85484771375/how-to-install-npm) you update NPM using NPM
itself with `sudo npm install -g npm`.

## Usage

```
Expand All @@ -32,6 +35,8 @@ properties:
+ `priorityThresh` indicates the expected result must be found within the top N locations. This can be set for the entire suite as well as overwritten in individual test cases.
+ `tests` is an array of test cases that make up the suite.
+ `endpoint` the API endpoint (`search`, `reverse`, `suggest`) to target. Defaults to `search`.
+ `weights` (optional) test suite wide weighting for scores of the individual expectations. See the
weights section below

`tests` consists of objects with the following properties:
+ `id` is a unique identifier within the test suite (this could be unnecessary, let's discuss)
Expand All @@ -55,6 +60,8 @@ properties:

+ `unexpected` is analogous to `expected`, except that you *cannot* specify a `priorityThresh` and the `properties`
array does *not* support strings.
+ `weights` (optional) test case specific weighting for scores of the individual expectations. See the
weights section below

## output generators
The acceptance-tests support multiple different output generators, like an email and terminal output. See `node test
Expand Down Expand Up @@ -89,3 +96,22 @@ override the default aliases and define your own in `pelias-config`:
}
}
```

## Weights

Weights are used to influence the score each individual expectation contributes to the total score
for a test. By default, all fields in expected properties, passing the priority threshold, and the
absence of any unexpected properties each contribute one point.

Any score for any individual property can be changed by specifying an object `weights` in a test
suite, or in an individual test case. For example, to more heavily weight the `name` property by
giving it a weight of 10 points, set weights to the following:
```javascript
{
"properties": {
"name": 10
}
}
```

Weights can be nested and are completely optional, in which case the defaults will be in effect.
136 changes: 54 additions & 82 deletions lib/eval_test.js
Original file line number Diff line number Diff line change
@@ -1,28 +1,63 @@
'use strict';

var equalProperties = require( '../lib/equal_properties' );
var scoreTest = require( '../lib/scoreTest' );
var sanitiseTestCase = require( '../lib/sanitiseTestCase' );
var isObject = require( 'is-object' );
var util = require( 'util' );
var path = require( 'path' );

var locations;
try {
locations = require( path.resolve(process.cwd() + '/locations.json') );
} catch (e) {
locations = [];

function formatTestErrors(score) {
var message = 'score ' + score.score + ' out of ' + score.max_score;

if (score.score < score.max_score) {
message += '\ndiff: ' + JSON.stringify(score.diff, null, 2);
}

return message;
}

/**
* Ensure the weights object is valid by filling in any missing
* default values.
*/
function setDefaultWeights(weights) {
weights = weights || {};
weights.properties = weights.properties || {};
weights.priorityThresh = weights.priorityThresh || 1;
weights.unexpected = weights.unexpected || 1;

return weights;
}

/**
* Combine a context passed in from a test suite with properties
* from one individual test case to create the final context for this
* test case. It handles locations, weights, and priorityThresh
*/
function makeTestContext( testCase, context ) {
context.locations = context.locations || {};
context.weights = setDefaultWeights(context.weights);

if( 'expected' in testCase && 'priorityThresh' in testCase.expected ){
context.priorityThresh = testCase.expected.priorityThresh;
}

return context;
}

/**
* Given a test-case, the API results for the input it specifies, and a
* priority-threshold to find the results in, return an object indicating the
* status of this test (whether it passed, failed, is a placeholder, etc.)
*/
function evalTest( priorityThresh, testCase, apiResults ){
if( (!( 'expected' in testCase ) || testCase.expected.properties === null) &&
!( 'unexpected' in testCase ) ){
function evalTest( testCase, apiResults, context ){
context = makeTestContext( testCase, context );

testCase = sanitiseTestCase(testCase, context.locations);

// on error, sanitiseTestCase returns an error message string
if (typeof testCase === 'string') {
return {
result: 'placeholder',
msg: 'Placeholder test, no `expected` specified.'
msg: testCase
};
}

Expand All @@ -33,77 +68,14 @@ function evalTest( priorityThresh, testCase, apiResults ){
};
}

var ind;
var expected = [];
if( 'expected' in testCase ){
for( ind = 0; ind < testCase.expected.properties.length; ind++ ){
var testCaseProps = testCase.expected.properties[ ind ];
if( typeof testCaseProps === 'string' ){
if( testCaseProps in locations ){
expected.push(locations[ testCaseProps ]);
}
else {
return {
result: 'placeholder',
msg: 'Placeholder test, no `out` object matches in `locations.json`.'
};
}
}
else {
expected.push( testCaseProps );
}
}

if( 'priorityThresh' in testCase.expected ){
priorityThresh = testCase.expected.priorityThresh;
}
}

var unexpected = ( testCase.hasOwnProperty( 'unexpected' ) ) ?
testCase.unexpected.properties : [];

var expectedResultsFound = [];

for( ind = 0; ind < apiResults.length; ind++ ){
var result = apiResults[ ind ];
for( var expectedInd = 0; expectedInd < expected.length; expectedInd++ ){
if( expectedResultsFound.indexOf( expectedInd ) === -1 &&
equalProperties( expected[ expectedInd ], result.properties ) ){
var success = ( ind + 1 ) <= priorityThresh;
if( !success ){
return {
result: 'fail',
msg: util.format( 'Result found, but not in top %s. (%s)', priorityThresh, ind+1 )
};
}
else {
expectedResultsFound.push( expectedInd );
}
}
}

for( var unexpectedInd = 0; unexpectedInd < unexpected.length; unexpectedInd++ ){
if( equalProperties( unexpected[ unexpectedInd ], result.properties ) ){
return {
result: 'fail',
msg: util.format( 'Unexpected result found.' )
};
}
}
}

if ( expectedResultsFound.length === expected.length ) {
return { result: 'pass' };
}

if ( expected.length === 0 && unexpected.length > 0 ) {
return {result: 'pass'};
}
var score = scoreTest(testCase, apiResults, context);

return {
result: 'fail',
msg: 'No matching result found.'
};
result: (score.score < score.max_score) ? 'fail' : 'pass',
score: score.score,
max_score: score.max_score,
msg: formatTestErrors(score)
};
}

module.exports = evalTest;
16 changes: 15 additions & 1 deletion lib/exec_test_suite.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,18 @@
var evalTest = require( '../lib/eval_test' );
var ExponentialBackoff = require( '../lib/ExponentialBackoff');
var request = require( 'request' );
var path = require( 'path' );
var util = require( 'util' );

var validTestStatuses = [ 'pass', 'fail', undefined ];

var locations;
try {
locations = require( path.resolve(process.cwd() + '/locations.json') );
} catch (e) {
locations = {};
}

function validateTestSuite(testSuite) {
testSuite.tests.forEach( function ( testCase ){
if( validTestStatuses.indexOf( testCase.status ) === -1 ){
Expand Down Expand Up @@ -136,8 +144,14 @@ function execTestSuite( apiUrl, testSuite, stats, cb ){
var results;

if( res.statusCode === 200 ){
var context = {
priorityThresh: testSuite.priorityThresh,
locations: locations,
weights: testSuite.weights
};

test_interval.decreaseBackoff();
results = evalTest( testSuite.priorityThresh, testCase, res.body.features );
results = evalTest( testCase, res.body.features, context );
} else { // unexpected (non 200 or retry) status code
test_interval.increaseBackoff();
printRequestErrorMessage(testCase, res);
Expand Down
66 changes: 66 additions & 0 deletions lib/sanitiseTestCase.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
'use strict';

/**
* Given the properties of a test case,
* construct the actual expected object.
* This simply acounts for pre-defiend locations
*/
function constructExpectedOutput(properties, locations) {
return properties.map(function(property) {
if ( typeof property === 'string' && property in locations ) {
return locations[property];
// this intentionally leaves unmatched location strings as strings
// that way it is possible to go back and look for them later
} else {
return property;
}
});
}

/**
* Find unmatched location strings left from running constructExpectedOutput
*/
function findPlaceholders(expected) {
return expected.filter(function(item) {
return typeof item === 'string';
});
}

/**
* some tests don't have a properties array, if the properties
* object is just a single string, turn it into a one element
* array
*/
function normalizeProperties(properties) {
if (typeof properties === 'string') {
properties = [properties];
}
return properties;
}

function sanitiseTestCase(testCase, locations)
{
locations = locations || {};

if (!testCase.expected && !testCase.unexpected) {
return 'Placeholder test: no `expected` or `unexpected` given';
}

if (testCase.expected) {
if (!testCase.expected.properties) {
return 'Placeholder test: `expected` block is empty';
}

testCase.expected.properties = normalizeProperties(testCase.expected.properties);
testCase.expected.properties = constructExpectedOutput(testCase.expected.properties, locations);

var unmatched_placeholders = findPlaceholders(testCase.expected.properties);
if (unmatched_placeholders.length > 0) {
return 'Placeholder: no matches for ' + unmatched_placeholders.join(', ') + ' in `locations.json`.';
}
}

return testCase;
}

module.exports = sanitiseTestCase;
70 changes: 70 additions & 0 deletions lib/scoreHelpers.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
var deepDiff = require( 'deep-diff' );

var initial_score = {score: 0, max_score: 0, diff: []};

/**
* Use the deep-diff library to create an (almost too) detailed description
* of the differences between the expected and actual properties. Some massaging
* of the data so only the parts we care about are shown is done.
*/
function createDiff(expectation, result) {
var diff = deepDiff.diff(expectation, result);

// objects with no differences have an undefined diff
if (diff === undefined) {
return ''; // return an empty string for less confusing output later
}

// filter out diff elements corresponding to a new element on the right side
// these are ignored by our tests and would just be noise
return diff.filter(function(diff_part) {
return diff_part.kind !== 'N';
});
}

function filterDiffs(diff) {
if( diff === '' || (Array.isArray(diff) && diff.length === 0)) {
return false;
}
return true;
}


/**
* function to be used with Array.reduce to combine several subscores for
* the same object and build one score that combines all of them.
* It totals up the actual and maximum score, and concatenates
* multiple diff or error outputs into an array
*/
function combineScores(total_score, this_score) {
var new_diff = total_score.diff;
if (this_score.diff) {
new_diff = total_score.diff.concat(this_score.diff);
new_diff = new_diff.filter(filterDiffs);
}

return {
score: total_score.score + this_score.score,
max_score: total_score.max_score + this_score.max_score,
diff: new_diff
};
}

/**
* Small helper function to determine if a given apiResult is high
* enough in a set of results to pass a priority threshold.
* Caveat: the result object must be the exact same javascript object
* as the one taken from the apiResults, not just another object with
* identical properties
*/
function inPriorityThresh(apiResults, result, priorityThresh) {
var index = apiResults.indexOf(result);
return index !== -1 && index <= priorityThresh - 1;
}

module.exports = {
initial_score: initial_score,
createDiff: createDiff,
combineScores: combineScores,
inPriorityThresh: inPriorityThresh
};
Loading