Tests for visualization screenshot comparison #17545

Through some trial and error and looking at the diff screenshots I found that if I set the window size in the visualize/index.js page to 1291 x 811 remote.setWindowSize(1292, 811); then I got much better results. Only 1 failed and it was pretty close to passing.

29 passing (1.0m)
1 failing



     └- ✖ fail: "visualize app visualize app should compare visualization screenshot for data table with minimum bucket metric agg and geohash"
     │        Error: expected 0.050852997448979594 to be below 0.05

Based on this, I'd like us to try to devise a screen size tuning routine that takes some screenshot and adjusts the Kibana window to get the smallest diff. This could even allow the tests to pass across multiple browsers.

bhavyarm · 2018-04-17T18:31:21Z

Updated the screenshot for data table options and trying again to see if CI goes through.

elasticmachine · 2018-04-17T19:20:47Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

elasticmachine · 2018-04-17T20:21:42Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

elasticmachine · 2018-04-17T21:37:32Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

elasticmachine · 2018-04-18T15:20:53Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

elasticmachine · 2018-04-18T17:02:13Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

LeeDr · 2018-04-18T18:11:36Z

@bhavyarm On my Ubuntu machine, if I set my setWindowSize(1290,811) then my screenshots are exactly the size of the baseline screenshots and I get very good matching. Some are less than .01 difference and the worst one was about 0.022.
If I don't adjust it, then the compare_pngs shows me the difference in size of the images.

The question is, why do I have to adjust my setWindowSize different than when you did the baseline. Even in the last passing Jenkins job you can see that the new session screenshot size didn't match the baseline size. The height is 9 pixels taller on the session.

16:58:37 expected height 686 and width 1280
16:58:37 actual height 695 and width 1280
16:58:38        │ debg  calculating diff pixels...
16:58:38        │ debg  percentSimilar: 0.026081905976676385

LeeDr · 2018-04-18T18:26:23Z

I'm going to try this PR on Windows and see what kind of sizes and diffs I get.

bhavyarm · 2018-04-18T21:04:39Z

@LeeDr see if this works when you get a moment. Thanks!

elasticmachine · 2018-04-18T21:57:29Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

LeeDr · 2018-04-19T14:44:42Z

On my Windows 4k 15" laptop with display scaling set to 100% and setWindowSize(1308, 822); the session screenshots match the baseline screenshot sizes exactly and these are the results (1 failing timelion_colors);

$ time node scripts/functional_test_runner --debug | grep -E "(percentSimilar|Taking)"

       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_area_chart_bar.png"
       │ debg  percentSimilar: 0.012302979227405248
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_area_chart_options.png"
       │ debg  percentSimilar: 0.012031933309037901
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_barchart_percentile.png"
       │ debg  percentSimilar: 0.014806168002915452
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_barchart_tophit.png"
       │ debg  percentSimilar: 0.01845276056851312
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_datatable_average.png"
       │ debg  percentSimilar: 0.021078944970845483
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_datatable_options.png"
       │ debg  percentSimilar: 0.03068740889212828
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_datatable_significant.png"
       │ debg  percentSimilar: 0.02009384110787172
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_gaugecircle_options.png"
       │ debg  percentSimilar: 0.03494442419825073
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_goal_chart_options.png"
       │ debg  percentSimilar: 0.009010568513119533
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_heatmap_alloptions.png"
       │ debg  percentSimilar: 0.010871446793002915
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_horizontal_bar_chart.png"
       │ debg  percentSimilar: 0.011112882653061224
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_inputcontrol_options.png"
       │ debg  percentSimilar: 0.04303708090379009
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_line_chart_options.png"
       │ debg  percentSimilar: 0.012067237609329446
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_line_chart_parent.png"
       │ debg  percentSimilar: 0.012034211005830903
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_linechart_bubbles.png"
       │ debg  percentSimilar: 0.01107757835276968
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_linechart_derivative.png"
       │ debg  percentSimilar: 0.011325847303206998
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_linechart_serial.png"
       │ debg  percentSimilar: 0.011358873906705539
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_markdown_options.png"
       │ debg  percentSimilar: 0.04337190233236152
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_metrictable_median.png"
       │ debg  percentSimilar: 0.013550018221574344
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_piechart_donut.png"
       │ debg  percentSimilar: 0.009208728134110788
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_piechart_unique_count.png"
       │ debg  percentSimilar: 0.010436406705539358
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_regionmap_options.png"
       │ debg  percentSimilar: 0.009946701895043732
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tagcloud_single.png"
       │ debg  percentSimilar: 0.028306076895043733
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tilemap_geohash.png"
       │ debg  percentSimilar: 0.009027651239067055
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tilemap_heatmap.png"
       │ debg  percentSimilar: 0.008657525510204081
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tilemap_options.png"
       │ debg  percentSimilar: 0.009185951166180758
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_timelion_colors.png"
       │ debg  percentSimilar: 0.06928298104956268
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\failure\visualize app visualize app should compare visualization screenshot for timelion.png"
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_vertical_bar_chart_options.png"
       │ debg  percentSimilar: 0.012228954081632652
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_vertical_movingagg.png"
       │ debg  percentSimilar: 0.011375956632653061
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_verticalbar_average.png"
       │ debg  percentSimilar: 0.011980685131195335

LeeDr · 2018-04-19T20:32:12Z

The latest changes with the calibration routine passes on my Ubuntu desktop and my Windows laptop on multiple difference display scaling settings.

elasticmachine · 2018-04-19T21:04:10Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

bhavyarm · 2018-04-19T22:00:11Z

Jenkins, test this

…rest of the screenshots

elasticmachine · 2018-05-31T18:38:26Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

LeeDr · 2018-06-04T19:46:19Z

I feel like we need to have version control and branching on these images if they're going to fit into our standard processes. So to me, that means they go in some github elastic repository.
If we don't want all the binary history of images in kibana repo we can create a new public screenshots repository.
Then the screenshots could be pulled down to where tests are running and cached. I think we could design a way to pull only the latest versions of the screenshots for the branch we're testing without history. Something like wget on the repo zip?
Somewhere in the test framework or even in the tests themselves we check if the screenshots files exist and if some timestamp or version file matches the upstream version. If they don't exist or the version doesn't match we pull the branch as a zip file and unzip overwrite existing files.

When a Kibana PR makes a change that requires updating a screenshot, it should reference the PR or commit in the screenshots repo (and always update the screenshots version file?). Maybe there's an automated way to do this?
I'm thinking that the tests wouldn't need to reference the version or commit specifically.

Another approach is for each test to reference the specific screenshot blob url. In this way only individually changed screenshots would have to be refreshed when there was a change.

We should consider solutions that make the PR process as easy as possible for both the case where a single screenshot needs to be updated or added, and the case where many or all of them have to be updated.

timroes · 2018-06-05T10:17:19Z

Then the screenshots could be pulled down to where tests are running and cached. I think we could design a way to pull only the latest versions of the screenshots for the branch we're testing without history. Something like wget on the repo zip?

git clone --depth 1 or in case we would be using submodules git clone --shallow-submodules to just clone the most recent commit of a repository.

bhavyarm · 2018-06-05T21:15:57Z

@tylersmalley @cjcenizal @jbudz @silne30 - can you please add your inputs here? Lee and I will talk with Viz team in their next sync. Thanks!

tylersmalley · 2018-06-05T22:15:09Z

While I understand this adds much-needed coverage for Visualizations, I have a few additional concerns. The screenshot comparisons as implemented rely on a threshold, which in some cases allows for an 8% difference. With the subtleties of visualizations, I feel like this reduces the actual coverage. Understanding that the problem is the test coverage of visualizations, is there a reason we can't implement these tests in any of the current frameworks without adding an additional way of testing.

Additionally, with the new visualization pipeline changes - will that help with the problem of test coverage?

cuff-links · 2018-06-05T23:41:16Z

I had a run-in with visual regression tests while developing a test suite for eui and came into a few snags that seem to be inherent for that type of testing.

Comes With The Territory

Cross Machine Variance - In the issue I linked to above, there was a comment revealing the fact that there was variance between the snapshots on different machines. What made things worse was that the machines were both running the same OS (Mac OSX High Sierra). Differences in installed font packages and other environmental variables could cause the tests to have unpredictable variance numbers. Below: I address how we compensated.
Cross Browser Variance - Because each browser implements the web in their own special way, there is inherent variance between them. I will address how we compensated for that one below too.
Cross Environment Variance - You guessed it. Running on CI vs running on local machine caused window size differences and other factors that I was not able to figure out. Again, compensation strategy listed below.

Compensating For Variances

For Cross Machine Variance - Since EUI is small and does not have a ton of branching, we were able to leverage git to help with this scenario. Basically, tests run twice. First time creates baselines, second time to test against them. This wouldn't happen every time on CI, only when there was a change to a baseline. That work is here. We ran into a snag with node-git and our Jenkins jobs so there are some kinks to work out before that is working in CI.
For Cross Browser and Environment Variance - The toold wdio-visual-regression-service sues filenames to be able to figure out what images to compare against. In the filename, we included the browser, the viewport size, the environment (whether or not you are in CI) and the operating system in the name to ensure that images are being compared in their own scope.

This means that you have more images to check in but the image is very focused so variances really tell you something. If you take a screenshot with a ton on the page, you have more moving parts thus making it hard to do a pixel by pixel comparison.
This sort testing is helpful to supplement other types so using them in small scale and for focused screenshots seems to be where most of the value lies.
We want to make sure we aren't just testing the visualization libraries that we are using and that these tests put confidence in the right areas of our product.

tylersmalley · 2018-06-06T00:04:02Z

@silne30 why did you merge this PR?

cjcenizal · 2018-06-06T00:06:52Z

I think it was an accident, but I'll let @silne30 explain that one. 😄

In terms of what to do next, should we revert this? If we do revert it and then decide we want to add this change, then we'd have to revert the revert. Would that re-add the images, thus ballooning the repo size further?

This reverts commit 84d678b.

cuff-links · 2018-06-06T21:33:47Z

@tylersmalley Definitely wasn't intentional. I am not sure if there is some kind of default action on the page for enter keys or different key strokes but the only action I actually meant to take with this issue was to leave the comment that I left. I had no interaction with the merge issue button, whatsoever. Prior to me being requested to comment on this issue, I had no knowledge of it and it was not on my radar. Sorry causing the merge...however it was that it happened.

* undoing a messy merge * updating screenshots * changing the variance to account for data table failure * trying a different variance for data table and a general one for the rest of the screenshots * changing the variance for general to .065 * adding xy position to adjust the screensize * changing variance and setting a small window * create calibrateForScreenshots method * remove empty lines

This reverts commit 84d678b.

* undoing a messy merge * updating screenshots * changing the variance to account for data table failure * trying a different variance for data table and a general one for the rest of the screenshots * changing the variance for general to .065 * adding xy position to adjust the screensize * changing variance and setting a small window * create calibrateForScreenshots method * remove empty lines

elastic#19692) This reverts commit 84d678b.

bhavyarm force-pushed the addScreenShotPieCompare branch from b922b0f to ba3fb0f Compare April 13, 2018 18:38

bhavyarm force-pushed the addScreenShotPieCompare branch from 80981c8 to b90ebdc Compare April 13, 2018 20:56

bhavyarm requested a review from LeeDr April 13, 2018 21:02

bhavyarm force-pushed the addScreenShotPieCompare branch from b90ebdc to 0dc35a9 Compare April 16, 2018 14:56

bhavyarm changed the title ~~[WIP] tests for visualization screenshot comparison~~ Tests for visualization screenshot comparison Apr 16, 2018

jbudz added test Feature:Visualizations Generic visualization features (in case no more specific feature label is available) labels Apr 16, 2018

bhavyarm and others added 8 commits May 31, 2018 13:11

updating screenshots

0969873

changing the variance to account for data table failure

bb3a47e

trying a different variance for data table and a general one for the …

71da95b

…rest of the screenshots

changing the variance for general to .065

25ee485

adding xy position to adjust the screensize

4149d0e

changing variance and setting a small window

6ad75a0

create calibrateForScreenshots method

8d0ef64

remove empty lines

115e72b

bhavyarm force-pushed the addScreenShotPieCompare branch from 55787c6 to 115e72b Compare May 31, 2018 17:24

cuff-links merged commit 84d678b into elastic:master Jun 5, 2018

cjcenizal added a commit to cjcenizal/kibana that referenced this pull request Jun 6, 2018

Revert "Tests for visualization screenshot comparison (elastic#17545)"

0c9676f

This reverts commit 84d678b.

cjcenizal mentioned this pull request Jun 6, 2018

Revert "Tests for visualization screenshot comparison (#17545)" #19692

Merged

cjcenizal added a commit that referenced this pull request Jun 6, 2018

Revert "Tests for visualization screenshot comparison (#17545)" (#19692)

87cccfc

This reverts commit 84d678b.

kindsun pushed a commit that referenced this pull request Jun 12, 2018

Revert "Tests for visualization screenshot comparison (#17545)" (#19692)

82080e5

This reverts commit 84d678b.

bhavyarm self-assigned this Jun 19, 2018

maryia-lapata pushed a commit to maryia-lapata/kibana that referenced this pull request Jun 25, 2018

Revert "Tests for visualization screenshot comparison (elastic#17545)" (

8617d12

elastic#19692) This reverts commit 84d678b.

LeeDr mentioned this pull request Apr 8, 2019

[WIP] New snapshots tests #34719

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for visualization screenshot comparison #17545

Tests for visualization screenshot comparison #17545

bhavyarm commented Apr 4, 2018 •

edited

Loading

elasticmachine commented Apr 4, 2018

elasticmachine commented Apr 13, 2018

elasticmachine commented Apr 13, 2018

elasticmachine commented Apr 13, 2018

bhavyarm commented Apr 13, 2018

elasticmachine commented Apr 13, 2018

elasticmachine commented Apr 16, 2018

LeeDr commented Apr 16, 2018

LeeDr commented Apr 16, 2018

bhavyarm commented Apr 17, 2018

elasticmachine commented Apr 17, 2018

elasticmachine commented Apr 17, 2018

elasticmachine commented Apr 17, 2018

elasticmachine commented Apr 18, 2018

elasticmachine commented Apr 18, 2018

LeeDr commented Apr 18, 2018

LeeDr commented Apr 18, 2018

bhavyarm commented Apr 18, 2018

elasticmachine commented Apr 18, 2018

LeeDr commented Apr 19, 2018

LeeDr commented Apr 19, 2018

elasticmachine commented Apr 19, 2018

bhavyarm commented Apr 19, 2018

elasticmachine commented May 31, 2018

LeeDr commented Jun 4, 2018 •

edited

Loading

timroes commented Jun 5, 2018

bhavyarm commented Jun 5, 2018 •

edited

Loading

tylersmalley commented Jun 5, 2018

cuff-links commented Jun 5, 2018 •

edited

Loading

tylersmalley commented Jun 6, 2018

cjcenizal commented Jun 6, 2018

cuff-links commented Jun 6, 2018 •

edited

Loading

Tests for visualization screenshot comparison #17545

Tests for visualization screenshot comparison #17545

Conversation

bhavyarm commented Apr 4, 2018 • edited Loading

elasticmachine commented Apr 4, 2018

💔 Build Failed

elasticmachine commented Apr 13, 2018

💔 Build Failed

elasticmachine commented Apr 13, 2018

💔 Build Failed

elasticmachine commented Apr 13, 2018

💔 Build Failed

bhavyarm commented Apr 13, 2018

elasticmachine commented Apr 13, 2018

💔 Build Failed

elasticmachine commented Apr 16, 2018

💔 Build Failed

LeeDr commented Apr 16, 2018

LeeDr commented Apr 16, 2018

bhavyarm commented Apr 17, 2018

elasticmachine commented Apr 17, 2018

💔 Build Failed

elasticmachine commented Apr 17, 2018

💔 Build Failed

elasticmachine commented Apr 17, 2018

💔 Build Failed

elasticmachine commented Apr 18, 2018

💚 Build Succeeded

elasticmachine commented Apr 18, 2018

💚 Build Succeeded

LeeDr commented Apr 18, 2018

LeeDr commented Apr 18, 2018

bhavyarm commented Apr 18, 2018

elasticmachine commented Apr 18, 2018

💚 Build Succeeded

LeeDr commented Apr 19, 2018

LeeDr commented Apr 19, 2018

elasticmachine commented Apr 19, 2018

💔 Build Failed

bhavyarm commented Apr 19, 2018

elasticmachine commented May 31, 2018

💔 Build Failed

LeeDr commented Jun 4, 2018 • edited Loading

timroes commented Jun 5, 2018

bhavyarm commented Jun 5, 2018 • edited Loading

tylersmalley commented Jun 5, 2018

cuff-links commented Jun 5, 2018 • edited Loading

Comes With The Territory

Compensating For Variances

tylersmalley commented Jun 6, 2018

cjcenizal commented Jun 6, 2018

cuff-links commented Jun 6, 2018 • edited Loading

bhavyarm commented Apr 4, 2018 •

edited

Loading

LeeDr commented Jun 4, 2018 •

edited

Loading

bhavyarm commented Jun 5, 2018 •

edited

Loading

cuff-links commented Jun 5, 2018 •

edited

Loading

cuff-links commented Jun 6, 2018 •

edited

Loading