Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for visualization screenshot comparison #17545

Merged
merged 9 commits into from
Jun 5, 2018

Conversation

bhavyarm
Copy link
Contributor

@bhavyarm bhavyarm commented Apr 4, 2018

This test adds baseline screenshots for all visualizations except tsvb and vega and compares them to ensure visualizations are getting displayed.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@bhavyarm
Copy link
Contributor Author

Jenkins, test this

@elasticmachine
Copy link
Contributor

💔 Build Failed

@bhavyarm bhavyarm changed the title [WIP] tests for visualization screenshot comparison Tests for visualization screenshot comparison Apr 16, 2018
@jbudz jbudz added test Feature:Visualizations Generic visualization features (in case no more specific feature label is available) labels Apr 16, 2018
@elasticmachine
Copy link
Contributor

💔 Build Failed

@LeeDr
Copy link

LeeDr commented Apr 16, 2018

You have a baseline/screenshot_tag_cloud.png which isn't used in the test? Maybe it was replaced by screenshot_tagcloud_single.png?

@LeeDr
Copy link

LeeDr commented Apr 16, 2018

My first run of this PR had rather bad results;
17 passing (2.0m)
13 failing

Through some trial and error and looking at the diff screenshots I found that if I set the window size in the visualize/index.js page to 1291 x 811 remote.setWindowSize(1292, 811); then I got much better results. Only 1 failed and it was pretty close to passing.

29 passing (1.0m)
1 failing



     └- ✖ fail: "visualize app visualize app should compare visualization screenshot for data table with minimum bucket metric agg and geohash"
     │        Error: expected 0.050852997448979594 to be below 0.05

Based on this, I'd like us to try to devise a screen size tuning routine that takes some screenshot and adjusts the Kibana window to get the smallest diff. This could even allow the tests to pass across multiple browsers.

@bhavyarm
Copy link
Contributor Author

Updated the screenshot for data table options and trying again to see if CI goes through.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@LeeDr
Copy link

LeeDr commented Apr 18, 2018

@bhavyarm On my Ubuntu machine, if I set my setWindowSize(1290,811) then my screenshots are exactly the size of the baseline screenshots and I get very good matching. Some are less than .01 difference and the worst one was about 0.022.
If I don't adjust it, then the compare_pngs shows me the difference in size of the images.

The question is, why do I have to adjust my setWindowSize different than when you did the baseline. Even in the last passing Jenkins job you can see that the new session screenshot size didn't match the baseline size. The height is 9 pixels taller on the session.

16:58:37 expected height 686 and width 1280
16:58:37 actual height 695 and width 1280
16:58:38        │ debg  calculating diff pixels...
16:58:38        │ debg  percentSimilar: 0.026081905976676385

@LeeDr
Copy link

LeeDr commented Apr 18, 2018

I'm going to try this PR on Windows and see what kind of sizes and diffs I get.

@bhavyarm
Copy link
Contributor Author

@LeeDr see if this works when you get a moment. Thanks!

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@LeeDr
Copy link

LeeDr commented Apr 19, 2018

On my Windows 4k 15" laptop with display scaling set to 100% and setWindowSize(1308, 822); the session screenshots match the baseline screenshot sizes exactly and these are the results (1 failing timelion_colors);

$ time node scripts/functional_test_runner --debug | grep -E "(percentSimilar|Taking)"

       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_area_chart_bar.png"
       │ debg  percentSimilar: 0.012302979227405248
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_area_chart_options.png"
       │ debg  percentSimilar: 0.012031933309037901
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_barchart_percentile.png"
       │ debg  percentSimilar: 0.014806168002915452
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_barchart_tophit.png"
       │ debg  percentSimilar: 0.01845276056851312
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_datatable_average.png"
       │ debg  percentSimilar: 0.021078944970845483
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_datatable_options.png"
       │ debg  percentSimilar: 0.03068740889212828
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_datatable_significant.png"
       │ debg  percentSimilar: 0.02009384110787172
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_gaugecircle_options.png"
       │ debg  percentSimilar: 0.03494442419825073
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_goal_chart_options.png"
       │ debg  percentSimilar: 0.009010568513119533
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_heatmap_alloptions.png"
       │ debg  percentSimilar: 0.010871446793002915
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_horizontal_bar_chart.png"
       │ debg  percentSimilar: 0.011112882653061224
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_inputcontrol_options.png"
       │ debg  percentSimilar: 0.04303708090379009
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_line_chart_options.png"
       │ debg  percentSimilar: 0.012067237609329446
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_line_chart_parent.png"
       │ debg  percentSimilar: 0.012034211005830903
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_linechart_bubbles.png"
       │ debg  percentSimilar: 0.01107757835276968
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_linechart_derivative.png"
       │ debg  percentSimilar: 0.011325847303206998
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_linechart_serial.png"
       │ debg  percentSimilar: 0.011358873906705539
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_markdown_options.png"
       │ debg  percentSimilar: 0.04337190233236152
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_metrictable_median.png"
       │ debg  percentSimilar: 0.013550018221574344
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_piechart_donut.png"
       │ debg  percentSimilar: 0.009208728134110788
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_piechart_unique_count.png"
       │ debg  percentSimilar: 0.010436406705539358
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_regionmap_options.png"
       │ debg  percentSimilar: 0.009946701895043732
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tagcloud_single.png"
       │ debg  percentSimilar: 0.028306076895043733
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tilemap_geohash.png"
       │ debg  percentSimilar: 0.009027651239067055
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tilemap_heatmap.png"
       │ debg  percentSimilar: 0.008657525510204081
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_tilemap_options.png"
       │ debg  percentSimilar: 0.009185951166180758
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_timelion_colors.png"
       │ debg  percentSimilar: 0.06928298104956268
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\failure\visualize app visualize app should compare visualization screenshot for timelion.png"
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_vertical_bar_chart_options.png"
       │ debg  percentSimilar: 0.012228954081632652
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_vertical_movingagg.png"
       │ debg  percentSimilar: 0.011375956632653061
       │ info  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\session\screenshot_verticalbar_average.png"
       │ debg  percentSimilar: 0.011980685131195335

@LeeDr
Copy link

LeeDr commented Apr 19, 2018

The latest changes with the calibration routine passes on my Ubuntu desktop and my Windows laptop on multiple difference display scaling settings.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@bhavyarm
Copy link
Contributor Author

Jenkins, test this

@elasticmachine
Copy link
Contributor

💔 Build Failed

@LeeDr
Copy link

LeeDr commented Jun 4, 2018

I feel like we need to have version control and branching on these images if they're going to fit into our standard processes. So to me, that means they go in some github elastic repository.
If we don't want all the binary history of images in kibana repo we can create a new public screenshots repository.
Then the screenshots could be pulled down to where tests are running and cached. I think we could design a way to pull only the latest versions of the screenshots for the branch we're testing without history. Something like wget on the repo zip?
Somewhere in the test framework or even in the tests themselves we check if the screenshots files exist and if some timestamp or version file matches the upstream version. If they don't exist or the version doesn't match we pull the branch as a zip file and unzip overwrite existing files.

When a Kibana PR makes a change that requires updating a screenshot, it should reference the PR or commit in the screenshots repo (and always update the screenshots version file?). Maybe there's an automated way to do this?
I'm thinking that the tests wouldn't need to reference the version or commit specifically.

Another approach is for each test to reference the specific screenshot blob url. In this way only individually changed screenshots would have to be refreshed when there was a change.

We should consider solutions that make the PR process as easy as possible for both the case where a single screenshot needs to be updated or added, and the case where many or all of them have to be updated.

@timroes
Copy link
Contributor

timroes commented Jun 5, 2018

Then the screenshots could be pulled down to where tests are running and cached. I think we could design a way to pull only the latest versions of the screenshots for the branch we're testing without history. Something like wget on the repo zip?

git clone --depth 1 or in case we would be using submodules git clone --shallow-submodules to just clone the most recent commit of a repository.

@bhavyarm
Copy link
Contributor Author

bhavyarm commented Jun 5, 2018

@tylersmalley @cjcenizal @jbudz @silne30 - can you please add your inputs here? Lee and I will talk with Viz team in their next sync. Thanks!

@tylersmalley
Copy link
Contributor

While I understand this adds much-needed coverage for Visualizations, I have a few additional concerns. The screenshot comparisons as implemented rely on a threshold, which in some cases allows for an 8% difference. With the subtleties of visualizations, I feel like this reduces the actual coverage. Understanding that the problem is the test coverage of visualizations, is there a reason we can't implement these tests in any of the current frameworks without adding an additional way of testing.

Additionally, with the new visualization pipeline changes - will that help with the problem of test coverage?

@cuff-links cuff-links merged commit 84d678b into elastic:master Jun 5, 2018
@cuff-links
Copy link
Contributor

cuff-links commented Jun 5, 2018

I had a run-in with visual regression tests while developing a test suite for eui and came into a few snags that seem to be inherent for that type of testing.

Comes With The Territory

  1. Cross Machine Variance - In the issue I linked to above, there was a comment revealing the fact that there was variance between the snapshots on different machines. What made things worse was that the machines were both running the same OS (Mac OSX High Sierra). Differences in installed font packages and other environmental variables could cause the tests to have unpredictable variance numbers. Below: I address how we compensated.
  2. Cross Browser Variance - Because each browser implements the web in their own special way, there is inherent variance between them. I will address how we compensated for that one below too.
  3. Cross Environment Variance - You guessed it. Running on CI vs running on local machine caused window size differences and other factors that I was not able to figure out. Again, compensation strategy listed below.

Compensating For Variances

  1. For Cross Machine Variance - Since EUI is small and does not have a ton of branching, we were able to leverage git to help with this scenario. Basically, tests run twice. First time creates baselines, second time to test against them. This wouldn't happen every time on CI, only when there was a change to a baseline. That work is here. We ran into a snag with node-git and our Jenkins jobs so there are some kinks to work out before that is working in CI.
  2. For Cross Browser and Environment Variance - The toold wdio-visual-regression-service sues filenames to be able to figure out what images to compare against. In the filename, we included the browser, the viewport size, the environment (whether or not you are in CI) and the operating system in the name to ensure that images are being compared in their own scope.
  • This means that you have more images to check in but the image is very focused so variances really tell you something. If you take a screenshot with a ton on the page, you have more moving parts thus making it hard to do a pixel by pixel comparison.
  • This sort testing is helpful to supplement other types so using them in small scale and for focused screenshots seems to be where most of the value lies.
  • We want to make sure we aren't just testing the visualization libraries that we are using and that these tests put confidence in the right areas of our product.

@tylersmalley
Copy link
Contributor

@silne30 why did you merge this PR?

@cjcenizal
Copy link
Contributor

I think it was an accident, but I'll let @silne30 explain that one. 😄

In terms of what to do next, should we revert this? If we do revert it and then decide we want to add this change, then we'd have to revert the revert. Would that re-add the images, thus ballooning the repo size further?

cjcenizal added a commit to cjcenizal/kibana that referenced this pull request Jun 6, 2018
cjcenizal added a commit that referenced this pull request Jun 6, 2018
@cuff-links
Copy link
Contributor

cuff-links commented Jun 6, 2018

@tylersmalley Definitely wasn't intentional. I am not sure if there is some kind of default action on the page for enter keys or different key strokes but the only action I actually meant to take with this issue was to leave the comment that I left. I had no interaction with the merge issue button, whatsoever. Prior to me being requested to comment on this issue, I had no knowledge of it and it was not on my radar. Sorry causing the merge...however it was that it happened.

kindsun pushed a commit that referenced this pull request Jun 12, 2018
* undoing a messy merge

* updating screenshots

* changing the variance to account for data table failure

* trying a different variance for data table and a general one for the rest of the screenshots

* changing the variance for general to .065

* adding xy position to adjust the screensize

* changing variance and setting a small window

* create calibrateForScreenshots method

* remove empty lines
kindsun pushed a commit that referenced this pull request Jun 12, 2018
@bhavyarm bhavyarm self-assigned this Jun 19, 2018
maryia-lapata pushed a commit to maryia-lapata/kibana that referenced this pull request Jun 25, 2018
* undoing a messy merge

* updating screenshots

* changing the variance to account for data table failure

* trying a different variance for data table and a general one for the rest of the screenshots

* changing the variance for general to .065

* adding xy position to adjust the screensize

* changing variance and setting a small window

* create calibrateForScreenshots method

* remove empty lines
maryia-lapata pushed a commit to maryia-lapata/kibana that referenced this pull request Jun 25, 2018
@LeeDr LeeDr mentioned this pull request Apr 8, 2019
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Visualizations Generic visualization features (in case no more specific feature label is available) test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants