Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent handling of script errors between the three engines #731

Closed
AkA84 opened this issue Apr 13, 2018 · 20 comments
Closed

Inconsistent handling of script errors between the three engines #731

AkA84 opened this issue Apr 13, 2018 · 20 comments

Comments

@AkA84
Copy link
Contributor

AkA84 commented Apr 13, 2018

(This is with backstopjs@3.2.9)

I noticed an inconsistency with how the three engines handle script errors, with Puppeteer being the one more troublesome, imho.

I came up with this setup, which you can use to replicate the issue:

Three scenarios, one of which is designed to fail

"scenarios": [
  {
    "label": "Google",
    "url": "https://google.com"
  },
  {
    "label": "Google / Wait for wrong selector",
    "url": "https://google.com",
    "onReadyScript": "wait-for-wrong-selector.js"
  },
  {
    "label": "Google / Type",
    "url": "https://google.com",
    "onReadyScript": "type.js"
  }
]

For the two scenarios with the "onReady" script, there is a variation of the script for each of the engines

Casper

// casper/type.js
module.exports = casper =>
  casper.then(() => {
    casper.fillSelectors('form.tsf', {
      'input#lst-ib': 'foobarbar'
    });
  });
}

// casper/wait-for-wrong-selector.js
module.exports = casper =>
  casper.then(() => {
    casper.fillSelectors('form.tsf', {
      'input#lst-ib': 'foobarbar'
    });
  });

  casper.then(() => {
    casper.waitForSelector('.does-not-exist');
  });
}

Chromy

// chromy/type.js
module.exports = chromy => {
  chromy.type('input#lst-ib', 'foobarbar');
}

// chromy/wait-for-wrong-selector.js
module.exports = chromy => {
  chromy.type('input#lst-ib', 'foobarbar');
  chromy.wait('.does-not-exist');
}

Puppeteer

// puppet/type.js
module.exports = async puppet => {
  await puppet.type('input#lst-ib', 'foobarbar');
}

// puppet/wait-for-wrong-selector.js
module.exports = async puppet => {
  await puppet.type('input#lst-ib', 'foobarbar');
  await puppet.waitFor('.does-not-exist', { timeout: 1000 });
}

Now backstop reference and backstop test are run with all three engines, and these are the results that I've found

Casper

COMMAND | Executing core for `reference`
# ...
CasperJS:  CREATING NEW REFERENCE FILES
CasperJS:  Current location is https://google.com
CasperJS:  Capturing screenshots for desktop (1920x900)
CasperJS:  Ready event received.
CasperJS:  Current location is https://google.com
CasperJS:  Wait timeout of 5000ms expired, exiting.

Testing script failed with code: 1
# ...
      COMMAND | Command `reference` ended with an error after [9.965s]

With CasperJS the script error makes the entire command immediately fail and exit, meaning that BackstopJS can't capture the remaining screenshots in the list.

Given that reference fails, there's no point in running test given that it will fail as well and no report will be generated

Chromy

COMMAND | Executing core for `reference`
# ...
CREATING NEW REFERENCE FILES
9222 INFO >  BackstopTools have been installed.
CREATING NEW REFERENCE FILES
9223 Chrome v65 detected.
9223 INFO >  BackstopTools have been installed.
CREATING NEW REFERENCE FILES
9224 Chrome v65 detected.
9224 INFO >  BackstopTools have been installed.

# ...
      COMMAND | Command `reference` successfully executed in [10.396s]

Chromy marks the task as successful and doesn't report any error whatsoever, although the screenshot for "Google / Wait for wrong selector" had not been generated

Running test gives the same output, but it's marked as failed. This is report's outcome

COMMAND | Executing core for `report`
      compare | Chromy error: WaitTimeoutError. See scenario – Google / Wait for wrong selector (desktop)
      compare | OK: Google backstop_default_Google_0_document_0_desktop.png
      compare | OK: Google / Type backstop_default_Google___Type_0_document_0_desktop.png
      # ...
      report | 2 Passed
      report | 1 Failed
      # ...
      COMMAND | Command `report` ended with an error after [0.177s]
      COMMAND | Command `test` ended with an error after [10.845s]

Now report reports on the WaitTimeoutError and it fails, showing in the html report the missing screenshot for "Google / Wait for wrong selector"

chromy

Puppeteer

COMMAND | Executing core for `reference`
# ...
CREATING NEW REFERENCE FILE
Browser Console Log 0: JSHandle:BackstopTools have been installed.
# ...
CREATING NEW REFERENCE FILE
Browser Console Log 0: JSHandle:BackstopTools have been installed.
######## Error running Puppeteer ######### Error: waiting failed: timeout 1000ms exceeded
# ...
CREATING NEW REFERENCE FILE
Browser Console Log 0: JSHandle:BackstopTools have been installed.

# ...

      COMMAND | Command `reference` successfully executed in [5.624s]

While it reports that something went wrong with a script (which is good for debugging), Puppeteer still allows for the entire test suite to run, and the report command is marked as successful

Running test gives the same output, and it's marked as successful as well. This is report's outcome

COMMAND | Executing core for `report`
      compare | OK: Google / Type backstop_default_Google___Type_0_document_0_desktop.png
      compare | OK: Google backstop_default_Google_0_document_0_desktop.png
       # ...
       report | 2 Passed
       report | 0 Failed
       # ...
      COMMAND | Command `report` successfully executed in [0.476s]
      COMMAND | Command `test` successfully executed in [6.424s]

There is no mentioning whatsoever of "Google / Wait for wrong selector" which at this point is simply ignored, report is marked as successful and the html report shows that everything is fine...but with only 2 screenshots instead of 3

puppet


While CasperJS is too eager to stop everything as soon as it finds an error, Puppeteer's handling of script errors imho is worse as it simply marks the whole run as successful even if there are screenshots missing. If you have a big test suite (we have 120+ screenshots in our project), it's very easy to trust the report and not realize that some screenshots might be missing.

I think the best behaviour should be a mix of Chromy + Puppeteer: show the script error in the output for debugging purposes, let the command run to completion and then mark test/report as failed and show the missing screenshot(s) in the report

@garris
Copy link
Owner

garris commented Apr 13, 2018

@AkA84 thank you very much for doing this comparison! The current intended puppeteer behavior is supposed to throw an unexpected error if a case like this happens. So, this is definitely a bug. I would definitely love to see this fixed and I have been putting off closing #704 and announcing an official release of puppeteer until this is addressed.

@krisdigital — 👆 any ideas here?

@krisdigital
Copy link
Contributor

@garris With this PR it should behave like Chromy + the error messages..

@garris
Copy link
Owner

garris commented Apr 13, 2018

@krisdigital — thank you so much! Will follow up soon.

@garris
Copy link
Owner

garris commented Apr 15, 2018

The puppeteer version is now fixed on @latest. The error messages are also being sent to the front end -- now I just need to add this information to the layout.

npm install -g backstopjs@3.2.11

@garris
Copy link
Owner

garris commented Apr 16, 2018

Hello All, Puppeteer errors are now correctly surfaced in the web report...

screen shot 2018-04-16 at 11 27 23 am

@AkA84
Copy link
Contributor Author

AkA84 commented Apr 18, 2018

@garris @krisdigital i installed the latest version (v3.2.12) and run those 3 scenarios again, and now backstop (with puppeteer) exits as soon as an error is encountered

COMMAND | Executing core for `reference`
createBitmaps | Selected 3 of 3 scenarios.
CREATING NEW REFERENCE FILE
Browser Console Log 0: JSHandle:BackstopTools have been installed.
x Close Browser
CREATING NEW REFERENCE FILE
Browser Console Log 0: JSHandle:BackstopTools have been installed.
Puppeteer encountered an error while running scenario "Google / Wait for wrong selector"
Error: waiting for selector ".does-not-exist" failed: timeout 1000ms exceeded
      COMMAND | Command `reference` ended with an error after [3.799s]
      COMMAND | TypeError: Cannot read property 'join' of undefined
                    at processScenarioView (node_modules/backstopjs/core/util/runPuppet.js:252:151)
                    at <anonymous>

I noticed that the line referenced in the error is this one

const testPair = engineTools.generateTestPair(config, scenario, viewport, variantOrScenarioLabelSafe, scenarioLabelSafe, 0, `${scenario.selectors.join('__')}`);

scenario.selectors in my case is undefined. This is the scenario object for me:

{
  label: 'Google / Wait for wrong selector',
  url: 'https://google.com',
  onReadyScript: 'wait-for-wrong-selector.js',
  sIndex: 1
}

@krisdigital
Copy link
Contributor

@AkA84 @garris I think that error was introduced here ddce7a5#diff-2b0d3e4bf62b40b86f9b984a5f9003ecR252

@garris
Copy link
Owner

garris commented Apr 18, 2018

Thank you for finding this! I will fix this ASAP.

@garris garris closed this as completed in d4f9956 Apr 18, 2018
@AkA84
Copy link
Contributor Author

AkA84 commented Apr 19, 2018

@garris (and @krisdigital ) everything works great now, thanks!

@AkA84
Copy link
Contributor Author

AkA84 commented Apr 20, 2018

I'm sorry @garris i just noticed now that it still doesn't work as expected, the missing screenshots are marked as green, as if Backstop compared the two error images

wrong

This wasn't the case from what i see here, so something must've gotten broken after that

@garris
Copy link
Owner

garris commented Apr 20, 2018

@AkA84 ah, so did you approve that error state? That is actually legal! 😄

You are allowed to check for (and approve) selectors not being available.

@AkA84
Copy link
Contributor Author

AkA84 commented Apr 22, 2018

@garris no i've never approved any report in any of the tests i did as part of this issue, that's why i'm reporting this as a bug. I think the behaviour should be the one displayed in your screenshot

img

and the one that Chromy has (with the due differences)

Running test gives the same output, but it's marked as failed. [...]
Now report reports on the WaitTimeoutError and it fails, showing in the html report the missing screenshot for "Google / Wait for wrong selector"

otherwise imho this could be even worse than it was before: if before you knew the total number of scenarios you could quickly see that the number reported on was not right, now you can't even do that ! You basically have to scroll down yourself and see if there are any errors reported (= you can't trust the report)

And I'm not even considering the scenario where you are running backstop as part of a CI process where none of the two options are even possible :/

@garris
Copy link
Owner

garris commented Apr 22, 2018

Did you run reference at any point? You had to run reference or approve for that reference image to get there. If you did not do either of those things then there is an error. Otherwise this is working as designed.

@igorpavlov-zz
Copy link

igorpavlov-zz commented Apr 23, 2018

@garris Does this mean that I would need to check all my screenshots manually? We are a relatively large project and currently having 100+ scenarios to run, we cannot really afford checking each reference screenshot to be valid.

I think it would make sense to just show the scenario fail if:

  1. there are errors in script / selector errors
  2. screenshot could not be taken by any reason

Please let us know what do you think.

@AkA84
Copy link
Contributor Author

AkA84 commented Apr 23, 2018

@garris I think I understand where we are diverging with our opinions on how things should work.

The way we use backstop (the first time, at least) in our project is to first run directly backstop reference so that we automatically get the baseline screenshots, and then to run backstop test to catch any diffs. In this case if a screenshot had a script error, backstop would not report it after running test, on the assumption that if the same error appeared on the reference screenshot then the dev must have approved it and therefore must be fine with it.

Instead, if I understand correctly, the way you expect it to be used is to run directly test even the first time, then manually check if all the screenshots look ok and finally accept them so they can get promoted to original set of baseline screenshots. By checking manually the would-be baseline screenshots, the dev is sure to be aware of any screenshot with script errors and so if he's fine with having a that as a reference, then Backstop will be fine with it as well

I see two possible problems with the latter approach (on the assumption that I got you right)

  1. It requires a manual check to approve the first set of baseline screenshots. This is fine if you have a small set of scenarios, but if you have a considerable amount (we have more than 120 scenarios at the moment), this step could not only become an annoyance but also be prone to human error
  2. It might not work well with CI. We intend to use backstop at some point in Jenkins, but in order to do that we need to create the reference screenshots automatically and thus to trust that backstop reports any script errors that it might find, however consistent they might be

Last point I though worth discussing was

You are allowed to check for (and approve) selectors not being available.

But wouldn't it be odd to rely on a scenario with a failing script to check whether a selector is available or not? Shouldn't backstop encourage devs, in that case, to write a successful script that check if a selector doesn't exist, instead of using a failing script that expect a selector to exist?

Also please note that this is not simply about a selector existing or not, it's about any script error. For example in changed my type.js script to call non existing method, and backstop still passed the comparison despite it

module.exports = async engine => {
  // ...
  await engine.nonExistingMethod('.does-not-exist', { timeout: 1000 }); // <---
}

error-1

And I tried one last thing: to have, for the same scenario, one error thrown for reference, and a different error thrown for test

// when running "reference"
module.exports = async engine => {
  // ...
  await engine.nonExistingMethod('.does-not-exist', { timeout: 1000 });
}

// when running "test"
module.exports = async engine => {
  // ...
  await engine.waitFor('.does-not-exist', { timeout: 1000 });
}

and backstop still passed the comparison, reporting only on the second error

error-2

but this means that then a scenario with a failing script will always be pass comparison as long as the script keeps failing, no matter what the reason for the failure is (could be a non-existent selector, a wrong method, internal engine error, etc)

@garris
Copy link
Owner

garris commented Apr 23, 2018

I understand your thinking — however, this has not been a primary use case as of yet.

That said — i think this is a useful feature and it would be very easy to add. How about, at the end of a backstop reference flow, we look though the results collection to see if there were any client errors — if there were any errors we exit with a 1. Or, you could add this behavior to reference and test commands behind a config flag (e.g. failOnException). I think that would be cool.

Or maybe simpler for you on the CI side, you could just grep for error messages that pop up during the run and decide specifically what kinds of things you want to check for.

Do any of these sound good for you?

@igorpavlov-zz
Copy link

igorpavlov-zz commented Apr 23, 2018

@garris Thanks a lot for the reply, I think we should still allow reference to continue running, even if there are errors. These errors at the end are actually very handy. Imagine we have 2 errors, this would allow to see them all by running BS once, not twice.

The issue is that the test remains to be considered as passed, while there are errors - it failed, test is not working.

@garris
Copy link
Owner

garris commented Apr 23, 2018

@igorpavlov Sure, So then you would want failOnException: true to fail an individual scenario if there were any exceptions. (even if the comparison passes) Right? This seems like a useful feature. What do you guys think?

@AkA84
Copy link
Contributor Author

AkA84 commented Apr 24, 2018

Thanks @garris for your reply!

If by adding that failOnException flag (or flags, one for reference and one for test) we can achieve the following, it sounds great to me!

  • reference can be successful (exit code = 0) even if there are script errors. Basically as it is now. If someone wants the command to end with exit code 1, they can simply set failOnException: true on it
  • test, and by extent report, should fail (exit code = 1) if there are script errors. And by fail I don't mean "exit immediately" but i mean "finish the whole run and then report the error". The report should then mark as failed any scenario with both screenshots missing due to script errors, and that scenario should be counted in the "failed" total. This of course only if failOnException: true

Does that match what you had in mind for the failOnException flag?

@garris
Copy link
Owner

garris commented Apr 24, 2018

Yes. That is very specifically exactly what I was thinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants