Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEWS-6530 Chromification #578

Merged
merged 14 commits into from
Aug 20, 2018
Merged

NEWS-6530 Chromification #578

merged 14 commits into from
Aug 20, 2018

Conversation

bigmartyn
Copy link
Contributor

@bigmartyn bigmartyn commented Aug 13, 2018

https://jira.dev.bbc.co.uk/browse/NEWS-6530

Improve how Wraith works with chrome by:

  • adding additional command line parameters to make it more lenient to ssl errors
  • setting a user-configurable 'settle' time, that the driver will wait for things to become stable
  • retrying 'failed' pages three times before giving up
  • having two 100% different 'error' images, so you never get a false positive if both pages fail
  • incorporating useful pull requests from the greater Wraith community

Copy link

@kodikos kodikos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor tidy ups, but otherwise a good job 👍

@@ -98,10 +98,12 @@ def get_driver
options = Selenium::WebDriver::Chrome::Options.new
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we tidy this up? A pattern like [opt1, opt2, ...].each |opt| options.add_argument(opt)

@@ -132,6 +132,14 @@ def comp_domain_label
domains.keys[1]
end

def timeout_ms
@config['timeout_ms'] ? @config['timeout_ms'] : 1000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be @config['timeout_ms'] || 1000?

end

context 'non-standard config values' do
let(:config) { YAML.load "browser: phantomjs\nthreads: 2\ntimeout_ms: 4000"}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't quite follow the use of YAML.load to create the object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YAML.load takes a YAML string that it parses and returns as an object. To do that from a YAML file, YAML.load_file would be used instead.

It's just easier than creating a fixture file when the contents of that file is small.

Copy link

@kodikos kodikos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, but there is an issue with the tests, but it may be just it needs a rerun.

@bigmartyn bigmartyn merged commit f4859e5 into master Aug 20, 2018
@bigmartyn bigmartyn deleted the chromification branch August 20, 2018 10:36
@bigmartyn bigmartyn changed the title Chromification NEWS-6530 Chromification Aug 20, 2018
@patricknelson
Copy link

patricknelson commented Jun 25, 2019

It looks like these changes aren't yet available when performing a gem install wraith. I was originally assuming from the online documentation that the settle configuration option was available (which appears to have been added in this PR). However, the current published version of this gem is 4.2.3 and predates this new feature.

I was banging my head against the wall since it looked like I had a race condition where some pages would simply not be captured, likely due to timeout, so I resorted to using settle but the configuration didn't seem to have any effect at all. I traced it down to the fact that apparently that's because it's not yet available, so once I updated to the latest version (by manually wiping out my gem install'ed version with the latest files in this repo) my problems were finally solved in combination with a large settle (e.g. 60s in my case since I have a slow local dev server).

Why I'm here (and to help future me and any other folks getting here from a Google search): The error that I was repeatedly seeing, possibly some kind of timeout or race condition, was along the lines of the following:

ERROR: unable to bind to locking port 9514 within 45 seconds
WARN: Using fallback image instead

So, the fix (TL;DR): Running chromedriver-update (via chromedriver-helper) to ensure the chromedriver version it was in sync with my version of chrome along with using the latest code in this PR (not yet published) to then apply a large settle parameter (documented here) seemed to work for me.

Edit: Created issue to track this here: #613

@patricknelson
Copy link

Quick update to anyone reading: These code updates are now available in v4.2.4 via gem install wraith.

So if you've got a ton of paths in your configuration like I do and have similar issues to above, try updating now.

@ajoah
Copy link
Contributor

ajoah commented Jun 27, 2019

Hi @patricknelson and @bigmartyn ,
Apparently there is a backward compability issue with the chrome wait. The default code in wait--chrome.js doesnt work anymore.

I tried to use the new parameter 'settle' but the same.

My knowledge of ruby and selenium is very limited but if i read the documentation of implicit_wait (of which settle changes the value) it works only when we want to find a DOM element :
An implicit wait is to tell WebDriver to poll the DOM for a certain amount of time when trying to find an element or elements if they are not immediately available. The default setting is 0. Once set, the implicit wait is set for the life of the WebDriver object instance.
https://docs.seleniumhq.org/docs/04_webdriver_advanced.jsp#explicit-and-implicit-waits

I missed something ?

@patricknelson
Copy link

patricknelson commented Jun 27, 2019

@ajoah Exactly what errors are you seeing (if any)?

I will say I haven't noticed any change in my before_capture script (which, at the end, effectively does exactly the same thing as wait--chrome.js). Since the feature for Chromedriver doesn't appear to be well documented yet I'll paste my working code (give this a shot):

var doneTesting = arguments[0];
setTimeout(doneTesting, 1000);

Maybe there are more arguments being passed into the wrapping closure now? Since I'm directly targeting the 0th index when the existing boilerplate currently just pulls the last argument, i.e. in wait--chrome.js:

var callback = arguments[arguments.length-1];
setTimeout(callback, 2000);

Also worth noting: You've got a great point on the use of implicit_wait when no call to driver.find_element is occurring. So, theoretically, changing settle should have no effect here. In my case I also set threads: 1 since I'm running it in Windows (via Mingw-w64) and parallel isn't compatible with Windows. So now that I think of it my unable to bind to locking port error above might have been related to that as well (not sure yet since I haven't dived into this deeply enough, unfortunately).

@ajoah
Copy link
Contributor

ajoah commented Jun 28, 2019

@patricknelson no difference with your code :/
There is no error just chrome doesn't wait. To check, I set 20 seconds and i run on one page : the "saving images" step takes less than 5sec. How do you check ?

I think it is due to the fact that execute_async_script is replaced by execute_script : https://github.com/BBC-News/wraith/pull/578/files#diff-b6c1420e088ed8e1127a35c6ac8ec84eR139

The first function wait a callback, the second not : https://seleniumhq.github.io/selenium/docs/api/java/org/openqa/selenium/JavascriptExecutor.html

There is a reason of this replacement ? if not we can set back execute_async_script :)

ping @staylos92

patricknelson added a commit to patricknelson/wraith that referenced this pull request Jul 20, 2019
…gression caused by bbc#578) and just reverting to hard-coded 'invalid.jpg' from before.
@nicrodgers
Copy link

We are also seeing the same behaviour (wait not working, settle not working) with Chrome and wraith 4.2.4.

@patricknelson
Copy link

Crossposting my potential solution from #616 (comment) here in case it's helpful for any other folks that may land on this PR 😄

What do you guys see when you run chromedriver -v? Also, what version of Chrome are you using? If those two aren't the same, try replacing the current chromedriver binary currently in your path with one from here that is aligned with your version of Chrome: https://chromedriver.chromium.org/

That's usually the issue for it going out of sync. I think it'd be a rad feature to have wraith somehow find a way to stay in sync with your currently installed Chrome (either that or just isolate/ship it's own version... hint hint... 😎).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants