Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler does not work with selenium>=4.11.0 #5500

Closed
anakin87 opened this issue Aug 3, 2023 · 0 comments · Fixed by #5515
Closed

Crawler does not work with selenium>=4.11.0 #5500

anakin87 opened this issue Aug 3, 2023 · 0 comments · Fixed by #5515
Assignees
Labels
Contributions wanted! Looking for external contributions topic:crawler

Comments

@anakin87
Copy link
Member

anakin87 commented Aug 3, 2023

Describe the bug
Due to recent changes in selenium, the Chromium driver can't be found though it is properly installed.
Related issue: SeleniumHQ/selenium#12466

Error message

NoSuchDriverException Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/haystack/nodes/connector/crawler.py in init(self, urls, crawler_depth, filter_urls, id_hash_keys, extract_hidden_text, loading_wait_time, output_dir, overwrite_existing_files, file_path_meta_field_name, crawler_naming_function, webdriver_options)
122 try:
--> 123 self.driver = webdriver.Chrome(service=Service("chromedriver"), options=options)
124 except WebDriverException as exc:

5 frames
NoSuchDriverException: Message: Unable to locate or obtain driver for chrome; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location

The above exception was the direct cause of the following exception:

NodeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/haystack/nodes/connector/crawler.py in init(self, urls, crawler_depth, filter_urls, id_hash_keys, extract_hidden_text, loading_wait_time, output_dir, overwrite_existing_files, file_path_meta_field_name, crawler_naming_function, webdriver_options)
123 self.driver = webdriver.Chrome(service=Service("chromedriver"), options=options)
124 except WebDriverException as exc:
--> 125 raise NodeError(
126 """
127 'chromium-driver' needs to be installed manually when running colab. Follow the below given commands:

NodeError:
'chromium-driver' needs to be installed manually when running colab. Follow the below given commands: ...

Expected behavior
The crawler should find the driver and properly work.

Additional context
As a workaround, selenium can be downgraded: pip install "selenium<4.11.0"
We should support the new version and make it work. (It doesn't seem like a difficult change...)

To Reproduce

from haystack.nodes import Crawler

crawler = Crawler(output_dir="crawled_files")

System:

  • OS: Ubuntu 22.04
  • Haystack version (commit or version number): 1.19.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Contributions wanted! Looking for external contributions topic:crawler
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant