-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cant use it on colab #108
Comments
Curious how How you run chrome from collab.. |
Was a solution for this found? |
I know it's a bit late but this is how to solve this First paste the code below in a cell then run it !apt-get update
!apt install chromium-chromedriver
!apt install -y xvfb
!pip install undetected-chromedriver
!pip install PyVirtualDisplay Here we install all the necessary package to run the webdriver And since the default chromedriver is built from chrome, it will throw the same error as well. And as you guessed it, the one built from chromium will run just as fine To do that, run the code below !zip -j /content/chromedriver_linux64.zip /usr/bin/chromedriver
#replace python3.7 with your own version of python in case it's not the same
patcher_src = "/usr/local/lib/python3.7/dist-packages/undetected_chromedriver/patcher.py"
with open(patcher_src, "r") as f:
contents = f.read()
contents = contents.replace("return urlretrieve(u)[0]",\
"return urlretrieve('file:///content/chromedriver_linux64.zip',"\
"filename='/tmp/chromedriver_linux64.zip')[0]")
with open(patcher_src, "w") as f:
f.write(contents) Then just initialize the webdriver just as you would do in the normal way import undetected_chromedriver.v2 as uc
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 600))
display.start()
options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
driver = uc.Chrome(options=options) Make sure to add the Creating a virtual display is not mandatory but an headless browser has an higher chance of getting caught by antibots detection so might as well include it
EDIT: if you already used Now we can properly say that this issue is closed ;) |
Please, see the updated version of the code, as this is broken since the transition of Google Colab devices to Ubuntu 20.04 |
@DiTo97 Thank you so much for this. However I keep getting an error point to this line that reads "Bad Zip File" Do you know how to fix? |
FloatingMind12 Can you please help - your solution #108 (comment) worked perfectly previously, but now started causing the Maybe you knows how to fix? |
same issue |
Do not know if it helps, but with Selenium there was a solution proposed which works And example with a colab notebook which is working Hope someone will also suggest how to adapt it for |
Hi @ThreadedLinx, @maiiabocharova, @ali-arjmandi, I have put together an updated version of the code following @maiiabocharova's suggestion. import pathlib
import re
import subprocess
import typing
def is_in_jupyter_notebook() -> bool:
"""It checks whether a Jupyter notebook is being run"""
try:
get_ipython
return True
except NameError:
return False
def is_on_gcolab() -> bool:
"""It checks whether a Jupyter notebook is being run on Google Colab"""
if not is_in_jupyter_notebook():
return False
return "google.colab" in str(get_ipython())
def is_ubuntu_20_04() -> bool:
import lsb_release
metadata = lsb_release.get_os_release()
distro = metadata["ID"].lower()
release = metadata["RELEASE"]
return distro == "ubuntu" and release == "20.04"
def setup_ubuntu_20_04() -> None:
"""It sets up a Ubuntu 20.04 container with `chromium-browser`
For more information, see
https://github.com/googlecolab/colabtools/issues/3347#issuecomment-1387453484
"""
# It adds debian buster
EOF_debian_buster = """\
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main
deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main
"""
!echo "$EOF_debian_buster" > /etc/apt/sources.list.d/debian.list
# It adds keys
!apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517
!apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138
!apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A
!apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg
!apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg
!apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg
# It adds the debian repo for chromium* packages only
# Note the double-blank lines between entries
EOF_chromium_pref = """\
Package: *
Pin: release a=eoan
Pin-Priority: 500
Package: *
Pin: origin "deb.debian.org"
Pin-Priority: 300
Package: chromium*
Pin: origin "deb.debian.org"
Pin-Priority: 700
"""
!echo "$EOF_chromium_pref" > /etc/apt/preferences.d/chromium.pref
# It installs the packages
!apt-get update
!apt-get install chromium chromium-driver
!apt-get install -y xvfb
def setup_requirements() -> None:
PIP_requirements = " ".join([
"PyVirtualDisplay", # To run a virtual display
"undetected-chromedriver",
])
!python3 -m pip install --upgrade pip
!python3 -m pip install --upgrade $PIP_requirements
def get_py_module_path(module: str) -> typing.Optional[pathlib.Path]:
"""It gets the absolute path of a Python module"""
r = subprocess.run(
["pip", "show", module],
capture_output=True
)
try:
r.check_returncode()
except subprocess.CalledProcessError:
return None
stdout = r.stdout.decode()
try:
RE_abspath = "\nLocation: (?P<abspath>.*)\n"
matches = re.search(RE_abspath, stdout)
abspath = matches.group("abspath")
except AttributeError:
return None
dist_packages = pathlib.Path(abspath).resolve()
return dist_packages / module
def patch_undetected_chromedriver() -> None:
"""It forces `undetected_chromedriver` to run the Chromium webdriver
For more information, see
https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377
"""
chromedriver_filename = "chromedriver_linux64.zip"
src_chromedriver_filepath = ROOT / chromedriver_filename
dst_chromedriver_filepath = pathlib.Path("/tmp") / chromedriver_filename
!zip -j "$src_chromedriver_filepath" /usr/bin/chromedriver
PY_module = "undetected_chromedriver"
module_path = get_py_module_path(PY_module)
patcher_filepath = module_path / "patcher.py"
with patcher_filepath.open("rt") as f:
contents = f.read()
src = f"'file://{src_chromedriver_filepath}'"
dst = f"'{dst_chromedriver_filepath}'"
# It is forced to use the local webdriver
contents = contents.replace(
f"return urlretrieve(u)[0]",
f"return urlretrieve({src}, filename={dst})[0]"
)
with patcher_filepath.open("wt") as f:
f.write(contents)
def setup_container() -> None:
"""It sets up the container which is being run"""
if is_ubuntu_20_04():
setup_ubuntu_20_04()
setup_requirements()
patch_undetected_chromedriver()
ROOT = pathlib.Path("/content")
anchor = ROOT / "anchor.txt"
assert is_on_gcolab(), "It seems you are not on Google Colab"
# It will set the Google Colab container up only
# after disconnections, not after restarts
if not anchor.exists():
setup_container()
anchor.touch() After running the above cell, you may try it out on a Cloudflare-protected website: import time
import pyvirtualdisplay
import undetected_chromedriver.v2 as uc # Note import before selenium
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
with pyvirtualdisplay.Display(visible=0, size=(800, 600)) as _:
URL = "https://nowsecure.nl" # A Cloudflare-protected website
options = uc.ChromeOptions()
options.add_argument("--no-sandbox")
driver = uc.Chrome(options=options)
driver.get(URL)
STR_message = "oh yeah, you passed!"
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//main/h1"))
)
message = element.text.strip().lower()
assert STR_message == message
finally:
driver.quit() Please, mention this issue comment, if you plan to paste it elsewhere! |
Hi @DiTo97 WebDriverException Traceback (most recent call last) 4 frames WebDriverException: Message: Service /root/.local/share/undetected_chromedriver/350167f31e3b62c9_chromedriver unexpectedly exited. Status code was: 1 |
Hi @enok, I have just run my code and it works. What I can suggest is 1) to disconnect and delete the Google Colab runtime (start over), 2) to make sure it is running on Ubuntu 20.04 (you can use the provided function Please, consider upvoting the previous comment, if this helps you solve your problems with Google Colab! |
Thanks for the reply @DiTo97, can you share your collab project working so I can use it as a template? |
Unfortunately, I cannot upload the IPYNB file, as that file type is not supported (nor I can share the Google Colab link). I will paste below the (JSON) contents of the IPYNB file, even though they are exactly the same I shared above. You just have to copy-paste them to a blank file, save the file with the .ipynb extension, and open it on Google Colab. {
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"[#108](https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108)"
],
"metadata": {
"id": "soj_LuU2J3uu"
}
},
{
"cell_type": "code",
"source": [
"import pathlib\n",
"import re\n",
"import subprocess\n",
"import typing\n",
"\n",
"\n",
"def is_in_jupyter_notebook() -> bool:\n",
" \"\"\"It checks whether a Jupyter notebook is being run\"\"\"\n",
" try:\n",
" get_ipython\n",
" return True\n",
" except NameError:\n",
" return False\n",
"\n",
"\n",
"def is_on_gcolab() -> bool:\n",
" \"\"\"It checks whether a Jupyter notebook is being run on Google Colab\"\"\"\n",
" if not is_in_jupyter_notebook():\n",
" return False\n",
"\n",
" return \"google.colab\" in str(get_ipython())\n",
"\n",
"\n",
"def is_ubuntu_20_04() -> bool:\n",
" import lsb_release\n",
" metadata = lsb_release.get_os_release()\n",
"\n",
" distro = metadata[\"ID\"].lower()\n",
" release = metadata[\"RELEASE\"]\n",
"\n",
" return distro == \"ubuntu\" and release == \"20.04\"\n",
"\n",
"\n",
"def setup_ubuntu_20_04() -> None:\n",
" \"\"\"It sets up a Ubuntu 20.04 container with `chromium-browser`\n",
"\n",
" For more information, see \n",
" https://github.com/googlecolab/colabtools/issues/3347#issuecomment-1387453484\n",
" \"\"\"\n",
" # It adds debian buster\n",
" EOF_debian_buster = \"\"\"\\\n",
"deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster.gpg] http://deb.debian.org/debian buster main\n",
"deb [arch=amd64 signed-by=/usr/share/keyrings/debian-buster-updates.gpg] http://deb.debian.org/debian buster-updates main\n",
"deb [arch=amd64 signed-by=/usr/share/keyrings/debian-security-buster.gpg] http://deb.debian.org/debian-security buster/updates main\n",
"\"\"\"\n",
" !echo \"$EOF_debian_buster\" > /etc/apt/sources.list.d/debian.list\n",
"\n",
" # It adds keys\n",
" !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys DCC9EFBF77E11517\n",
" !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 648ACFD622F3D138\n",
" !apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 112695A0E562B32A\n",
"\n",
" !apt-key export 77E11517 | gpg --dearmour -o /usr/share/keyrings/debian-buster.gpg\n",
" !apt-key export 22F3D138 | gpg --dearmour -o /usr/share/keyrings/debian-buster-updates.gpg\n",
" !apt-key export E562B32A | gpg --dearmour -o /usr/share/keyrings/debian-security-buster.gpg\n",
"\n",
" # It adds the debian repo for chromium* packages only\n",
" # Note the double-blank lines between entries\n",
" EOF_chromium_pref = \"\"\"\\\n",
"Package: *\n",
"Pin: release a=eoan\n",
"Pin-Priority: 500\n",
"\n",
"\n",
"Package: *\n",
"Pin: origin \"deb.debian.org\"\n",
"Pin-Priority: 300\n",
"\n",
"\n",
"Package: chromium*\n",
"Pin: origin \"deb.debian.org\"\n",
"Pin-Priority: 700\n",
"\"\"\"\n",
" !echo \"$EOF_chromium_pref\" > /etc/apt/preferences.d/chromium.pref\n",
"\n",
" # It installs the packages\n",
" !apt-get update\n",
" !apt-get install chromium chromium-driver\n",
" !apt-get install -y xvfb\n",
"\n",
"\n",
"def setup_requirements() -> None:\n",
" PIP_requirements = \" \".join([\n",
" \"PyVirtualDisplay\", # To run a virtual display\n",
" \"undetected-chromedriver\",\n",
" ])\n",
"\n",
" !python3 -m pip install --upgrade pip\n",
" !python3 -m pip install --upgrade $PIP_requirements\n",
"\n",
"\n",
"def get_py_module_path(module: str) -> typing.Optional[pathlib.Path]:\n",
" \"\"\"It gets the absolute path of a Python module\"\"\"\n",
" r = subprocess.run(\n",
" [\"pip\", \"show\", module], \n",
" capture_output=True\n",
" )\n",
"\n",
" try:\n",
" r.check_returncode()\n",
" except subprocess.CalledProcessError:\n",
" return None\n",
"\n",
" stdout = r.stdout.decode()\n",
"\n",
" try:\n",
" RE_abspath = \"\\nLocation: (?P<abspath>.*)\\n\"\n",
"\n",
" matches = re.search(RE_abspath, stdout)\n",
" abspath = matches.group(\"abspath\")\n",
" except AttributeError:\n",
" return None\n",
"\n",
" dist_packages = pathlib.Path(abspath).resolve()\n",
" return dist_packages / module\n",
"\n",
"\n",
"def patch_undetected_chromedriver() -> None:\n",
" \"\"\"It forces undetected_chromedriver to run the Chromium webdriver\n",
"\n",
" For more information, see \n",
" https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/108#issuecomment-1170269377\n",
" \"\"\"\n",
" chromedriver_filename = \"chromedriver_linux64.zip\"\n",
"\n",
" src_chromedriver_filepath = ROOT / chromedriver_filename\n",
" dst_chromedriver_filepath = pathlib.Path(\"/tmp\") / chromedriver_filename\n",
"\n",
" !zip -j \"$src_chromedriver_filepath\" /usr/bin/chromedriver\n",
"\n",
" PY_module = \"undetected_chromedriver\"\n",
" module_path = get_py_module_path(PY_module)\n",
"\n",
" patcher_filepath = module_path / \"patcher.py\"\n",
"\n",
" with patcher_filepath.open(\"rt\") as f:\n",
" contents = f.read()\n",
"\n",
" src = f\"'file://{src_chromedriver_filepath}'\"\n",
" dst = f\"'{dst_chromedriver_filepath}'\"\n",
"\n",
" # It is forced to use the local webdriver\n",
" contents = contents.replace(\n",
" f\"return urlretrieve(u)[0]\",\n",
" f\"return urlretrieve({src}, filename={dst})[0]\"\n",
" )\n",
"\n",
" with patcher_filepath.open(\"wt\") as f:\n",
" f.write(contents)\n",
"\n",
"\n",
"def setup_container() -> None:\n",
" \"\"\"It sets up the container which is being run\"\"\"\n",
" if is_ubuntu_20_04():\n",
" setup_ubuntu_20_04()\n",
"\n",
" setup_requirements()\n",
" patch_undetected_chromedriver()\n",
"\n",
"\n",
"ROOT = pathlib.Path(\"/content\")\n",
"anchor = ROOT / \"anchor.txt\"\n",
"\n",
"\n",
"assert is_on_gcolab(), \"It seems you are not on Google Colab\"\n",
"\n",
"# It will set the Google Colab container up only\n",
"# after disconnections, not after restarts\n",
"if not anchor.exists():\n",
" setup_container()\n",
" anchor.touch()"
],
"metadata": {
"id": "l6ORjDeTwZvx"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"import time\n",
"\n",
"import pyvirtualdisplay\n",
"import undetected_chromedriver.v2 as uc # Note import before selenium\n",
"from selenium.webdriver.common.by import By\n",
"from selenium.webdriver.support import expected_conditions as EC\n",
"from selenium.webdriver.support.ui import WebDriverWait\n",
"\n",
"\n",
"with pyvirtualdisplay.Display(visible=0, size=(800, 600)) as _:\n",
" URL = \"https://nowsecure.nl\" # A Cloudflare-protected website\n",
"\n",
" options = uc.ChromeOptions()\n",
" options.add_argument(\"--no-sandbox\")\n",
"\n",
" driver = uc.Chrome(options=options)\n",
" driver.get(URL)\n",
"\n",
" STR_message = \"oh yeah, you passed!\"\n",
"\n",
" try:\n",
" element = WebDriverWait(driver, 10).until(\n",
" EC.presence_of_element_located((By.XPATH, \"//main/h1\"))\n",
" )\n",
"\n",
" message = element.text.strip().lower()\n",
" assert STR_message == message\n",
" finally:\n",
" driver.quit()"
],
"metadata": {
"id": "E_picQQnOSAi"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "SzdIGmJ-BQ-n"
},
"execution_count": null,
"outputs": []
}
]
} |
@DiTo97 |
Fixed by setting the undetected_chromedriver==3.2.1 |
How to add extinction |
zip error: Nothing to do! (/content/chromedriver_linux64.zip) |
This is because the colab linux has changed to 22.04 so the fuction to check for 22.04 becomes false. change all 20.04 to 22.04. |
this code gives
WebDriverException: Message: Service ./chromedriver unexpectedly exited. Status code was: -6
The text was updated successfully, but these errors were encountered: