Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Kubernetes DNS and Remote Webdriver #220

Closed
prabhatverma286 opened this issue Feb 18, 2021 · 13 comments
Closed

Kubernetes DNS and Remote Webdriver #220

prabhatverma286 opened this issue Feb 18, 2021 · 13 comments

Comments

@prabhatverma286
Copy link

Thank you for such an amazing tool you have built!

If I understand the functionality correctly, selenium-wire opens up a proxy socket on the addr FQDN (and random port), and any requests from selenium are routed through this proxy. This allows the proxy to capture and process all the requests.

It took me a while to understand that the addr FQDN is used for both - the creation of the proxy socket and from selenium to connect to the proxy.

I have set up a remote selenium grid on my kubernetes cluster, and I am trying to connect to it from another pod within my kubernetes cluster. The services for these pods are of type ClusterIP - meaning that the IP is randomly generated with each deployment. Kubernetes has intelligent DNS resolution where you can specify http://service-name:port, and it will resolve it to the IP address. So I should be able to open a port with service-name in the addr option, however when I try to do that, I get the following error:
seleniumwire.thirdparty.mitmproxy.exceptions.ServerException: Error starting mitmproxy server: gaierror(-5, 'No address associated with hostname')

I tried using 127.0.0.1 or 0.0.0.0, and although I am able to create the proxy server, selenium is of course unable to connect to it and it fails with
Message: unknown error: net::ERR_PROXY_CONNECTION_FAILED

It would be beneficial if I could define, for example, an addr to start the proxy server (where I could use 127.0.0.1) and another option to be able to reach that proxy server from selenium (where I can leverage the kubernetes DNS resolution).

@wkeeling
Copy link
Owner

wkeeling commented Feb 19, 2021

Thanks for raising this. You're right, Selenium Wire sends all browser traffic through an internal proxy it spins up in the background, and it uses the same address for both the proxy server and for Selenium itself when it configures the browser. There isn't the ability to separate these currently in Selenium Wire itself, but I think there may be a workaround.

If you're using the latest version of Selenium Wire, then there's an option called auto_config. This tells Selenium Wire to configure the browser - via Selenium - with the IP/port of it's internal proxy. The option is set to True by default but if you set this to False then Selenium Wire won't configure the browser and will assume you will do it manually. You can configure the browser by passing a browser specific option (I'm assuming Chrome here) and specify the Kubernetes service-name at this point.

Here's some code that demonstrates how to do it:

from seleniumwire import webdriver

sw_options = {
    'auto_config': False,  # Ensure this is set to False
    'addr': '0.0.0.0',  # The address the proxy will listen on
    'port': 8087,
}

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=service-name:8087')  # Specify your Kubernetes service-name here
chrome_options.add_argument('--ignore-certificate-errors')

driver = webdriver.Remote(
    desired_capabilities=chrome_options.to_capabilities(),
    seleniumwire_options=sw_options,
)

driver.get(...)

Would that allow you to take advantage of the Kubernetes DNS resolution?

@wkeeling
Copy link
Owner

wkeeling commented Mar 3, 2021

@prabhatverma286 did the suggested workaround above work for you?

@prabhatverma286
Copy link
Author

prabhatverma286 commented Mar 9, 2021

@wkeeling apologies for taking so long to reply - got a bit busy. I tried the workaround again today and it works, so thank you so much!

FWIW, my code below (I am using forward proxy as I am behind a firewall)

from selenium.webdriver.common.by import By
from seleniumwire import webdriver

options = {
    'suppress_connection_errors': False,
    'auto_config': False,
    'addr': '0.0.0.0',
    'port': 8087,
    'proxy': {
        'http': <forward proxy details like scheme://user:pass@ip:port>,
        'https': <forward proxy details like scheme://user:pass@ip:port>,,
    },
}

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=kubernetes-service-name:8087')
chrome_options.add_argument('--ignore-certificate-errors')

browser = webdriver.Remote('http://selenium-service-name:4444/wd/hub',
                           desired_capabilities=chrome_options.to_capabilities(), seleniumwire_options=options)

print("Browser setup done.")
# Use try/finally so the browser quits even if there is an exception
try:
    print("Getting yt.")
    browser.get("https://www.youtube.com/")
    print("Saving screenshot for yt")
    browser.save_screenshot('yt.png')
    print("Extracting Xpath.")
    text = browser.find_element(By.XPATH,'/html/body/ytd-app/div/ytd-page-manager/ytd-browse/ytd-two-column-browse'
                                         '-results-renderer/div[1]/ytd-rich-grid-renderer/div['
                                         '6]/ytd-rich-item-renderer[1]/div/ytd-rich-grid-media/div[1]/div/div['
                                         '1]/h3/a/yt-formatted-string').text
    print(f'The title of the first video on youtube is : {text}')
except Exception as e:
    print(e)
finally:
    browser.quit()
    print(browser.requests)

Which produces the output:

python selenium_test.py
Browser setup done.
Getting yt.
Saving screenshot for yt
Extracting Xpath.
The title of the first video on youtube is : Positive Mood JAZZ - Sunny Jazz Cafe and Bossa Nova Music
[]

with screenshot:

Took me a couple of tries though, because I remember trying it a few weeks ago and it didn't work. But most probably I was doing something wrong.

One question though: the requests dictionary at the end of the output is empty. Any idea why?

@wkeeling
Copy link
Owner

wkeeling commented Mar 9, 2021

Great news that the workaround works! I may now look at adding the above example to the readme, as it may provide useful for other people running a container setup.

Regarding printing the requests, you just need to make sure that you print them before calling browser.quit(). Quitting the browser will shutdown Selenium Wire and clear out all captured requests. So switching the statements around should fix:

finally:
    print(browser.requests)
    browser.quit()  # Clears out request storage

@wkeeling wkeeling closed this as completed Mar 9, 2021
@TrinityHC
Copy link

@prabhatverma286
I'm trying use webdriver.Remote and user pass proxy with selenium wire in docker, still confused about some params in your code

options = {
     ...
    'addr': '0.0.0.0',
    'port': 8087,
    'proxy': {
        'http': http://user:pass@ip:port,
        'https': https://user:pass@ip:port>,
    },
}

what is the port at here?
And

chrome_options.add_argument('--proxy-server=kubernetes-service-name:8087')

what is kubernetes-service-name at here? if I'm using docker at here, what should I put at here?

my docker-compose.yml is like

  ...
  chrome:
      image: selenium/standalone-chrome:latest
      hostname: chrome
      ports:
        - "4444:4444"
      privileged: true
      shm_size: 2g

and current code is like

        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_argument('--headless')
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        
        self.browser = webdriver.Remote(command_executor='http://chrome:4444/wd/hub', options=chrome_options, desired_capabilities=capabilities)

Appreciate your help in advance

@pradeepjay02
Copy link

What is selenium-service-name here

@kaletvintsev
Copy link

kaletvintsev commented Sep 4, 2023

This is my working setup with proxy and chrome-node in Docker.
Docker containers:
selenium/node-chrome:115.0
selenium/hub:4.11

from seleniumwire import webdriver

PROXY_IP = 'your proxy ip'
PROXY_PORT = 'your proxy port'

sw_options = {
    'addr': "0.0.0.0", 
    'auto_config': False,
    'port': 8087,  # You can choose any other, the main thing is that it is free
    # proxy settings 
    'proxy': {
        'http': f'http://{PROXY_IP}:{PROXY_PORT}',
        'https': f'https://{PROXY_IP}:{PROXY_PORT}'
    }
}

chrome_options = webdriver.ChromeOptions()
# here should be port from sw_options['port']
chrome_options.add_argument('--proxy-server=host.docker.internal:8087')
chrome_options.add_argument('--ignore-certificate-errors')

driver = webdriver.Remote(
    command_executor="http://localhost:4444/wd/hub", # docker selenium-hub address
    options=chrome_options,
    seleniumwire_options=sw_options
    )

@diyoyo
Copy link

diyoyo commented Oct 3, 2023

Hi, thanks @kaletvintsev for sharing.
Let's say I'm using jupyterlab in a docker, on a machine A. Then from a notebook, I want to call the hub on server B, which decides which node C to call on its own... So there are 3 machines, + docker proxys on each machine.

I am a bit confused about what addr, port, PROXY_IP and PROXY_PORT are supposed to target.

Because:

  • A has ip: 192.168.0.a
  • JupyterLab in docker container on A has ip: 172.17.0.xx
  • B has ip: 192.168.0.b
  • Hub in docker container on B has ip: 172.18.0..xx
  • C has ip: (well, it depends on what Hub decides, but you get where I'm going with this)...

Just to be has clear as possible:

  • web UI for jupyterlab is available at http://192.168.0.a:8888
  • web UI for hub is available at http://192.168.0.b:4444
  • and my node config in docker-compose is:
    - "SE_EVENT_BUS_SUBSCRIBE_PORT=4443"
    - SE_NODE_HOST=${SE_NODE_IP}
    - "SE_NODE_PORT=5555"
    - SE_EVENT_BUS_HOST=${SE_HUB_IP}
    - "SE_EVENT_BUS_PUBLISH_PORT=4442"

This is my working setup with proxy and chrome-node in Docker. Docker containers: selenium/node-chrome:115.0 selenium/hub:4.11

from seleniumwire import webdriver

PROXY_IP = 'your proxy ip'
PROXY_PORT = 'your proxy port'

w_options = {
    'addr': "0.0.0.0", 
    'auto_config': False,
    'port': 8087, 
    # proxy settings 
    'proxy': {
        'http': f'http://{PROXY_IP}:{PROXY_PORT}',
        'https': f'https://{PROXY_IP}:{PROXY_PORT}'
    }
}

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=host.docker.internal:8087')
chrome_options.add_argument('--ignore-certificate-errors')

driver = webdriver.Remote(
    command_executor="http://localhost:4444/wd/hub", # docker selenium-hub address
    options=chrome_options,
    seleniumwire_options=sw_options
    )

@diyoyo
Copy link

diyoyo commented Oct 3, 2023

Maybe I should open the PROXY_PORT in the docker-compose.yml config, so that the http request can go through?

@danztensai
Copy link

 'addr': "0.0.0.0", 
    'auto_config': False,
    'port': 8087, 

what is port: 8087, what is it for? @diyoyo

@diyoyo
Copy link

diyoyo commented Oct 23, 2023

 'addr': "0.0.0.0", 
    'auto_config': False,
    'port': 8087, 

what is port: 8087, what is it for? @diyoyo

I have no clue @danztensai , I just tried to use the code previous posted 😄

@kaletvintsev
Copy link

what is port: 8087, what is it for? @diyoyo

This is the port for the seleniumwire proxy. You can choose any other, the main thing is that it is free

@danztensai
Copy link

do I have to expose the port in the docker? @kaletvintsev

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants