Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between Windows and Linux when handling HTTPS requests through HTTP proxy #1434

Closed
iyanmv opened this issue Sep 5, 2018 · 8 comments

Comments

@iyanmv
Copy link

iyanmv commented Sep 5, 2018

Hi,

Let me explain the problem I was facing and how I end up here. I'll try to do a brief summary.

I work in a company that protects access to the Internet through a proxy that uses NTLMv2 for authentication. This is not a problem for Windows computers but it is a pain when working with GNU/Linux machines. Anyway, there is a great solution for that: CNTLM. It is possible to create a local proxy that is able to authenticate with the corporate proxy and then, by setting http_proxy, https_proxy and no_proxy env variables and configuring properly some specific tools that do not use this variables (apt, yum, git, docker, etc.), voilà, Internet for everyone! No problems so far with two exceptions: pip and conda.

For those struggling with the same issue, please have a look at this open related issue: Doesn't work behind proxy in corporate Windows network (NTLM). I was able to use pip with Linux machines by setting up a local Nexus repository and adding pypi as a proxy repo. Yes, Nexus is able to authenticate with the NTLM proxy just fine.

But, why do I think this is a urllib3 issue, too? Sure, some interesting feature to add would be allowing NTLM authentication (see #242), but this is not what I am asking here (sure there are more interesting things to implement before an old authentication method). The problem is when I noticed that pip and conda work just fine in Windows with CNTLM, but not in Linux. Same CNTLM, python and urllib3 versions. And the problem is that urllib3 does not work properly when doing https requests through a proxy in Linux. I will try to have a look at the code, but I write these issue just in case more familiar with urllib3 can help 😃

How to replicate:

  1. Install and configure CNTLM
  2. Create a virutalenv or conda env with python 3
  3. Install urllib3 with pip (I tried version 1.23)
  4. Execute the following in Windows and Linux:
import urllib3
proxy = urllib3.ProxyManager()
proxy.request('GET', <any http site>)
proxy.request('GET', <any https site>)

: A similar proxy can be simulated in GNU/Linux with Squid + Samba + NTLMv2 auth. Also, have a look at this comment: it is possible to set it up with Apache.

In Windows both requests works well. I just get the (expected) warning:

InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings

On the other hand, Linux http requests works but https one fails:

HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 Proxy Authentication Required',)))

It looks like packet is not properly managed by CNTLM. Of course, this is the same error I get when trying to use pip and conda or when using python requests module.

Any ideas? Do you know any OS dependent feature that may be causing this?

Thanks!

@iyanmv iyanmv changed the title Differences between Windows and Linux when handling HTTPS requests through proxy Differences between Windows and Linux when handling HTTPS requests through HTTP proxy Sep 5, 2018
@sethmlarson
Copy link
Member

Thanks for filing this issue. Unfortunately, a majority of the maintainers of urllib3 do not use proxies in our workflow and thus aren't familiar with the ecosystem or issues that are commonly faced by proxy users.
It would be of great benefit to many if you could look into solving this issue in urllib3. :)

@YuMan-Tam
Copy link

YuMan-Tam commented Oct 16, 2018

I had exactly the same issue and found a hack. It has nothing to do with the OS.

The source of this issue lies in http.client. When an https request is made via an NTLM proxy, the function _tunnel() in the HTTPconnection class is called. The original code proceeds if the return code is 200. For NTLM proxy, the return code is 407 so the following code is called.

if code != http.HTTPStatus.OK:
    self.close()
    raise OSError("Tunnel connection failed: %d %s" % (code,message.strip()))

The key is that, failed or otherwise, the code only sends one request and then returns. Therefore, NTLM would work if you modify the code to perform the dances prior to the above block. This hack works on my work Windows machine. I don't think it's a "solution" but works in (and only in) this particular case.

@iyanmv
Copy link
Author

iyanmv commented Nov 30, 2018

Thanks @YuMan-Tam! Unfortunately, I cannot test your solution, I'm not working with NTLM proxies anymore (thank god! 🙏)

@joshuacheong
Copy link

Hi all, @YuMan-Tam could you share the code snippet for the workaround? - I have been stuck at this problem for quite some weeks. Your help is deeply appreciated

@YuMan-Tam
Copy link

The relevant modification is commented with “Experimentation connections” and the three import modules sspi, base64, win32api. I think I only modified the function _tunnel.

It has been a while since I lasted worked on it so I did not remember the details. But, roughly, for https requests, part of the NTLM dances dropped. Hence, one needs to find a way to keep the connection alive by manually passing the details of the dance. I figured this out by using the chrome/firefox debug log to isolate all send and receive data – up until the error occurs. This work around is specific for windows, and I only tested on my work PC which uses windows 7. However, I believe the mechanism works in general.

Authentication information is abstracted away with the sspi and win32api module.

Relevant snippet for client.py:

import sspi
import base64
import win32api
    def _tunnel(self):
        connect_str = "CONNECT %s:%d HTTP/1.1\r\n" % (self._tunnel_host,
            self._tunnel_port)
        connect_bytes = connect_str.encode("ascii")
        self.send(connect_bytes)
        
        """ Experimentation for connections"""
        # Prepare authorization header for the new request.
        # Manually add scflags=0
        username = win32api.GetUserName()
        ca = sspi.ClientAuth("NTLM", auth_info=
                             (username, "", None), scflags=0)
        _, data = ca.authorize(None)
        auth_key = base64.b64encode(data[0].Buffer).decode("utf-8")
        
        self._tunnel_headers["Connection"] = "keep-alive"
        self._tunnel_headers["Proxy-Connection"] = "keep-alive"
        self._tunnel_headers["Proxy-Authorization"] = "NTLM %s" % auth_key
        for header, value in self._tunnel_headers.items():
            header_str = "%s: %s\r\n" % (header, value)
            header_bytes = header_str.encode("latin-1")
            self.send(header_bytes)
        self.send(b'\r\n')

        response = self.response_class(self.sock, method=self._method)
        (version, code, message) = response._read_status()
        while True:
            line = response.fp.readline(_MAXLINE + 1)
            if line.decode("utf-8").startswith("Proxy-Authenticate: NTLM "):
                challenge = line.decode("utf-8").replace("\r\n","")
                challenge = list(filter(lambda s: s.startswith("Proxy-Authenticate: NTLM "),challenge.split(",")))
                challenge = challenge[0].strip().split()[2]
                challenge = base64.b64decode(challenge)
                # Build response of challenge
                _, data = ca.authorize(challenge)
                auth_key = base64.b64encode(data[0].Buffer).decode("utf-8")
                self._tunnel_headers["Proxy-Authorization"] = "NTLM %s" % auth_key

            if len(line) > _MAXLINE:
                raise LineTooLong("header line")
            if not line:
                # for sites which EOF without sending a trailer
                break
            if line in (b'\r\n', b'\n', b''):
                break
            if self.debuglevel > 0:
                print('header:', line.decode())
                
        self.send(connect_bytes)
        for header, value in self._tunnel_headers.items():
            header_str = "%s: %s\r\n" % (header, value)
            header_bytes = header_str.encode("latin-1")
            self.send(header_bytes)
        self.send(b'\r\n')

        response = self.response_class(self.sock, method=self._method)
        (version, code, message) = response._read_status()

        if code != http.HTTPStatus.OK:
            self.close()
            raise OSError("Tunnel connection failed: %d %s" % (code,
                                                               message.strip()))
        while True:
            line = response.fp.readline(_MAXLINE + 1)
            if len(line) > _MAXLINE:
                raise LineTooLong("header line")
            if not line:
                # for sites which EOF without sending a trailer
                break
            if line in (b'\r\n', b'\n', b''):
                break
            if self.debuglevel > 0:
                print('header:', line.decode())  

@dopstar
Copy link

dopstar commented Dec 30, 2019

@YuMan-Tam your snippet worked well. I did this:

  • copied and reworked requests-ntlm library into requests-ntlm2 library
    • when requests-ntlm and/or urllib3 finally addresses this I can deprecate requests-ntlm2 and archive the repo
  • created requests_ntlm2.connection.VerifiedHTTPSConnection which inherit from urllib3.connection.VerifiedHTTPSConnection and I overridden its _tunnel() method to be like your snippet
  • created requests_ntlm2.adapters.HttpNtlmAdapter which is responsible of monkey-patching pool classes in urllib3.poolmanager AND sending ntlm credentials downstream.

repo is here: https://github.com/dopstar/requests-ntlm2

@YuMan-Tam
Copy link

@YuMan-Tam your snippet worked well. I did this:

  • copied and reworked requests-ntlm library into requests-ntlm2 library

    • when requests-ntlm and/or urllib3 finally addresses this I can deprecate requests-ntlm2 and archive the repo
  • created requests_ntlm2.connection.VerifiedHTTPSConnection which inherit from urllib3.connection.VerifiedHTTPSConnection and I overridden its _tunnel() method to be like your snippet

  • created requests_ntlm2.adapters.HttpNtlmAdapter which is responsible of monkey-patching pool classes in urllib3.poolmanager AND sending ntlm credentials downstream.

repo is here: https://github.com/dopstar/requests-ntlm2

Awesome. I hope I will have a chance to test this soon! Thank you again for your work!

@MAbdElRaouf
Copy link

@YuMan-Tam Thanks for the excellent workaround. It worked perfectly in my case trying to NTLM authenticate with corporate proxy without exposing username:password in the code. Hope to see this addressed by urllib soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants