Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing request Headers and request.path #1436

Closed
geckoy opened this issue Jul 31, 2024 · 2 comments
Closed

Missing request Headers and request.path #1436

geckoy opened this issue Jul 31, 2024 · 2 comments
Assignees
Labels
Question Questions related to proxy server

Comments

@geckoy
Copy link

geckoy commented Jul 31, 2024

Problem
Hi, am trying to use the ProxyPoolPlugin and i faced an issue which the @request param in before_upstream_connection method , doesn't contain all headers and path, when i try "print(request.headers)" it display only three headers:

..$ python testproxy.py
2024-07-31 17:12:32,000 - pid:12005 [I] plugins.load:89 - Loaded plugin proxy.http.proxy.HttpProxyPlugin
2024-07-31 17:12:32,000 - pid:12005 [I] plugins.load:89 - Loaded plugin __main__.ProxyPoolPlugin
CONNECT httpbin.org:443 HTTP/1.1
Host: httpbin.org:443
Proxy-Connection: keep-alive
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36

in contrary the host server receives all headers:

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7", 
    "Accept-Encoding": "gzip, deflate, br, zstd", 
    "Accept-Language": "en-US,en;q=0.9", 
    "Custom-Header": "CustomHeaderValue", 
    "Host": "httpbin.org", 
    "Priority": "u=0, i", 
    "Sec-Ch-Ua": "\"Not)A;Brand\";v=\"99\", \"Google Chrome\";v=\"127\", \"Chromium\";v=\"127\"", 
    "Sec-Ch-Ua-Mobile": "?0", 
    "Sec-Ch-Ua-Platform": "\"Linux\"", 
    "Sec-Fetch-Dest": "document", 
    "Sec-Fetch-Mode": "navigate", 
    "Sec-Fetch-Site": "none", 
    "Sec-Fetch-User": "?1", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-66aa648b-378abae576431222046136ea"
  }
}

Script am using

# -*- coding: utf-8 -*-
"""
    proxy.py
    ~~~~~~~~
    ⚡⚡⚡ Fast, Lightweight, Pluggable, TLS interception capable proxy server focused on
    Network monitoring, controls & Application development, testing, debugging.

    :copyright: (c) 2013-present by Abhinav Singh and contributors.
    :license: BSD, see LICENSE for more details.
"""
import base64
import random
import logging
import ipaddress
from typing import Any, Dict, List, Optional

from proxy.http import Url, httpHeaders, httpMethods
from proxy.core.base import TcpUpstreamConnectionHandler
from proxy.http.proxy import HttpProxyBasePlugin
from proxy.common.flag import flags
from proxy.http.parser import HttpParser
from proxy.common.utils import text_, bytes_
from proxy.http.exception import HttpProtocolException
from proxy.common.constants import (
    COLON, ANY_INTERFACE_HOSTNAMES, LOCAL_INTERFACE_HOSTNAMES,
)


logger = logging.getLogger(__name__)

DEFAULT_HTTP_ACCESS_LOG_FORMAT = '{client_ip}:{client_port} - ' + \
    '{request_method} {server_host}:{server_port}{request_path} -> ' + \
    '{upstream_proxy_host}:{upstream_proxy_port} - ' + \
    '{response_code} {response_reason} - {response_bytes} bytes - ' + \
    '{connection_time_ms} ms'

DEFAULT_HTTPS_ACCESS_LOG_FORMAT = '{client_ip}:{client_port} - ' + \
    '{request_method} {server_host}:{server_port} -> ' + \
    '{upstream_proxy_host}:{upstream_proxy_port} - ' + \
    '{response_bytes} bytes - {connection_time_ms} ms'



class ProxyPoolPlugin(TcpUpstreamConnectionHandler, HttpProxyBasePlugin):
    """Proxy pool plugin simply acts as a proxy adapter for proxy.py itself.

    Imagine this plugin as setting up proxy settings for proxy.py instance itself.
    All incoming client requests are proxied to configured upstream proxies."""

    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)
        self._endpoint: Url = self._select_proxy()
        # Cached attributes to be used during access log override
        self._metadata: List[Any] = [
            None, None, None, None,
        ]

    def handle_upstream_data(self, raw: memoryview) -> None:
        self.client.queue(raw)

    def before_upstream_connection(
            self, request: HttpParser,
    ) -> Optional[HttpParser]:
        """Avoids establishing the default connection to upstream server
        by returning None.

        TODO(abhinavsingh): Ideally connection to upstream proxy endpoints
        must be bootstrapped within it's own re-usable and garbage collected pool,
        to avoid establishing a new upstream proxy connection for each client request.

        See :class:`~proxy.core.connection.pool.UpstreamConnectionPool` which is a work
        in progress for SSL cache handling.
        """
        

        # We don't want to send private IP requests to remote proxies
        try:
            if ipaddress.ip_address(text_(request.host)).is_private:
                return request
        except ValueError:
            pass
        if self._endpoint == None:
            return request
        # If chosen proxy is the local instance, bypass upstream proxies
        assert self._endpoint.port and self._endpoint.hostname
        if self._endpoint.port == 8899 and \
                self._endpoint.hostname in LOCAL_INTERFACE_HOSTNAMES + ANY_INTERFACE_HOSTNAMES:
            return request
        # Establish connection to chosen upstream proxy
        endpoint_tuple = (text_(self._endpoint.hostname), self._endpoint.port)
        logger.debug('Using endpoint: {0}:{1}'.format(*endpoint_tuple))
        self.initialize_upstream(*endpoint_tuple)
        assert self.upstream
        try:
            self.upstream.connect()
        except TimeoutError:
            raise HttpProtocolException(
                'Timed out connecting to upstream proxy {0}:{1}'.format(
                    *endpoint_tuple,
                ),
            )
        except ConnectionRefusedError:
            # TODO(abhinavsingh): Try another choice, when all (or max configured) choices have
            # exhausted, retry for configured number of times before giving up.
            #
            # Failing upstream proxies, must be removed from the pool temporarily.
            # A periodic health check must put them back in the pool.  This can be achieved
            # using a data structure without having to spawn separate thread/process for health
            # check.
            raise HttpProtocolException(
                'Connection refused by upstream proxy {0}:{1}'.format(
                    *endpoint_tuple,
                ),
            )
        logger.debug(
            'Established connection to upstream proxy {0}:{1}'.format(
                *endpoint_tuple,
            ),
        )
        return None

    def handle_client_request(
            self, request: HttpParser,
    ) -> Optional[HttpParser]:
        """Only invoked once after client original proxy request has been received completely."""
        print(request.build(for_proxy=True).decode('utf-8'))
        
        if not self.upstream:
            return request
        assert self.upstream
        # For log sanity (i.e. to avoid None:None), expose upstream host:port from headers
        host, port = None, None
        # Browser or applications may sometime send
        #
        # "CONNECT / HTTP/1.0\r\n\r\n"
        #
        # for proxy keep alive checks.
        if request.has_header(b'host'):
            url = Url.from_bytes(request.header(b'host'))
            assert url.hostname
            host, port = url.hostname.decode('utf-8'), url.port
            port = port if port else (
                443 if request.is_https_tunnel else 80
            )
        path = None if not request.path else request.path.decode()
        self._metadata = [
            host, port, path, request.method,
        ]
        # Queue original request optionally with auth headers to upstream proxy
        if self._endpoint.has_credentials:
            assert self._endpoint.username and self._endpoint.password
            request.add_header(
                httpHeaders.PROXY_AUTHORIZATION,
                b'Basic ' +
                base64.b64encode(
                    self._endpoint.username +
                    COLON +
                    self._endpoint.password,
                ),
            )
        self.upstream.queue(memoryview(request.build(for_proxy=True)))
        return request

    def handle_client_data(self, raw: memoryview) -> Optional[memoryview]:
        """Only invoked when before_upstream_connection returns None"""
        # Queue data to the proxy endpoint
        assert self.upstream
        self.upstream.queue(raw)
        return raw

    def handle_upstream_chunk(self, chunk: memoryview) -> Optional[memoryview]:
        """Will never be called since we didn't establish an upstream connection."""
        if not self.upstream:
            return chunk
        # pylint: disable=broad-exception-raised
        raise Exception("This should have never been called")

    def on_upstream_connection_close(self) -> None:
        """Called when client connection has been closed."""
        if self.upstream and not self.upstream.closed:
            logger.debug('Closing upstream proxy connection')
            self.upstream.close()
            self.upstream = None

    def on_access_log(self, context: Dict[str, Any]) -> Optional[Dict[str, Any]]:
        if not self.upstream:
            return context
        addr, port = (self.upstream.addr[0], self.upstream.addr[1]) \
            if self.upstream else (None, None)
        context.update({
            'upstream_proxy_host': addr,
            'upstream_proxy_port': port,
            'server_host': self._metadata[0],
            'server_port': self._metadata[1],
            'request_path': self._metadata[2],
            'response_bytes': self.total_size,
        })
        self.access_log(context)
        return None

    def access_log(self, log_attrs: Dict[str, Any]) -> None:
        access_log_format = DEFAULT_HTTPS_ACCESS_LOG_FORMAT
        request_method = self._metadata[3]
        if request_method and request_method != httpMethods.CONNECT:
            access_log_format = DEFAULT_HTTP_ACCESS_LOG_FORMAT
        logger.info(access_log_format.format_map(log_attrs))

    def _select_proxy(self) -> Url:
        """Choose a random proxy from the pool.

        TODO: Implement your own logic here e.g. round-robin, least connection etc.
        """
        return None
    

    
import proxy


    
if __name__ == '__main__':
  proxy.main(
    hostname=ipaddress.IPv4Address('0.0.0.0'),
    port=8899,
    plugins=[ProxyPoolPlugin],
    
  )

thanks.

@geckoy geckoy added the Bug Bug report in proxy server label Jul 31, 2024
@abhinavsingh abhinavsingh added Question Questions related to proxy server and removed Bug Bug report in proxy server labels Aug 1, 2024
@abhinavsingh
Copy link
Owner

@geckoy You don't see header because its an HTTPS request. You will need TLS interception if you want to inspect the entire request. You can enable TLS interception on your remote proxies. Unfortunately, ProxyPool currently don't work well together w/ TLS interception enabled locally. See #1368 for background.

@geckoy
Copy link
Author

geckoy commented Aug 6, 2024

@abhinavsingh thank you for your response.

@geckoy geckoy closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Questions related to proxy server
Projects
None yet
Development

No branches or pull requests

2 participants