Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to disable HAProxy authentication #6

Open
onurakman opened this issue Mar 31, 2017 · 3 comments
Open

How to disable HAProxy authentication #6

onurakman opened this issue Mar 31, 2017 · 3 comments

Comments

@onurakman
Copy link

How can I disable HAProxy authentication (by manully editing haproxy.cfg) because of HAProxy passes its authentication info to the site and the site returns 401

@candale
Copy link

candale commented Jun 14, 2017

What I did is commented the following in the haproxy.cfg file:

...
# Splash Cluster configuration
frontend http-in
    bind *:8050
#
#    http basic auth
#    acl auth_ok http_auth
#    http-request auth realm Splash if !auth_ok
#    http-request allow if auth_ok
#    http-request deny
#
   # don't apply the same limits for non-render endpoints
    acl staticfiles path_beg /_harviewer/
    acl misc path / /info /_debug /debug

    use_backend splash-cluster # if auth_ok !staticfiles !misc
    use_backend splash-misc # if auth_ok staticfiles
    #  use_backend splash-misc # if auth_ok misc
...

@nirvana-msu
Copy link

nirvana-msu commented Dec 10, 2017

This is incredibly annoying.. These credentials are meant to be private, so one could limit access to Splash instance facing the internet. Instead these credentials get forwarded to every website you crawl. Unless you additionally use a proxy, you're effectively letting everyone know where your Splash instance is and what its credentials are... Why is it even done this way? Clearly looks like a bug to me.

P.S. To test this, you can just crawl https://httpbin.org/headers and check response body (it simply mirrors the headers). You'll see your Splash credentials in Authorization header.

Is there a workaround to still use HTTP Basic Auth for Splash, but do not pass these credentials onto the website you crawl?

UPDATE Ok, I solved this by setting Authorization header in request.meta['splash']['splash_headers'], instead of directly in request headers as done by HttpAuthMiddleware. I believe that advice to use HttpAuthMiddleware is very dangerous and should be removed from documentation / README. The correct way is clearly to set these credentials via splash_headers.

@chipzzz
Copy link

chipzzz commented Dec 15, 2020

@nirvana-msu setting those got me past the 401 code, THANK YOU. although, it looks like it still does not go through scraper api proxy :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants