Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting 401 #11

Open
zinyosrim opened this issue Jan 22, 2018 · 5 comments
Open

getting 401 #11

zinyosrim opened this issue Jan 22, 2018 · 5 comments

Comments

@zinyosrim
Copy link

zinyosrim commented Jan 22, 2018

I installed aquarium with the default options. I looks fine, I get lots of messages like:

splash0_1  | 2018-01-22 12:11:06.702841 [-] "172.21.0.6" - - [22/Jan/2018:12:11:05 +0000] "GET / HTTP/1.0" 200 7677 "-" "-"
splash1_1  | 2018-01-22 12:11:07.039685 [-] "172.21.0.6" - - [22/Jan/2018:12:11:06 +0000] "GET / HTTP/1.0" 200 7677 "-" "-"
splash2_1  | 2018-01-22 12:11:07.442394 [-] "172.21.0.6" - - [22/Jan/2018:12:11:06 +0000] "GET / HTTP/1.0" 200 7677 "-" "-"

I'm running locally on mac, therefore my settings.py has the line SPLASH_URL = 'http://localhost:8050'

When launching the crawler I'm getting

2018-01-22 12:51:45 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 http://www.example.com>: HTTP status code is not handled or not allowed

and I can't see the request in the splash terminal. How can I fix this?

Thanks
Zin

@zinyosrim
Copy link
Author

I fixed this by adding

http_user = 'user'
http_pass = 'userpass'

as class attributes of my spider

@chipzzz
Copy link

chipzzz commented Dec 15, 2020

@zinyosrim , where did you add this?

@zinyosrim
Copy link
Author

@chipzzz here:
grafik

@chipzzz
Copy link

chipzzz commented Dec 15, 2020

@zinyosrim Thanks!

In another issue I found this to be working for me

yield SplashRequest(url, callback=self.parse, endpoint='render.html',
             meta={'start_url': url}, args={'wait': 3}, splash_headers={'Authorization': basic_auth_header('user', 'userpass')})

Still no luck on getting these requests to go through my 3rd party proxy though.

@tayyab-elahi
Copy link

I fixed this by adding

http_user = 'user'
http_pass = 'userpass'

as class attributes of my spider

Not working for me though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants