Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest rvest+httr combo fails to submit properly forms in some cases #133

Closed
etabeta78 opened this issue Jan 20, 2016 · 7 comments
Closed

Comments

@etabeta78
Copy link

by using latest rvest + httr on CRAN, I experience "Internal Server Error 500" when submitting some forms with submit_form

for instance, if I run

session1 <- html_session("http://www.imdb.com/search/title")
my_form <- html_form(session1)[[2]]
my_form <- set_values(my_form, "title"="Frozen")
submit_form(session1, my_form)

I get such an error.

The problem is still present if I upgrade httr to the latest github source (via devtools)
OTOH, the problem disappears if I downgrade httr to 0.6.1, so that I'm lead to think it might be related to one of the many changes occurred in the passage to 1.0.0 (might it be some detail in the RCurl -> curl migration? or something completely different? so far I've not been able to narrow down the list of possible culprits :-( )

@alex23lemm
Copy link

Hadley,
I stumbled over the same issue 2 days ago and could also narrow it down to httr 1.0.0.

Since over a year I use rvest to log in to several corporate systems which do not provide an API.
After updating from httr 0.6.1 to 1.0.0 all of the logins using submit_form stopped working.

I was able to get the logs from one of our servers which shows the following:

  • The login is successful
  • After that a redirect to '/' follows which is the normal and correct behavior
  • Then comes the error: Instead of making a GET on '/' rvest/httr makes a POST

I could verify this on the client side using with_verbose. Because everything works fine with rvest 0.3.1/httr 0.6.1, I am just posting the relevant with_verbose(submit_form)) snippets for the rvest 0.3.1/httr 1.0.0 combo which illustrate the issue below. Even though curl's informational text says "Switch from POST to GET" the second request is a POST again:

Submitting with 'login'
*  Found bundle for host www.openair.com: 0x3a46a290
*  Re-using existing connection! (#0) with host www.openair.com
*  Connected to www.openair.com (64.89.44.170) port 443 (#0)
-> POST /index.pl HTTP/1.1
-> Host: www.openair.com
-> User-Agent: libcurl/7.43.0 r-curl/0.9.4 httr/1.0.0
-> Accept-Encoding: gzip, deflate
-> Accept: application/json, text/xml, application/xml, */*
-> Content-Type: application/x-www-form-urlencoded
-> Content-Length: 733
...
*  upload completely sent off: 733 out of 733 bytes
<- HTTP/1.1 302 Found
<- Date: Wed, 20 Jan 2016 18:04:01 GMT
<- Server: Apache
<- P3P: policyref="/w3c/p3p.xml", CP="CP="NOI DSP COR NID TAIa OUR NOR""
<-
...
<- 
*  Ignoring the response-body
*  Connection #0 to host www.openair.com left intact
*  Issue another request to this URL: 'https://www.openair.com/dashboard.pl?app=ma;_login=1;   uid=WN4gL6C__5RGzyvkdrwHpmQ;r=prmPsGJ1048'
*  Switch from POST to GET
*  Found bundle for host www.openair.com: 0x3a46a290
*  Re-using existing connection! (#0) with host www.openair.com
*  Connected to www.openair.com (64.89.44.170) port 443 (#0)
-> POST /dashboard.pl?app=ma;_login=1;uid=WN4gL6C__5RGzyvkdrwHpmQ;r=prmPsGJ1048   HTTP/1.1

@etabeta78
Copy link
Author

Just a small note to remark that the problem persists after updating httr to latest (1.1.0)

@hadley
Copy link
Member

hadley commented May 20, 2016

Probably the same as r-lib/httr#368

@hadley
Copy link
Member

hadley commented May 20, 2016

Or maybe actually r-lib/httr#356

@antoine-lizee
Copy link

antoine-lizee commented May 21, 2016

Definitely r-lib/httr#356. At the bottom of your verbose output is the signature of the problem that I ran into too:

....
*  Switch from POST to GET
....
-> POST /dashboard.pl?app=ma;_login=1;uid=WN4gL6C__5RGzyvkdrwHpmQ;r=prmPsGJ1048   HTTP/1.1

It's supposed to switch from POST to GET after the redirects and pretends to, but because we force the CUSTOM_REQUEST, it still issues a POST... I found the answer buried into the cURL doc :-).

@hadley hadley closed this as completed May 21, 2016
@etabeta78
Copy link
Author

chiming in just to confirm that it definitely is r-lib/httr#356
thank you all for narrowing out the exact problem!

@mikkelkrogsholm
Copy link

What was the solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants