Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cookies.py why replace escaped quote with nothing? #3759

Open
saft opened this issue Dec 8, 2016 · 21 comments
Open

cookies.py why replace escaped quote with nothing? #3759

saft opened this issue Dec 8, 2016 · 21 comments

Comments

@saft
Copy link

saft commented Dec 8, 2016

https://github.com/kennethreitz/requests/blob/master/requests/cookies.py#L346

Why do this? An escaped quote can be sent in the case of sending json in the cookie and this code mangles that json. I'm raising this because the http library seems to think that formatting the json this way on the server is the right way to do it. See http/cookies.py function _quote line 225 in my 3.4 version.

@sigmavirus24
Copy link
Contributor

Can you provide an example of a server sending JSON in the cookie header? The RFCs (6265) for Cookies are explicit and JSON really doesn't fit into the cookie header appropriately.

@saft
Copy link
Author

saft commented Dec 9, 2016

The following library does it quite explicitly. I'm not sure how stable it is but it is referenced in the aiohttp documentation.
https://github.com/aio-libs/aiohttp-session/blob/master/aiohttp_session/__init__.py#L219

@sigmavirus24
Copy link
Contributor

I'm raising this because the http library seems to think that formatting the json this way on the server is the right way to do it.

So you seemed to mention that there was a server sending JSON in the set-cookie header. Now you're pointing at another library? I don't see how that library is storing JSON in a cookie at all. Do you mean that it allows for it?

@saft
Copy link
Author

saft commented Dec 9, 2016

Maybe this line helps to illustrate where aiohttp_session puts json into the cookie:
https://github.com/aio-libs/aiohttp-session/blob/master/aiohttp_session/__init__.py#L245

aiohttp_session uses aiohttp which uses http/cookies.py. The server is my own.

To look at it from the other perspective, why remove all escaped quotes from cookie data - 6265 doesn't disallow that?

@Lukasa
Copy link
Member

Lukasa commented Dec 9, 2016

As always, the easiest thing to do when you're trying to work out the answer to a problem like this is to just look at the Git logs. If you do, they direct you to #1440.

The TL;DR is that unescaping all quotes was probably unnecessary. The goal of the original patch was simply to remove enclosing quote marks, but we took a (probably a bit too aggressive) shortcut. I'm open to a patch that changes that behaviour.

@saft
Copy link
Author

saft commented Dec 9, 2016

From #1440 I can't figure out what the end intent of it was. Perhaps to be out-of-the-box compatible with Django? Certainly now set_cookie actually keeps the enclosing quote marks.

On further analysis, RFC6265 does seem to say that backslash and double quotes are invalid characters. However what set_cookie does is to just remove all occurrences of \".

I would love to try to improve that bit of code but I'm not sure what I should be trying to improve it towards.

@Lukasa
Copy link
Member

Lukasa commented Dec 9, 2016

@saft The end intent was to unescape escaped quote marks that surround a cookie value. These are almost certainly in error. However, we seem to have done that by hitting the problem with a sledgehammer by just unescaping all quotes, which was probably unnecessary.

So the goal would be to craft a change here that uses a regular expression to only unquote quote marks at the start and end of the cookie value.

@saft
Copy link
Author

saft commented Dec 9, 2016

Are you saying the the intent was to remove parts 2 and 4 of cookie value below if they exist?

 1   2    3    4   5
["][\\"][...][\\"]["]

If so why do you want to do that as a special case?

@piotr-dobrogost
Copy link

I think the intent was to change this
[\"]...[\"]
to this
["]...["]

@Lukasa
Copy link
Member

Lukasa commented Dec 9, 2016

@piotr-dobrogost is correct.

@saft
Copy link
Author

saft commented Dec 9, 2016

OK, I think I'm out of place to continue to question (or maybe not in this forum) so please suggest we discontinue the conversation if you think so too.

My curiosity isn't quite satisfied though. The intent is not in line with RFC6265 so why is it desirable?

@Lukasa
Copy link
Member

Lukasa commented Dec 9, 2016

@saft I'm not sure what you mean: we're agreeing that the current behaviour is wrong. We're just trying to discuss what the fix should be.

As to "the intent is not in line with RFC 6265": sure it is. RFC 6265 allows surrounding DQUOTE characters. This is to allow the possibility that someone has mistakenly escaped the DQUOTE values, which is forbidden by RFC 6265.

However, it does end up seeming like this might be superfluous to needs. Right now I'm not able to reproduce a situation in which that behaviour is required.

@piotr-dobrogost
Copy link

piotr-dobrogost commented Dec 9, 2016

What is allowed in a cookie value has been traditionally very tricky question. I think the intent was to tolerate reasonably looking cookies. However now that there's good answer to the question what is allowed (in the form of RFC 6265) there's no reason to tolerate such invalid cookies anymore. If so then there's no need to unescape anything as any cookie's value including backspace or double quote is invalid. The only problem is backwards compatibility which prevents library from breaking some behavior even though it might have been wrong in the first place. This could be the reason to fix this bug by bringing behavior to what was intended (to tolerate escaped double quotes when they start and end the cookie's value) and not to what is allowed by current standard (RFC 6265).

RFC 6265 allows surrounding DQUOTE characters.

You mean RFC 6265 allows the cookie's value to be surrounded I guess :)

@piotr-dobrogost
Copy link

piotr-dobrogost commented Dec 9, 2016

Yes, they are missing a comma (and the last comma in this line is superfluous :)).

According to RFC 2965:

cookie-value    =  NAME "=" VALUE [";" path] [";" domain] [";" port]
VALUE           =  value
value           =  token | quoted-string

and RFC 2616 defines:

quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext         = <any TEXT except <">>
quoted-pair    = "\" CHAR

token          = 1*<any CHAR except CTLs or separators>
separators     = "(" | ")" | "<" | ">" | "@"
               | "," | ";" | ":" | "\" | <">
               | "/" | "[" | "]" | "?" | "="
               | "{" | "}" | SP | HT

You say that library does not allow \\" substring. The question is if it's allowed by cited grammar rules. If it's not allowed then I think the library is right to disallow this.

@saft
Copy link
Author

saft commented Dec 9, 2016

I think the intent was to change this
["]...["]

On second thought I don't agree with this because of:
if hasattr(cookie.value, 'startswith') and cookie.value.startswith('"') and cookie.value.endswith('"'):

What is allowed in a cookie value has been traditionally very tricky question.

+1

However now that there's good answer to the question what is allowed (in the form of RFC 6265) there's no reason to tolerate such invalid cookies anymore.

Perhaps. I think this set_cookie is arbitrary. I'm still unconvinced of any justification for any stated intent so far.

@Lukasa
Copy link
Member

Lukasa commented Dec 9, 2016

Perhaps. I think this set_cookie is arbitrary. I'm still unconvinced of any justification for any stated intent so far.

And?

No-one is disagreeing with the idea that this method is not achieving what it set out to. I am explaining the train of thought that led us here. I am not asserting that it was correct or useful, only that it's the reasoning. I am explaining this because your original issue was a question: "why do this?". I have pointed you to the why.

However, both myself and @piotr-dobrogost have already said that we are open to changes in behaviour here. I don't know who you're trying to convince: there is no-one here arguing that the status quo is good. I feel like you're arguing against a non-existent opponent.

The TL;DR is this: this method seems like it's a waste of everyone's time, and probably should be removed. However, as the Requests project has learned all too clearly, actually removing it is likely to be a backward incompatible change. There are some hopes that we will remove or drastically rewrite this entire module, though, so I suspect this will go away.

@saft
Copy link
Author

saft commented Dec 9, 2016

OK, my apologies. I see that I'm coming across argumentative to you. My intent was purely to educate myself and to help if I could at all. However, I hear you and am ready to drop this. You guys have been really really helpful. 👍

To be totally open and honest I don't feel like there was a coherent intent or 'why' explained. That is why I pursued it.

But, I did hear you say that you were open to changes. I doubly hear you that it will probably be a backward incompatible change for some. If we really understood the 'why' of why it was added and who would be burned we might be able to mitigate that.

@Lukasa
Copy link
Member

Lukasa commented Dec 9, 2016

@saft Seems like a traditional case of crossed wires due to text-based communication. Thankyou for apologising: I'm sorry as well, I clearly misread your tone.

Back to being productive. 👍

So, I think this is wrong. I am absolutely not an expert on cookies: they are the part of HTTP I know least about. However, as I understand from some of my collaborators on this project, our cookie stack is ultimately not very good. Part of this is because of the involvement of the standard library, whose cookie stack is also not very good.

So right now the cookie stack was worked on to the point where all the complaints went away. And they have! The cookie functionality is one of the least-complained-about parts of Requests. So that leads to a strong inclination to avoid making incremental changes to it, and to instead more-or-less leave it alone until such time as we can totally replace it.

Ultimately, I think the reason this patch was added was a sense of defensiveness against a problem that didn't fully exist. An enormous part of that is my responsibility: I reviewed and merged #1440, and ultimately probably should have done more investigating to prove that the problem existed. Certainly, trying today, I can't find any immediate adverse affects from removing that block of code.

However, now that that has happened, there is a problem of inertia. Ultimately, it doesn't seem to be hurting many people. It's entirely circumventable, because we allow plugging in other cookiejar objects, so anyone it does inconvenience can nonetheless work around it. And it has been there for three years without so much as a whimper.

My strong inclination, then, is to leave it alone until such time as we consider a substantial change to the cookie stack. At that time, we can also include substantial testing of the cookie functionality so that we can do a much better job of indicating why certain bits of functionality are present.

@piotr-dobrogost
Copy link

piotr-dobrogost commented Dec 9, 2016

I see you're trying to get to the truth (even if it's only the historic truth :)) and not only I understand this but I applaud you for doing this. I'm with you because I have exactly the same attitude :) Once upon a time I was curious what can be escaped (if I remember well) in cookie-value and I raised issue at https://github.com/sashahart/cookies. However as you can see this repo has since moved and the Issues section is no more available :( That's a pity because I think you might find this issue and discussion there interesting :) I emailed Sasha asking if he could show Issues section again.

@PedanticHacker

This comment has been minimized.

@GoldenPalazzo
Copy link

just add

del requests.cookies.RequestsCookieJar.set_cookie

if you want to use the cookie as you get from the request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants