Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Violation - Broken POST content [rt.cpan.org #75592] #69

Closed
oalders opened this issue Mar 31, 2017 · 3 comments
Closed

RFC Violation - Broken POST content [rt.cpan.org #75592] #69

oalders opened this issue Mar 31, 2017 · 3 comments

Comments

@oalders
Copy link
Member

oalders commented Mar 31, 2017

Migrated from rt.cpan.org#75592 (status was 'open')

Requestors:

Attachments:

From bbb@cpan.org on 2012-03-07 00:11:51:

Hello Gisle,

I accidentally upgraded to HTTP::Message 6.03 and now the urlencoding is
suddenly broken for special binary uploads. Rolling back to version 6.02
repairs all the problems. But the new version is corrupting too much!

From bbb@cpan.org on 2012-03-07 00:18:32:

I found the problem.

HTTP/Request/Common.pm:85:     # HTML/4.01 says that line breaks are
represented as "CR LF" pairs (i.e., `%0D%0A')
HTTP/Request/Common.pm:86:     $content =~ s/(?<!%0D)%0A/%0D%0A/g;

That's a nifty little negative lookbehind substitution, but "%0A" is not
a newline. In fact, it all three characters fit on one line. And
modifying already-encoded binary data isn't being helpful at all.

From bbb@cpan.org on 2012-03-07 00:26:45:

Here is the patch we are using in production that corrects the problem.

From gaas@cpan.org on 2012-03-11 11:23:03:

Since the new behaviour is to encode "\n" as "\r\n" for application/x�www�form�urlencoded parts 
this is clearly not transparant if you try to pass through binary data.  I can see that.  The new behaviour 
only make sense to text content (where you emulate an HTML form).

I don't know what you mean by RFC Violation.  The patch that introduced this behaviour claimed the 
the old behaviour conflicted with the HTTP RFC.  What RFC do you claim say otherwise?

Binary parts would normally be handled by using the multipart/form�data format instead.

Your patch is broken.  It breaks the test suite and works for you simply be effectively disabling the 
statement it affects.  Thus reverting to the old behaviour.

I think it ought to be possible use this interface and get transparent binary encoding as well.  Not quite 
sure how yet.

From bbb@cpan.org on 2012-03-12 02:48:58:

Gisle,

Thank you for your response!

I understand that HTTP RFC 2068 suggests that CRLF be used across the
wire to fully comply. And I believe it might even be appropriate to
convert multipart/form�data fields (that happen to be text/*) to CRLF
format so that everything across the raw transport is standard. But
there is nothing in RFC 1738 that suggests you can inject extra
characters during the encoding process for
application/x�www�form�urlencoded data. And "%0A" is just three digits
in plain text without any special characters so should be safe across
the wire as is, thus the violation referred to in my report.

However, I have seen SOME real browsers such as IE encode textarea
inputs with forced CRLF prior to encoding, which I believe your 6.03
will help emulate this behavior. I'm not sure what any RFC states about
that or where that idea even comes from or if that is even proper or
not. And if this behavior I've seen is proper, then yes, my patch would
break that.

Unfortunately, I'm not sure how to tell LWP::UserAgent or HTTP::Request
when I know that my content is not text and should not be corrupted in
this case. I just know that everything used to work fine for years
posting binary inputs, and suddenly after upgrading to 6.03 it is
broken. I guess I can double URL encode it on the client end and then
double decode it on the server end to workaround any encoding issues
with HTTP::Request, but that would require changes on both ends and
doesn't provide a slow migration path very easily. I'm just not sure
what the best solution is.

And which RFC are you claiming that the OLD behavior does not comply?

Or do you have any other suggestions?

From doherty@cpan.org on 2012-03-31 04:52:51:

This has caused a module of mine to malfunction under some circumstances. 
We calculate an md5 of the text we're sending to the server to ensure it 
isn't harmed in transit. Well, with 6.03 it is - the text the server 
receives no longer matches the md5.

I've attached a transcript of a shell session showing how I determined 
that this release is responsible. I don't know whether an RFC is 
violated, but I do know this change is causing problems.

From doherty@cpan.org on 2012-04-24 20:22:30:

Looking at http://www.w3.org/Protocols/rfc2616/rfc2616-
sec14.html#sec14.15 I see: "Conversion of all line breaks to CRLF MUST 
NOT be done" in reference to calculating a Content-MD5 header, which 
suggested to me that messing with my line endings was probably a bad 
idea.

Then, I saw: "HTTP allows transmission of text types with any of several 
line break conventions and not just the canonical form using CRLF." which 
prompted me to look at http://www.w3.org/Protocols/rfc2616/rfc2616-
sec3.html#sec3.7.1 which is more explicit.

I'm certainly an amateur at reading RFCs, but it looks like you're not 
required to convert text types in the body to CRLF. If that's the case, 
please consider reverting this change.

From eugenek@cpan.org on 2013-08-13 15:20:13:

Plus one to reverting conversion.

Not only prepending every 0A with 0D completely breaks binary data, it is also not clear that "x-www-form-urlencoded" content-type is not suited for it, as many cache servers and existing communications depend on it. 

Actually, any silent alteration (and this is alteration, not some sort of transfer-encoding - server gets data different from what was sent by client) of user data seems very unnatural to me, not to mention breaking compatibility with previous versions.

Issue warning (it is at most; I'm not sure that you got whole idea right).

From gfilatov@cpan.org on 2014-01-13 08:27:50:

Any progress here?
@gnatyna
Copy link

gnatyna commented Mar 6, 2024

Hi. Is there any chance to remove this content alteration?
line https://metacpan.org/release/OALDERS/HTTP-Message-6.45/source/lib/HTTP/Request/Common.pm#L92

            # HTML/4.01 says that line breaks are represented as "CR LF" pairs (i.e., `%0D%0A')
           $content =~ s/(?<!%0D)%0A/%0D%0A/g if defined($content);

seems to be unexpected.
imho HTML processor(browser, or some tools like browser) should prepare text content itself.

@oalders
Copy link
Member Author

oalders commented Mar 6, 2024

It looks like this change was introduced via 0045480.

oalders added a commit that referenced this issue Mar 6, 2024
@bbkr
Copy link

bbkr commented Oct 3, 2024

+1 for removing this weird and completly unexpected behavior of LF->CRLF autocorrection.

In my case it had financial consequences - when using this library for sending SMSes for client he got charged twice the expected amount. Because estimated cost was based on actual message length and this stupid replacement changed newlines which caused SMS to be too long to fit in single part message. Not to mention spending almost 2 hours for "WTF debugging".

Please NEVER modify user data implicitly.

oalders added a commit that referenced this issue Oct 3, 2024
@oalders oalders closed this as completed in 968789d Oct 4, 2024
oalders added a commit that referenced this issue Oct 7, 2024
    - Stop transforming LF into CRLF. Fixes #69 (GH#196) (Olaf Alders)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants