Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding messages introduces LF characters. #1190

Closed
paho-outreach opened this issue Dec 22, 2017 · 20 comments
Closed

Encoding messages introduces LF characters. #1190

paho-outreach opened this issue Dec 22, 2017 · 20 comments

Comments

@paho-outreach
Copy link

In #1113, QuotedPrintable.encode was changed from

::Mail::Utilities.to_crlf([::Mail::Utilities.to_lf(str)].pack("M"))

to

[str].pack("M")

The problem I'm running into is that pack introduces LF characters, so encoding a message > 998 characters creates something that is not valid per rfc2822, even if you only had CRLF line-endings before.

I see that #1157 prevents attachments from being forced into quoted-printable, and attachment corruption was the motivation for this change in #1113. Would you be open to restoring the old behavior, or does more need to be done in attachment land first?

@Fjan
Copy link

Fjan commented Dec 29, 2017

I just ran into another side effect of #1113: The LF line endings are now encoded as =0D in quoted printable messages and I'm seeing a lot of additional empty lines in plain text email messages. I think what happens is that email clients will add a new line for each encoded \r (perhaps it only does this if there is a mix of both \n and \r\n line endings?).

Browsers submit text area fields with \r\n endings in Rails and Ruby uses \n endings by default so there's a going to be a lot of effort to ensure we don't get any mixed line endings in there, I would be in favour of having a way to get the old behaviour back.

@jeremy
Copy link
Collaborator

jeremy commented Mar 2, 2018

The problem I'm running into is that pack introduces LF characters, so encoding a message > 998 characters creates something that is not valid per rfc2822, even if you only had CRLF line-endings before.

@paho-outreach Could you show an example message or code snippet that demonstrates this?

The LF line endings are now encoded as =0D in quoted printable messages and I'm seeing a lot of additional empty lines in plain text email messages.

@Fjan Could you provide an example?

I've added a pull request which restores \n normalization at #1210. Please give it a shot as well.

@vihai
Copy link

vihai commented Mar 5, 2018

This is what is affecting me, it seems to be related to this issue:

> puts ::Mail.new(body: "€\n1234567890123456789012345678901234567890123456789012345678901234567890123\n", charset: 'UTF-8').encoded
Date: Tue, 06 Mar 2018 00:10:47 +0100
Message-ID: <5a9dce77d36b8_17a2ad9b5be30d4315fa@linobis.mail>
Mime-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=E2=82=AC
1234567890123456789012345678901234567890123456789012345678901234567890123=

> puts ::Mail.new(::Mail.new(body: "€\n123456789012345678901234567890123456789012345678901234567890123456789012\n", charset: 'UTF-8').encoded).encoded
Date: Tue, 06 Mar 2018 00:12:15 +0100
Message-ID: <5a9dcecfb601c_17a2ad9b5be30d432019@linobis.mail>
Mime-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=E2=82=AC=0D
123456789012345678901234567890123456789012345678901234567890123456789012=0D=

> puts ::Mail.new(::Mail.new(::Mail.new(body: "€\n123456789012345678901234567890123456789012345678901234567890123456789012\n", charset: 'UTF-8').encoded).encoded)
Date: Tue, 06 Mar 2018 00:12:37 +0100
Message-ID: <5a9dcee594ee_17a2ad9b5be30d4321a0@linobis.mail>
Mime-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8
Content-Transfer-Encoding: quoted-printable

=E2=82=AC=0D
123456789012345678901234567890123456789012345678901234567890123456789012=0D=
=0D

After parsing and encoding this specific message two times it ends up with two \r\r that the browser inteprets as two lines.

Also DKIM and S/MIME signatures get invalidated as the message is effectively changed.

@Fjan
Copy link

Fjan commented Mar 6, 2018

Hi Jeremy,
This is an excerpt of a plain text email generated by mail 2.6.6:

Details of the new appointment:
When           	: Fri 9/3/2018 15:00 to 16:00
Status         	: Approved (unlimited credit) 
ID             	: 37211227

This is an excerpt of an email generated by mail 2.7.0:

When           	: Fri 9/3/2018 16:00 to 17:00 =0D
Status         	: Approved (unlimited credit)  =0D
ID             	: 37211228=0D

We can remove the extra \r from the source code, however, some of the fields are provided by the user and browsers submit text area fields with \r\n line endings so it would require filtering all text area fields in every rails app. Note that it is not a problem in HTML messages as those are not white space sensitive. It looks like your fix should do the job, thnx.

@paho-outreach
Copy link
Author

Here's a simple example:

require 'mail'

mail = Mail.new do
  # need > 998 characters to trigger line-wrapping
  body 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
end

p mail.encoded

Outputs:

"Date: Tue, 06 Mar 2018 08:09:03 -0800\r\nMessage-ID: <5a9ebd1feab47_135e2b181ec490d025463@worktop.mail>\r\nMime-Version: 1.0\r\nContent-Type: text/plain;\r\n charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\naaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa=\n"

Pre 2.7.0, the line-wrapping characters would be =\r\n.

@uhrohraggy
Copy link

Hi, I had opened this within Rails at rails/rails#32836 but was helped to realize this was a separate gem. I used the default mail version 2.7 that comes with rails new, but am still finding this is occurring after upgrading to 2.7.1.rc1. The sample forkable app is at https://github.com/uhrohraggy/email-bug - that app still needs either action_mailer config or a SendGrid account, but in case it's helpful.

@uhrohraggy
Copy link

FYI I opened bug in Ruby https://bugs.ruby-lang.org/issues/14741 - correct me if I'm wrong, but [].pack('M') should return CRLF based on the spec, not just LF

@nicosticht
Copy link

Updated by shyouhei (Shyouhei Urabe) 9 months ago

See also issue #14352. We have decided to reject this change. The behaviour is now documented.

To those who don't read Japanese let me summarize: Ruby's pack behabiour was from Perl's and Perl also don't emit CRLF. This is supposedly because in UNIX variants, emails are written in LF only and, to convert them to CRLF is done in sendmail(1). We don't want to break things here.

So what's the best solution for mail gem?

@Fjan
Copy link

Fjan commented Jan 25, 2019

@nicosticht I think we have to wait for #1210 to get merged, but until then downgrading the mail gem to version 2.6.6 works for me

schleuderrr pushed a commit to schleuder/schleuder that referenced this issue Feb 8, 2019
It is an unresolved issue with 'mail'
<mikel/mail#1190> that it started to contain
carriage returns. It's a little unclear why they don't fix it, but it's
a minor problem which we can work around.
schleuderrr pushed a commit to schleuder/schleuder that referenced this issue Feb 10, 2019
It is an unresolved issue with 'mail'
<mikel/mail#1190> that it started to contain
carriage returns. It's a little unclear why they don't fix it, but it's
a minor problem which we can work around.
kfrz added a commit to department-of-veterans-affairs/vets-api that referenced this issue Feb 27, 2019
* Mailer specs were failing, due to \r being added to lines.
  See: mikel/mail#1190
kfrz added a commit to department-of-veterans-affairs/vets-api that referenced this issue Apr 9, 2019
* Update rails to 5.0.7.2

* Update locale modules/**/*.gemspec to point to rails 5.0.7.1

* Update modules meta information to point to va.gov

* Update activerecord-postgis-adapter version to >= 4.0.x

* Update nokogiri version to >= 1.10.1

* Add rails 5.0 initializers

* Disable config.action_controller.raise_on_unfiltered_parameters setting

* Use relative requires in rails 5 way

* Add generated binstubs for update and setup

* Disable raise on missing callback to fix specs not running

* Update db/schema.rb with rails 5 style

* Add ApplicationRecord model

* Update models to inherit from ApplicationRecord

* fix in progress form cast

* update database type cast for clean uuid

* dont send strings for hash attributes

* add deep transform parameters method in app controller

* fix supporting evidence upload swagger spec

* Add utf8-cleaner gem

* Rails 5.0.0 introduced code in ActionDispatch to catch invalid
  UTF-8 querystring params, and raise a ActionController::BadRequest
  error when present. For more information, see the commit at:
  rails/rails@59ab2d1#diff-a2247cc200a3f9cd3d0e5ef8d76c4df8

* utf8-cleaner gem introduces a middleware layer to rack requests
  that will sanitize rack input, avoiding UTF8 errors altogether

* Remove #safe_encoded_params from GI::InstitutionsController

* Update specs to expect a successful request if sent invalid UTF8
string, as it should be sanitized by rack

* Update module callbacks to declare raise: false

* Render header only responses

* Refactor UsersController test to use consistent matcher

* Assign id param in V0::Facilities::CcpController#validate_id instead of mutating it

* Pin mail gem to 2.6.6 to avoid generating LF characters

* Mailer specs were failing, due to \r being added to lines.
  See: mikel/mail#1190

* Remove unneeded assets initializer

* Uptick rubocop target rails version

* DRY up appeals_api request spec

* move deep transform parameters out of app controller (#2842)

* Update migrations to specify from which Rails version they were spawned

* turn on belongs_to_required_by_default

* fix async stale scope spec

* fix rails 5.0 deprecation warnings  (#2856)

* move deep transform parameters out of app controller

* remove raise in transactional callbacks

* dont use strings for middleware

* permit all on MessagingPreferencesController

* dont need to return false if error is added

* permit params in pciu controller

* permit in health records ctrl

* permit params for military ranks ctrl

* fix async base spec

* use `#to_hash` in prescriptions finder

* permit in prescrip pref ctrl

* permit pciu address fields

* dont permit all in health records ctrl

* dont permit all in messaging pref ctrl

* send array to health records ctrl

* dont permit all in filterable

* dont raise on deprecation

* dont halt callbacks on return false

* turn on raise_on_unfiltered_parameters

* remove raise_on_unfiltered_parameters config as it's causing console errors and will have not effect once we're >= rails 5.1 (#2926)

* Match SessionsController spec to master

* Filter HashHelpers coverage
schleuderrr pushed a commit to schleuder/schleuder that referenced this issue Apr 15, 2019
It is an unresolved issue with 'mail'
<mikel/mail#1190> that it started to contain
carriage returns. It's a little unclear why they don't fix it, but it's
a minor problem which we can work around.
@Fjan
Copy link

Fjan commented Jul 27, 2019

As far as I can tell the encoded LF produced by the Quoted Printable output of [str].pack("M") does not comply with RFC2045 which defines quoted printable attachments:

A line break in a text body, represented
as a CRLF sequence in the text canonical form, must be
represented by a (RFC 822) line break, which is also a
CRLF sequence, in the Quoted-Printable encoding.

and

Any octet, except a CR or LF that is part of a CRLF line break of the canonical
(standard) form of the data being encoded, may be represented by an "=" followed by a two digit hexadecimal representation

The original implementation did this correctly, so I would suggest either reverting or doing [str].pack("M").gsub('=0D',"\r") to be compliant with the standard.

@dlouzan
Copy link

dlouzan commented Jan 19, 2020

@jeremy We have been confronted with exactly this issue when implementing S/MIME signatures for Gitlab. When Unicode characters and quoted-printable is involved, there's a mismatch between the encoded values used for generating the PKCS signature and what is actually generated in the email, that makes the signatures invalid. All of this works when using just ASCII and 7bit.

Our basic approach is an interceptor that does something like below (the incoming message is multipart with text and html parts, file samples available in a link at the end):

def delivering_email(message)
  signed_message = Gitlab::Email::Smime::Signer.sign(
    cert: certificate.cert,
    key: certificate.key,
    data: message.encoded)

  signed_email = Mail.new(signed_message)

  overwrite_body(message, signed_email)
  overwrite_headers(message, signed_email)
end

def overwrite_body(message, signed_email)
  # since this is a multipart email, assignment to nil is important,
  # otherwise Message#body will add a new mail part
  message.body = nil
  message.body = signed_email.body.encoded
end

def overwrite_headers(message, signed_email)
  message.content_disposition = signed_email.content_disposition if signed_email.content_disposition
  message.content_transfer_encoding = signed_email.content_transfer_encoding
  message.content_type = signed_email.content_type
end

I've traced down this to exactly the extra =0D entries incorrectly converted in the decode/encode steps. I was actually able to edit some of our emails and remove those values, and the signatures become valid. But the only solution that is working consistently is downgrading to 2.6.6, which we'd like to avoid.

I also tried some of the patches listed in this ticket, #1325, and your PR #1210. It improves things but does not solve all cases; some of those patches solve the text/html part but I still have issues in the text/plain part. What looks like to be the main problem is that the charset included in the Content-Type seems to be ignored in the crlf conversions by Mail::Encodings::QuotedPrintable class.

For the record, I've checked older email notifications from gitlab.com, and this issue with =0D extra characters has been around for a long time, I think nobody noticed until now because no signatures where involved. It might also be my misunderstanding of the RFC ofc.

In the corresponding Gitlab issue there's also sample message files pre- and post 2.7: https://gitlab.com/gitlab-org/gitlab/issues/197386#note_273117603

Would be really grateful if you could give us your assessment.

Thanks!

@dlouzan
Copy link

dlouzan commented Jan 27, 2020

A sample of the failing behaviour:

require 'mail'

email = Mail.new do
  to      'nicolas@test.lindsaar.net.au'
  from    'Mikel Lindsaar <mikel@test.lindsaar.net.au>'
  subject 'This will break encoded'
  content_type 'text/plain; charset=UTF-8'
  content_transfer_encoding 'quoted-printable'
  body "0123456789012345678901234567890123456789012345678901234567890123456789\n"
end

email.body.raw_source
=> "0123456789012345678901234567890123456789012345678901234567890123456789\r\n"

email.body.encoded
=> "0123456789012345678901234567890123456789012345678901234567890123456789=0D=\n\n"

Mail.new(email).body.encoded
=> "0123456789012345678901234567890123456789012345678901234567890123456789=0D=\n=0D\n"

This would be encoded to 71 chars, though with quoted-printable that becomes 70 + 5:

  • 70 chars for the number chars
  • the \n at the end is translated to \r\n in raw form, then to =0D\n= because of the conversion of \r to encoded =0D, plus the = for the soft wrap
  • then the extra \n at the newline soft-wrapped

The endlines for each that should be \r\n are all over broken after that. When afterwards creating a new Mail object with the input of the original object, the same broken conversion is applied a second time. This is what breaks our signatures.

@mbedarff
Copy link

mbedarff commented Sep 7, 2020

I ran into the same issue sending mails with rails and the mail gem in version 2.7.1. Mails containing special chars are sent as quoted printable and all \n new lines are outputted as:
...=0D\r\n

This additional =0D leeds to ugly extra new line in Outlook.

By debugging I found out that the raw souce of the body changed between versions 2.7.0 and 2.7.1:
In version 2.7.0 the new lines stay \n, in version 2.7.1 the new lines are converted to \r\n before.

This happens because of calling ::Mail::Utilities.to_crlf on the input string in the initialize function of the Mail::Body class. This was changed with this commit: Restore LF line ending parsing

May be this conversion needs to be undone in the Mail::Encondings::QuotedPrintable.encode function by calling ::Mail::Utilities.to_crlf on str before using pack("M") to encode the raw source.

@Fjan
Copy link

Fjan commented Sep 7, 2020

This issue is fairly simple: this gem switched to using pack("M") for encoding, but unfortunately there is a bug in Ruby in that means it's not compliant with the RFC with regard to encoding new lines. The bug is marked "won't fix" in ruby because it would be too disruptive to change it after all these years, which I can understand. The only way forward is to revert the change in the gem to work around the new line encoding so it's compliant with the RFC again.
This monkey patch does that, you can simply drop it in your initialisers directory to get the old behaviour back.

module Mail
  module Encodings
    class QuotedPrintable < SevenBit
      def self.encode(str)
        ::Mail::Utilities.to_crlf([::Mail::Utilities.to_lf(str)].pack("M"))
      end
    end
  end
end

@mbedarff
Copy link

mbedarff commented Sep 7, 2020

@Fjan Thank you for sharing your monkey patch and explanations.

I converted it to an module that can be prepended to QuotedPrintable and reuses the super method of the original encode method:

module Mail
  module Encodings
    module EncodeQuotedPrintableWithoutCR
      def encode(str)
        ::Mail::Utilities.to_crlf(super(::Mail::Utilities.to_lf(str)))
      end
    end
  end
end

Mail::Encodings::QuotedPrintable.singleton_class.prepend(Mail::Encodings::EncodeQuotedPrintableWithoutCR)

This module is applied as an core extension during initialization of my app.

@jeremy
Copy link
Collaborator

jeremy commented Mar 10, 2021

Fixed by #1210.

I also tried some of the patches listed in this ticket, #1325, and your PR #1210. It improves things but does not solve all cases; some of those patches solve the text/html part but I still have issues in the text/plain part. What looks like to be the main problem is that the charset included in the Content-Type seems to be ignored in the crlf conversions by Mail::Encodings::QuotedPrintable class.

@dlouzan Could you share the cases that #1210 does not solve for you? Perhaps as separate new issues.

@jeremy jeremy closed this as completed Mar 10, 2021
@Nitrodist
Copy link

@jeremy can we see a release of 2.7.2, please? We are looking at using the monkey patch above in the mean time.

@elalemanyo
Copy link

Hi,
I am using the monkey patch and ugly extra new line in Outlook are gone 🙏.
But I have noticed some encoding issues, for example ü becomes =C3=BC. Is this normal? Any way to fix this?
Thanks

@Fjan
Copy link

Fjan commented Mar 3, 2022

@elalemanyo That's normal for 7bit encoding, it can't be changed without breaking decoding at the recipient side

@elalemanyo
Copy link

elalemanyo commented Mar 3, 2022

@elalemanyo That's normal for 7bit encoding, it can't be changed without breaking decoding at the recipient side

@Fjan ok, but was not like this without the monkey patch, and now I have some failing tests 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants