7bit encoded email with line break #151

htulipe · 2022-11-22T09:26:07Z

Hello. Maybe a dumb question but I can't see how the lib can successfully parse a 7bit encoded email that contain line breaks. By successfully I mean without losing line breaks.

Version

mail 0.2.3
Erlang/OTP 24 [erts-12.3.2.6] [source] [64-bit] [smp:5:5] [ds:5:5:10] [async-threads:1]
Elixir 1.14.0 (compiled with Erlang/OTP 24)

Test Case

Using the parse_email function defined in parser test:

parse_email("""
    To: user@example.com
    From: me@example.com
    Subject: Test Email
    Content-Transfer-Encoding: 7bit

    This is the body!
    It has more than one line
    """)

Steps to reproduce

Run the above code

Expected Behavior

The returned body should have some sort of line breaks:
This is the body!\nIt has more than one line

Actual Behavior

The returned body no longer have line breaks:
This is the body!It has more than one line

Reading at the code, I see that the lib joins the body lines using \r\n but the SevenBit parser called just after drops them. Am I missing something ?

Joining the body lines with \n instead of \r\n seems to fix the issue.

Thanks in advance

PS: I saw the previous issue on the matter but could not find an answer there so I allowed myself to repost a new issue.

The text was updated successfully, but these errors were encountered:

bcardarella · 2022-11-22T13:34:13Z

Is @andrewtimberlake 's answer in #138 not sufficient?

htulipe · 2022-11-22T16:49:43Z

I agree with Andrew's RFC understanding but the direct conclusion is that we can't send multi-line emails with this encoding. That can't be possible, I must be missing something.

May I add that python email module parses the same email without loosing line breaks.

bcardarella · 2022-11-22T18:31:20Z

@htulipe so your issue is not with the parsing but the compilation from the data structure into an email?

htulipe · 2022-11-23T08:04:51Z

My goal is to read an EML file and transform it in some data structure that my frontend end can then display.

SergeyMosin · 2024-08-22T02:44:23Z

Is @andrewtimberlake 's answer in #138 not sufficient? ( #138 (comment) )

First of all, thank you for your work on this module. However, I have the following question...

What should be the expected parsed message body for the following code according to RFC 2045 §2.7 ?

IO.inspect(
      Mail.parse([
        "From: a@b.tld",
        "To: c@d.tld",
        "Subject: test",
        "Content-Transfer-Encoding: 7bit", # or 8bit
        "",
        "line1",
        "line2"
      ])
    )

Option A: line1\r\nline2

✔️ Data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences.
✔️ No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0).
✔️ CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences.

Option B: line1line2

❌ Data that is all represented as relatively short lines with 998 octets or less ~~between CRLF line separation sequences~~.
✔️ No octets with decimal values greater than 127 are allowed and neither are NULs (octets with decimal value 0).
❌ ~~CR (decimal value 13) and LF (decimal value 10) octets only occur as part of CRLF line separation sequences~~.

I personally lean towards Option A, but the Mail.parse function currently outputs Option B which seems to diverge from the RFC in points 1 and 3 because the "CRLF line separation sequence" is missing.

andrewtimberlake · 2024-08-23T04:26:02Z

I subsequently found out that 7bit decoding was removing line breaks indiscriminately and should only be removing those used to wrap lines exceeding the maximum length of 1000 chars
I have merged in a fix #164

SergeyMosin · 2024-08-23T13:36:45Z

Thank you for the quick fix. I think the same problem effects the 8bit encoding as well.
Example:

IO.inspect(
  Mail.parse([
    "From: a@b.tld",
    "To: c@d.tld",
    "Subject: test",
    "Content-Type: text/plain; charset=UTF-8",
    "Content-Transfer-Encoding: 8bit",
    "",
    "lÃ¯ne1",
    "lÃ¯ne2"
  ])
)

outputs this:

%Mail.Message{
  headers: %{
    "content-transfer-encoding" => "8bit",
    "content-type" => ["text/plain", {"charset", "UTF-8"}],
    "from" => "a@b.tld",
    "subject" => "test",
    "to" => ["c@d.tld"]
  },
  body: "lÃ¯ne1lÃ¯ne2",
  parts: [],
  multipart: false
}

no \r\n in the body

andrewtimberlake · 2024-08-23T13:55:37Z

Thanks, great catch. Fixed in #166

andrewtimberlake closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7bit encoded email with line break #151

7bit encoded email with line break #151

htulipe commented Nov 22, 2022 •

edited

Loading

bcardarella commented Nov 22, 2022

htulipe commented Nov 22, 2022 •

edited

Loading

bcardarella commented Nov 22, 2022

htulipe commented Nov 23, 2022

SergeyMosin commented Aug 22, 2024

andrewtimberlake commented Aug 23, 2024

SergeyMosin commented Aug 23, 2024

andrewtimberlake commented Aug 23, 2024

7bit encoded email with line break #151

7bit encoded email with line break #151

Comments

htulipe commented Nov 22, 2022 • edited Loading

Version

Test Case

Steps to reproduce

Expected Behavior

Actual Behavior

bcardarella commented Nov 22, 2022

htulipe commented Nov 22, 2022 • edited Loading

bcardarella commented Nov 22, 2022

htulipe commented Nov 23, 2022

SergeyMosin commented Aug 22, 2024

andrewtimberlake commented Aug 23, 2024

SergeyMosin commented Aug 23, 2024

andrewtimberlake commented Aug 23, 2024

htulipe commented Nov 22, 2022 •

edited

Loading

htulipe commented Nov 22, 2022 •

edited

Loading