Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #283 #284

Closed

Conversation

RichardSteele
Copy link
Contributor

Prevents the loss of a space if the previous word is 8-bit but the current one is not. This equals the case of the current word being 8-bit but not the previous one.

@vincent-richard
Copy link
Member

Hello!

Unfortunately, the patch you proposed introduces a regression and breaks the "parser_textTest" unit test, at 3 places:

  1. test: textTest::testNewFromString (F) line: 182
  2. test: textTest::testBugFix20110511 (F) line: 593
  3. test: textTest::testInternationalizedEmail_whitespace (F) line: 711

@RichardSteele
Copy link
Contributor Author

Right, I forgot about the tests.

I think this bug can't be resolved without breaking other parts. As per RFC2047 spaces between adjacent encoded words are just separators but not meant to be displayed. A space between an encoded word and a regular ASCII text is not just a separator but also meant to be displayed.

The problem is that vmime::text::createFromString() doesn't know whether or not the text will be forcefully encoded later on, which will create encoded words even for ASCII texts, like in a mailbox field. Likewise, during the stringification of a text it's unclear whether or not the text was created manually or by createFromString(). Handling the bug at this point could end up in adding redundant spaces.

As RFC2047 says:

Use of 'encoded-word's to represent strings of purely ASCII characters is allowed, but discouraged.

In conclusion, it might be better to put a warning in the comments of vmime::text.

vincent-richard added a commit that referenced this pull request May 21, 2024
```
mailbox(text("Test München West", charsets::UTF_8), "a@b.de").generate();
```

produces

```
=?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <test@example.com>
```

The first space between ``Test`` and ``München`` is encoded as an
underscore along with the first word: ``Test_``. The second space
between ``München`` and ``West`` is encoded with neither of the two
words and thus lost. Decoding the text results in ``Test
MünchenWest`` instead of ``Test München West``.

This is caused by how ``vmime::text::createFromString()`` handles
transitions between 7-bit and 8-bit words: If an 8-bit word follows a
7-bit word, a space is appended to the previous word. The opposite
case of a 7-bit word following an 8-bit word *misses* this behaviour.

When one fixes this problem, a follow-up issue appears:

``text::createFromString("a b\xFFc d")`` tokenizes the input into
``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This
"right-side alignment" nature of the whitespace is a problem for
word::generate():

As per RFC 2047, spaces between adjacent encoded words are just
separators but not meant to be displayed. A space between an encoded
word and a regular ASCII text is not just a separator but also meant
to be displayed.

When word::generate() outputs the b-word, it would have to strip one
space, but only when there is a transition from encoded-word to
unencoded word. word::generate() does not know whether d will be
encoded or unencoded.

The idea now is that we could change the tokenization of
``text::createFromString`` such that whitespace is at the *start* of
words rather than at the end. With that, word::generate() need not
know anything about the next word, but rather only the *previous*
one.

Thus, in this patch,

1. The tokenization of ``text::createFromString`` is changed to
   left-align spaces and the function is fixed to account for
   the missing space on transition.
2. ``word::generate`` learns how to steal a space character.
3. Testcases are adjusted to account for the shifted
   position of the space.

Fixes: #283, #284

Co-authored-by: Vincent Richard <vincent@vincent-richard.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants