vmime: prevent loss of a space during text::createFromString #306
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
produces
The first space between
Test
andMünchen
is encoded as an underscore along with the first word:Test_
. The second space betweenMünchen
andWest
is encoded with neither of the two words and thus lost. Decoding the text results inTest MünchenWest
instead ofTest München West
.This is caused by how
vmime::text::createFromString()
handles transitions between 7-bit and 8-bit words: If an 8-bit word follows a 7-bit word, a space is appended to the previous word. The opposite case of a 7-bit word following an 8-bit word misses this behaviour.When one fixes this problem, a follow-up issue appears:
text::createFromString("a b\xFFc d")
tokenizes the input intom_words={word("a "), word("b\xFFc ", utf8), word("d")}
. This "right-side alignment" nature of the whitespace is a problem for word::generate():As per RFC 2047, spaces between adjacent encoded words are just separators but not meant to be displayed. A space between an encoded word and a regular ASCII text is not just a separator but also meant to be displayed.
When word::generate() outputs the b-word, it would have to strip one space, but only when there is a transition from encoded-word to unencoded word. word::generate() does not know whether d will be encoded or unencoded.
The idea now is that we could change the tokenization of
text::createFromString
such that whitespace is at the start of words rather than at the end. With that, word::generate() need not know anything about the next word, but rather only the previous one.Thus, in this patch,
text::createFromString
is changed to left-align spaces and the function is fixed to account for the missing space on transition.word::generate
learns how to steal a space character.Fixes: #283, #284
Cc @RichardSteele