`vmime::text::createFromString()` drops spaces if an 8-bit word is followed by a 7-bit word #283

RichardSteele · 2023-11-08T08:50:22Z

Following example

vmime::mailbox mailbox(vmime::text("Test München West", vmime::charsets::WINDOWS_1252), "test@example.com");
vmime::utility::outputStreamAdapter adapter(std::cout);
mailbox.generate(adapter);

gives as output

=?us-ascii?Q?Test_?= =?windows-1252?Q?M=FCnchen?= =?us-ascii?Q?West?=
  <test@example.com>

The first space between Test and München is encoded as an underscore along with the first word: Test_. The second space between München and West is encoded with neither of the two words and thus lost. Decoding the text results in Test MünchenWest instead of Test München West.

This is caused by the way vmime::text::createFromString() handles transitions between 7-bit and 8-bit words:

vmime/src/vmime/text.cpp

Lines 310 to 346 in 1a35bb6

    
           if (is8bit) { 
        
           	if (count && prevIs8bit) { 
        
           		// No need to create a new encoded word, just append 
        
           		// the current word to the previous one. 
        
           		shared_ptr <word> w = getWordAt(getWordCount() - 1); 
        
           		w->getBuffer() += " " + chunk; 
        
           	} else { 
        
           		if (count) { 
        
           			shared_ptr <word> w = getWordAt(getWordCount() - 1); 
        
           			w->getBuffer() += ' '; 
        
           		} 
        
           		appendWord(make_shared <word>(chunk, ch)); 
        
           		prevIs8bit = true; 
        
           		++count; 
        
           	} 
        
           } else { 
        
           	if (count && !prevIs8bit) { 
        
           		shared_ptr <word> w = getWordAt(getWordCount() - 1); 
        
           		w->getBuffer() += " " + chunk; 
        
           	} else { 
        
           		appendWord(make_shared <word>(chunk, charset(charsets::US_ASCII))); 
        
           		prevIs8bit = false; 
        
           		++count; 
        
           	} 
        
           }

If an 8-bit word follows a 7-bit word, a space is appended to the previous word (lines 321 - 324). The opposite case of a 7-bit word following an 8-bit word misses this behaviour (lines 339- 345). Pull request #284 fixes this.

The text was updated successfully, but these errors were encountered:

jengelh · 2024-04-25T22:49:38Z

A trivial fix is to just generate a single vmime::word — which is what most MUA (e.g. Alpine, Thunderbird) do anyway.

+++ b/src/vmime/text.cpp
@@ -273,6 +273,7 @@ void text::createFromString(const string& in, const charset& ch) {
 
        removeAllWords();
 
+#if 0
        // Check whether there is a recommended encoding for this charset.
        // If so, the whole buffer will be encoded. Else, the number of
        // 7-bit (ASCII) bytes in the input will be used to determine if
@@ -288,7 +289,9 @@ void text::createFromString(const string& in, const charset& ch) {
        // If there are "too much" non-ASCII chars, encode everything
        if (alwaysEncode || asciiPercent < 60) {  // less than 60% ASCII chars
 
+#endif
                appendWord(make_shared <word>(in, ch));
+#if 0
 
        // Else, only encode words which need it
        } else {
@@ -363,6 +366,7 @@ void text::createFromString(const string& in, const charset& ch) {
                        }
                }
        }
+#endif
 }

``` mailbox(text("Test München West", charsets::UTF_8), "a@b.de").generate(); ``` produces ``` =?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <test@example.com> ``` The first space between ``Test`` and ``München`` is encoded as an underscore along with the first word: ``Test_``. The second space between ``München`` and ``West`` is encoded with neither of the two words and thus lost. Decoding the text results in ``Test MünchenWest`` instead of ``Test München West``. This is caused by how ``vmime::text::createFromString()`` handles transitions between 7-bit and 8-bit words: If an 8-bit word follows a 7-bit word, a space is appended to the previous word. The opposite case of a 7-bit word following an 8-bit word *misses* this behaviour. When one fixes this problem, a follow-up issue appears: ``text::createFromString("a b\xFFc d")`` tokenizes the input into ``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This "right-side alignment" nature of the whitespace is a problem for word::generate(): As per RFC 2047, spaces between adjacent encoded words are just separators but not meant to be displayed. A space between an encoded word and a regular ASCII text is not just a separator but also meant to be displayed. When word::generate() outputs the b-word, it would have to strip one space, but only when there is a transition from encoded-word to unencoded word. word::generate() does not know whether d will be encoded or unencoded. The idea now is that we could change the tokenization of ``text::createFromString`` such that whitespace is at the *start* of words rather than at the end. With that, word::generate() need not know anything about the next word, but rather only the *previous* one. Thus, in this patch, 1. The tokenization of ``text::createFromString`` is changed to left-align spaces and the function is fixed to account for the missing space on transition. 2. ``word::generate`` learns how to steal a space character. 3. Testcases are adjusted to account for the shifted position of the space. Fixes: #283, #284 Co-authored-by: Vincent Richard <vincent@vincent-richard.net>

RichardSteele changed the title ~~Some spaces are lost in texts of multipled words with different charsets~~ Some spaces are lost in texts of multiple words with different charsets Nov 8, 2023

RichardSteele changed the title ~~Some spaces are lost in texts of multiple words with different charsets~~ vmime::text::createFromString() drops spaces if an 8-bit word is followed by a 7-bit word Nov 12, 2023

jengelh mentioned this issue Apr 26, 2024

vmime: prevent loss of a space during text::createFromString #306

Merged

vincent-richard closed this as completed in #306 May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`vmime::text::createFromString()` drops spaces if an 8-bit word is followed by a 7-bit word #283

`vmime::text::createFromString()` drops spaces if an 8-bit word is followed by a 7-bit word #283

RichardSteele commented Nov 8, 2023 •

edited

Loading

jengelh commented Apr 25, 2024

vmime::text::createFromString() drops spaces if an 8-bit word is followed by a 7-bit word #283

vmime::text::createFromString() drops spaces if an 8-bit word is followed by a 7-bit word #283

Comments

RichardSteele commented Nov 8, 2023 • edited Loading

jengelh commented Apr 25, 2024

`vmime::text::createFromString()` drops spaces if an 8-bit word is followed by a 7-bit word #283

`vmime::text::createFromString()` drops spaces if an 8-bit word is followed by a 7-bit word #283

RichardSteele commented Nov 8, 2023 •

edited

Loading