Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how charsets are detected for emails #5882

Merged
merged 1 commit into from
Feb 28, 2024

Conversation

miaulalala
Copy link
Contributor

Fixes #5845
Fixes #5834

@miaulalala miaulalala changed the title Checnge how charsets are detected for emails Change how charsets are detected for emails Dec 20, 2021
@GretaD
Copy link
Contributor

GretaD commented Dec 22, 2021

this looks good for me too

@miaulalala miaulalala marked this pull request as ready for review January 3, 2022 11:58
@xinstein

This comment has been minimized.

@miaulalala miaulalala force-pushed the fix/detect-char-encoding branch from 4bda549 to fac906a Compare January 11, 2022 16:42
@miaulalala miaulalala requested review from GretaD and st3iny January 11, 2022 18:35
@xinstein

This comment has been minimized.

@miaulalala

This comment has been minimized.

@xinstein

This comment has been minimized.

Copy link
Member

@st3iny st3iny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throws an error Could not detect charset for message on a harmless message that worked fined before.

Test email as sent from Thunderbird. It contains an image from Unsplash. I could reproduce it using other images too so this seems to be an issue with attached images.

Error from the console
{
  "app": "mail",
  "uid": "admin",
  "error": {
    "isError": true,
    "debug": true,
    "type": "OCA\\Mail\\Exception\\ServiceException",
    "code": 0,
    "message": "Could not detect charset for message",
    "trace": [
      {
        "file": "/srv/http/dev_apps/mail/lib/Model/IMAPMessage.php",
        "line": 566,
        "function": "loadBodyData",
        "class": "OCA\\Mail\\Model\\IMAPMessage"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/Model/IMAPMessage.php",
        "line": 437,
        "function": "handleTextMessage",
        "class": "OCA\\Mail\\Model\\IMAPMessage"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/Model/IMAPMessage.php",
        "line": 377,
        "function": "getPart",
        "class": "OCA\\Mail\\Model\\IMAPMessage"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/Model/IMAPMessage.php",
        "line": 102,
        "function": "loadMessageBodies",
        "class": "OCA\\Mail\\Model\\IMAPMessage"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/IMAP/MessageMapper.php",
        "line": 242,
        "function": "__construct",
        "class": "OCA\\Mail\\Model\\IMAPMessage"
      },
      {
        "function": "OCA\\Mail\\IMAP\\{closure}",
        "class": "OCA\\Mail\\IMAP\\MessageMapper"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/IMAP/MessageMapper.php",
        "line": 256,
        "function": "array_map"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/IMAP/MessageMapper.php",
        "line": 69,
        "function": "findByIds",
        "class": "OCA\\Mail\\IMAP\\MessageMapper"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/Service/MailManager.php",
        "line": 180,
        "function": "find",
        "class": "OCA\\Mail\\IMAP\\MessageMapper"
      },
      {
        "file": "/srv/http/dev_apps/mail/lib/Controller/MessagesController.php",
        "line": 249,
        "function": "getImapMessage",
        "class": "OCA\\Mail\\Service\\MailManager"
      },
      {
        "file": "/srv/http/lib/private/AppFramework/Http/Dispatcher.php",
        "line": 217,
        "function": "getBody",
        "class": "OCA\\Mail\\Controller\\MessagesController"
      },
      {
        "file": "/srv/http/lib/private/AppFramework/Http/Dispatcher.php",
        "line": 126,
        "function": "executeController",
        "class": "OC\\AppFramework\\Http\\Dispatcher"
      },
      {
        "file": "/srv/http/lib/private/AppFramework/App.php",
        "line": 157,
        "function": "dispatch",
        "class": "OC\\AppFramework\\Http\\Dispatcher"
      },
      {
        "file": "/srv/http/lib/private/Route/Router.php",
        "line": 302,
        "function": "main",
        "class": "OC\\AppFramework\\App"
      },
      {
        "file": "/srv/http/lib/base.php",
        "line": 1006,
        "function": "match",
        "class": "OC\\Route\\Router"
      },
      {
        "file": "/srv/http/index.php",
        "line": 36,
        "function": "handleRequest",
        "class": "OC"
      }
    ]
  }
}

@miaulalala

This comment has been minimized.

@miaulalala
Copy link
Contributor Author

Throws an error Could not detect charset for message on a harmless message that worked fined before.

Test email as sent from Thunderbird. It contains an image from Unsplash. I could reproduce it using other images too so this seems to be an issue with attached images.

Error from the console

Ah interesting. Attached, not embedded as base64?

@st3iny
Copy link
Member

st3iny commented Jan 18, 2022

Throws an error Could not detect charset for message on a harmless message that worked fined before.
Test email as sent from Thunderbird. It contains an image from Unsplash. I could reproduce it using other images too so this seems to be an issue with attached images.
Error from the console

Ah interesting. Attached, not embedded as base64?

I added the image as a regular attachment. At least in Thunderbird it is detected as an attachment and is not shown inline.

Here is the relevant excerpt from the .eml:

--------------LrBOmGA0xhgKeGGopkJKdw1w
Content-Type: image/jpeg; name="ilya-bronskiy-__Z8VvhbIZ4-unsplash.jpg"
Content-Disposition: attachment;
 filename="ilya-bronskiy-__Z8VvhbIZ4-unsplash.jpg"
Content-Transfer-Encoding: base64

@miaulalala
Copy link
Contributor Author

#1037 is also related

@miaulalala
Copy link
Contributor Author

Throws an error Could not detect charset for message on a harmless message that worked fined before.
Test email as sent from Thunderbird. It contains an image from Unsplash. I could reproduce it using other images too so this seems to be an issue with attached images.
Error from the console

Ah interesting. Attached, not embedded as base64?

I added the image as a regular attachment. At least in Thunderbird it is detected as an attachment and is not shown inline.

Here is the relevant excerpt from the .eml:

--------------LrBOmGA0xhgKeGGopkJKdw1w
Content-Type: image/jpeg; name="ilya-bronskiy-__Z8VvhbIZ4-unsplash.jpg"
Content-Disposition: attachment;
 filename="ilya-bronskiy-__Z8VvhbIZ4-unsplash.jpg"
Content-Transfer-Encoding: base64

Figured it out. PHP implicit bools were fooling me. As the content is $data = "" - i. e. an empty string, my boolean check on !$data returned a false value even though $data is correct as "". Early returns FTW here.

@miaulalala miaulalala requested a review from st3iny March 18, 2022 00:19
@miaulalala miaulalala added this to the v1.11.8 milestone Mar 18, 2022
@ChristophWurst ChristophWurst removed this from the v1.11.8 milestone Apr 27, 2022
@roukmoute
Copy link

Would it be possible to use symfony/unicode or hoa/ustring to avoid encoding problems?

Copy link
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit scary. Let's add tests to cover the edge cases.

👍 otherwise

@miaulalala
Copy link
Contributor Author

This is a bit scary. Let's add tests to cover the edge cases.

👍 otherwise

Not quite sure how to test this as the complete function chain is private.

@miaulalala miaulalala force-pushed the fix/detect-char-encoding branch from 368320d to b557369 Compare October 9, 2023 08:52
@ChristophWurst
Copy link
Member

Not quite sure how to test

Move the charset logic into a new class, add unit tests for it and use it in the other complex class.


// Still UTF8, no need to convert
if($charset !== null && strtoupper($charset) === 'UTF-8') {
return $data;
Copy link
Member

@ChristophWurst ChristophWurst Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of the tests is able to reach a line below this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still the case. either the charset parameter is set and used, or mb detects the string as UTF-8

lib/IMAP/Charset/Converter.php Outdated Show resolved Hide resolved
@ChristophWurst ChristophWurst mentioned this pull request Nov 29, 2023
24 tasks
$iso2022jpMimePart_noCharset = new Horde_Mime_Part();
$iso2022jpMimePart_noCharset->setContents('外せ園査リツハワ題');
// Korean - not in mb nor iconv
// $iso106461MimePart = new Horde_Mime_Part();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISO 10646-1 is impossible to get working for me. The coding is not mentioned in any report. Can we skip this?

Copy link
Member

@ChristophWurst ChristophWurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we can get this in

lib/IMAP/Charset/Converter.php Outdated Show resolved Hide resolved
Signed-off-by: Anna Larch <anna@nextcloud.com>
@ChristophWurst ChristophWurst force-pushed the fix/detect-char-encoding branch from 0cbdb53 to 176e074 Compare February 27, 2024 18:32
@ChristophWurst ChristophWurst merged commit ce0e4bd into main Feb 28, 2024
37 checks passed
@ChristophWurst ChristophWurst deleted the fix/detect-char-encoding branch February 28, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loading messages fails on some folders with mb_convert_encoding() error Hebrew (all RTL?) Language Error
6 participants