Drag and drop of .eml with extended characters #645

chocolatechipcats · 2021-01-29T20:11:56Z

I discover and posted about this on MobileRead. Jim said it was a corner case and low-priority, but I'm posting it here mostly to document what I figured out so far and so it doesn't get lost in the MR thread.

Situation: I drag and drop my AO3 and FFNet notification emails into FanFicFare as opposed to using the IMAP fetch. A specific email (ffnet notification, using Content-Type: text/plain; charset="utf-8") kept failing to parse the URL and instead pasted the file path of the eml instead.

I identified it to the presence of two Unicode Latin-1 supplement characters (U+00E9 and U+00E0) in the message body—this case, the chapter title was "Déjà-vu".

A few rough tests (basically emailing myself urls and random unicode characters) confirm that this also occurs with other Unicode blocks, but only when the content-type is plain text. I've not tested whether it also occurs with IMAP fetch.

This bit of code in geturls.py seems related:

def get_urls_from_text(data,configuration=None,normalize=False,email=False):
    urls = collections.OrderedDict()
    try:
        # py3 can have issues with extended chars in txt emails
        data = ensure_str(data,errors='replace')
    except UnicodeDecodeError:
        data = data.decode('utf8') ## for when called outside calibre.

ensure_str is part of six, which seems to be a library for py2/py3 compatibility. I did try a few random things based on what I googled. errors='strict' didn't work, nor did adding encoding='utf-8'. Commenting out the line entirely (admittedly mostly to see what would happen) caused FFF to fail to load. This is why I am not a programmer.

Jim mentioned that it might also be OS-dependent; if that is the case, I use Windows 10 Home (19041).

Thankfully I was taking notes on this so unless I forgot to write something down that's everything I have so far.

The text was updated successfully, but these errors were encountered:

chocolatechipcats · 2021-01-31T21:20:03Z

additional thought: might it be an issue with data.decode? I'm not sure whether eml drag-and-drop is "called outside Calibre" though, but Python's docs seems to indicate you can also add an error handler to that.

chocolatechipcats · 2021-02-07T23:46:29Z

thank you!

chocolatechipcats changed the title ~~Drag and drop of .eml with extended characters~~ [Low-priority] Drag and drop of .eml with extended characters Jan 29, 2021

JimmXinu added the Priority-Low label Jan 29, 2021

JimmXinu changed the title ~~[Low-priority] Drag and drop of .eml with extended characters~~ Drag and drop of .eml with extended characters Jan 29, 2021

JimmXinu closed this as completed in f33a5de Feb 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drag and drop of .eml with extended characters #645

Drag and drop of .eml with extended characters #645

chocolatechipcats commented Jan 29, 2021 •

edited

Loading

chocolatechipcats commented Jan 31, 2021

chocolatechipcats commented Feb 7, 2021

Drag and drop of .eml with extended characters #645

Drag and drop of .eml with extended characters #645

Comments

chocolatechipcats commented Jan 29, 2021 • edited Loading

chocolatechipcats commented Jan 31, 2021

chocolatechipcats commented Feb 7, 2021

chocolatechipcats commented Jan 29, 2021 •

edited

Loading