You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I discover and posted about this on MobileRead. Jim said it was a corner case and low-priority, but I'm posting it here mostly to document what I figured out so far and so it doesn't get lost in the MR thread.
Situation: I drag and drop my AO3 and FFNet notification emails into FanFicFare as opposed to using the IMAP fetch. A specific email (ffnet notification, using Content-Type: text/plain; charset="utf-8") kept failing to parse the URL and instead pasted the file path of the eml instead.
I identified it to the presence of two Unicode Latin-1 supplement characters (U+00E9 and U+00E0) in the message body—this case, the chapter title was "Déjà-vu".
A few rough tests (basically emailing myself urls and random unicode characters) confirm that this also occurs with other Unicode blocks, but only when the content-type is plain text. I've not tested whether it also occurs with IMAP fetch.
This bit of code in geturls.py seems related:
def get_urls_from_text(data,configuration=None,normalize=False,email=False):
urls = collections.OrderedDict()
try:
# py3 can have issues with extended chars in txt emails
data = ensure_str(data,errors='replace')
except UnicodeDecodeError:
data = data.decode('utf8') ## for when called outside calibre.
ensure_str is part of six, which seems to be a library for py2/py3 compatibility. I did try a few random things based on what I googled. errors='strict' didn't work, nor did adding encoding='utf-8'. Commenting out the line entirely (admittedly mostly to see what would happen) caused FFF to fail to load. This is why I am not a programmer.
Jim mentioned that it might also be OS-dependent; if that is the case, I use Windows 10 Home (19041).
Thankfully I was taking notes on this so unless I forgot to write something down that's everything I have so far.
The text was updated successfully, but these errors were encountered:
chocolatechipcats
changed the title
Drag and drop of .eml with extended characters
[Low-priority] Drag and drop of .eml with extended characters
Jan 29, 2021
additional thought: might it be an issue with data.decode? I'm not sure whether eml drag-and-drop is "called outside Calibre" though, but Python's docs seems to indicate you can also add an error handler to that.
I discover and posted about this on MobileRead. Jim said it was a corner case and low-priority, but I'm posting it here mostly to document what I figured out so far and so it doesn't get lost in the MR thread.
Situation: I drag and drop my AO3 and FFNet notification emails into FanFicFare as opposed to using the IMAP fetch. A specific email (ffnet notification, using
Content-Type: text/plain; charset="utf-8"
) kept failing to parse the URL and instead pasted the file path of the eml instead.I identified it to the presence of two Unicode Latin-1 supplement characters (U+00E9 and U+00E0) in the message body—this case, the chapter title was "Déjà-vu".
A few rough tests (basically emailing myself urls and random unicode characters) confirm that this also occurs with other Unicode blocks, but only when the content-type is plain text. I've not tested whether it also occurs with IMAP fetch.
This bit of code in geturls.py seems related:
ensure_str is part of six, which seems to be a library for py2/py3 compatibility. I did try a few random things based on what I googled.
errors='strict'
didn't work, nor did addingencoding='utf-8'.
Commenting out the line entirely (admittedly mostly to see what would happen) caused FFF to fail to load. This is why I am not a programmer.Jim mentioned that it might also be OS-dependent; if that is the case, I use Windows 10 Home (19041).
Thankfully I was taking notes on this so unless I forgot to write something down that's everything I have so far.
The text was updated successfully, but these errors were encountered: