Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drag and drop of .eml with extended characters #645

Closed
chocolatechipcats opened this issue Jan 29, 2021 · 2 comments
Closed

Drag and drop of .eml with extended characters #645

chocolatechipcats opened this issue Jan 29, 2021 · 2 comments

Comments

@chocolatechipcats
Copy link
Contributor

chocolatechipcats commented Jan 29, 2021

I discover and posted about this on MobileRead. Jim said it was a corner case and low-priority, but I'm posting it here mostly to document what I figured out so far and so it doesn't get lost in the MR thread.

Situation: I drag and drop my AO3 and FFNet notification emails into FanFicFare as opposed to using the IMAP fetch. A specific email (ffnet notification, using Content-Type: text/plain; charset="utf-8") kept failing to parse the URL and instead pasted the file path of the eml instead.

I identified it to the presence of two Unicode Latin-1 supplement characters (U+00E9 and U+00E0) in the message body—this case, the chapter title was "Déjà-vu".

A few rough tests (basically emailing myself urls and random unicode characters) confirm that this also occurs with other Unicode blocks, but only when the content-type is plain text. I've not tested whether it also occurs with IMAP fetch.

This bit of code in geturls.py seems related:

def get_urls_from_text(data,configuration=None,normalize=False,email=False):
    urls = collections.OrderedDict()
    try:
        # py3 can have issues with extended chars in txt emails
        data = ensure_str(data,errors='replace')
    except UnicodeDecodeError:
        data = data.decode('utf8') ## for when called outside calibre.

ensure_str is part of six, which seems to be a library for py2/py3 compatibility. I did try a few random things based on what I googled. errors='strict' didn't work, nor did adding encoding='utf-8'. Commenting out the line entirely (admittedly mostly to see what would happen) caused FFF to fail to load. This is why I am not a programmer.

Jim mentioned that it might also be OS-dependent; if that is the case, I use Windows 10 Home (19041).

Thankfully I was taking notes on this so unless I forgot to write something down that's everything I have so far.

@chocolatechipcats chocolatechipcats changed the title Drag and drop of .eml with extended characters [Low-priority] Drag and drop of .eml with extended characters Jan 29, 2021
@JimmXinu JimmXinu changed the title [Low-priority] Drag and drop of .eml with extended characters Drag and drop of .eml with extended characters Jan 29, 2021
@chocolatechipcats
Copy link
Contributor Author

additional thought: might it be an issue with data.decode? I'm not sure whether eml drag-and-drop is "called outside Calibre" though, but Python's docs seems to indicate you can also add an error handler to that.

@chocolatechipcats
Copy link
Contributor Author

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants