Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when attempting to run run clippy.py #1

Open
SitrucL opened this issue Dec 10, 2020 · 7 comments
Open

Error when attempting to run run clippy.py #1

SitrucL opened this issue Dec 10, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@SitrucL
Copy link

SitrucL commented Dec 10, 2020

Tried running the python script and passed in the .txt file as instructed on the git page & YT video however, I keep encountering this error:

image

Please let me know if there's anything that I can do to resolve the issue

Additional info:
Python 3.91
Kindle PW
Windows 10

@dangbert
Copy link
Owner

dangbert commented Dec 10, 2020

Hi, I'd like to help get this fixed for you. Would you mind sharing your "My Clippings.txt" file to help me replicate the issue?

Based on this I'm guessing your file is encoded in a weird format the program doesn't expect.

One simple test could be to copy and paste the contents of your file (open it in a notepad) into a new file which you then provide to clippy.py


Let me know how it goes.

@SitrucL
Copy link
Author

SitrucL commented Dec 11, 2020

Hi dangbert,

I tried paste the contents into a new .txt file but that didnt seem to resolve the issue. Here is a copy of my clippings.txt, perhaps there are some characters within the file that are causing the issue. I wasnt too sure from the error message which particular characters/lines to look at though.

Thanks in advance

My Clippings.txt

@dangbert
Copy link
Owner

Interestingly when I downloaded your file I didn't run into that encoding issue (although I'm using Ubuntu 20 and Python 3.8.5). Either uploading the file to github changed its encoding, or this particular issue has something to do with our python versions differing or running on windows vs ubuntu.

However, when I ran clippy.py on your file, I ran into another issue which stems from something I've already suspected: some different versions of Kindle format "My Clippings.txt" in slightly different ways.

Here you can see the difference between a highlight in my file (top) and yours (bottom):

image

Specifically the problem is that your file lists both a page number and a location range for every highlight, currently my program fails to parse your file as it expects either just a page number to be provided or a location range instead but not both.


I built this program originally because the existing alternatives I tried didn't parse my Kindle's "My Clippings.txt" file correctly. The "My Clippings.txt" format is not great, and the fact that it seems to vary across different Kindle versions and language settings is a pain when it comes to parsing it.

But it's my goal to make clippy-kindle robust to different versions of kindles (and eventually to the language setting used as well). I can try to spend some time this week reworking the parser to support your file format, and then we can further investigate the encoding issue. Sorry for the troubles :/

@AmmarShaqeel
Copy link
Contributor

AmmarShaqeel commented Apr 1, 2021

I've had issues with this as well.
This is because the "My Clippings.txt" seems to have 4 different formats for the highlights.

Normal:

- Your Highlight at location 1177-1178 | Added on Monday, 19 June 2017 02:21:10

Very rare, short or blank highlights:

- Your Highlight at location 1177 | Added on Monday, 19 June 2017 02:21:10

Rare:

- Your Highlight on page 7 | Added on Monday, 19 June 2017 02:21:10

If book has pages (not all publishers seem to have to the trouble of adding pages):

- Your Highlight on page 22 | location 325-325 | Added on Thursday, 15 June 2017 18:23:21

Something else to note is that it seems vary between "Your highlight at location" and "Your highlight on page"

@dangbert
Copy link
Owner

dangbert commented Apr 4, 2021

The parsing issues related to this issue are addressed and fixed now after #5 and c53fec7, thank you @AmmarShaqeel for the contribution!

However this doesn't address the original encoding problem in the OP. I haven't experienced this issue on Ubuntu 20 with this "My Clippings.txt" file, but I can try to replicate by running this in Windows...

Edit: I'm also experiencing the same "charmap" codec error when running in windows (windows 8, python 3.6.5)

Update: Nov 10, 2022:
The repo needs to be refactored / tested to work on windows. I will use github actions with onwidows to troubleshoot this (as I don't have a windows machine)...

  • If i remember correctly the issue had something to do with the datetime library...

dangbert added a commit that referenced this issue Apr 7, 2021
TODO: add chardet to requirements.txt, test in linux
@dangbert dangbert added the bug Something isn't working label Sep 12, 2022
Repository owner deleted a comment from DaDaQingFeng Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants
@dangbert @AmmarShaqeel @SitrucL and others