-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlite3.OperationalError: Could not decode to UTF-8 column 'data' with text #44
Comments
You can place the line as follow: if os.path.isfile(contact_db):
with sqlite3.connect(contact_db) as db:
db.text_factory = lambda b: b.decode(errors = 'ignore') # here
contacts(db, data)
if os.path.isfile(msg_db):
with sqlite3.connect(msg_db) as db:
db.text_factory = lambda b: b.decode(errors = 'ignore') # and here
messages(db, data)
media(db, data, media_folder)
vcard(db, data)
create_html(data, output_folder) Also, could you provide me a stack trace related to this error and the version of the exporter you are using? |
Oh thanks, sorry for not providing It before.
|
No problem. Do you know what's the language and what encoding the message is used? |
I am using the latest one. |
I have just run a grep command on all of the messages and after seeing the messages I think that this is most likely this emoji "💪" times 4. |
Adding this line didn't help.
|
I have tried to search for It using sqlitebrowser in the message table and data column but I couldn't find It. |
Wait, I think that I have found It! |
Replace i += 1 # around line 377 of extract.py?
if i % 1000 == 0:
print(f"Gathering messages...({i}/{total_row_number})", end="\r")
content = c.fetchone() with i += 1
print(content["_id"])
content = c.fetchone() When you run that, the next row the printed |
This is the HEX and ASCII: |
Also, can you figure out the actual message shown in WhatsApp? Btw, just a reminder, do not post it publicly if it is a sensitive message. |
You see, I have lost the Whatsapp encryption key and was only able to get the key file and this was before I deleted all of the Whatsapp backups. |
Hey, I am worried now, what is that HEX? 😰 |
Hey so I did this: echo 4b656570206974207570f09f9290f09f9290f09f9290f09f8cb9f09f8cb9f2a3b0bdf0b7a0bdf0b7a0bdedb29e | xxd -r -p And got this: Keep it up💐💐💐🌹🌹 |
Is there a way to ignore this error and let It run? |
I don't mean to scare you. I have yet to look into the hex before you post the result tbh.
The easiest way is use a try except block and skip it but it is not an ideal solution. I am still trying to reproduce it because the exporter failed after inserting a row with your binary content, but I got a different error than yours.
|
Yes there is! |
Hah, there is no difference than. |
They can't be displayed in text or RTL mode. Try this and post the output: SELECT quote(text_data) from message WHERE _id=<the message id>; |
If you are using Linux/WSL: $ sqlite3 msgstore.db
> SELECT quote(text_data) from message WHERE _id=<the message id>; |
'Keep it up💐💐💐🌹🌹���' |
I didn't get a hex. What version of sqlite3 command line are you using? |
Why is there no id in your command? |
3.40.1
Look closer and you will find that it is just redacted😂. It should be an integer anyway. |
Oh. Why would you redact that?! |
I guess that's mean we have different data in that cell. |
Why are getting a hex anyway, the cell should contain the message, not the hex, right? |
Can you put the raw data into the cell? |
Yeah, I am trying to understand what raw data you got and how I can put the raw data into the cell. Hex is just one of the possible solutions, but we proved that we have different data inside the cell already by using hex. quote() function does not necessarily output hex but also just a quoted text. What about this: |
But quote(text_data) = is also going to be here so the hex won't match. Would It? |
And can you please write that python except block for now? |
It will also have the quotes in the hex right? Wouldn't that be modifying the hex? |
Implemented the workaround for the problem in 9ac8839. |
It doesn't necessarily be hex. |
Does the workaround skip It or does It try to extract It?
Did you put in the hex in the cell or the text? |
The workaround skip the message.
I put the hex in the hex area of the cell. |
Could It be that It didn't get translated? Why don't you translate It externally and put It in as text?
|
This is the output: |
This is the plain hex, |
If you don't mind you can send me your database. It will the the easiest way to do debugging. |
No can do. |
You can say Hi to me in the Matrix room and send the database to me through Matrix or send me an email with the link to the database at hello [at] knugi.com. You can find my PGP key at https://keyserver1.pgp.com. |
Isn't there a way to extract a specific row from the database? |
Can you send me an INSERT statement to that row? |
Pardon my ignorance but how do I do that? |
Mailed you. |
The workaround is released in 0.9.5. |
Hello,
I suspect that this is due to some of the database not being in UTF-8.
I got around this on another program that also exports Whatsapp messages by adding the following code:
db.text_factory = lambda b: b.decode(errors = 'ignore')
after this code which was already there,
db = sqlite3.connect(file_path)
But when I tried to do this in this program,
I couldn't find It in the extract python files.
I instead found,
this....
Which is beyond my understanding.
The text was updated successfully, but these errors were encountered: