-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emojis in MySQL/MariaDB #20
Comments
I think so? I did the same collation on my mysql db, but I haven't noticed any messed up emojis. Can you give me an example of one of the tweets you noticed was broken? I'd like to check mine. |
Sure, here are a couple examples: author: 4EVER_SUSAN author: 6DRUZ Really wish these tweets had IDs. |
Ah yeah, I've got the same problem... It might be an issue with the original data. Looks like it's the same in the csv? The first tweet is line 84788 in the first csv. It's just the unicode replacement character there too. |
This isn't a MySQL or MariaDB issue, I'm facing the same problem with PostgreSQL. It's malformed characters. |
Thanks for the help tracking the issue down @chrisgherbert . The issue there is the emoji is the 👏🏻 Clapping Hands: Light Skin Tone. You can figure that out with a hexdump on the chars from the tweet That actually byte code Looking at the bytes in the stream I see, |
An important thing if you're struggling parsing these Unicode characters in my fork of the repository they're encoded as |
Has anyone gotten the emojis to work when the data is loaded into a MySQL or MariaDB database? I'm using utf8mb4 encoding and utf8mb4_unicode_ci collation, but only a small portion of the emojis are displaying properly for me.
The text was updated successfully, but these errors were encountered: