Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding Challenges with wordcloud #687

Open
MarcViebahn opened this issue Aug 25, 2022 · 1 comment
Open

Encoding Challenges with wordcloud #687

MarcViebahn opened this issue Aug 25, 2022 · 1 comment

Comments

@MarcViebahn
Copy link

Hi there,

as beeing pretty new to programming and python I am currently wrestling with encodings in oder to get the German Umlauts correct within a WordCloud.

Description

In case I feed wordcloud with an example text like: text = "Wir mögen Möglichkeiten." The "ö" are shown correctly within the wordcloud.
I have a sqlite database (utf8) with the text of 20,000 articles. When I read all the articles and save them in one text file with encoding utf-8 I can print the correct text within the prompt and open the correct text with notepad++ oder word with utf8 encoding.
When I use the same text file for the wordcloud all german Umlauts are lost and all the words with umlauts have a blank instead of the umlaut.

Expected Results

Actual Results

Versions

I am using Python 3.8.10 and tested the behaviour on linux mint, MaxOS and Windows.

I guess there will be an easy explanation, but unfortunately I am totally lost.
Thank you very much for any hint into the right direction!

Marc

@xiaoyingv
Copy link

I feel that you are using a font file or you have not specified a font file, for example: a file with the extension ".ttc" or ".ttf". If you haven't specified a font file, it's possible that the default font file is being used. However, the default font file does not have the German diacritical marks you need, which causes the program to encounter unrecognized symbols and output "None". My suggestion is: you need to find a font file that has the diacritical marks you need or can display your text content correctly. This type of file is usually a ".ttc" or ".ttf" file, which you can obtain by searching on a search engine. If you are using a Windows system, you can find the font file you need in the "C:\Windows\Fonts" directory.

So, how can you use the font we specified in Python? First, you need to place the obtained font file in your project folder. When you create or use a wordcloud object, you can pass the path of the font file as an argument, for example: wordcloud.WordCloud(font_path=font_path), where font_path is the path to the font file you want to use.

That's my suggestion. If you have already solved this problem, congratulations!

xiaoyingv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants