Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word cloud for tamil and english language #688

Open
HitheshSankararaman opened this issue Sep 13, 2022 · 5 comments
Open

Word cloud for tamil and english language #688

HitheshSankararaman opened this issue Sep 13, 2022 · 5 comments

Comments

@HitheshSankararaman
Copy link

HitheshSankararaman commented Sep 13, 2022

image

mostcommon = FreqDist(allwords).most_common(100)
wordcloud = WordCloud(width=1600, height=800, background_color='white',font_path='Nirmala.ttf',).generate(str(mostcommon))
fig = plt.figure(figsize=(30,10), facecolor='white')
plt.imshow(wordcloud) #, interpolation="bilinear")
plt.axis('off')
plt.title('Top 100 Most Common Words', fontsize=100)
#plt.tight_layout(pad=0)
plt.show()

The above code generates the word cloud.

English words are printed properly but not in correct size (high frequency words must have a bigger font size)
Eg:- "sir" must be have the biggest font size since it's been repeated more number of times (see output.csv)
Tamil words are not printed, only random letters are printed.
How to solve this ?
The words and its frequencies are present in output.csv.
output.csv

@amueller
Copy link
Owner

Does the if you want to give it a frequency distribution instead of the raw text, you need to use the generate_from_frequencies method.

@HitheshSankararaman
Copy link
Author

Does the if you want to give it a frequency distribution instead of the raw text, you need to use the generate_from_frequencies method.

I used "generate_from_frequencies" and got the following,

Screenshot from 2022-09-19 14-28-12

If you see in the image, some tamil words are not properly printed. Eg:- சொல்லுங்க
How to rectify that ?

@amueller
Copy link
Owner

I'm sorry, I can't read tamil, so I don't know what's wrong with the processing. The most likely cause is the font is not supporting some characters.

@HitheshSankararaman
Copy link
Author

HitheshSankararaman commented Sep 20, 2022

I'm sorry, I can't read tamil, so I don't know what's wrong with the processing. The most likely cause is the font is not supporting some characters.

I am using Nirmala.ttf font . While plotting in matplotlib using Nirmala.ttf ,all the words are printed correctly with the help of mplcairo , cairo , raqm.

Only while using word cloud I am getting this error .

@amueller
Copy link
Owner

matplotlib uses pil/pillow under the hood. Can you try reproducing with pil?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants