In this project, we mainly focus on the processing of Cantonese corpus. We have explored topic modeling and sentiment analysis of Cantonese based on the real datasets in Hong Kong. We also using different methods to predict sentiments, compare and analysis their results.
In the folder '/Data', there are training and testing dataset, as well as insurance reviews.
All the code has been put in the folder '/Code', which contains Topic Modeling part and Sentiment Analysis part.
Results has been put into the folder '/Result'. It also contain the prediction on the Insurance reviews which are predicted by VADER and Textblob methods.