twitter30k_cleaned.csv is cleaned version of twitter data. Preprocessing is already applied to this data.
Preprocessing applied to this data
- lower case
- contraction to expansion
- remove emails
- remove urls
- remove special chars
- remove @mentions and #hashtag
- remove html tags
- remove more than 2 repeated chars. youuuuu-> you
If you want to clean it further, you can add spelling correction.