v1.1.0
Release Description: Word Filtering and Frequency Processing
This release introduces a new feature that allows for efficient filtering and processing of words and their frequencies in a CSV file. The focus is on filtering non-Tamil words and removing trailing punctuation from a large dataset.
Key Features:
Word Filtering: The system filters out approximately 10,000 non-Tamil words from the input CSV file. By leveraging language-specific characteristics, the algorithm accurately identifies and removes words that do not belong to the Tamil language.
Trailing Punctuation Removal: Around 25,000 words in the dataset have trailing punctuation marks, which can impact subsequent analysis and natural language processing tasks. The system removes these trailing punctuations, ensuring cleaner and more meaningful word representations.
Output Format: The processed words and their corresponding frequencies are saved in a CSV file, enhancing compatibility and ease of use for further analysis and integration into downstream applications.
Dataset:
No changes same as https://github.com/aviiciii/tamil-word-frequency/releases/tag/v1.0.0