Skip to content

v1.1.0

Compare
Choose a tag to compare
@aviiciii aviiciii released this 27 Jun 09:45
· 30 commits to main since this release
a2e65d9

Release Description: Word Filtering and Frequency Processing

This release introduces a new feature that allows for efficient filtering and processing of words and their frequencies in a CSV file. The focus is on filtering non-Tamil words and removing trailing punctuation from a large dataset.

Key Features:

Word Filtering: The system filters out approximately 10,000 non-Tamil words from the input CSV file. By leveraging language-specific characteristics, the algorithm accurately identifies and removes words that do not belong to the Tamil language.

Trailing Punctuation Removal: Around 25,000 words in the dataset have trailing punctuation marks, which can impact subsequent analysis and natural language processing tasks. The system removes these trailing punctuations, ensuring cleaner and more meaningful word representations.

Output Format: The processed words and their corresponding frequencies are saved in a CSV file, enhancing compatibility and ease of use for further analysis and integration into downstream applications.

Dataset:
No changes same as https://github.com/aviiciii/tamil-word-frequency/releases/tag/v1.0.0