Release v1.1.0 · aviiciii/tamil-word-frequency

Release Description: Word Filtering and Frequency Processing

This release introduces a new feature that allows for efficient filtering and processing of words and their frequencies in a CSV file. The focus is on filtering non-Tamil words and removing trailing punctuation from a large dataset.

Key Features:

Word Filtering: The system filters out approximately 10,000 non-Tamil words from the input CSV file. By leveraging language-specific characteristics, the algorithm accurately identifies and removes words that do not belong to the Tamil language.

Trailing Punctuation Removal: Around 25,000 words in the dataset have trailing punctuation marks, which can impact subsequent analysis and natural language processing tasks. The system removes these trailing punctuations, ensuring cleaner and more meaningful word representations.

Output Format: The processed words and their corresponding frequencies are saved in a CSV file, enhancing compatibility and ease of use for further analysis and integration into downstream applications.

Dataset:
No changes same as https://github.com/aviiciii/tamil-word-frequency/releases/tag/v1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0