You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Kudos to the author for an end-to-end piepline for cleaning and filtering a large corpus. I was working with main_filtering.py and was trying to change the parameter values in parameters_filtering.py, hoping to increase/decrease the no. of documents that were being removed out. But I observe no changes.
I have english dataset so I set parameters_filtering_en, and I have experimented with the given values and some modifications in 1/more conditions and cutoffs.
I have also tried out parameters_filtering_default where I do observe changes in documents being filtered out. The no. was different from those in parameters_filtering_en.
The parameters_filtering_default has some error. I modified languages_id.py to account for "defualt" as langauge but used flagged_/stop_words of english language.
Within parameters_filtering_default or parameters_filtering_en, when parameter values are changed no changes are observed in no. of documents or documents which are getting removed.
Kindly review the code and let me know the solutions. Also let me know if I'm missing something.
Thank You!
The text was updated successfully, but these errors were encountered:
Hi! Kudos to the author for an end-to-end piepline for cleaning and filtering a large corpus. I was working with main_filtering.py and was trying to change the parameter values in parameters_filtering.py, hoping to increase/decrease the no. of documents that were being removed out. But I observe no changes.
Kindly review the code and let me know the solutions. Also let me know if I'm missing something.
Thank You!
The text was updated successfully, but these errors were encountered: