You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it would be sensible to identify different languages throughout the package using ISO two-letter codes (e.g. en, fr, de ...).
In particular, we should implement this for the Snowball stemmer in python which currently uses the full language names.
I am also wondering if in Rust, we should use String for the language parameter or define an Enum e.g.
use vtext::lang
let stemmer = SnowballStemmerParams::default().lang(lang::en).build()
The latter is probably simpler, but it makes it a bit harder to extend e.g. if someone designs an custom estimator for a language not in the list (e.g. some ancient infrequently used language), they would have to create a new enum.
Also just to be consistent the parameter name would be "lang" not "language", right?
The text was updated successfully, but these errors were encountered:
From #78 (comment) by @joshlk
In particular, we should implement this for the Snowball stemmer in python which currently uses the full language names.
I am also wondering if in Rust, we should use
String
for the language parameter or define anEnum
e.g.The latter is probably simpler, but it makes it a bit harder to extend e.g. if someone designs an custom estimator for a language not in the list (e.g. some ancient infrequently used language), they would have to create a new enum.
Also just to be consistent the parameter name would be
"lang"
not"language"
, right?The text was updated successfully, but these errors were encountered: