-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Considerations for language model inclusion in default package or download them later #298
Comments
where are these models hosted? |
@heytitle Model hosted at self-host. https://github.com/PyThaiNLP/pythainlp-corpus/blob/master/db.json |
Model files are currently hosted on either on Dropbox on GitHub. |
Today, Thai Named-Entity Recognition model and Thai Romanization model host pythainlp-corpus on GitHub. |
Hi, This would be quite important for me as I work on secure servers with no Internet access and no permission to write to arbitrary paths. I have also noted a weird behavior. Even if I do not need to download any model or corpora, the library currently requires to write an empty In the short term, could this be removed? Addition: my issue is also related to #475 Thanks, Alex |
I added customize path. You can customize path by download source code and customize path then you install package from source code. 82b8df9 |
Today, Many our models use model from huggingface hub that use same our method. I think it still be the best method to download the model. |
This issue works as a note on the size of different language models PyThaiNLP currently use.
To include or not include: Pros and cons
Use pip to download language models
Optionally, we can also consider create a new package, upload them to PyPI, and using pip to facilitate downloads.
User can do something like
pip install pythainlp-models-pos
orpip install pythainlp-models[ner]
orpip install pythainlp-models[all]
during their environment setup, and then will never have to worry about them being downloaded during runtime.This way, we can use PyPI as our data host and also benefit from any possible proxy and cache CI platforms/ISPs may have for PyPI. This can be more secure than our self-manage system as well.
PyPI standard package size limit is 60MB. But more can be requested.
Size and Hosting
(clearly, we need some standard naming convention here as well)
Training data and training scripts
See #344
Model card
Related to this, in terms of model description, see #471
Model auto-download
See discussion about
pythainlp.corpus.get_corpus_path()
at #385The text was updated successfully, but these errors were encountered: