-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spaCy-models: Please Consider Distributing via PyPi #5967
Comments
Unfortunately we've discussed this at length and there's just no way to make it happen. There are some packages that are quite large, up to 1gb of data. These packages cannot be served over PyPi. We don't want to have two mechanisms for distributing the models. We thought about making our own PyPi server, but this didn't work either, because it introduces security problems if someone nips in and registers the package name on the main PyPi index. If we instead preregistered those packages, users would get confusing errors if they forget our index server. So we think the current solution is the best we can do. The models actually are pip packages --- they're just served from Github release assets. So you can totally add them to your internal PyPi server and use pip from there. |
Hi @honnibal , and what if you would only have wrappers on PyPI, that will download the models from github on setup? |
Having the packages download the models from github wouldn't help with the security restrictions mentioned above. The model packages are standard pip packages with longer names like In contrast, We've realized that the symlinks cause a number of headaches, so we don't recommend them anymore and are planning to remove them in spacy v3. Then you will only be able to use the full package names like |
Understood--thanks for the explanation of some of the alternatives considered.
This is likely what we'll investigate give PyPi isn't an option. I may have missed this in the docs, but is there a way to programmatically generate the sdists bundled in the Github releases such that we could dynamically build and upload the package to our internal PyPi server when a new tag is published? Or would we need to download the release & upload the artifact directly? |
@jamescurtin I'm not sure I understand the question well. I mean, you'll always be able to automate these things, regardless of which convenience scripts spaCy provides? And I'm sure you know that, so I must be missing what you're actually asking. Maybe the only thing you might not be aware of is the |
This issue has been automatically closed because it was answered and there was no follow-up discussion. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Feature Summary
Release
spaCy
models via PyPiFeature Description
We use
spaCy
in an enterprise setting. For security, the hosts that build production docker images cannot connect to the external internet. This introduces complexity when trying to install packages likespacy-models
, where the recommended installation method is to either install from a Github release (requiring a connection to github.com) or to vendor the package (avoids networking issues, but bloats individual repos).Publishing the models through PyPi would be beneficial in that
spacy-models
would no longer be installed differently than other packages & would also allow us to benefit from the security that PyPi provides (e.g. ability to mirror the package index on our internal network, assurance that package versions are immutable, etc.).Perhaps you could start with adding the small models to PyPi, as they would not run into default package size restrictions. PyPi allows package authors to file a request increasing the maximum allowable size of the package: the increased limits would easily support the medium models. There is also precedent for setting size limits that would allow for distributing the large models as well.
The text was updated successfully, but these errors were encountered: