Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy v2.0.0 breaks the ability to download models using spaCy #13528

Open
afogel opened this issue Jun 16, 2024 · 16 comments
Open

Numpy v2.0.0 breaks the ability to download models using spaCy #13528

afogel opened this issue Jun 16, 2024 · 16 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@afogel
Copy link

afogel commented Jun 16, 2024

How to reproduce the behaviour

In my dockerfile, I run these commands:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install spacy

RUN python -m spacy download en_core_web_lg

It returns the following error (and stacktrace):

2.519 Traceback (most recent call last):
2.519   File "<frozen runpy>", line 189, in _run_module_as_main
2.519   File "<frozen runpy>", line 148, in _get_module_details
2.519   File "<frozen runpy>", line 112, in _get_module_details
2.519   File "/usr/local/lib/python3.12/site-packages/spacy/__init__.py", line 6, in <module>
2.521     from .errors import setup_default_warnings
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/errors.py", line 3, in <module>
2.522     from .compat import Literal
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/compat.py", line 39, in <module>
2.522     from thinc.api import Optimizer  # noqa: F401
2.522     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/api.py", line 1, in <module>
2.522     from .backends import (
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/__init__.py", line 17, in <module>
2.522     from .cupy_ops import CupyOps
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
2.522     from .numpy_ops import NumpyOps
2.522   File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
2.524 ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Locking to the previous version of numpy will resolve this issue:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install numpy==1.26.4 spacy

RUN python -m spacy download en_core_web_lg
@gborodin
Copy link

+1

@svlandeg svlandeg added the bug Bugs and behaviour differing from documentation label Jun 17, 2024
@rustammdev
Copy link

How to reproduce the behaviour

In my dockerfile, I run these commands:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install spacy

RUN python -m spacy download en_core_web_lg

It returns the following error (and stacktrace):

2.519 Traceback (most recent call last):
2.519   File "<frozen runpy>", line 189, in _run_module_as_main
2.519   File "<frozen runpy>", line 148, in _get_module_details
2.519   File "<frozen runpy>", line 112, in _get_module_details
2.519   File "/usr/local/lib/python3.12/site-packages/spacy/__init__.py", line 6, in <module>
2.521     from .errors import setup_default_warnings
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/errors.py", line 3, in <module>
2.522     from .compat import Literal
2.522   File "/usr/local/lib/python3.12/site-packages/spacy/compat.py", line 39, in <module>
2.522     from thinc.api import Optimizer  # noqa: F401
2.522     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/api.py", line 1, in <module>
2.522     from .backends import (
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/__init__.py", line 17, in <module>
2.522     from .cupy_ops import CupyOps
2.522   File "/usr/local/lib/python3.12/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
2.522     from .numpy_ops import NumpyOps
2.522   File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
2.524 ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Locking to the previous version of numpy will resolve this issue:

FROM --platform=linux/amd64 python:3.12.4

RUN pip install --upgrade pip

RUN pip install torch --index-url https://download.pytorch.org/whl/cpu
RUN pip install numpy==1.26.4 spacy

RUN python -m spacy download en_core_web_lg

this solution helped, thank you

@supert56
Copy link

+1 I also had this problem. Thanks for posting the solution 👍

@nachthammer
Copy link

nachthammer commented Jun 18, 2024

Those solutions indeed works, but I would still like to see a fix in the codebase itself. This issue is that inside the requirements.txt of the project (just an assumption after a short look at the codebase), the version is specified as such:

numpy>=1.15.0; python_version < "3.9"
numpy>=1.19.0; python_version >= "3.9"

I am a huge fan, in all of my projects, of always pinning dependencies even up to the patch version.

I would suggest a PR that looks like this:

numpy>=1.15.0,<2.0.0; python_version < "3.9"
numpy>=1.19.0,<2.0.0; python_version >= "3.9"

This at least pins the version down to major releases, which should anyway always be the case, as major version can (and most likely will always) contain breaking changes.

@afogel
Copy link
Author

afogel commented Jun 18, 2024

@DoctorManhattan123 To clarify, the solution I posted is only meant to be a stopgap.

Ideally, all downstream consumers of numpy (including library maintainers) should complete the migration to leverage numpy 2.0.0. I imagine, given the size of the release, that this will take time.

The pinned version is to tide over people seeking to quickly fix their CI/CD or whatever impacted process is broken until a more robust solution is implemented in the affected codebases.

@bendennescma
Copy link

This issue with thinc has been noted explosion/thinc#939

mortii added a commit to mortii/anki-morphs that referenced this issue Jun 19, 2024
there is a spaCy bug that hopefully will be fixed soon: explosion/spaCy#13528
SoulHarsh007 added a commit to SoulHarsh007/gitAPy that referenced this issue Jun 19, 2024
spacy is not compatible with numpy 2.x, see:
explosion/spaCy#13528 and thus the CI fails
locking numpy to latest 1.x release fixes this problem

Signed-off-by: SoulHarsh007 <harsh.peshwani@outlook.com>
@lucas-mdsena
Copy link

It helped. Thanks!

@cyriaka90
Copy link

The new release 3.7.6 should resolve this :)

@ddayan
Copy link

ddayan commented Sep 1, 2024

I'm still experiencing the same error on 3.7.6 and numpy 2.1 && 2.0.0. As a sanity check it works after downgrading to 1.26.4

@CptCaptain
Copy link

The issue still persists with the 3.7.6-release as it still depends on thinc<8.3, which is incompatible with numpy>=2.0

@bendennescma
Copy link

bendennescma commented Sep 4, 2024

The issue still persists with the 3.7.6-release as it still depends on thinc<8.3, which is incompatible with numpy>=2.0

Yes it appears thinc v8.3.0 itself is the first release that is compatible with numpy>=2.0

The latest release before that (v8.2.5) specifically restricts numpy pin to <2.0.0

@filbranden
Copy link

See also #13607

@honnibal
Copy link
Member

Sorry for the delay on this.

I want to release the upgraded numpy pin as version 3.8, because I don't want to drop support for Python 3.8 in a patch release. Upgrading to numpy v2 in a patch release is also questionable.

However, the model artifacts bake in the version of spaCy into the package. This means I need to retrain the models to do the v3.8 release, and the retraining is taking some time.

@afogel
Copy link
Author

afogel commented Oct 29, 2024

@honnibal I think this was resolved by release 3.8.2, right? If so, can we close?

@yovelcohen
Copy link

@afogel still happens to me on 3.8.2

@afogel
Copy link
Author

afogel commented Dec 29, 2024

@yovelcohen so it looks like you need to explicitly lock to the latest thinc version in order to resolve the dependencies using poetry lock.

right now, my pyproject.toml looks like this:

[tool.poetry.dependencies]
python = "3.12.5"
...
spacy = "3.8.2"
thinc = "8.3.3"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests