Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency version check fails for tokenizers #11107

Closed
1 of 2 tasks
guyrosin opened this issue Apr 7, 2021 · 6 comments
Closed
1 of 2 tasks

Dependency version check fails for tokenizers #11107

guyrosin opened this issue Apr 7, 2021 · 6 comments
Assignees

Comments

@guyrosin
Copy link
Contributor

guyrosin commented Apr 7, 2021

Environment info

  • transformers version: 4.5.0
  • Platform: Linux-4.15.0-134-generic-x86_64-with-glibc2.10
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.8.1 (False)
  • Tensorflow version (GPU?): 2.4.0 (False)
  • Using GPU in script?: N/A
  • Using distributed or parallel set-up in script?: N/A
  • tokenizers version: 0.10.2 (checked also 0.10.1)

Who can help

@stas00, @sgugger

Information

When importing transformers, the new dependency version check code (#11061) seems to fail for the tokenizers library:
importlib.metadata.version('tokenizers') returns None instead of the version string.

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. import transformers
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/guyrosin/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/__init__.py", line 43, in <module>
    from . import dependency_versions_check
  File "/home/guyrosin/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/dependency_versions_check.py", line 41, in <module>
    require_version_core(deps[pkg])
  File "/home/guyrosin/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/utils/versions.py", line 101, in require_version_core
    return require_version(requirement, hint)
  File "/home/guyrosin/miniconda3/envs/pt/lib/python3.8/site-packages/transformers/utils/versions.py", line 92, in require_version
    if want_ver is not None and not ops[op](version.parse(got_ver), version.parse(want_ver)):
  File "/home/guyrosin/miniconda3/envs/pt/lib/python3.8/site-packages/packaging/version.py", line 57, in parse
    return Version(version)
  File "/home/guyrosin/miniconda3/envs/pt/lib/python3.8/site-packages/packaging/version.py", line 296, in __init__
    match = self._regex.search(version)
TypeError: expected string or bytes-like object

The root problem is this:

from importlib.metadata import version
version('tokenizers') # returns None

Expected behavior

importlib.metadata.version('tokenizers') should return its version string.

@stas00
Copy link
Contributor

stas00 commented Apr 7, 2021

Thank you for this report, @guyrosin

Any idea how we could reproduce this problem? This works for me:

$ python -c "from importlib.metadata import version; print(version('tokenizers'))"
0.10.1

I do see a different problem though. I see we now have: "tokenizers>=0.10.1,<0.11"

I didn't expect a range definition, when I wrote this code, so currently it tries to do:

version.parse('0.10.0') > version.parse('0.10.1,<0.11')

which is wrong - I'm surprised version.parse doesn't assert, it just quietly returns the same unparsed string 0.10.1,<0.11
so this definitely needs to be fixed to split by ',' and test each condition separately.

Actually does the problem go away if you edit transformers/dependency_versions_table.py to use just "tokenizers>=0.10.1"?

Also if you could add a debug print and see what's in got_ver and want_ver just before it fails to version.parse. I think this is the real culprit according to the trace. i.e. it's not that it can't find tokenizers - but it fails to parse one of the 2 version inputs.

Thanks.

@stas00
Copy link
Contributor

stas00 commented Apr 7, 2021

Could you please give a try to this PR: #11110 and let me know if the problem goes away? Thank you.

@guyrosin
Copy link
Contributor Author

guyrosin commented Apr 7, 2021

Thanks for the fast response @stas00! I'm glad this helped you find another bug :)

I guess the problem in my case is with the tokenizers distribution. I'm getting:

$ python -c "from importlib.metadata import version; print(version('tokenizers'))"
None

Even after reinstalling tokenizers.
So trying your PR results in "got_ver is None" (want_ver is 0.10.1)
No idea how to reproduce it though :\

Edit: to make it clear: using pkg_resources instead of importlib works:

$ python -c "import pkg_resources; print(pkg_resources.get_distribution('tokenizers').version)"
0.10.1

@stas00
Copy link
Contributor

stas00 commented Apr 7, 2021

Does it help if you update/explicitly install this library:

pip install importlib_metadata -U

and then retry?

It's good to know that pkg_resources does report the right thing. But it also has a cache which apparently reports the past state and not the current, which is the main reason it was replaced.

But sometimes the site-packages folder gets messed up.

Does it make any difference if you first uninstall tokenizers twice in a row and then install it (I know you said you reinstalled it, but this is slightly different)

What if you create a new environment and try there?

@guyrosin
Copy link
Contributor Author

guyrosin commented Apr 7, 2021

OK, it seems like my environment was corrupted indeed - there was a tokenizers-0.9.4.dist-info folder inside my env's site-packages folder... After deleting it (manually) and reinstalling tokenizers, everything works!
Thanks a lot for your help @stas00!

@guyrosin guyrosin closed this as completed Apr 7, 2021
@stas00
Copy link
Contributor

stas00 commented Apr 7, 2021

Yay! Glad it worked, @guyrosin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants