Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataclasses library is NOT backward compatible #99401

Closed
yonikremer opened this issue Nov 11, 2022 · 8 comments
Closed

dataclasses library is NOT backward compatible #99401

yonikremer opened this issue Nov 11, 2022 · 8 comments
Labels
pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@yonikremer
Copy link

One of the changes in the dataclasses library in Python 3.11 (Python 3.10 works fine) is causing backward compatibility issues with the datasets library by hugging face.

When you import the datasets library using python 3.11 you get the following error:

Traceback (most recent call last):
  File "/kaggle/working/import_datasets.py", line 1, in <module>
    import datasets
  File "/opt/conda/envs/myenv/lib/python3.11/site-packages/datasets/__init__.py", line 45, in <module>
    from .builder import ArrowBasedBuilder, BeamBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder
  File "/opt/conda/envs/myenv/lib/python3.11/site-packages/datasets/builder.py", line 91, in <module>
    @dataclass
     ^^^^^^^^^
  File "/opt/conda/envs/myenv/lib/python3.11/dataclasses.py", line 1221, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/opt/conda/envs/myenv/lib/python3.11/dataclasses.py", line 1211, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/myenv/lib/python3.11/dataclasses.py", line 959, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/myenv/lib/python3.11/dataclasses.py", line 816, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'datasets.utils.version.Version'> for field version is not allowed: use default_factory

To reproduce the error, run my kaggle kernels notebook

@yonikremer yonikremer added the type-bug An unexpected behavior, bug, or error label Nov 11, 2022
@JelleZijlstra
Copy link
Member

This is called out in https://docs.python.org/3.11/whatsnew/3.11.html#dataclasses and it feels unlikely we'll go back on this change. cc @ericvsmith though.

@AlexWaygood AlexWaygood added the stdlib Python modules in the Lib dir label Nov 11, 2022
@yonikremer
Copy link
Author

@JelleZijlstra @ericvsmith So if I add a __hash__ method to datasets.utils.version.Version it should work?

@ericvsmith
Copy link
Member

@yonikremer : Yes, I think so (although I can't test it just now). Your other option is to use a default_factory. Sorry for the hassle.

@hauntsaninja hauntsaninja added the pending The issue will be closed if no feedback is provided label Nov 14, 2022
@visko-sc
Copy link

visko-sc commented Nov 29, 2022

Could another option (just workaround) be using @dataclass(unsafe_hash=True) ?

@kinnala
Copy link

kinnala commented Dec 15, 2022

Doesn't it affect all dataclasses that have numpy arrays as default values?

@hauntsaninja
Copy link
Contributor

As Eric mentions, the fix you probably want is to use a default_factory.

Closing as per #99401 (comment)

@ASEM000
Copy link

ASEM000 commented Feb 4, 2023

Hello,

For jax, JAX arrays are immutable but not hashable. So this goes against the proxy method used.

Moreover, JAX Arrays can not be subclassed to add a hash to count as non-mutable by dataclasses.

This change will break the JAX array annotation for dataclasses.

jax-ml/jax#14295

@jakevdp
Copy link

jakevdp commented Feb 4, 2023

I would suggest that e029c53 has a logical flaw, and should be considered a bug.

It's true that hashability implies immutability; but it's not true that immutability implies hashability. The commit in question assumes the latter, and thus the logic is flawed: there are valid, immutable values that incorrectly lead to an error because they happen to not have a hash function.

Is there any possibility that the team would consider rolling back e029c53 due to this flaw in its logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

9 participants