You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HuggingFace datasets imported by nemo.collections.llm.gpt.data in /opt/NeMo/nemo/collections/llm/gpt/data/init.py:15 from nemo.collections.llm.gpt.data.dolly import DollyDataModule
/opt/NeMo/nemo/collections/llm/gpt/data/dolly.py:20 from datasets import load_dataset
usr/local/lib/python3.10/dist-packages/datasets/init.py:17
from .arrow_dataset import Dataset
...
Typo in Python 3.10? should be threading.Condition instead of threading._Condition ? cf huggingface/datasets:5613apache-beam:24458
Typo in source code ? should be subclassing threading.Condition instead of threading._Condition ?
~/.local/lib/python3.10/site-packages/multiprocess/dummy/init.py:87
class Condition(threading._Condition):
88 # XXX
89 if sys.version_info < (3, 0):
90 notify_all = threading._Condition.notify_all.func
AttributeError: module 'threading' has no attribute '_Condition'
Steps/Code to reproduce bug
from nemo.collections import llm
AttributeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 from nemo.collections import llm
File /opt/NeMo/nemo/collections/llm/gpt/data/init.py:15
1 # Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
(...)
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
---> 15 from nemo.collections.llm.gpt.data.dolly import DollyDataModule
16 from nemo.collections.llm.gpt.data.fine_tuning import FineTuningDataModule
17 from nemo.collections.llm.gpt.data.mock import MockDataModule
File /opt/NeMo/nemo/collections/llm/gpt/data/dolly.py:20
17 from typing import TYPE_CHECKING, List, Optional
19 import numpy as np
---> 20 from datasets import load_dataset
22 from nemo.collections.llm.gpt.data.core import get_dataset_root
23 from nemo.collections.llm.gpt.data.fine_tuning import FineTuningDataModule
File /usr/local/lib/python3.10/dist-packages/datasets/init.py:17
1 # Copyright 2020 The HuggingFace Datasets Authors and the TensorFlow Datasets Authors.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
(...)
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
15 version = "3.1.0"
---> 17 from .arrow_dataset import Dataset
18 from .arrow_reader import ReadInstruction
19 from .builder import ArrowBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder
File /usr/local/lib/python3.10/dist-packages/datasets/arrow_dataset.py:77
75 from . import config
76 from .arrow_reader import ArrowReader
---> 77 from .arrow_writer import ArrowWriter, OptimizedTypedSequence
78 from .data_files import sanitize_patterns
79 from .download.streaming_download_manager import xgetsize
File /usr/local/lib/python3.10/dist-packages/datasets/arrow_writer.py:27
24 from fsspec.core import url_to_fs
26 from . import config
---> 27 from .features import Audio, Features, Image, Value, Video
28 from .features.features import (
29 FeatureType,
30 _ArrayXDExtensionType,
(...)
37 to_pyarrow_listarray,
38 )
39 from .filesystems import is_remote_filesystem
File /usr/local/lib/python3.10/dist-packages/datasets/features/init.py:17
1 all = [
2 "Audio",
3 "Array2D",
(...)
15 "Video",
16 ]
---> 17 from .audio import Audio
18 from .features import Array2D, Array3D, Array4D, Array5D, ClassLabel, Features, LargeList, Sequence, Value
19 from .image import Image
File /usr/local/lib/python3.10/dist-packages/datasets/features/audio.py:13
11 from ..table import array_cast
12 from ..utils.file_utils import xopen, xsplitext
---> 13 from ..utils.py_utils import no_op_if_value_is_null, string_to_dict
16 if TYPE_CHECKING:
17 from .features import FeatureType
File /usr/local/lib/python3.10/dist-packages/datasets/utils/py_utils.py:37
34 from urllib.parse import urlparse
36 import multiprocess
---> 37 import multiprocess.pool
38 import numpy as np
39 from tqdm.auto import tqdm
File ~/.local/lib/python3.10/site-packages/multiprocess/pool.py:609
603 self._cond.release()
605 #
606 #
607 #
--> 609 class ThreadPool(Pool):
611 from .dummy import Process
613 def init(self, processes=None, initializer=None, initargs=()):
File ~/.local/lib/python3.10/site-packages/multiprocess/pool.py:611, in ThreadPool()
609 class ThreadPool(Pool):
--> 611 from .dummy import Process
613 def init(self, processes=None, initializer=None, initargs=()):
614 Pool.init(self, processes, initializer, initargs)
Ok docker quietened following error log from pip, this is an apache-beam issue, closing issue.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 24.4.0 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 16.1.0 which is incompatible.
cugraph 24.4.0 requires dask-cuda==24.4., but you have dask-cuda 24.8.2 which is incompatible.
cugraph 24.4.0 requires rapids-dask-dependency==24.4., but you have rapids-dask-dependency 24.8.0 which is incompatible.
cugraph-service-server 24.4.0 requires dask-cuda==24.4., but you have dask-cuda 24.8.2 which is incompatible.
cugraph-service-server 24.4.0 requires rapids-dask-dependency==24.4., but you have rapids-dask-dependency 24.8.0 which is incompatible.
cuml 24.4.0 requires dask-cuda==24.4., but you have dask-cuda 24.8.2 which is incompatible.
cuml 24.4.0 requires rapids-dask-dependency==24.4., but you have rapids-dask-dependency 24.8.0 which is incompatible.
cuml 24.4.0 requires treelite==4.1.2, but you have treelite 4.3.0 which is incompatible.
dask-cudf 24.4.0 requires rapids-dask-dependency==24.4.*, but you have rapids-dask-dependency 24.8.0 which is incompatible.
multiprocess 0.70.16 requires dill>=0.3.8, but you have dill 0.3.1.1 which is incompatible.
tensorflow-metadata 1.16.1 requires protobuf<4.21,>=3.20.3; python_version < "3.11", but you have protobuf 4.25.5 which is incompatible.
tensorrt-llm 0.12.0 requires nvidia-modelopt~=0.15.0, but you have nvidia-modelopt 0.0.0 which is incompatible.
tensorrt-llm 0.12.0 requires pynvml>=11.5.0, but you have pynvml 11.4.1 which is incompatible.
tensorrt-llm 0.12.0 requires transformers<=4.42.4,>=4.38.2, but you have transformers 4.46.2 which is incompatible.
Describe the bug
HuggingFace datasets imported by nemo.collections.llm.gpt.data in /opt/NeMo/nemo/collections/llm/gpt/data/init.py:15
from nemo.collections.llm.gpt.data.dolly import DollyDataModule
/opt/NeMo/nemo/collections/llm/gpt/data/dolly.py:20
from datasets import load_dataset
usr/local/lib/python3.10/dist-packages/datasets/init.py:17
from .arrow_dataset import Dataset
...
Typo in Python 3.10? should be threading.Condition instead of threading._Condition ? cf huggingface/datasets:5613 apache-beam:24458
Typo in source code ? should be subclassing threading.Condition instead of threading._Condition ?
~/.local/lib/python3.10/site-packages/multiprocess/dummy/init.py:87
class Condition(threading._Condition):
88 # XXX
89 if sys.version_info < (3, 0):
90 notify_all = threading._Condition.notify_all.func
AttributeError: module 'threading' has no attribute '_Condition'
Steps/Code to reproduce bug
from nemo.collections import llm
Expected behavior
Import should occur without triggering an error
Environment overview (please complete the following information)
Additional context
Add any other context about the problem here.
Typo in Python 3.10? should be subclassing threading.Condition instead of threading._Condition ?
The text was updated successfully, but these errors were encountered: