Skip to content

Commit

Permalink
[data] deflake test_huggingface (ray-project#42271)
Browse files Browse the repository at this point in the history
For `datasets>=2.11` package, there is a minimum requirement of `"pyarrow>=8.0.0"` as introduced here: https://github.com/huggingface/datasets/blob/main/setup.py#L116C2-L117

We skip the streaming read test for lower pyarrow versions, since the underlying `pyarrow.Table.to_reader()` method, which is used by the HF iterable dataset path, will not be available.

Signed-off-by: Scott Lee <sjl@anyscale.com>
  • Loading branch information
scottjlee authored and vickytsang committed Jan 12, 2024
1 parent 6ca559e commit 9282726
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
2 changes: 0 additions & 2 deletions ci/ray_ci/data.tests.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,2 @@
flaky_tests:
- //python/ray/data:test_streaming_integration
- //python/ray/data:test_split
- //python/ray/data:test_huggingface
7 changes: 6 additions & 1 deletion python/ray/data/tests/test_huggingface.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import datasets
import pyarrow
import pytest

import ray
from ray.tests.conftest import * # noqa


def test_huggingface(ray_start_regular_shared):
def test_from_huggingface(ray_start_regular_shared):
data = datasets.load_dataset("tweet_eval", "emotion")

# Check that DatasetDict is not directly supported.
Expand Down Expand Up @@ -45,6 +46,10 @@ def test_huggingface(ray_start_regular_shared):
datasets.Version(datasets.__version__) < datasets.Version("2.8.0"),
reason="IterableDataset.iter() added in 2.8.0",
)
@pytest.mark.skipif(
datasets.Version(pyarrow.__version__) < datasets.Version("8.0.0"),
reason="pyarrow.Table.to_reader() added in 8.0.0",
)
# Note, pandas is excluded here because IterableDatasets do not support pandas format.
@pytest.mark.parametrize(
"batch_format",
Expand Down

0 comments on commit 9282726

Please sign in to comment.