Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update versioneer from 0.21 to 0.23 #114

Merged
merged 2 commits into from
Aug 15, 2022

Conversation

oliverholworthy
Copy link
Member

Goals ⚽

Restore editable install support with latest version of setuptools (without requiring setting any environment variables to activate legacy support).

Implementation Details 🚧

Ran the following

pip install --upgrade versioneer
versioneer install

There are a few changes since we last configured versioneer (0.21).

The relevant change we're after to fix the editable install is in release 0.23 which added a patch for compatibility with the new setuptools release.

Testing Details 🔍

Checked that the following works locally:

pip install -e .

@oliverholworthy oliverholworthy added the chore Maintenance for the repository label Aug 12, 2022
@oliverholworthy oliverholworthy self-assigned this Aug 12, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #114 of commit d000560b0578ef8dcdcc1dc9c6463d5a91164d0d, no merge conflicts.
Running as SYSTEM
Setting status of d000560b0578ef8dcdcc1dc9c6463d5a91164d0d to PENDING with url https://10.20.13.93:8080/job/merlin_core/95/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/114/*:refs/remotes/origin/pr/114/* # timeout=10
 > git rev-parse d000560b0578ef8dcdcc1dc9c6463d5a91164d0d^{commit} # timeout=10
Checking out Revision d000560b0578ef8dcdcc1dc9c6463d5a91164d0d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d000560b0578ef8dcdcc1dc9c6463d5a91164d0d # timeout=10
Commit message: "Update `versioneer` from 0.21 to 0.23"
 > git rev-list --no-walk 9408224520d731c51b7952a43def675b76e81756 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins17413768858752881116.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 1)) (2022.1.1)
Requirement already satisfied: distributed>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 2)) (2022.3.0)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 3)) (1.3.5)
Requirement already satisfied: numba>=0.54 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 4)) (0.55.2)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 5)) (6.0.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 6)) (3.19.4)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 7)) (4.64.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 8)) (1.9.0)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 9)) (1.2.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 10)) (21.3)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (2.1.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (2022.7.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (1.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (0.11.2)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (8.0.4)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (3.0.3)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (1.0.4)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (5.9.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (1.7.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (6.1)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (2.2.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (2022.1)
Requirement already satisfied: numpy>=1.17.3; platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (1.21.5)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->-r requirements.txt (line 4)) (0.38.1)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->-r requirements.txt (line 4)) (62.4.0)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->-r requirements.txt (line 8)) (1.1.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->-r requirements.txt (line 8)) (1.52.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->-r requirements.txt (line 9)) (0.4.2)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->-r requirements.txt (line 9)) (1.2.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->-r requirements.txt (line 10)) (3.0.9)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask>=2021.11.2->-r requirements.txt (line 1)) (1.0.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->-r requirements.txt (line 2)) (2.0.1)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed>=2021.11.2->-r requirements.txt (line 2)) (1.0.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (1.15.0)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (6.0.2)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (6.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (4.0.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ..................................FFFF......... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
_________________ test_dask_dataset_from_dataframe[True-cudf] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra4')
origin = 'cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra5')
origin = 'dask_cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-pd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra6')
origin = 'pd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-dd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra7')
origin = 'dd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_serial_context[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44201 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40821 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36525 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39131 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42153 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43263 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36319 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...
============ 4 failed, 339 passed, 1 skipped, 83 warnings in 52.66s ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins7569333044506419160.sh

@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-114

@viswa-nvidia viswa-nvidia added this to the Merlin 22.08 milestone Aug 12, 2022
@oliverholworthy
Copy link
Member Author

rerun tests

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #114 of commit d000560b0578ef8dcdcc1dc9c6463d5a91164d0d, no merge conflicts.
Running as SYSTEM
Setting status of d000560b0578ef8dcdcc1dc9c6463d5a91164d0d to PENDING with url https://10.20.13.93:8080/job/merlin_core/99/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/114/*:refs/remotes/origin/pr/114/* # timeout=10
 > git rev-parse d000560b0578ef8dcdcc1dc9c6463d5a91164d0d^{commit} # timeout=10
Checking out Revision d000560b0578ef8dcdcc1dc9c6463d5a91164d0d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d000560b0578ef8dcdcc1dc9c6463d5a91164d0d # timeout=10
Commit message: "Update `versioneer` from 0.21 to 0.23"
 > git rev-list --no-walk 9408224520d731c51b7952a43def675b76e81756 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins8069296641562108724.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ............................................... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_serial_context[True]
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
self.make_current()

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40689 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39843 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39223 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35425 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33133 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46403 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 343 passed, 1 skipped, 83 warnings in 53.54s =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins17882304365434260564.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #114 of commit 5c431330c7a782eae5e4bcef3a628ce76281825c, no merge conflicts.
Running as SYSTEM
Setting status of 5c431330c7a782eae5e4bcef3a628ce76281825c to PENDING with url https://10.20.13.93:8080/job/merlin_core/101/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/114/*:refs/remotes/origin/pr/114/* # timeout=10
 > git rev-parse 5c431330c7a782eae5e4bcef3a628ce76281825c^{commit} # timeout=10
Checking out Revision 5c431330c7a782eae5e4bcef3a628ce76281825c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5c431330c7a782eae5e4bcef3a628ce76281825c # timeout=10
Commit message: "Merge branch 'main' into versioneer-update-0.23"
 > git rev-list --no-walk 9408224520d731c51b7952a43def675b76e81756 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins17532358910714575438.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ............................................... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_serial_context[True]
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
self.make_current()

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38351 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39151 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46211 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46179 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41203 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37073 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 343 passed, 1 skipped, 83 warnings in 50.65s =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins6852723447775174483.sh

@karlhigley karlhigley merged commit 919c312 into NVIDIA-Merlin:main Aug 15, 2022
benfred added a commit that referenced this pull request Sep 6, 2022
The update to versioneer in #114 resulted in us not getting versions from git.
This is because we weren't specifying the tag_prefix appropriately, and
this broke newer versions of versioneer.

Fix and add a basic unittest that would catch issues like this in the future
karlhigley added a commit that referenced this pull request Sep 7, 2022
* Fix versioneer to get accurate version numbers

The update to versioneer in #114 resulted in us not getting versions from git.
This is because we weren't specifying the tag_prefix appropriately, and
this broke newer versions of versioneer.

Fix and add a basic unittest that would catch issues like this in the future

* flake8

Co-authored-by: Karl Higley <kmhigley@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Maintenance for the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants