Update `versioneer` from 0.21 to 0.23 #114

oliverholworthy · 2022-08-12T12:17:50Z

Goals ⚽

Restore editable install support with latest version of setuptools (without requiring setting any environment variables to activate legacy support).

Implementation Details 🚧

Ran the following

pip install --upgrade versioneer
versioneer install

There are a few changes since we last configured versioneer (0.21).

The relevant change we're after to fix the editable install is in release 0.23 which added a patch for compatibility with the new setuptools release.

Testing Details 🔍

Checked that the following works locally:

pip install -e .

nvidia-merlin-bot · 2022-08-12T12:21:04Z

Click to view CI Results

GitHub pull request #114 of commit d000560b0578ef8dcdcc1dc9c6463d5a91164d0d, no merge conflicts.
Running as SYSTEM
Setting status of d000560b0578ef8dcdcc1dc9c6463d5a91164d0d to PENDING with url https://10.20.13.93:8080/job/merlin_core/95/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/114/*:refs/remotes/origin/pr/114/* # timeout=10
 > git rev-parse d000560b0578ef8dcdcc1dc9c6463d5a91164d0d^{commit} # timeout=10
Checking out Revision d000560b0578ef8dcdcc1dc9c6463d5a91164d0d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d000560b0578ef8dcdcc1dc9c6463d5a91164d0d # timeout=10
Commit message: "Update `versioneer` from 0.21 to 0.23"
 > git rev-list --no-walk 9408224520d731c51b7952a43def675b76e81756 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins17413768858752881116.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 1)) (2022.1.1)
Requirement already satisfied: distributed>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 2)) (2022.3.0)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 3)) (1.3.5)
Requirement already satisfied: numba>=0.54 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 4)) (0.55.2)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 5)) (6.0.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 6)) (3.19.4)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 7)) (4.64.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 8)) (1.9.0)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 9)) (1.2.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 10)) (21.3)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (2.1.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (2022.7.1)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (1.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->-r requirements.txt (line 1)) (0.11.2)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (8.0.4)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (3.0.3)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (1.0.4)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (5.9.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (1.7.0)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (6.1)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->-r requirements.txt (line 2)) (2.2.0)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (2022.1)
Requirement already satisfied: numpy>=1.17.3; platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (1.21.5)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->-r requirements.txt (line 4)) (0.38.1)
Requirement already satisfied: setuptools in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->-r requirements.txt (line 4)) (62.4.0)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->-r requirements.txt (line 8)) (1.1.0)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->-r requirements.txt (line 8)) (1.52.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->-r requirements.txt (line 9)) (0.4.2)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->-r requirements.txt (line 9)) (1.2.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->-r requirements.txt (line 10)) (3.0.9)
Requirement already satisfied: locket in /usr/local/lib/python3.8/dist-packages (from partd>=0.3.10->dask>=2021.11.2->-r requirements.txt (line 1)) (1.0.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->-r requirements.txt (line 2)) (2.0.1)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed>=2021.11.2->-r requirements.txt (line 2)) (1.0.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->-r requirements.txt (line 3)) (1.15.0)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (6.0.2)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (6.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->-r requirements.txt (line 9)) (4.0.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_graph.py .                                           [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 12%]

tests/unit/io/test_io.py ..................................FFFF......... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 74%]

.......................................................................  [ 95%]

tests/unit/schema/test_schema.py ......                                  [ 97%]

tests/unit/schema/test_schema_io.py ..                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

_________________ test_dask_dataset_from_dataframe[True-cudf] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra4')

origin = 'cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra5')

origin = 'dask_cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-pd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra6')

origin = 'pd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-dd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra7')

origin = 'dd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-8/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 71 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_serial_context[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44201 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40821 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36525 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39131 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42153 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43263 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36319 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...

============ 4 failed, 339 passed, 1 skipped, 83 warnings in 52.66s ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins7569333044506419160.sh

github-actions · 2022-08-12T12:31:22Z

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-114

oliverholworthy · 2022-08-15T13:15:13Z

rerun tests

nvidia-merlin-bot · 2022-08-15T13:24:30Z

Click to view CI Results

GitHub pull request #114 of commit d000560b0578ef8dcdcc1dc9c6463d5a91164d0d, no merge conflicts.
Running as SYSTEM
Setting status of d000560b0578ef8dcdcc1dc9c6463d5a91164d0d to PENDING with url https://10.20.13.93:8080/job/merlin_core/99/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/114/*:refs/remotes/origin/pr/114/* # timeout=10
 > git rev-parse d000560b0578ef8dcdcc1dc9c6463d5a91164d0d^{commit} # timeout=10
Checking out Revision d000560b0578ef8dcdcc1dc9c6463d5a91164d0d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d000560b0578ef8dcdcc1dc9c6463d5a91164d0d # timeout=10
Commit message: "Update `versioneer` from 0.21 to 0.23"
 > git rev-list --no-walk 9408224520d731c51b7952a43def675b76e81756 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins8069296641562108724.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_graph.py .                                           [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 12%]

tests/unit/io/test_io.py ............................................... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 74%]

.......................................................................  [ 95%]

tests/unit/schema/test_schema.py ......                                  [ 97%]

tests/unit/schema/test_schema_io.py ..                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 71 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_serial_context[True]

/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first

self.make_current()
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40689 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39843 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39223 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35425 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33133 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46403 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

================= 343 passed, 1 skipped, 83 warnings in 53.54s =================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins17882304365434260564.sh

nvidia-merlin-bot · 2022-08-15T13:39:48Z

Click to view CI Results

GitHub pull request #114 of commit 5c431330c7a782eae5e4bcef3a628ce76281825c, no merge conflicts.
Running as SYSTEM
Setting status of 5c431330c7a782eae5e4bcef3a628ce76281825c to PENDING with url https://10.20.13.93:8080/job/merlin_core/101/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/114/*:refs/remotes/origin/pr/114/* # timeout=10
 > git rev-parse 5c431330c7a782eae5e4bcef3a628ce76281825c^{commit} # timeout=10
Checking out Revision 5c431330c7a782eae5e4bcef3a628ce76281825c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5c431330c7a782eae5e4bcef3a628ce76281825c # timeout=10
Commit message: "Merge branch 'main' into versioneer-update-0.23"
 > git rev-list --no-walk 9408224520d731c51b7952a43def675b76e81756 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins17532358910714575438.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_graph.py .                                           [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 12%]

tests/unit/io/test_io.py ............................................... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 74%]

.......................................................................  [ 95%]

tests/unit/schema/test_schema.py ......                                  [ 97%]

tests/unit/schema/test_schema_io.py ..                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 71 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_serial_context[True]

/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first

self.make_current()
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38351 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39151 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46211 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46179 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41203 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37073 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

================= 343 passed, 1 skipped, 83 warnings in 50.65s =================

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins6852723447775174483.sh

The update to versioneer in #114 resulted in us not getting versions from git. This is because we weren't specifying the tag_prefix appropriately, and this broke newer versions of versioneer. Fix and add a basic unittest that would catch issues like this in the future

* Fix versioneer to get accurate version numbers The update to versioneer in #114 resulted in us not getting versions from git. This is because we weren't specifying the tag_prefix appropriately, and this broke newer versions of versioneer. Fix and add a basic unittest that would catch issues like this in the future * flake8 Co-authored-by: Karl Higley <kmhigley@gmail.com>

Update versioneer from 0.21 to 0.23

d000560

oliverholworthy added the chore Maintenance for the repository label Aug 12, 2022

oliverholworthy self-assigned this Aug 12, 2022

karlhigley approved these changes Aug 12, 2022

View reviewed changes

viswa-nvidia added this to the Merlin 22.08 milestone Aug 12, 2022

Merge branch 'main' into versioneer-update-0.23

5c43133

karlhigley merged commit 919c312 into NVIDIA-Merlin:main Aug 15, 2022

benfred mentioned this pull request Sep 6, 2022

Fix versioneer to get accurate version numbers #132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `versioneer` from 0.21 to 0.23 #114

Update `versioneer` from 0.21 to 0.23 #114

oliverholworthy commented Aug 12, 2022

nvidia-merlin-bot commented Aug 12, 2022

github-actions bot commented Aug 12, 2022

oliverholworthy commented Aug 15, 2022

nvidia-merlin-bot commented Aug 15, 2022

nvidia-merlin-bot commented Aug 15, 2022

Update versioneer from 0.21 to 0.23 #114

Update versioneer from 0.21 to 0.23 #114

Conversation

oliverholworthy commented Aug 12, 2022

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

nvidia-merlin-bot commented Aug 12, 2022

github-actions bot commented Aug 12, 2022

Documentation preview

oliverholworthy commented Aug 15, 2022

nvidia-merlin-bot commented Aug 15, 2022

nvidia-merlin-bot commented Aug 15, 2022

Update `versioneer` from 0.21 to 0.23 #114

Update `versioneer` from 0.21 to 0.23 #114