Update to changed `rmm::device_scalar` API #1637

harrism · 2021-06-01T01:08:52Z

After rapidsai/rmm/#789 is a breaking API change for rmm::device_scalar. This PR fixes a couple of uses of rmm::device_scalar to fix the build of cuGraph, and should be merged immediately after rapidsai/rmm/#789.

Also fixes an unrelated narrowing conversion warning.

BradReesWork · 2021-06-01T16:49:02Z

rerun tests

harrism · 2021-06-02T01:18:48Z

CI won't pass until rapidsai/rmm/#789 is merged.

BradReesWork · 2021-06-07T16:55:23Z

rerun tests

This PR refactors `device_scalar` to use a single-element `device_uvector` for its storage. This simplifies the implementation of device_scalar. Also changes the API of `device_scalar` so that its asynchronous / stream-ordered methods use the same API style (with explicit stream parameter) as `device_uvector` and `device_buffer`. Closes #570 This is a breaking change. When it is merged, PRs are likely to need to be merged immediately in other libraries to account for the API changes. - [x] cuDF: rapidsai/cudf#8411 - [x] cuGraph: rapidsai/cugraph#1637 - [x] RAFT: rapidsai/raft#259 - [x] ~cuML~ (unused) - [x] ~cuSpatial~ (unused) Authors: - Mark Harris (https://github.com/harrism) Approvers: - Rong Ou (https://github.com/rongou) - Jake Hemstad (https://github.com/jrhemstad) URL: #789

harrism · 2021-06-08T03:23:09Z

rerun tests

harrism · 2021-06-08T04:54:07Z

rerun tests

harrism · 2021-06-08T08:02:45Z

This PR depends on rapidsai/raft#259 also. Once that is merged, hopefully this one will pass CI.

ChuckHastings · 2021-06-08T18:01:53Z

rerun tests

ChuckHastings · 2021-06-08T20:08:01Z

rerun tests

harrism · 2021-06-08T22:45:28Z

I suggest issueing the @gpucibot merge command now so it auto-merges once CI passes...

ajschmidt8 · 2021-06-08T22:55:32Z

@gpucibot merge

codecov-commenter · 2021-06-08T23:57:23Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@6aab21f). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 155c27f differs from pull request most recent head fa9b367. Consider uploading reports for the commit fa9b367 to get more accurate results

@@               Coverage Diff               @@
##             branch-21.08    #1637   +/-   ##
===============================================
  Coverage                ?   59.96%           
===============================================
  Files                   ?       80           
  Lines                   ?     3542           
  Branches                ?        0           
===============================================
  Hits                    ?     2124           
  Misses                  ?     1418           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6aab21f...fa9b367. Read the comment docs.

harrism · 2021-06-09T00:03:45Z

I spoke too soon. This doesn't look related to this PR though:

09:54:54     from dask.dataframe.core import (
09:54:54 E   ImportError: cannot import name 'make_meta' from 'dask.dataframe.core' (/opt/conda/envs/rapids/lib/python3.8/site-packages/dask/dataframe/core.py)```

cpp/src/traversal/tsp.cu

ajschmidt8 · 2021-06-09T17:09:58Z

I think the failures here could be related to this PR: rapidsai/cudf#8458

Co-authored-by: Paul Taylor <paul.e.taylor@me.com>

ajschmidt8 · 2021-06-10T14:13:47Z

rerun tests

ChuckHastings · 2021-06-10T19:50:43Z

I think we might need to update the cugraph conda dependencies as well.

The linked cudf update corrects the cudf dependencies, but our environment specifically downloads older versions.

ajschmidt8 · 2021-06-10T19:58:04Z

I found an issue with the rapids-build-env conda package that could be contributing to the failures here. I kicked off the Jenkins run below which should publish some corrected packages. Let's hold off on rerunning the tests until that job is complete.

https://gpuci.gpuopenanalytics.com/job/rapidsai/job/integration/job/branches/job/integration-branch-pipeline/567/

ajschmidt8 · 2021-06-11T00:12:13Z

rerun tests 🤞

pentschev · 2021-06-11T10:20:54Z

I don't know if the updated rapids-build-env package was still not available but the wrong one was picked again in the last build:

05:00:30   - rapidsai-nightly/linux-64::rapids-build-env==21.08.00a210608=cuda11.0_py37_g58360b3_408

It should have been build 425 rather than 408, but rapids-notebook-env picked the correct build 425. Rerunning tests to check.

pentschev · 2021-06-11T10:20:58Z

rerun tests

pentschev · 2021-06-11T13:21:24Z

Even on the latest run (hasn't completed yet), the incorrect version of rapids-build-env is being picked:

15:15:07   - rapidsai-nightly/linux-64::rapids-build-env==21.08.00a210608=cuda11.0_py37_g58360b3_408

Just as in my previous comment, it should get build 425 that depends on the correct versions of Dask/Distributed.

@ajschmidt8 any ideas what could still be going on here?

jakirkham · 2021-06-11T13:45:19Z

Wonder if we need to rebuild Docker images as well

ajschmidt8 · 2021-06-11T15:16:04Z

It looks like the correct version of dask is used all the way up until this point where libcugraph is installed. Installing libcugraph downgrades dask back to 2021.5.1 for some reason.

I think the best way to fix this would be to follow in cuml's footsteps here and install the latest version of dask after libcugraph has been installed.

pentschev · 2021-06-11T15:20:33Z

It looks like the correct version of dask is used all the way up until this point where libcugraph is installed. Installing libcugraph downgrades dask back to 2021.5.1 for some reason.

I think the best way to fix this would be to follow in cuml's footsteps here and install the latest version of dask after libcugraph has been installed.

This may fix the issue now, but this doesn't explain why an older rapids-build-env is being picked, which seems to be the ultimate reason why Dask is getting downgraded. The workaround being suggested can blow up again in the future if another conda install is included after the point which Dask is installed from source.

jakirkham · 2021-06-11T15:25:58Z

Would also add cugraph already seems to be installing dask + distributed from main. Had checked this previously when looking at Peter's Dask update PR ( #1666 )

ajschmidt8 · 2021-06-11T15:29:13Z

It looks like the correct version of dask is used all the way up until this point where libcugraph is installed. Installing libcugraph downgrades dask back to 2021.5.1 for some reason.
I think the best way to fix this would be to follow in cuml's footsteps here and install the latest version of dask after libcugraph has been installed.

This may fix the issue now, but this doesn't explain why an older rapids-build-env is being picked, which seems to be the ultimate reason why Dask is getting downgraded. The workaround being suggested can blow up again in the future if another conda install is included after the point which Dask is installed from source.

You are correct. We have an internal issue opened to investigate a larger issue at hand here which is likely contributing to this problem. The issue is that the wrong gpuci/rapidsai Docker image is being used for CI here. If you scroll to the top of the failed build logs, you will see it says gpuci/rapidsai:21.06-cuda11.2-devel-centos7-py3.8 instead of gpuci/rapidsai:21.08-cuda11.2-devel-centos7-py3.8. One of the differences between those two images is the dask version which is likely contributing to the problem here. For now, the workaround I described above should suffice until we work on correcting the images used in our CI system during release/codefreeze periods like we're in right now.

ajschmidt8 · 2021-06-11T15:30:54Z

Would also add cugraph already seems to be installing dask + distributed from main. Had checked this previously when looking at Peter's Dask update PR ( #1666 )

I saw that. I think the order of operations is what we'd need to change. We'd want to move the pip install to happen after the conda install of libcugraph similar to what cuml does.

pentschev · 2021-06-11T15:34:30Z

Note that per the logs, we will need to install Dask twice, where it is now and after libcugraph. The first install upgrades from 2021.5.1, which is downgraded during libcugraph install and will need to be upgraded again.

EDIT: Nevermind, I think only after libcugraph should suffice. I thought there was some tests running before libcugraph was installed.

pentschev · 2021-06-11T20:11:18Z

The build is already past the failure point, hopefully this will be green soon!

BradReesWork · 2021-06-11T20:32:55Z

@gpucibot merge

pentschev · 2021-06-11T20:55:15Z

Just like in rapidsai/cuml#3978 (comment), failures seem legit to the changes that were introduced in Distributed 2021.6.0, for example:

TypeError: No dispatch for <class 'cudf.core.index.RangeIndex'>

x = <class 'int'>, index = RangeIndex(start=0, stop=0, step=1)
parent_meta = Series([], Name: source, dtype: int32)

    def make_meta(x, index=None, parent_meta=None):
        """
        This method creates meta-data based on the type of ``x``,
        and ``parent_meta`` if supplied.
    
        Parameters
        ----------
        x : Object of any type.
            Object to construct meta-data from.
        index :  Index, optional
            Any index to use in the metadata. This is a pass-through
            parameter to dispatches registered.
        parent_meta : Object, default None
            If ``x`` is of arbitrary types and thus Dask cannot determine
            which back-end to be used to generate the meta-data for this
            object type, in which case ``parent_meta`` will be used to
            determine which back-end to select and dispatch to. To use
            utilize this parameter ``make_meta_obj`` has be dispatched.
            If ``parent_meta`` is ``None``, a pandas DataFrame is used for
            ``parent_meta`` thats chooses pandas as the backend.
    
        Returns
        -------
        A valid meta-data
        """
    
        if isinstance(
            x,
            (
                dd._Frame,
                dd.core.Scalar,
                dd.groupby._GroupBy,
                dd.accessor.Accessor,
                da.Array,
            ),
        ):
            return x._meta
    
        try:
>           return make_meta_dispatch(x, index=index)

I'm guessing cuGraph will require an update to adapt to the changes above, similar to what has been done in rapidsai/cudf#8426 . Also ccing @galipremsagar in case he has thoughts or could work on addressing those.

galipremsagar · 2021-06-12T01:16:20Z

rerun tests

jakirkham · 2021-06-12T07:17:15Z

rerun tests

jakirkham · 2021-06-12T15:51:56Z

rerun tests

BradReesWork · 2021-06-12T22:30:27Z

rerun tests

ajschmidt8 · 2021-06-14T13:53:15Z

rerun tests

The previous error log link & screenshot are below. More than likely, we'll need to update the Docker version on the GCP node here, but I'm retrying the tests one more time to see if it happened to be a transient error.
(log link)

ajschmidt8 · 2021-06-14T14:43:14Z

As it turns out a100 testing shouldn't have been enabled for this repo. It's been disabled. Since that was the only failure in the previous, run I will admin merge this PR since it's holding things up.

jakirkham · 2021-06-14T14:55:19Z

Thanks AJ! 😀

harrism added 2 commits June 1, 2021 11:06

Updates for device_scalar changes

1745ee3

Fix narrowing conversion warning

1342580

harrism requested a review from a team as a code owner June 1, 2021 01:08

harrism mentioned this pull request Jun 1, 2021

Refactor rmm::device_scalar in terms of rmm::device_uvector rapidsai/rmm#789

Merged

5 tasks

seunghwak approved these changes Jun 1, 2021

View reviewed changes

BradReesWork added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 1, 2021

BradReesWork added this to the 21.06 milestone Jun 1, 2021

BradReesWork modified the milestones: 21.06, 21.08 Jun 1, 2021

seunghwak mentioned this pull request Jun 1, 2021

MG WCC improvements #1628

Merged

Merge branch 'branch-21.08' into fea-device_scalar-refactor

bc36d8e

ChuckHastings approved these changes Jun 8, 2021

View reviewed changes

trxcllnt reviewed Jun 9, 2021

View reviewed changes

cpp/src/traversal/tsp.cu Outdated Show resolved Hide resolved

Update cpp/src/traversal/tsp.cu

070122c

Co-authored-by: Paul Taylor <paul.e.taylor@me.com>

ajschmidt8 mentioned this pull request Jun 10, 2021

Update UCX-Py version to 0.21 #1650

Merged

jakirkham mentioned this pull request Jun 10, 2021

Update Dask/Distributed version rapidsai/cuml#3978

Merged

mv dask installation

fa9b367

ajschmidt8 approved these changes Jun 14, 2021

View reviewed changes

rapids-bot bot merged commit 6b5079c into rapidsai:branch-21.08 Jun 14, 2021

Update to changed rmm::device_scalar API #1637

Update to changed rmm::device_scalar API #1637

Conversation

harrism commented Jun 1, 2021 • edited by ChuckHastings Loading

BradReesWork commented Jun 1, 2021

harrism commented Jun 2, 2021

BradReesWork commented Jun 7, 2021

harrism commented Jun 8, 2021

harrism commented Jun 8, 2021

harrism commented Jun 8, 2021

ChuckHastings commented Jun 8, 2021

ChuckHastings commented Jun 8, 2021

harrism commented Jun 8, 2021

ajschmidt8 commented Jun 8, 2021

codecov-commenter commented Jun 8, 2021 • edited Loading

Codecov Report

harrism commented Jun 9, 2021

ajschmidt8 commented Jun 9, 2021

ajschmidt8 commented Jun 10, 2021

ChuckHastings commented Jun 10, 2021

ajschmidt8 commented Jun 10, 2021

ajschmidt8 commented Jun 11, 2021

pentschev commented Jun 11, 2021

pentschev commented Jun 11, 2021

pentschev commented Jun 11, 2021

jakirkham commented Jun 11, 2021

ajschmidt8 commented Jun 11, 2021

pentschev commented Jun 11, 2021

jakirkham commented Jun 11, 2021

ajschmidt8 commented Jun 11, 2021

ajschmidt8 commented Jun 11, 2021

pentschev commented Jun 11, 2021 • edited Loading

pentschev commented Jun 11, 2021

BradReesWork commented Jun 11, 2021

pentschev commented Jun 11, 2021

galipremsagar commented Jun 12, 2021

jakirkham commented Jun 12, 2021

jakirkham commented Jun 12, 2021

BradReesWork commented Jun 12, 2021

ajschmidt8 commented Jun 14, 2021

ajschmidt8 commented Jun 14, 2021

jakirkham commented Jun 14, 2021

Update to changed `rmm::device_scalar` API #1637

Update to changed `rmm::device_scalar` API #1637

harrism commented Jun 1, 2021 •

edited by ChuckHastings

Loading

codecov-commenter commented Jun 8, 2021 •

edited

Loading

pentschev commented Jun 11, 2021 •

edited

Loading