Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix cuspatial.haversine_distance() example in multi-tenant notebook #336

Merged
merged 2 commits into from
Feb 5, 2024

Conversation

jameslamb
Copy link
Member

Fixes the cuspatial.haversine_distance() example using the public NYC taxi data hosted on GSC (GSC console link).

Running through the notebook code for that example (the same one that ends up at https://docs.rapids.ai/deployment/nightly/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook/), I encountered 3 issues:

  1. needed to authenticate with GCP
ValueError: An error occurred while calling the read_parquet method registered to the cudf backend.
Original Message: An error occurred while calling the read_parquet method registered to the pandas backend.
Original Message: Invalid gcloud credentials
  1. that GCS bucket contains some files at path gcs://anaconda-public-data/nyc-taxi/2015.parquet that are not actually parquet files
Screenshot 2024-02-05 at 1 20 07 PM
  1. cuspatial.haversine_distance() expects to receive 2 cuspatial.GeoSeries objects
TypeError('haversine_distance() takes 2 positional arguments but 4 were given')

Looks like that changed here: rapidsai/cuspatial#924.

This resolves those issues.

How I tested this

Following the instructions from https://docs.rapids.ai/install#install-rapids, ran jupyter lab in a RAPIDS container like this:

docker run \
    --gpus all \
    --pull always \
    --rm \
    -it \
    --shm-size=1g \
    --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    -p 8888:8888 \
    -p 8787:8787 \
    -p 8786:8786 \
    rapidsai/notebooks:24.02a-cuda12.0-py3.10

Then ran this notebook code (just the LocalCUDACluster parts and below), on a machine with a few 80GB H100s. Confirmed that data was pulled successfully without needing to authenticate with GCP, and that cuspatial.haversine_distance() ran without error and produced plausible-looking results.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

" \"gcs://anaconda-public-data/nyc-taxi/2015.parquet\",\n",
" storage_options={\"token\": \"cloud\"},\n",
" \"gcs://anaconda-public-data/nyc-taxi/2015.parquet/part.1*\",\n",
" storage_options={\"token\": \"anon\"},\n",
Copy link
Member Author

@jameslamb jameslamb Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that bucket is public, authenticating with GCP isn't necessary. For reference: https://gcsfs.readthedocs.io/en/stable/#credentials

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great thanks @jameslamb

@jacobtomlinson jacobtomlinson merged commit b84111f into rapidsai:main Feb 5, 2024
3 checks passed
@jameslamb jameslamb deleted the fix/multi-tenant-notebook branch February 5, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants