Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Improved Array Conversion with CumlArrayDescriptor and Decorators [skip-ci] #3040

Merged
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
4524b7b
Adding additional checking for incorrect use cases. Added CumlArrayDe…
mdemoret-nv Jul 28, 2020
b849dc5
Cleaning up more use cases
mdemoret-nv Jul 30, 2020
af5c5ee
Merge branch 'branch-0.15' into bug-audit-cumlarray-use-improved
mdemoret-nv Aug 10, 2020
924e717
Initial commit of CumlArrayDescriptor in PCA
mdemoret-nv Aug 11, 2020
55badb8
Incrementally updating CumlArray uses
mdemoret-nv Aug 26, 2020
03f5097
Merge branch 'branch-0.16' into bug-audit-cumlarray-use-improved
mdemoret-nv Aug 27, 2020
9afc431
Adding some improvements to decorators to auto detect certain scenari…
mdemoret-nv Aug 27, 2020
cb7fae4
Adding internals.func_utils to test wrapping all functions and checki…
mdemoret-nv Sep 9, 2020
64399fb
Merge remote-tracking branch 'upstream/branch-0.16' into bug-audit-cu…
mdemoret-nv Sep 9, 2020
dcb6162
Commit before merging upstream
mdemoret-nv Sep 21, 2020
c1751a9
Merge branch 'branch-0.16' into bug-audit-cumlarray-use-improved
mdemoret-nv Sep 21, 2020
0c83f91
Updating native_bayes
mdemoret-nv Sep 22, 2020
043110d
Partial working state
mdemoret-nv Sep 23, 2020
090060b
Updating KMeans
mdemoret-nv Sep 24, 2020
8bef065
Partial pass over all Base subclasses
mdemoret-nv Sep 25, 2020
f0c2f59
Mostly complete pass of removing to_output
mdemoret-nv Sep 26, 2020
1e4d94f
Completed cleanup of Base method removal
mdemoret-nv Sep 28, 2020
7d90619
Cleaning up more to_output uses. Fixing test errors
mdemoret-nv Oct 1, 2020
60398d0
Adding tartet_arg property and fixing tests that can use it
mdemoret-nv Oct 2, 2020
100a352
More cleanup and test fixing
mdemoret-nv Oct 2, 2020
5501289
Updating types derived from Base to properly use get_param_names and …
mdemoret-nv Oct 5, 2020
94ed367
Merge remote-tracking branch 'upstream/branch-0.16' into bug-audit-cu…
mdemoret-nv Oct 7, 2020
e1d15f2
Fixing import order. Adding support for sparse arrays
mdemoret-nv Oct 19, 2020
104faf7
Attempting to fix nearest neighbors
mdemoret-nv Oct 19, 2020
bbf9128
Merge remote-tracking branch 'upstream/branch-0.16' into bug-audit-cu…
mdemoret-nv Oct 19, 2020
f07e2e2
Merge remote-tracking branch 'upstream/branch-0.17' into bug-audit-cu…
mdemoret-nv Oct 20, 2020
9b93816
Removing commented code
mdemoret-nv Oct 20, 2020
52be0b3
Fixing failing tests
mdemoret-nv Oct 21, 2020
f2ec80c
Merge remote-tracking branch 'upstream/branch-0.17' into bug-audit-cu…
mdemoret-nv Oct 21, 2020
574ec67
Fixing more tests
mdemoret-nv Oct 22, 2020
b064363
Adding PR to CHANGELOG and style fixes
mdemoret-nv Oct 22, 2020
1da1c70
Fixing missing import
mdemoret-nv Oct 22, 2020
ca08886
Removing protocol interface for python 3.7
mdemoret-nv Oct 22, 2020
246cc12
Fixing ARIMA. Required including changes from PR#2956
mdemoret-nv Oct 23, 2020
580f6bf
Fixing labelbinarizer and KNN failing tests
mdemoret-nv Oct 23, 2020
23233ca
Removing "invalid syntax" so flake8 can run
mdemoret-nv Oct 23, 2020
82804c9
Adding more wrappers to ARIMA so tests pass.
mdemoret-nv Oct 26, 2020
3b5da28
Committing CI change to allow tests to run.
mdemoret-nv Oct 29, 2020
638f19f
Moving memory check to plugin
mdemoret-nv Oct 29, 2020
e12f8be
Adding ability to load SPD environment variables to the logger
mdemoret-nv Oct 30, 2020
b658ed1
Changing pytest import-mode to better support development
mdemoret-nv Oct 30, 2020
cda7d82
Changing relative imports to absolute
mdemoret-nv Oct 31, 2020
ad93ee7
Adding first iteration of dev guide to see how it looks
mdemoret-nv Oct 31, 2020
7c6be98
Improving the quick_run plugin
mdemoret-nv Nov 2, 2020
e3997f9
Merge remote-tracking branch 'upstream/branch-0.17' into bug-audit-cu…
mdemoret-nv Nov 2, 2020
c63368a
Removing skip_* from cuml decorators
mdemoret-nv Nov 3, 2020
13d2c51
Fixing cuml_decorators test.
mdemoret-nv Nov 3, 2020
3f1c6ab
Removing the logger environment addition
mdemoret-nv Nov 4, 2020
d1783ca
Updating non-Base methods to use decorators
mdemoret-nv Nov 5, 2020
801c2a6
Large cleanup of remaining to_output, with_cupy_rmm and input_to_dev_ptr
mdemoret-nv Nov 5, 2020
90579e3
Style cleanup
mdemoret-nv Nov 5, 2020
8db2596
Apply John's suggestions from code review on Dev Guide
mdemoret-nv Nov 5, 2020
ac3e513
Merge remote-tracking branch 'upstream/branch-0.17' into bug-audit-cu…
mdemoret-nv Nov 5, 2020
8658d56
Large update to Estimator Guide incorporating feedback from JohnZ
mdemoret-nv Nov 5, 2020
c9a91c3
Removing array tracking and putting in plugin
mdemoret-nv Nov 6, 2020
ba2d6e2
Removing PR Description file
mdemoret-nv Nov 6, 2020
fc8d03e
Removing ArrayOutputable
mdemoret-nv Nov 6, 2020
71dd6ee
Removing test plugins
mdemoret-nv Nov 6, 2020
74cfed0
Cleaning up code to remove unnecessary diffs
mdemoret-nv Nov 6, 2020
c7e957e
Style cleanup
mdemoret-nv Nov 6, 2020
f338ce3
Defaulting to cp array instead of np, per feedback
mdemoret-nv Nov 6, 2020
fa7da5b
Adding additional tests
mdemoret-nv Nov 6, 2020
cd301d6
Merge remote-tracking branch 'upstream/branch-0.17' into bug-audit-cu…
mdemoret-nv Nov 6, 2020
3c89920
Separating func_tools into separate files
mdemoret-nv Nov 7, 2020
753a1b0
Removing extra changes to conftest.py which should not have been comm…
mdemoret-nv Nov 10, 2020
478e03c
Renaming base.py back to base.pyx
mdemoret-nv Nov 10, 2020
652b721
Apply suggestions from code review
mdemoret-nv Nov 10, 2020
dc60ce6
Incorporating feedback from Dante's code review
mdemoret-nv Nov 11, 2020
f92ce57
Merge branch 'bug-audit-cumlarray-use-improved' of github.com:mdemore…
mdemoret-nv Nov 11, 2020
8299d0a
Removing straggling TODO
mdemoret-nv Nov 11, 2020
15de0d8
Applying Dante's Revisions to ESTIMATOR_GUIDE
mdemoret-nv Nov 11, 2020
8a8998d
Updateing ESTIMATOR_GUIDE from feedback from Dante
mdemoret-nv Nov 11, 2020
d758628
Merge remote-tracking branch 'upstream/branch-0.17' into bug-audit-cu…
mdemoret-nv Nov 11, 2020
be51cfa
Cleaning up straggling to_output
mdemoret-nv Nov 11, 2020
a80ab6b
Another iteration on code review feedback
mdemoret-nv Nov 12, 2020
e2ba884
Style cleanup
mdemoret-nv Nov 12, 2020
eda86ac
More small items from code review
mdemoret-nv Nov 12, 2020
20887d0
Merge remote-tracking branch 'upstream/branch-0.17' into bug-audit-cu…
mdemoret-nv Nov 13, 2020
3ea2490
One final change to ESTIMATOR_GUIDE
mdemoret-nv Nov 13, 2020
cca0cea
Updaing all *_mg.pyx files to use the new naming conventions and Cuml…
mdemoret-nv Nov 13, 2020
ef0be19
Merge branch 'branch-0.17' into bug-audit-cumlarray-use-improved
JohnZed Nov 13, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
- PR #2906: Moving `linalg` decomp to RAFT namespaces
- PR #2996: Removing the max_depth restriction for switching to the batched backend
- PR #3004: Remove Single Process Multi GPU (SPMG) code
- PR #3040: Improved Array Conversion with CumlArrayDescriptor and Decorators

## Bug Fixes
- PR #2983: Fix seeding of KISS99 RNG
Expand Down
7 changes: 7 additions & 0 deletions PR_description.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[WIP] Addition of CumlArray Decorators and Descriptors to Ensure Consistent Usage In Library
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved

Due to the large number of input and output array types that cuml supports, the library needs to be flexible in how array data is handed when entering and exiting the API. This has led to some inconsistent usage of `CumlArray`, unnecessary conversions and memcpys, and adds repeated work for the developers.

This PR helps with these issues by adding a new object `CumlArrayDescriptor` and a set of decorators in `cuml.internals` that make working with array data easier for developers and more consistent for users while still being flexible enough to handle any scenario. Finally, this PR updates the majority of the code base to be consistent and use the new features.

A future developer guide will do a deep dive into how the new features should be used, but for now,
2 changes: 1 addition & 1 deletion ci/gpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ gpuci_conda_retry install -c conda-forge -c rapidsai -c rapidsai-nightly -c nvid
"dask-cudf=${MINOR_VERSION}" \
"dask-cuda=${MINOR_VERSION}" \
"ucx-py=${MINOR_VERSION}" \
"xgboost=1.2.0dev.rapidsai0.16" \
"xgboost=1.2.0dev.rapidsai${MINOR_VERSION}" \
"rapids-build-env=${MINOR_VERSION}.*" \
"rapids-notebook-env=${MINOR_VERSION}.*" \
"rapids-doc-env=${MINOR_VERSION}.*"
Expand Down
24 changes: 24 additions & 0 deletions cpp/include/cuml/tsa/batched_arima.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,30 @@ namespace ML {

enum LoglikeMethod { CSS, MLE };

/**
* Pack separate parameter arrays into a compact array
*
* @param[in] handle cuML handle
* @param[in] params Parameter structure
* @param[in] order ARIMA order
* @param[in] batch_size Batch size
* @param[out] param_vec Compact parameter array
*/
void pack(raft::handle_t& handle, const ARIMAParams<double>& params,
const ARIMAOrder& order, int batch_size, double* param_vec);

/**
* Unpack a compact array into separate parameter arrays
*
* @param[in] handle cuML handle
* @param[out] params Parameter structure
* @param[in] order ARIMA order
* @param[in] batch_size Batch size
* @param[in] param_vec Compact parameter array
*/
void unpack(raft::handle_t& handle, ARIMAParams<double>& params,
const ARIMAOrder& order, int batch_size, const double* param_vec);

/**
* Compute the differenced series (seasonal and/or non-seasonal differences)
*
Expand Down
12 changes: 12 additions & 0 deletions cpp/src/arima/batched_arima.cu
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,18 @@

namespace ML {

void pack(raft::handle_t& handle, const ARIMAParams<double>& params,
const ARIMAOrder& order, int batch_size, double* param_vec) {
const auto stream = handle.get_stream();
params.pack(order, batch_size, param_vec, stream);
}

void unpack(raft::handle_t& handle, ARIMAParams<double>& params,
const ARIMAOrder& order, int batch_size, const double* param_vec) {
const auto stream = handle.get_stream();
params.unpack(order, batch_size, param_vec, stream);
}

void batched_diff(raft::handle_t& handle, double* d_y_diff, const double* d_y,
int batch_size, int n_obs, const ARIMAOrder& order) {
const auto stream = handle.get_stream();
Expand Down
7 changes: 7 additions & 0 deletions cpp/src/common/logger.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#define SPDLOG_HEADER_ONLY
#include <spdlog/sinks/stdout_color_sinks.h> // NOLINT
#include <spdlog/spdlog.h> // NOLINT
#include <spdlog/cfg/env.h> // NOLINT

#include <algorithm>
#include <cuml/common/callbackSink.hpp>
Expand Down Expand Up @@ -56,6 +57,12 @@ Logger::Logger()
currPattern() {
setPattern(DefaultPattern);
setLevel(CUML_LEVEL_INFO);

// Init the logger so its in the registry
spdlog::initialize_logger(logger);

// Load environment variables
spdlog::cfg::load_env_levels();
}

void Logger::setLevel(int level) {
Expand Down
21 changes: 2 additions & 19 deletions notebooks/arima_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If we want to get the parameters that were fitted to the model, we can use `get_fit_params` or the corresponding properties:"
"If we want to get the parameters that were fitted to the model, we can use `get_fit_params` or the corresponding properties. The parameters are organized in 2D arrays: one row represents one parameter and the columns are different batch members."
]
},
{
Expand All @@ -211,7 +211,7 @@
"source": [
"param_mig = model_mig.get_fit_params()\n",
"print(param_mig[\"ma\"])\n",
"print(model_mig.ma)"
"print(model_mig.ma_)"
]
},
{
Expand All @@ -230,23 +230,6 @@
"print(model_mig.pack())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The parameters are organized in 2D arrays: one row represents one parameter and the columns are different batch members."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Print the ma.L1 and ma.L2 parameters for each of 4 batch members\n",
"print(param_mig[\"ma\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
2 changes: 1 addition & 1 deletion python/cuml/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@

# Output type configuration

global_output_type = 'input'
global_output_type = None

from cuml.common.memory_utils import set_global_output_type, using_output_type

Expand Down
53 changes: 27 additions & 26 deletions python/cuml/cluster/dbscan.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ from libcpp cimport bool
from libc.stdint cimport uintptr_t, int64_t
from libc.stdlib cimport calloc, malloc, free

import cuml
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
from cuml.common.array import CumlArray
from cuml.common.base import Base
from cuml.common.doc_utils import generate_docstring
from cuml.raft.common.handle cimport handle_t
from cuml.common import input_to_cuml_array
from cuml.common.array_descriptor import CumlArrayDescriptor

from collections import defaultdict

Expand Down Expand Up @@ -186,6 +188,9 @@ class DBSCAN(Base):
<http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html>`_.
"""

labels_ = CumlArrayDescriptor()
core_sample_indices_ = CumlArrayDescriptor()

def __init__(self, eps=0.5, handle=None, min_samples=5,
verbose=False, max_mbytes_per_batch=None,
output_type=None, calc_core_sample_indices=True):
Expand All @@ -196,18 +201,18 @@ class DBSCAN(Base):
self.calc_core_sample_indices = calc_core_sample_indices

# internal array attributes
self._labels_ = None # accessed via estimator.labels_
self.labels_ = None

# accessed via estimator._core_sample_indices_ when
# self.calc_core_sample_indices == True
self._core_sample_indices_ = None
# `self.calc_core_sample_indices == True`
mdemoret-nv marked this conversation as resolved.
Show resolved Hide resolved
self.core_sample_indices_ = None

# C++ API expects this to be numeric.
if self.max_mbytes_per_batch is None:
self.max_mbytes_per_batch = 0

@generate_docstring(skip_parameters_heading=True)
def fit(self, X, out_dtype="int32"):
def fit(self, X, out_dtype="int32") -> "DBSCAN":
"""
Perform DBSCAN clustering from features.

Expand All @@ -218,11 +223,6 @@ class DBSCAN(Base):
"int64", np.int64}.

"""
self._set_base_attributes(output_type=X, n_features=X)

if self._labels_ is not None:
del self._labels_

if out_dtype not in ["int32", np.int32, "int64", np.int64]:
raise ValueError("Invalid value for out_dtype. "
"Valid values are {'int32', 'int64', "
Expand All @@ -236,16 +236,16 @@ class DBSCAN(Base):

cdef handle_t* handle_ = <handle_t*><size_t>self.handle.getHandle()

self._labels_ = CumlArray.empty(n_rows, dtype=out_dtype)
cdef uintptr_t labels_ptr = self._labels_.ptr
self.labels_ = CumlArray.empty(n_rows, dtype=out_dtype)
cdef uintptr_t labels_ptr = self.labels_.ptr

cdef uintptr_t core_sample_indices_ptr = <uintptr_t> NULL

# Create the output core_sample_indices only if needed
if self.calc_core_sample_indices:
self._core_sample_indices_ = \
self.core_sample_indices_ = \
CumlArray.empty(n_rows, dtype=out_dtype)
core_sample_indices_ptr = self._core_sample_indices_.ptr
core_sample_indices_ptr = self.core_sample_indices_.ptr

if self.dtype == np.float32:
if out_dtype is "int32" or out_dtype is np.int32:
Expand Down Expand Up @@ -303,20 +303,21 @@ class DBSCAN(Base):
# Finally, resize the core_sample_indices array if necessary
if self.calc_core_sample_indices:

# Temp convert to cupy array only once
core_samples_cupy = self._core_sample_indices_.to_output("cupy")
# Temp convert to cupy array (better than using `cp.asarray`)
with cuml.using_output_type("cupy"):

# First get the min index. These have to monotonically increasing,
# so the min index should be the first returned -1
min_index = cp.argmin(core_samples_cupy).item()
# First get the min index. These have to monotonically
# increasing, so the min index should be the first returned -1
min_index = cp.argmin(self.core_sample_indices_).item()

# Check for the case where there are no -1's
if (min_index == 0 and core_samples_cupy[min_index].item() != -1):
# Nothing to delete. The array has no -1's
pass
else:
self._core_sample_indices_ = \
self._core_sample_indices_[:min_index]
# Check for the case where there are no -1's
if ((min_index == 0 and
self.core_sample_indices_[min_index].item() != -1)):
# Nothing to delete. The array has no -1's
pass
else:
self.core_sample_indices_ = \
self.core_sample_indices_[:min_index]

return self

Expand All @@ -325,7 +326,7 @@ class DBSCAN(Base):
'type': 'dense',
'description': 'Cluster labels',
'shape': '(n_samples, 1)'})
def fit_predict(self, X, out_dtype="int32"):
def fit_predict(self, X, out_dtype="int32") -> CumlArray:
"""
Performs clustering on X and returns cluster labels.

Expand Down
Loading