Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELEASE] raft v23.12 #2020

Merged
merged 90 commits into from
Dec 6, 2023
Merged

[RELEASE] raft v23.12 #2020

merged 90 commits into from
Dec 6, 2023

Conversation

raydouglass
Copy link
Member

❄️ Code freeze for branch-23.12 and v23.12 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-23.12 until release (merging of this PR).

What is the purpose of this PR?

  • Update documentation
  • Allow testing for the new release
  • Enable a means to merge branch-23.12 into main for the release

raydouglass and others added 30 commits September 22, 2023 10:14
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
This PR builds conda packages using CUDA 12 on ARM.

Closes #1834.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)

URL: #1853
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
Opening a fresh PR in hopes to reset gh-actions runners to v3.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Paul Taylor (https://github.com/trxcllnt)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Ray Douglass (https://github.com/raydouglass)
  - Ben Frederickson (https://github.com/benfred)

URL: #1868
Building on cuda 12.2 shows errors like

```
/code/raft/cpp/include/raft/spatial/knn/detail/ball_cover/registers-inl.cuh(177): error #20054-D: dynamic initialization is not supported for a function-scope static __shared__ variable within a __device__/__global__ function
              KeyValuePair<value_t, value_idx> shared_memV[kNumWarps * warp_q];
```

Fix by using default constructors for structures in shared memory, even trivial constructors will cause this issue

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1870
I got errors when compiling a program using raft NN-descent. This PR fixes the bug.

## Error
e.g.
```
/home/.../include/raft/neighbors/detail/nn_descent.cuh(1158): error: invalid narrowing conversion from "unsigned long" to "int"
      h_rev_graph_old_{static_cast<size_t>(nrow_ * NUM_SAMPLES)},
```

- nvcc

```
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
```

- gcc
```
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
```

- thrust : 2.2 (This is the cause of this error [[detail](#1869 (comment))])

# Change
Use `Class(...)` instead of `Class{...}`.

# Cause
The NN-descent code calls constructors of `thrust::host_vector` as shown below:
```cpp
graph_host_buffer_{static_cast<size_t>(nrow_ * DEGREE_ON_DEVICE)},
```
However, this constructor is regarded as a list initialization.
This is the same as the following code outputting 1 instead of 2.
```cpp
#include <iostream>
#include <vector>

int main() {
  std::vector<float> list{2};

  std::cout << list.size() << std::endl;
}
```

[detail](https://en.cppreference.com/w/cpp/language/list_initialization)

Authors:
  - tsuki (https://github.com/enp1s0)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Divye Gala (https://github.com/divyegala)

URL: #1869
NN-Descent was using `int` type for indexing in `mdarray`, however this was causing an overflow when the product of all extents was greater than `int`.

This PR also adds/fixes:

- Missing dependencies for `raft-ann-bench` development environment
- Exposes NN Descent iterations to use in CAGRA benchmarks

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1875
# Purpose
This PR provides a utility for copying between generic mdspans. This includes between host and device, between mdspans of different layouts, and between mdspans of different (convertible) data types

## API
`raft::copy(raft_resources, dest_mdspan, src_mdspan);`

# Limitations

- Currently does not support copies between mdspans on two different GPUs
- Currently not performant for generic host-to-host copies (would be much easier to optimize with submdspan for padded layouts)
- Submdspan with padded layouts would also make it easier to improve perf of some device-to-device copies, though perf should already be quite good for most device-to-device copies.

# Design

- Includes optional `RAFT_DISABLE_CUDA` build definition in order to use this utility in CUDA-free builds (important for use in the FIL backend for Triton)
- Includes a new `raft::stream_view` object which is a thin wrapper around `rmm::stream_view`. Its purpose is solely to provide a symbol that will be defined in CUDA-free builds and which will throw exceptions or log error messages if someone tries to use a CUDA stream in a CUDA-free build. This avoids a whole bunch of ifdefs that would otherwise infect the whole codebase.
- Uses (roughly in order of preference): `cudaMemcpyAsync, std::copy, cublas, custom device kernel, custom host-to-host transfer logic` for the underlying copy
- Provides two different headers: `raft/core/copy.hpp` and `raft/core/copy.cuh`. This is to accommodate the custom kernel necessary for handling completely generic device-to-device copies. See below for more details.

## Details on the header split
For many instantiations, even those which involve the device, we do not require nvcc compilation. If, however, we determine at compilation time that we must use a custom kernel for the copy, then we must invoke nvcc. We do not wish to indicate that a public header file is a C++ header when it is a CUDA header or vice versa, so we split the definitions into separate `hpp` and `cuh` files, with all template instantiations requiring the custom kernel enable-if'd out of the hpp file.

Thus, the cuh header can be used for _any_ mdspan-to-mdspan copy, but the hpp file will not compile for those specific instantiations that require a custom kernel. The recommended workflow is that if a `cpp` file requires an mdspan-to-mdspan copy, first try the `hpp` header. If that fails, the `cpp` file must be converted to a `cu` file, and the `cuh` header should be used. For source files that are already being compiled with nvcc (i.e. `.cu` files), the `cuh` header might as well be used and will not result in any additional compile time penalty.

# Remaining tasks to leave WIP status

- [x] Add benchmarks for copies
- [x] Ensure that new function is correctly added to docs

# Follow-up items

- Optimize host-to-host transfers using a cache-oblivious approach with SIMD-accelerated transposes for contiguous memory
- Test cache-oblivious device-to-device transfers and compare performance
- Provide transparent support for copies between devices.

## Relationship to mdbuffer
This utility encapsulates a substantial chunk of the core logic required for the mdbuffer implementation. It is being split into its own PR both because it is useful on its own and because the mdbuffer work has been delayed by higher priority tasks.

Close #1779

Authors:
  - William Hicks (https://github.com/wphicks)
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #1818
Adding instructions to how to use the ANN benchmark containers, as well as general benchmark doc improvements.

Authors:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1833
Update to use non deprecated signatures for `rapids_export` functions

Authors:
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1884
Closes #1813

This PR adds FAISS CPU indexes to the raft-Ann-bench benchmarks, and adjusts the build accordingly. In addition, this PR also updated the FAISS version to include the RAFT-enabled version, which required removing the FAISS conda packages and building FAISS from source.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Robert Maynard (https://github.com/robertmaynard)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Ray Douglass (https://github.com/raydouglass)

URL: #1814
Forward-merge branch-23.10 to branch-23.12
This PR switches back to using `branch-23.12` for CI workflows because the CUDA 12 ARM conda migration is complete.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1886
- Use strides to determine whether we can use a contiguous layout.

Useful for accepting inputs with stride information like numpy array or legate stores. The code is mostly copied and pasted from mdspan layout_left/right  constructors. These constructors can optionally accept a stride layout if the underlying memory is contiguous, otherwise throw an exception. We change the exception into a boolean return value.

Authors:
  - Jiaming Yuan (https://github.com/trivialfis)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #1861
The python raft-ann-bench code launches subprocesses to execute C++ code. These scripts weren't checking the return values of the C++ programs though, and just waiting for them to finish.  Fix this to propogate the failures up as exceptions by checking the subprocess return value.

After this change, having a failing subprocess looks something like:

```
2023-10-12 10:54:55 [info] Using the dataset file '/home/ben/code/raft/python/raft-ann-bench/datasets/glove-100-inner/base.fbin'
terminate called after throwing an instance of 'std::runtime_error'
  what():  read header of BinFile failed: /home/ben/code/raft/python/raft-ann-bench/datasets/glove-100-inner/base.fbin
Traceback (most recent call last):
  File "/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/__main__.py", line 324, in <module>
    main()
  File "/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/__main__.py", line 309, in main
    run_build_and_search(
  File "/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/__main__.py", line 113, in run_build_and_search
    subprocess.run(cmd, check=True)
  File "/home/ben/miniconda3/envs/raft/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/home/ben/miniconda3/envs/raft/bin/ann/RAFT_CAGRA_ANN_BENCH', '--build', '--data_prefix=/home/ben/code/raft/python/raft-ann-bench/datasets/', '--benchmark_out_format=json', '--benchmark_out=/home/ben/code/raft/python/raft-ann-bench/datasets/glove-100-inner/result/build/raft_cagra-10-10000.json', '/home/ben/code/raft/python/raft-ann-bench/src/raft-ann-bench/run/conf/temporary_glove-100-inner.json']' died with <Signals.SIGABRT: 6>.
```

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #1897
tfeher and others added 18 commits November 9, 2023 14:46
In ANN benchmark, when throughput mode is enabled, it is important that each thread works with a separate set of queries, otherwise cache effects can distort the results. This PR sets a different starting offset for each thread.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1981
Adds a CI check to verify the the libraft.so doesn't have any public weak kernel symbols.

Authors:
  - Robert Maynard (https://github.com/robertmaynard)
  - AJ Schmidt (https://github.com/ajschmidt8)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)

URL: #1935
- fix matrix::detail::select::radix::calc_chunk_size() for one-block kernel
- use `calc_buf_len()` rather than `len` as the buffer size of one-block kernel
- reduce register footprint of one-block kernel by reducing the number of buffer pointers
- reduce the buffer size by 1/8 for all radix select functions


Resolve #1823

Authors:
  - Yong Wang (https://github.com/yong-wang)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Ben Frederickson (https://github.com/benfred)
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Ben Frederickson (https://github.com/benfred)

URL: #1878
…ns (#1968)

Currently dask versions are pinned as part of every release cycle and then unpinned for the next development cycle across all of RAPIDS. This introduces a great deal of churn. To centralize the dependency, we have created a metapackage to manage the required dask version and this PR introduces that metapackage as a dependency of raft_dask.

xref: rapidsai/cudf#14364

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Jake Awe (https://github.com/AyodeAwe)

URL: #1968
This PR fixes a potential out of bounds access. 

While building the reverse graph, we count the number of potential edges in `rev_graph_count`

https://github.com/rapidsai/raft/blob/047bfb2a7a24a97a4bece0b553521533868e7889/cpp/include/raft/neighbors/detail/cagra/graph_core.cuh#L204-L205

While we store only max `output_graph_degree` neighbors, the `rev_graph_count` can be larger than `output_graph_degree`.  The loop that uses `rev_graph` is updated to take this limit into account.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - tsuki (https://github.com/enp1s0)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1987
… time (#1989)

We can safely assume all cuda streams have been created before we run benchmark experiments. With the sensitivity to latencies for smaller batch sizes, this has an impact on overall throughput.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Ben Frederickson (https://github.com/benfred)

URL: #1989
A few updates to the `update-version.sh` script

Authors:
   - Ray Douglass (https://github.com/raydouglass)

Approvers:
   - Jake Awe (https://github.com/AyodeAwe)
With #1878 merged, the performance of the radix select algorithms is much improved and we no longer need to incorporate the faiss block select algorithm. With #1878 merged, faiss block select goes from being the 3rd ranked selection algorithm, to the 5th.

This regenerates the heuristic function with the latest benchmark times, and removes the faiss block select in favour of kWarpImmediate and kRadix11bitsExtraPass.

Authors:
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1985
This PR removes static checks for serialization size. Upstream changes like rapidsai/rmm#1370 have altered these sizes and break RAFT CI. An alternative approach to verifying serialization will be developed.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Mark Harris (https://github.com/harrism)

URL: #1997
CAGRA index can be constructed using existing device_mdarrays, in which case just reference to these arrays are stored. 
This way allocations are managed outside the index, and we can customize how these allocations are made.

This PR
- modifies the CAGRA ANN bench wrapper to manage the allocations locally,
- add options for the json file to specify whether the graph / dataset is allocated in device / host_pinned / host_huge_page memory.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Robert Maynard (https://github.com/robertmaynard)

Approvers:
  - Micka (https://github.com/lowener)
  - Divye Gala (https://github.com/divyegala)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: #1896
This adds support for iterating over batches of nearest neighbors in the brute force knn. This lets you query for the nearest neighbors, and then filter down the results - and if you have filtered out too many results, get the next batch of nearest neighbors.

The challenge here is balancing memory consumption versus efficieny: we could store the results of the full gemm in the distance calculation - but for large indices this discards the benefits of using the tiling strategy and risks running OOM. Instead we exponentially grow the number of neighbors being returned, and also cache both the query norms and index norms between calls.

Authors:
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1947
- [x] Codepacking for compressed on-device (flat) PQ codes
- [x] Testing

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #1826
pylibraft specifies its numpy pinning incorrectly in conda packages. It is only a runtime dependency, not a build dependency. Because of run-exports from numpy, the resulting pinning on pylibraft requires a very new numpy version, and is thus causing cuml to fall back to old raft builds.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Mark Harris (https://github.com/harrism)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Ray Douglass (https://github.com/raydouglass)

URL: #2013
We were relying on spdlog/fmt to be brought in transitively before, which was compatible with FAISS. It looks like they are no longer being brought in transitively and so we are bringing in a newer version at the cpm/cmake level which is breaking our raft-ann-bench-cpu build.

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Jake Awe (https://github.com/AyodeAwe)

URL: #2018
@raydouglass raydouglass requested review from a team as code owners November 21, 2023 21:51
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

raydouglass and others added 2 commits December 4, 2023 14:11
RAPIDS repos are using the `main` branch of https://github.com/actions/labeler which recently introduced [breaking changes](https://github.com/actions/labeler/releases/tag/v5.0.0).

This PR pins to the latest v4 release of the labeler action until we can evaluate the changes required for v5.

Authors:
   - Ray Douglass (https://github.com/raydouglass)

Approvers:
   - AJ Schmidt (https://github.com/ajschmidt8)
@raydouglass raydouglass merged commit e9f9aa8 into main Dec 6, 2023
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.