Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support heterogenous fanout type #4608

Open
wants to merge 70 commits into
base: branch-24.10
Choose a base branch
from

Conversation

jnke2016
Copy link
Contributor

@jnke2016 jnke2016 commented Aug 13, 2024

closes #4589
closes #4591

Copy link
Collaborator

@ChuckHastings ChuckHastings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on changing the API a bit.

raft::random::RngState& rng_state,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

#if 0
/* FIXME:
There are two options to support heterogeneous fanout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another option to explore.

Create a new function called neighbor_sample. Create it off of the biased sampling API, but with the following changes:

  1. the biases become optional instead of required. Then it can do either uniform or biased in the same call just by whether the biases are included or not
  2. the fanout and heterogeneous fanout as you have defined. Or we might explore using std::variant, where it would either take host_span or tuple of host span and make the right choice internally
  3. Move the rng_state parameter to be right after the handle (before the graph_view). This feels like a better standard place for the parameter.

We can then mark the existing uniform_neighbor_sample and biased_neighbor_sample as deprecated. When we implement, the internal C++ implementation can just call the new neighbor_sample with the parameters properly configured. This makes it a non-breaking change (eventually we'll drop the old functions), but still keeps the code reuse increased.

Thoughts @seunghwak ?

Copy link
Contributor

@seunghwak seunghwak Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the biases become optional instead of required. Then it can do either uniform or biased in the same call just by whether the biases are included or not

=> In this case, we may update the existing non-heterogeneous fanout type sampling functions as well. i.e. combine the uniform & biased sampling functions. Not sure about the optimal balancing point between creating too many functions vs creating a function with too many input parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I guess we should avoid creating a too busy function (one function handling all different types of sampling based on the input arguments excessively using std::variant & std::optional) but we should also avoid creating too many functions... Not sure what's the optimal balancing point...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, adding new parameters exponentially increase code complexity (too handle all possible combinations of optional parameters), we should better create separate functions. If supporting an additional optional parameter requires only a minor change in the API and implementation, we may create one generic function (or we may create one complex function that handles all different options in the detail namespace and multiple public functions calling this if this helps in reducing code replication).

cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
@@ -368,6 +410,7 @@ cugraph_error_code_t cugraph_uniform_neighbor_sample(
const cugraph_type_erased_device_array_view_t* label_to_comm_rank,
const cugraph_type_erased_device_array_view_t* label_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sample_heterogeneous_fanout_t* heterogeneous_fanout,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we take the same approach here. Create a new C API function called neighbor_sample, following the biased function definition. Add this parameter. Deprecate the other functions. In the implementation we can just check for nullptr (NULL).

@@ -150,7 +173,7 @@ neighbor_sample_impl(

std::vector<size_t> level_sizes{};
int32_t hop{0};
for (auto&& k_level : fan_out) {
for (auto&& k_level : (*fan_out)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually sufficient yet... but I'm more worried about the API right now.

This loop will need, in the case of heterogeneous sampling, to have 2 levels of for loop. An outer loop iterating by hop and an inner loop iterating by type.

I'd be inclined to add a setup loop that iterates over the types and generates the masks - and perhaps identifies the maximum number of hops to drive the outer loop. You'll need to get k_level from the right type/hop combination... so this for construct won't work at all, it will need to look different.

Copy link
Contributor Author

@jnke2016 jnke2016 Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I only added it for it to compile. I will revisit this approach once we lock the API's interface. It is only supporting non heterogeneous type for now

@@ -192,7 +215,7 @@ neighbor_sample_impl(
if (labels) { (*level_result_label_vectors).push_back(std::move(*labels)); }

++hop;
if (hop < fan_out.size()) {
if (hop < (*fan_out).size()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fan_out size will (potentially) vary by type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I only added it for it to compile. I will revisit this approach once we lock the API's interface. It is only supporting non heterogeneous type for now

# FIXME: Add expensive check to ensure all dict values are lists
# Convert to a tuple of sequence (edge type size and fanout values)
edge_type_size = []
[edge_type_size.append(len(s)) for s in list(fanout_vals.values())]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this iterate over the edge types in the dictionary in order? We need to make sure that this is constructed with edge type 0 first, followed by edge type 1, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I converted the heterogeneous fanout type to a sorted ordered dictionary.

edge_type_size = []
[edge_type_size.append(len(s)) for s in list(fanout_vals.values())]
edge_type_fanout_vals = list(chain.from_iterable(list(fanout_vals.values())))
fanout_vals = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my earlier suggestions, I think we want this to be a CSR structure, so converting from a list of sizes to a list of offsets is perhaps best done here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We changed this back to a dense structure... so I think this code isn't right.

@@ -314,8 +316,21 @@ def uniform_neighbor_sample(
fanout_vals = fanout_vals.get().astype("int32")
elif isinstance(fanout_vals, cudf.Series):
fanout_vals = fanout_vals.values_host.astype("int32")
elif isinstance(fanout_vals, dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above

@github-actions github-actions bot added the CMake label Aug 20, 2024
cpp/include/cugraph/sampling_functions.hpp Outdated Show resolved Hide resolved
cpp/include/cugraph/sampling_functions.hpp Outdated Show resolved Hide resolved
cpp/include/cugraph/sampling_functions.hpp Outdated Show resolved Hide resolved
cpp/include/cugraph/sampling_functions.hpp Show resolved Hide resolved
cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Outdated Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Outdated Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Outdated Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Outdated Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Outdated Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Show resolved Hide resolved
handle_.get_stream());
}

if constexpr (multi_gpu) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was directly from my PR, I'm sorry I introduced this problem.

This shuffle won't work. start_vertex_offsets_ groups these vertices into groups based on the label. Shuffling start_vertices will lose the appropriate label information. I believe the logic would need to be:

  1. Convert start_vertex_offsets_ to start_vertex_labels
  2. Shuffle the pair (start_vertex, start_vertex_label) to the proper GPU
  3. Renumber the starting vertices as below
  4. Sort by the pairs by start_vertex_label
  5. Reconstitute the starting vertex offsets based on the new labels organize onto the proper GPUs

Then this will be ready for calling the sampling functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest change to the C++ API, we don't need to reconstitute the offsets, we'll be passing in labels to the C++ call. So skip step 5 above.

However, because the start_vertex_offsets_ will be local to this GPU, we will need to compute a global label id to perform step 1 above properly. You can compute the number of local labels (start_vertex_offsets_->size_ - 1). If we use host_scalar_allgatherv we can get the number of labels on each GPU. Then we can do thrust::exclusive_scan to compute the base label for each GPU. The global label ids can be constructed from that.

Finally, we'll need to construct the global label_to_comm_rank list. This should be constructible by using the output from the thrust::exclusive_scan to compute the mapping of labels to output GPUs.

std::optional<rmm::device_uvector<label_t>> edge_label{std::nullopt};
std::optional<rmm::device_uvector<size_t>> offsets{std::nullopt};

rmm::device_uvector<vertex_t> vertex_type_offsets(graph_view.local_vertex_partition_range_size(), handle_.get_stream());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's only used in the heterogeneous renumbering code and is redefined in that block. I'd be inclined to delete the definition and sequence_fill here.

std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<size_t const>> starting_vertex_offsets,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on latest slack conversation... we should change this back to starting_vertex_labels. This should be an easy change by backing out a few things from the implementation.

std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
edge_property_view_t<edge_t, bias_t const*> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<size_t const>> starting_vertex_offsets,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, change back to starting_vertex_labels

std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<size_t const>> starting_vertex_offsets,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
edge_property_view_t<edge_t, bias_t const*> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<size_t const>> starting_vertex_offsets,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

const cugraph_edge_property_view_t* edge_biases,
const cugraph_type_erased_device_array_view_t* start_vertices,
const cugraph_type_erased_device_array_view_t* start_vertex_offsets,
const cugraph_type_erased_device_array_view_t* label_to_comm_rank,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this from the C API (per the latest slack conversation). We will internally compute the label to comm rank for C++ based on which GPU the seeds are sent from.

cugraph_edge_property_view_t const* edge_biases,
cugraph_type_erased_device_array_view_t const* start_vertices,
cugraph_type_erased_device_array_view_t const* start_vertex_offsets,
cugraph_type_erased_device_array_view_t const* label_to_comm_rank,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets removed

handle_.get_stream());
}

if constexpr (multi_gpu) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest change to the C++ API, we don't need to reconstitute the offsets, we'll be passing in labels to the C++ call. So skip step 5 above.

However, because the start_vertex_offsets_ will be local to this GPU, we will need to compute a global label id to perform step 1 above properly. You can compute the number of local labels (start_vertex_offsets_->size_ - 1). If we use host_scalar_allgatherv we can get the number of labels on each GPU. Then we can do thrust::exclusive_scan to compute the base label for each GPU. The global label ids can be constructed from that.

Finally, we'll need to construct the global label_to_comm_rank list. This should be constructible by using the output from the thrust::exclusive_scan to compute the mapping of labels to output GPUs.

cpp/src/c_api/neighbor_sampling.cpp Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Show resolved Hide resolved
cpp/src/c_api/neighbor_sampling.cpp Show resolved Hide resolved
@jnke2016 jnke2016 marked this pull request as ready for review September 25, 2024 18:05
@jnke2016 jnke2016 requested review from a team as code owners September 25, 2024 18:05
Comment on lines 57 to 58
#neighbor_sample.pyx // Fix me, break the APi into homogeneous nad heterogeneous neighbor sample
#biased_neighbor_sample.pyx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be part of this PR, or a future PR?

Copy link
Contributor Author

@jnke2016 jnke2016 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This FIXME is already addressed in my local branch. it will be part of my next commit

cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
cpp/include/cugraph_c/sampling_algorithms.h Outdated Show resolved Hide resolved
const cugraph_type_erased_device_array_view_t* label_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sampling_options_t* options,
bool_t is_biased,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter is obsolete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I didn't update the PLC and python API as I was mostly focused on the C++ and CAPI. But these should be address in my next commits

ctypedef struct cugraph_sample_heterogeneous_fan_out_t:
pass

cdef cugraph_error_code_t \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These next two functions are also no longer necessary.

element corresponds to the fan_out values.
The sampling method can use different fan_out values for each edge type.

is_biased: bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made the C++ parameter obsolete be separating the uniform and biased methods. We should mirror this in PLC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants