Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support heterogenous fanout type #4608

Open
wants to merge 70 commits into
base: branch-24.10
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
0adb2fd
support heterogenous fanout type
jnke2016 Aug 13, 2024
bb5a3e2
remove unusued code
jnke2016 Aug 13, 2024
10fa86d
fix style
jnke2016 Aug 13, 2024
f904350
create one API for both uniform and biased neighborhood sampling
jnke2016 Aug 20, 2024
1fc32c3
use the same function for both uniform and biased nieghborhood sampling
jnke2016 Aug 20, 2024
8fc21f8
add support for heterogenous fanout support at the plc layer and cons…
jnke2016 Aug 20, 2024
01a57f3
remove outdated codes
jnke2016 Aug 20, 2024
3a6aeb2
add flag differentiating between biased and uniform sampling
jnke2016 Aug 21, 2024
d2f6467
update docstrings and rename variable
jnke2016 Aug 21, 2024
5d25155
rename variable
jnke2016 Aug 21, 2024
80f8b86
create new tuple type
jnke2016 Aug 21, 2024
50e0fc5
remove unnecessary check
jnke2016 Aug 21, 2024
9f455bf
add constructor converting from array_view_t to array_t
jnke2016 Aug 21, 2024
d114534
leverage new constructor and remove unnecessary code
jnke2016 Aug 21, 2024
cf4a3ae
ensure edge types are ordered in increasing order
jnke2016 Aug 21, 2024
bc87b50
update docstrings
jnke2016 Aug 21, 2024
3013684
update docstrings
jnke2016 Aug 21, 2024
d6b6234
undo changes to uniform neighbor sample
jnke2016 Aug 22, 2024
068b0a3
undo changes to uniform neighbor sample
jnke2016 Aug 22, 2024
6920f65
update docstrings
jnke2016 Aug 22, 2024
760c5cd
re-order arguments
jnke2016 Aug 22, 2024
1e0ef27
remove outdated comments
jnke2016 Aug 22, 2024
de79620
add arguments and type check
jnke2016 Aug 23, 2024
8c17009
rename variable for consistency
jnke2016 Aug 23, 2024
7b95c5e
update neighbor sample API
jnke2016 Aug 30, 2024
19fc765
remove outdated code
jnke2016 Aug 30, 2024
e30766c
remove outdated comment
jnke2016 Aug 30, 2024
5dd66f2
first cut at new sampling function definition to clean up things befo…
ChuckHastings Sep 4, 2024
4b2764c
updates to remove builder pattern, also rename functions and mark old…
ChuckHastings Sep 5, 2024
4c1c610
add implementation of heterogeneous neighborhood sampling
jnke2016 Sep 9, 2024
fe35c80
add exit condition
jnke2016 Sep 9, 2024
a658b29
remove comments
jnke2016 Sep 10, 2024
e52a38a
Add Implementation
ChuckHastings Sep 11, 2024
c416439
call heterogeneous renumbering
jnke2016 Sep 13, 2024
98d6c57
update branch and call heterogneous renumbering
jnke2016 Sep 13, 2024
d7165af
update heterogeneous renumbering call
jnke2016 Sep 17, 2024
579fd0a
create a csr data structure to efficiently store vertex and label
jnke2016 Sep 17, 2024
5cdf40a
update API and docstring
jnke2016 Sep 17, 2024
a8fbd9d
remove unsued variable
jnke2016 Sep 17, 2024
9d5b3dd
update C++ API for neighbor sampling
jnke2016 Sep 20, 2024
0358c6e
add fixme for deprecated flags
jnke2016 Sep 20, 2024
799c35d
update CAPI
jnke2016 Sep 20, 2024
ab8aa72
undo changes to k-truss
jnke2016 Sep 21, 2024
7d8b5ad
undo changes to tests
jnke2016 Sep 21, 2024
f2190ba
clean up code
jnke2016 Sep 21, 2024
1e96dcf
update docs
jnke2016 Sep 23, 2024
36c25ad
fix typo
jnke2016 Sep 23, 2024
4857b36
call scatter instead of gather and fix type bug
jnke2016 Sep 23, 2024
263b6ac
fix typo
jnke2016 Sep 23, 2024
9dff3ab
update neighbor sample API
jnke2016 Sep 24, 2024
33c8b3d
update CAPI
jnke2016 Sep 25, 2024
e357f42
remove unsued code
jnke2016 Sep 25, 2024
6081978
remove outdated comment
jnke2016 Sep 25, 2024
73b3ffe
remove unnecessary copy
jnke2016 Sep 25, 2024
ea972f3
remove outdate arguments
jnke2016 Sep 26, 2024
8822192
fix typo
jnke2016 Sep 27, 2024
e02a513
update plc API of heterogeneous neighbor sample
jnke2016 Sep 27, 2024
d6cb1d5
fix typo
jnke2016 Sep 27, 2024
54fa155
change back the fanout type from a sparse to a dense structure
jnke2016 Sep 27, 2024
499e041
fix typo
jnke2016 Sep 27, 2024
b571deb
add implementation of heterogeneous/homogeneous biased/uniform neighb…
jnke2016 Sep 27, 2024
f6c4ce3
properly handle edge types
jnke2016 Sep 27, 2024
e71660d
add tests for 'homogeneous_uniform_neighbor_sampling'
jnke2016 Sep 27, 2024
4e2c8cf
add tests for homogeneous_biased_neighbor_sampling.cpp
jnke2016 Sep 27, 2024
2458149
update type combination
jnke2016 Sep 27, 2024
df3e4ff
add tests for heterogeneous uniform/biased neighborhood sampling
jnke2016 Sep 28, 2024
d4847e4
properly sample with edge types
jnke2016 Sep 28, 2024
dc2c9ba
remove outdated tests
jnke2016 Sep 28, 2024
c01f4e4
add SG python implementation of neighborhood sampling both homogeneou…
jnke2016 Sep 30, 2024
dabd0c8
remove unused argument
jnke2016 Sep 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions cpp/include/cugraph/sampling_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ enum class prior_sources_behavior_t { DEFAULT = 0, CARRY_OVER, EXCLUDE };

/**
* @brief Uniform Neighborhood Sampling.
*
* @deprecated This API will be deleted, use neighbor_sample instead
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects from these outgoing neighbors to extract a subgraph.
Expand Down Expand Up @@ -139,6 +141,8 @@ uniform_neighbor_sample(

/**
* @brief Biased Neighborhood Sampling.
*
* @deprecated This API will be deleted, use neighbor_sample instead
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases) from these outgoing neighbors to extract a subgraph.
Expand Down Expand Up @@ -240,6 +244,121 @@ biased_neighbor_sample(
bool dedupe_sources = false,
bool do_expensive_check = false);


/**
* @brief Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects (with edge biases or not) from these outgoing neighbors to extract a subgraph.
* When branching out to select outgoing neighbors, either fan_out or heterogeneous_fan_out must
* be provided but not both.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* * @param rng_state A pre-initialized raft::RngState object for generating random numbers
* @param graph_view Graph View object to generate NBR Sampling on.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param edge_bias_view Optional view object holding edge biases (to be used in biased sampling) for @p
ChuckHastings marked this conversation as resolved.
Show resolved Hide resolved
* graph_view. Bias values should be non-negative and the sum of edge bias values from any vertex
* should not exceed std::numeric_limits<bias_t>::max(). 0 bias value indicates that the
* corresponding edge can never be selected. passing std::nullopt as the edge biases will result in
* uniform sampling.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_labels Optional device span of labels associted with each starting vertex
* for the sampling.
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some further API changes I want to propose here. I'll point you to a PR when they're ready.

* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param fan_out Host span defining branching out (fan-out) degree per source vertex for each
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
* level. When fan_out is provided, the sampling method uses the same fanout value for each type.
* @param heterogeneous_fan_out Tuple of host spans defining branching out (fan-out) degree per
* source vertex for each level in CSR style format. The first element of the tuple is the offset
* array per edge type id and the second element correspond to the fanout values.
* When heterogeneous_fan_out is provided, different fan_out values can be used for each edge type.
* The fan-out offsets size must be proportional to the number of edge types and fan_out values.
* @param return_hops boolean flag specifying if the hop information should be returned.
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
* DEFAULT)
* @param dedupe_sources boolean flag, if true then if a vertex v appears as a destination in hop X
* multiple times with the same label, it will only be passed once (for each label) as a source
* for the next hop. Default is false.
* @param with_replacement boolean flag specifying if random sampling is done with replacement
* (true); or, without replacement (false); default = true;
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/
// FIXME: Add flag for bias=True/False
ChuckHastings marked this conversation as resolved.
Show resolved Hide resolved
template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename bias_t,
typename label_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
neighbor_sample(
raft::handle_t const& handle,
raft::random::RngState& rng_state,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
std::optional<edge_property_view_t<edge_t, bias_t const*>> edge_bias_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
std::optional<raft::host_span<int32_t const>> fan_out,
std::optional<std::tuple<raft::host_span<int32_t const>, raft::host_span<int32_t const>>>
heterogeneous_fan_out,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

/*
* @brief renumber sampled edge list and compress to the (D)CSR|(D)CSC format.
*
Expand Down
122 changes: 122 additions & 0 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -319,8 +319,59 @@ void cugraph_sampling_set_dedupe_sources(cugraph_sampling_options_t* options, bo
*/
void cugraph_sampling_options_free(cugraph_sampling_options_t* options);

/**
* @brief Opaque neighborhood sampling heterogeneous fanout type
*/
// FIXME: internal representation should be tuple instead of pairs - Make it more generic (tuple)
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
// cugraph_device_tuple_t, host_device_tuple_t,
// dictionary, key and array
// translate dictionary to a tuple. Add to the draft PR the PLC layer.
// Concatenate to build the 3 arrays from the PLC layer
/// mimic
typedef struct {
int32_t align_;
} cugraph_sample_heterogeneous_fan_out_t;
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved

/**
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
* @brief Create heterogeneous fanout
*
* Input data will be stored in the heterogenous_fanout.
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
*
* The fanout is going to be a CSR structure, the edge_type_offsets will define which range
* of the fanout array is associated with each edge type, the fanout will be the values of
* fanout for that hop/type. So for edge type k, fanout[edge_type_offsets[k]] will identify
* the fanout for hop 0 for edge type k. fanout[edge_type_offsets[k] +1] will identify the
* fanout for hop 1, etc. edge_type_offsets[k+1] will mark the beginning of the fanout
* array for type k+1 (and the end of the fanout array for type k.
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph
* @param [in] edge_type_offsets Type erased array of edge type offsets
* @param [in] fanout Type erased array of fanout values
* @param [out] heterogeneous_fanout Opaque pointer to fanout_t
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_create_heterogeneous_fan_out(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_host_array_view_t* edge_type_offsets,
const cugraph_type_erased_host_array_view_t* fanout,
cugraph_sample_heterogeneous_fan_out_t** heterogeneous_fanout,
cugraph_error_t** error);

/**
* @brief Free edge type and fanout pairs
*
* @param [in] heterogeneous_fanout The edge type size and fanout values
*/
void cugraph_heterogeneous_fanout_free(cugraph_sample_heterogeneous_fan_out_t* heterogeneous_fanout);

/**
* @brief Uniform Neighborhood Sampling
*
* @deprecated This API will be deleted, use cugraph_neighbor_sample instead
*
* Returns a sample of the neighborhood around specified start vertices. Optionally, each
* start vertex can be associated with a label, allowing the caller to specify multiple batches
Expand Down Expand Up @@ -376,6 +427,8 @@ cugraph_error_code_t cugraph_uniform_neighbor_sample(

/**
* @brief Biased Neighborhood Sampling
*
* @deprecated This API will be deleted, use cugraph_neighbor_sample instead
*
* Returns a sample of the neighborhood around specified start vertices. Optionally, each
* start vertex can be associated with a label, allowing the caller to specify multiple batches
Expand Down Expand Up @@ -433,6 +486,74 @@ cugraph_error_code_t cugraph_biased_neighbor_sample(
cugraph_sample_result_t** result,
cugraph_error_t** error);

/**
* @brief Neighborhood Sampling
*
* Returns a sample of the neighborhood around specified start vertices with edge biases or not.
* Optionally, each start vertex can be associated with a label, allowing the caller to specify
* multiple batches of sampling requests in the same function call - which should improve GPU
* utilization.
*
* If label is NULL then all start vertices will be considered part of the same batch and the
* return value will not have a label column.
*
* @param [in] handle Handle for accessing resources
* * @param [in,out] rng_state State of the random number generator, updated with each call
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] edge_biases Device array of edge biases to use for sampling. If NULL
* use the edge weight as the bias. If set to NULL, edges will be sampled uniformly.
* @param [in] start_vertices Device array of start vertices for the sampling
* @param [in] start_vertex_labels Device array of start vertex labels for the sampling. The
* labels associated with each start vertex will be included in the output associated with results
* that were derived from that start vertex. We only support label of type INT32. If label is
* NULL, the return data will not be labeled.
* @param [in] label_list Device array of the labels included in @p start_vertex_labels. If
* @p label_to_comm_rank is not specified this parameter is ignored. If specified, label_list
* must be sorted in ascending order.
* @param [in] label_to_comm_rank Device array identifying which comm rank the output for a
* particular label should be shuffled in the output. If not specifed the data is not organized in
* output. If specified then the all data from @p label_list[i] will be shuffled to rank @p. This
* cannot be specified unless @p start_vertex_labels is also specified
* label_to_comm_rank[i]. If not specified then the output data will not be shuffled between ranks.
* @param [in] label_offsets Device array of the offsets for each label in the seed list. This
* parameter is only used with the retain_seeds option.
* @param [in] fanout Host array defining the fan out at each step in the sampling algorithm.
* We only support fanout values of type INT32
* @param [in] heterogeneous_fanout Tuple of host arrays defining the fan out at each step in the
* sampling algorithm. in CSR style format. The first element of the tuple is the offset array per
* edge type id and the second element correspond to the fanout values.
* We only support type INT32 for both the offsets and the fanout values array.
* @param [in] sampling_options
* Opaque pointer defining the sampling options.
* @param [in] is_biased
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
* A flag specifying whether to run biased neighborhood sampling
* (if set to true) or uniform neighbor sampling.
* @param [in] do_expensive_check
* A flag to run expensive checks for input arguments (if set to true)
* @param [out] result Output from the uniform_neighbor_sample call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_neighbor_sample(
const cugraph_resource_handle_t* handle,
cugraph_rng_state_t* rng_state,
cugraph_graph_t* graph,
const cugraph_edge_property_view_t* edge_biases,
const cugraph_type_erased_device_array_view_t* start_vertices,
const cugraph_type_erased_device_array_view_t* start_vertex_labels,
const cugraph_type_erased_device_array_view_t* label_list,
const cugraph_type_erased_device_array_view_t* label_to_comm_rank,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this one also.

const cugraph_type_erased_device_array_view_t* label_offsets,
const cugraph_type_erased_host_array_view_t* fan_out,
const cugraph_sample_heterogeneous_fan_out_t* heterogeneous_fanout,
const cugraph_sampling_options_t* options,
bool_t is_biased,
bool_t do_expensive_check,
cugraph_sample_result_t** result,
cugraph_error_t** error);

/**
* @deprecated This call should be replaced with cugraph_sample_result_get_majors
* @brief Get the source vertices from the sampling algorithm result
Expand Down Expand Up @@ -667,6 +788,7 @@ cugraph_error_code_t cugraph_test_uniform_neighborhood_sample_result_create(
* not CUGRAPH_SUCCESS
* @return error code
*/

cugraph_error_code_t cugraph_select_random_vertices(const cugraph_resource_handle_t* handle,
const cugraph_graph_t* graph,
cugraph_rng_state_t* rng_state,
Expand Down
21 changes: 21 additions & 0 deletions cpp/src/c_api/array.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,27 @@ struct cugraph_type_erased_host_array_t {
std::copy(vec.begin(), vec.end(), reinterpret_cast<T*>(data_.get()));
}

cugraph_type_erased_host_array_t(cugraph_type_erased_host_array_view_t const* view_p)
: data_(std::make_unique<std::byte[]>(view_p->num_bytes_)),
size_(view_p->size_),
num_bytes_(view_p->num_bytes_),
type_(view_p->type_)
{
std::copy(view_p->data_, view_p->data_ + num_bytes_, data_.get());
}

template <typename T>
T* as_type()
{
return reinterpret_cast<T*>(data_.get());
}

template <typename T>
T const* as_type() const
{
return reinterpret_cast<T const*>(data_.get());
}

auto view()
{
return new cugraph_type_erased_host_array_view_t{data_.get(), size_, num_bytes_, type_};
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/c_api/graph_functions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ struct create_vertex_pairs_functor : public cugraph::c_api::abstract_functor {
std::nullopt,
std::nullopt);
}

// FIXME: use std::tuple (template) instead.
result_ = new cugraph::c_api::cugraph_vertex_pairs_t{
new cugraph::c_api::cugraph_type_erased_device_array_t(first_copy, graph_->vertex_type_),
new cugraph::c_api::cugraph_type_erased_device_array_t(second_copy, graph_->vertex_type_)};
Expand Down
Loading
Loading