-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Replace edge_t with size_t #1613
Comments
Thinking about the timing of this. The new C API for graph construction hides this detail. If the number of edges is referenced it is referenced as a size_t in the API. The intention is to migrate all of the python bindings to use this new feature. It feels like there would be less potential breakage by waiting until this work is complete. |
Yeah... and I think we need to think little more about whether this is possible or not. Say, to compute degrees, the maximum possible degree can be larger than the total number of vertices if there are multi-edges. With this possibility, we cannot return a vector of vertex_t for degrees and always returning a vector of size_t can be wasteful. Need to resolve how to handle this case first before jumping to implementation. |
I feel like this is not really viable in the general case. We now use the Suggesting we close this. Any objections @seunghwak ? |
Yes, edge ID is one thing (for this, we can introduce edge_id_t) and returning degrees is another. This being said, we still don't need to use edge_t for the offsets array especially for multi-GPU use cases.
|
Describe the solution you'd like
We currently allow edge_t to be either 32 bit integer or 64 bit integer.
In the CSR representation of graphs, we use edge_t for the offsets array (array size # vertices + 1). Using 32 bit integer saves (# vertices + 1) * 4 bytes if the # edges is within the 32 bit limit.
However, this makes less sense in multi-GPU cases. The graph adjacency matrix is divided to multiple partitions.
std::numeric_limits<edge_t>::max() >= # edges within a partition
is sufficient, and we may mix CSR and DCSR as well (#1477).I think we'd better treat the offset array data type as implementation details (and dynamically sets to either 32 bit or 64 bit integers based on # edges within a partition), and replace edge_t to size_t in the public C++ interface.
This will cut both compile time and binary sizes as we can explicitly instantiate templates for fewer cases.
The text was updated successfully, but these errors were encountered: