Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Replace edge_t with size_t #1613

Open
Tracked by #3280
seunghwak opened this issue May 24, 2021 · 4 comments
Open
Tracked by #3280

[ENH] Replace edge_t with size_t #1613

seunghwak opened this issue May 24, 2021 · 4 comments
Assignees
Labels
improvement Improvement / enhancement to an existing function

Comments

@seunghwak
Copy link
Contributor

Describe the solution you'd like
We currently allow edge_t to be either 32 bit integer or 64 bit integer.

In the CSR representation of graphs, we use edge_t for the offsets array (array size # vertices + 1). Using 32 bit integer saves (# vertices + 1) * 4 bytes if the # edges is within the 32 bit limit.

However, this makes less sense in multi-GPU cases. The graph adjacency matrix is divided to multiple partitions. std::numeric_limits<edge_t>::max() >= # edges within a partition is sufficient, and we may mix CSR and DCSR as well (#1477).

I think we'd better treat the offset array data type as implementation details (and dynamically sets to either 32 bit or 64 bit integers based on # edges within a partition), and replace edge_t to size_t in the public C++ interface.

This will cut both compile time and binary sizes as we can explicitly instantiate templates for fewer cases.

@seunghwak seunghwak added the ? - Needs Triage Need team to review and classify label May 24, 2021
@BradReesWork BradReesWork added improvement Improvement / enhancement to an existing function and removed ? - Needs Triage Need team to review and classify labels Aug 5, 2021
@BradReesWork BradReesWork added this to the 21.10 milestone Aug 5, 2021
@ChuckHastings ChuckHastings modified the milestones: 21.10, 21.12 Sep 22, 2021
@ChuckHastings ChuckHastings modified the milestones: 21.12, 22.02 Nov 10, 2021
@ChuckHastings
Copy link
Collaborator

Thinking about the timing of this.

The new C API for graph construction hides this detail. If the number of edges is referenced it is referenced as a size_t in the API. The intention is to migrate all of the python bindings to use this new feature. It feels like there would be less potential breakage by waiting until this work is complete.

@seunghwak
Copy link
Contributor Author

Yeah... and I think we need to think little more about whether this is possible or not.

Say, to compute degrees, the maximum possible degree can be larger than the total number of vertices if there are multi-edges. With this possibility, we cannot return a vector of vertex_t for degrees and always returning a vector of size_t can be wasteful.

Need to resolve how to handle this case first before jumping to implementation.

@ChuckHastings
Copy link
Collaborator

I feel like this is not really viable in the general case.

We now use the edge_t also to represent edge ids in the graph. While it's true that the offsets on each GPU's subset of edges could be an implementation detail, we still expose the edge_t for edge ids, and we don't want to force 64-bit edge ids on all graphs.

Suggesting we close this. Any objections @seunghwak ?

@seunghwak
Copy link
Contributor Author

Yes, edge ID is one thing (for this, we can introduce edge_id_t) and returning degrees is another. This being said, we still don't need to use edge_t for the offsets array especially for multi-GPU use cases.

  1. We still need to ask users to provide data types for edge IDs and degrees (which is edge_t as of now).
  2. We can update the offsets array to use either edge_t or smaller (if edge_t is 64 bit integer, and # edges in local partitions fits into 32 bit integer), but this may not be super high priority at this moment (this may lead to some cut in peak memory usage for certain algorithm & input graph & # GPUs combinations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function
Projects
None yet
Development

No branches or pull requests

4 participants