-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PropertyGraph set index to vertex and edge ids #2523
Conversation
Currently, this only does SG version for rapidsai#2401. MG is still TODO. This also doesn't change anything user-facing (yet).
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #2523 +/- ##
===============================================
Coverage ? 60.13%
===============================================
Files ? 111
Lines ? 6184
Branches ? 0
===============================================
Hits ? 3719
Misses ? 2465
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@eriknw is this ready for review? |
Yeah, for SG. I'm going to put MG into a different WIP branch, because of issues I'm having with I still want to add one thing: the ability to add vertex data that already has the vertex id as the index. |
ok, I'll wait until you've made that addition, I wasn't sure if that was going into this branch or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I had just one optional minor question/suggestion which need not hold up approval.
if self.__series_type is cudf.Series: | ||
latest = {n: self.__vertex_prop_dataframe[n] | ||
for n in self.__vertex_prop_dataframe.columns} | ||
else: | ||
latest = self.__vertex_prop_dataframe.to_dict('series') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how much faster/more efficient it is to create latest
by separately using .to_dict()
only for pandas vs. just having one way that works with both. Granted it's not many more lines of code to support both, but still might be worth only doing it one way if the speedup is negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that it's any faster, but I aspire to write canonical pandas, and, who knows, maybe pandas can do something smarter and/or maybe cudf will add this API in the future.
rerun tests |
This includes a slow workaround for rapidsai/cudf#11550
rerun tests |
1 similar comment
rerun tests |
Some updates to "renumber by type"
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a check that we fetch ids in correct order and behave correctly with repeated ids ? (for both PropertyGraph
and MGProperty Graph
and for both get_edge_data
and get_node_data
).
See example:
import cudf
import cugraph
from cugraph.experimental import PropertyGraph
import numpy as np
src_ser = cudf.Series([1, 1, 1, 2])
dst_ser = cudf.Series([2, 3, 4, 1])
df = cudf.DataFrame(
{"src": src_ser, "dst": dst_ser, "edge_feat": np.arange(len(src_ser))}
)
pg = PropertyGraph()
pg.add_edge_data(df, vertex_col_names=['src', 'dst'])
Current Main Behavior:
pg.get_edge_data(edge_ids = [2,1,3,1], columns=['edge_feat'])
_SRC_ _DST_ _EDGE_ID_ _TYPE_ edge_feat
1 1 3 1 1
2 1 4 2 2
3 2 1 3 3
Expected behavior on this PR.
_SRC_ _DST_ _EDGE_ID_ _TYPE_ edge_feat
0 1 4 2 2
1 1 3 1 1
2 2 1 3 3
3 1 3 1 1
rerun tests |
This PR fixes rapidsai/graph_dl#27 This PR fixes rapidsai/graph_dl#43 This PR fixes rapidsai/graph_dl#39 **Tests Added:** Single GPU: - [x] APIs like num_nodes, num_edges - [x] test_sampling_basic - [x] test_sampling_homogeneous_gs_in_dir - [x] test_sampling_homogeneous_gs_out_dir - [x] test_sampling_gs_homogeneous_neg_one_fanout - [x] test_sampling_gs_heterogeneous_in_dir - [x] test_sampling_gs_heterogeneous_out_dir - [x] test_sampling_gs_heterogeneous_neg_one_fanout Multi GPU: - [x] APIs like num_nodes, num_edges - [x] test_sampling_basic - [x] test_sampling_homogeneous_gs_in_dir - [x] test_sampling_homogeneous_gs_out_dir - [x] test_sampling_gs_homogeneous_neg_one_fanout - [x] test_sampling_gs_heterogeneous_in_dir - [x] test_sampling_gs_heterogeneous_out_dir - [x] test_sampling_gs_heterogeneous_neg_one_fanout Bugs to reproduce: - [ ] Repro heterogeneous single gpu hang outside pytest - [x] Repro hetrogenous multi gpu incorrect results for with_replace=False #2760 - [x] Repro hetrogenous incorrect results for different amount of GPUs #2761 Tests that depend upon #2523 - [x] Add minimal example to PR to ensure it gets fixed Added comment here: #2523 (review) - [x] test_get_node_storage_gs (Failing cause of a PG bug) - [x] test_get_edge_storage_gs (Failing cause of a PG bug) - [x] test_get_node_storage_gs (Failing cause of a PG bug) - [x] test_get_edge_storage_gs (Failing cause of a PG bug) Authors: - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - Rick Ratzel (https://github.com/rlratzel) - Brad Rees (https://github.com/BradReesWork) URL: #2592
@gpucibot merge |
Closes rapidsai/graph_dl#40 ~Blocked by #2523~ ~Blocked by #2510~ Authors: - Alex Barghi (https://github.com/alexbarghi-nv) - Vibhu Jawa (https://github.com/VibhuJawa) - Rick Ratzel (https://github.com/rlratzel) - Chuck Hastings (https://github.com/ChuckHastings) - Erik Welch (https://github.com/eriknw) Approvers: - Joseph Nke (https://github.com/jnke2016) - Brad Rees (https://github.com/BradReesWork) - Vibhu Jawa (https://github.com/VibhuJawa) - Rick Ratzel (https://github.com/rlratzel) URL: #2653
This PR fixes the below errors that have popped up in MNMG testing. - [x] fix_out of index keys on MNMG graphs - [x] fix loc/get_node_storage error on MNMG graphs (Work around rapidsai/cudf#11877) - [x] Clear Cached Properties when they become invalid - [x] Remove 6 pytest skipping as both these PRs have landed - #2751 - #2523 - [x] Change `vertex_col_names` to `node_col_names` to match DGL - [x] Ensure MNMG tests pass - [x] Work around the PG bug and also prevent redundant conversion to lists Authors: - Vibhu Jawa (https://github.com/VibhuJawa) Approvers: - Rick Ratzel (https://github.com/rlratzel) - Erik Welch (https://github.com/eriknw) URL: #2786
Currently, this only does SG version for #2401. MG is still TODO.Closes #2401
This also doesn't change anything user-facing (yet).