-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] OPG pagerank (Py) #944
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
Codecov Report
@@ Coverage Diff @@
## branch-0.15 #944 +/- ##
===============================================
- Coverage 49.09% 47.86% -1.23%
===============================================
Files 49 52 +3
Lines 1485 1548 +63
===============================================
+ Hits 729 741 +12
- Misses 756 807 +51
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more comments, but I'm ok merging as is to unblock #947 if we address these questions in a separate PR.
from libc.stdint cimport uintptr_t | ||
from cython.operator cimport dereference as deref | ||
|
||
def mg_pagerank(input_df, local_data, handle, alpha=0.85, max_iter=100, tol=1.0e-5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support tol
currently, we should throw an error if users set it (or add it later to the API).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could add it to the final PR after #947 is merged, along with opg pagerank tests.
@@ -162,7 +162,6 @@ def pagerank(edge_list, alpha=0.85, max_iter=30): | |||
>>> dtype=['int32', 'int32']) | |||
>>> pr = dcg.pagerank(ddf_edge_list, alpha=0.85, max_iter=50) | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some dead code from SNMG and/or things that should be in RAFT in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is entirely snmg code. It simply connects to cugraph/python/cugraph/snmg/link_analysis/mg_pagerank.py which throws an exception saying it is disabled. We could update it to throw exception at python/cugraph/dask/pagerank/pagerank.py itself or just remove it completely for 0.15 release once we have OPG pagerank.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some prints in the PR which look like are debugging artifacts.
There is a single commented code that can be removed.
|
||
# Move to conftest | ||
from dask_cuda import LocalCUDACluster | ||
# cluster = LocalCUDACluster(protocol="tcp", scheduler_port=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this comment be removed from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a temporary test (which is not run by CI) This is to check the validity of the PR if someone wants to test the pipeline. This is going to be replaced by a pagerank test after #947 is merged and e2e runs successfully.
|
||
|
||
def common_func(sID, data, local_data): | ||
print("Dataframe: ", data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these debugging artifacts? Is it required to print all of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This going to be updated/removed in the final PR. Currently these prints exist for checking input validity as we can't verify results pagerank without Opg cuda code.
No description provided.