-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27299][GRAPH][WIP] Spark Graph API design proposal #24297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #104302 has finished for PR 24297 at commit
|
|
Although this is WIP, could you add LICENSE? |
|
so this is adding a |
|
@felixcheung The SPIP involves Cypher querying and graph algorithms. The |
|
Test build #104353 has finished for PR 24297 at commit
|
|
@s1ck . How do you build this PR in your environment? Jenkins seems to complain about [error] Cyclic reference involving
[error] Project(id cypher, base: /home/jenkins/workspace/SparkPullRequestBuilder/graph/cypher, dependencies: List(ResolvedClasspathDependency(ProjectRef(file:/home/jenkins/workspace/SparkPullRequestBuilder/,core),None), |
|
@dongjoon-hyun I'm using Maven/Intellij to build and run. I used the same |
|
Test build #104374 has finished for PR 24297 at commit
|
|
@dongjoon-hyun The cyclic reference is fixed. I won't dive into fixing Scala style tests for now since this is WIP and open for discussion. Btw, do you know if there is a Spark code style xml for IntelliJ? |
|
IntelliJ already understands The current warnings look like two types mostly. You can fix them easily.
|
|
Test build #104592 has finished for PR 24297 at commit
|
|
Hi, @mengxr . Could you help this PR to pass the Jenkins, please? |
|
@dongjoon-hyun This PR is a prototype for API and design discussions. We should break it down into smaller ones after we reach an agreement on the API and design. I don't think we can merge this one directly. |
|
Thank you, @mengxr . I see. |
|
Test build #104719 has finished for PR 24297 at commit
|
|
Test build #105643 has finished for PR 24297 at commit
|
|
Test build #106419 has finished for PR 24297 at commit
|
|
Test build #106420 has finished for PR 24297 at commit
|
|
Test build #106421 has finished for PR 24297 at commit
|
## What changes were proposed in this pull request? This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0. * `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms) * `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms * `spark-cypher` contains a Cypher query engine implementation Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL. Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302 A PoC for a running Cypher implementation can be found in this WIP PR apache#24297 ## How was this patch tested? Pass the Jenkins with all profiles and manually build and check the followings. ``` $ ls assembly/target/scala-2.12/jars/spark-cypher* assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar $ ls assembly/target/scala-2.12/jars/spark-graph* | grep -v graphx assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar ``` Closes apache#24490 from s1ck/SPARK-27300. Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com> Co-authored-by: Max Kießling <max@kopfueber.org> Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com> Co-authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
|
|
||
| override def toString: String = { | ||
| if (header.isEmpty) { | ||
| s"CAPSRecords.empty" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s is not necessary
| noLabelNodeDirectoryName | ||
| } else { | ||
| // TODO: Find more elegant solution for encoding underline characters | ||
| seq.map(_.replace("_", "--UNDERLINE--")).mkString("_").encodeSpecialCharacters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is it better to define a variable to hold "--UNDERLINE-- and use it at all of the references for consistency?
|
|
||
| implicit class RichRelationshipDataFrame(val relDf: RelationshipFrame) extends AnyVal { | ||
| def toRelationshipMapping: ElementMapping = RelationshipMappingBuilder | ||
| .on(relDf.idColumn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2-indent?
| * values from the evaluated children. | ||
| */ | ||
| def nullSafeConversion(expr: Expr)(withConvertedChildren: Seq[Column] => Column) | ||
| (implicit header: RecordHeader, df: DataFrame, parameters: CypherMap): Column = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4-indent?
|
Test build #112050 has finished for PR 24297 at commit
|
|
Test build #112072 has finished for PR 24297 at commit
|
4669b16 to
c337f6c
Compare
|
Test build #112098 has finished for PR 24297 at commit
|
Make example work when started from the IDE
Co-Authored-By: Sören Reichardt <soren.reichardt@neotechnology.com>
Graph examples
|
Test build #112105 has finished for PR 24297 at commit
|
Co-authored-by: Max Kießling <max.kiessling@neotechnology.com>
|
Test build #112106 has finished for PR 24297 at commit
|
|
Test build #112116 has finished for PR 24297 at commit
|
Co-authored-by: Max Kießling <max.kiessling@neotechnology.com>
|
Test build #112157 has finished for PR 24297 at commit
|
Co-authored-by: Max Kießling <max.kiessling@neotechnology.com>
|
Test build #112211 has finished for PR 24297 at commit
|
What changes were proposed in this pull request?
This PR demonstrates a prototypical implementation of the new Spark Graph API. The PR should mainly be used to discuss the API proposed in this GoogleDoc. This PR is not intended to be merged.
The PR introduces two modules:
Please use the PR and/or the GoogleDoc to comment the content of spark-graph-api. There will be follow-up PRs for spark-cypher.
How was this patch tested?
spark-cypher has been tested using the openCypher Technology Compatibility Kit
Contributors
Design, documentation and implementation have been a collaborative effort:
Co-Authored-By: Xiangrui Meng meng@databricks.com
Co-Authored-By: Max Kießling max.kiessling@neotechnology.com
Co-Authored-By: Mats Rydberg mats@neotechnology.com
Co-Authored-By: Philip Stutz philip.stutz@gmail.com
Co-Authored-By: Sören Reichardt soren.reichardt@neotechnology.com
Co-Authored-By: Jonatan Jäderberg jonatan.jaderberg@gmail.com
Co-Authored-By: Tobias Johansson tobias.johansson@neotechnology.com
Co-Authored-By: Alastair Green alastair.green@neo4j.com