Skip to content

Conversation

@s1ck
Copy link
Contributor

@s1ck s1ck commented Apr 4, 2019

What changes were proposed in this pull request?

This PR demonstrates a prototypical implementation of the new Spark Graph API. The PR should mainly be used to discuss the API proposed in this GoogleDoc. This PR is not intended to be merged.

The PR introduces two modules:

  • spark-graph-api (containing the API to be discussed)
  • spark-cypher (a prototypical implementation of spark-graph-api)

Please use the PR and/or the GoogleDoc to comment the content of spark-graph-api. There will be follow-up PRs for spark-cypher.

How was this patch tested?

spark-cypher has been tested using the openCypher Technology Compatibility Kit

Contributors

Design, documentation and implementation have been a collaborative effort:

Co-Authored-By: Xiangrui Meng meng@databricks.com
Co-Authored-By: Max Kießling max.kiessling@neotechnology.com
Co-Authored-By: Mats Rydberg mats@neotechnology.com
Co-Authored-By: Philip Stutz philip.stutz@gmail.com
Co-Authored-By: Sören Reichardt soren.reichardt@neotechnology.com
Co-Authored-By: Jonatan Jäderberg jonatan.jaderberg@gmail.com
Co-Authored-By: Tobias Johansson tobias.johansson@neotechnology.com
Co-Authored-By: Alastair Green alastair.green@neo4j.com

@dongjoon-hyun
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Apr 4, 2019

Test build #104302 has finished for PR 24297 at commit d678814.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Although this is WIP, could you add LICENSE?

========================================================================
Running Apache RAT checks
========================================================================
Attempting to fetch rat
Could not find Apache license headers in the following files:
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/api/pom.xml
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/api/src/main/scala/org/apache/spark/graph/api/CypherResult.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/api/src/main/scala/org/apache/spark/graph/api/CypherSession.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/api/src/main/scala/org/apache/spark/graph/api/GraphElementFrame.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/api/src/main/scala/org/apache/spark/graph/api/PropertyGraph.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/api/src/main/scala/org/apache/spark/graph/api/PropertyGraphType.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/pom.xml
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkCypherEntity.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkCypherFunctions.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkCypherRecords.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkCypherResult.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkCypherSession.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkEntityTable.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkGraphDirectoryStructure.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/SparkTable.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/adapters/MappingAdapter.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/adapters/RelationalGraphAdapter.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/adapters/SchemaAdapter.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/conversions/CypherValueEncoders.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/conversions/ExprConversions.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/conversions/RowConversion.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/conversions/StringEncodingUtilities.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/conversions/TemporalConversions.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/conversions/TypeConversions.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/io/ReadWriteGraph.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/udfs/TemporalUdfs.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/main/scala/org/apache/spark/cypher/util/HadoopFSUtils.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/resources/tck/failing_blacklist
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/resources/tck/failure_reporting_blacklist
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/resources/tck/temporal_blacklist
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/resources/tck/wont_fix_blacklist
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/scala/org/apache/spark/cypher/GraphExamplesSuite.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/scala/org/apache/spark/cypher/PropertyGraphReadWrite.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/scala/org/apache/spark/cypher/SharedCypherContext.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/scala/org/apache/spark/cypher/construction/ScanGraphFactory.scala
 !????? /home/jenkins/workspace/SparkPullRequestBuilder@2/graph/cypher/src/test/scala/org/apache/spark/cypher/tck/SparkCypherTckSuite.scala
[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/dev/check-license ; received return code 1

@felixcheung
Copy link
Member

so this is adding a graph directory next to graphx?

@s1ck
Copy link
Contributor Author

s1ck commented Apr 6, 2019

@felixcheung The SPIP involves Cypher querying and graph algorithms. The graphx directory will eventually be removed.

@SparkQA
Copy link

SparkQA commented Apr 7, 2019

Test build #104353 has finished for PR 24297 at commit a779702.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 8, 2019

@s1ck . How do you build this PR in your environment? Jenkins seems to complain about Cyclic reference issue.

[error] Cyclic reference involving 
[error]    Project(id cypher, base: /home/jenkins/workspace/SparkPullRequestBuilder/graph/cypher, dependencies: List(ResolvedClasspathDependency(ProjectRef(file:/home/jenkins/workspace/SparkPullRequestBuilder/,core),None), 

@s1ck
Copy link
Contributor Author

s1ck commented Apr 8, 2019

@dongjoon-hyun I'm using Maven/Intellij to build and run. I used the same sbt.project.name for the two modules .. I guess this lead to the cyclic reference error.

@SparkQA
Copy link

SparkQA commented Apr 8, 2019

Test build #104374 has finished for PR 24297 at commit 451e279.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@s1ck
Copy link
Contributor Author

s1ck commented Apr 8, 2019

@dongjoon-hyun The cyclic reference is fixed. I won't dive into fixing Scala style tests for now since this is WIP and open for discussion. Btw, do you know if there is a Spark code style xml for IntelliJ?

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 8, 2019

IntelliJ already understands checkstyle.xml. If you open the project with IntelliJ in Apache Spark project, you must see the red line warnings.

The current warnings look like two types mostly. You can fix them easily.

  • Max line length (100).
  • Import ordering and grouping (java -> scala -> 3rd party -> org.apache.spark)

@SparkQA
Copy link

SparkQA commented Apr 15, 2019

Test build #104592 has finished for PR 24297 at commit b6f26aa.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • implicit class RichExpression(expr: Expr)
  • implicit class TemporalExpression(val expr: Expr) extends AnyVal

@dongjoon-hyun
Copy link
Member

Hi, @mengxr . Could you help this PR to pass the Jenkins, please?

@mengxr
Copy link
Contributor

mengxr commented Apr 15, 2019

@dongjoon-hyun This PR is a prototype for API and design discussions. We should break it down into smaller ones after we reach an agreement on the API and design. I don't think we can merge this one directly.

@dongjoon-hyun
Copy link
Member

Thank you, @mengxr . I see.

@SparkQA
Copy link

SparkQA commented Apr 18, 2019

Test build #104719 has finished for PR 24297 at commit bfe66f8.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2019

Test build #105643 has finished for PR 24297 at commit 9dde5f9.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • case class SchemaAdapter(schema: PropertyGraphSchema) extends PropertyGraphType

@SparkQA
Copy link

SparkQA commented Jun 12, 2019

Test build #106419 has finished for PR 24297 at commit 94b01fd.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2019

Test build #106420 has finished for PR 24297 at commit d74df52.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2019

Test build #106421 has finished for PR 24297 at commit 4b6935b.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class GraphElementFrame

emanuelebardelli pushed a commit to emanuelebardelli/spark that referenced this pull request Jun 15, 2019
## What changes were proposed in this pull request?

This PR introduces the necessary Maven modules for the new [Spark Graph](https://issues.apache.org/jira/browse/SPARK-25994) feature for Spark 3.0.

* `spark-graph` is a parent module that users depend on to get all graph functionalities (Cypher and Graph Algorithms)
* `spark-graph-api` defines the [Property Graph API](https://docs.google.com/document/d/1Wxzghj0PvpOVu7XD1iA8uonRYhexwn18utdcTxtkxlI) that is being shared between Cypher and Algorithms
* `spark-cypher` contains a Cypher query engine implementation

Both, `spark-graph-api` and `spark-cypher` depend on Spark SQL.

Note, that the Maven module for Graph Algorithms is not part of this PR and will be introduced in https://issues.apache.org/jira/browse/SPARK-27302

A PoC for a running Cypher implementation can be found in this WIP PR apache#24297

## How was this patch tested?

Pass the Jenkins with all profiles and manually build and check the followings.
```
$ ls assembly/target/scala-2.12/jars/spark-cypher*
assembly/target/scala-2.12/jars/spark-cypher_2.12-3.0.0-SNAPSHOT.jar

$ ls assembly/target/scala-2.12/jars/spark-graph* | grep -v graphx
assembly/target/scala-2.12/jars/spark-graph-api_2.12-3.0.0-SNAPSHOT.jar
assembly/target/scala-2.12/jars/spark-graph_2.12-3.0.0-SNAPSHOT.jar
```

Closes apache#24490 from s1ck/SPARK-27300.

Lead-authored-by: Martin Junghanns <martin.junghanns@neotechnology.com>
Co-authored-by: Max Kießling <max@kopfueber.org>
Co-authored-by: Martin Junghanns <martin.junghanns@neo4j.com>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

override def toString: String = {
if (header.isEmpty) {
s"CAPSRecords.empty"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s is not necessary

noLabelNodeDirectoryName
} else {
// TODO: Find more elegant solution for encoding underline characters
seq.map(_.replace("_", "--UNDERLINE--")).mkString("_").encodeSpecialCharacters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it better to define a variable to hold "--UNDERLINE-- and use it at all of the references for consistency?


implicit class RichRelationshipDataFrame(val relDf: RelationshipFrame) extends AnyVal {
def toRelationshipMapping: ElementMapping = RelationshipMappingBuilder
.on(relDf.idColumn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2-indent?

* values from the evaluated children.
*/
def nullSafeConversion(expr: Expr)(withConvertedChildren: Seq[Column] => Column)
(implicit header: RecordHeader, df: DataFrame, parameters: CypherMap): Column = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4-indent?

@SparkQA
Copy link

SparkQA commented Oct 14, 2019

Test build #112050 has finished for PR 24297 at commit 9114585.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 14, 2019

Test build #112072 has finished for PR 24297 at commit 4669b16.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@s1ck s1ck force-pushed the 3.0-spark-graph-poc branch from 4669b16 to c337f6c Compare October 15, 2019 08:15
@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112098 has finished for PR 24297 at commit c337f6c.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112105 has finished for PR 24297 at commit f656a08.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Co-authored-by: Max Kießling <max.kiessling@neotechnology.com>
@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112106 has finished for PR 24297 at commit 31ce35d.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 15, 2019

Test build #112116 has finished for PR 24297 at commit 314ea23.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Co-authored-by: Max Kießling <max.kiessling@neotechnology.com>
@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112157 has finished for PR 24297 at commit 3420b6c.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Co-authored-by: Max Kießling <max.kiessling@neotechnology.com>
@SparkQA
Copy link

SparkQA commented Oct 17, 2019

Test build #112211 has finished for PR 24297 at commit 3786c73.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@s1ck s1ck closed this Dec 18, 2019
@s1ck s1ck deleted the 3.0-spark-graph-poc branch December 18, 2019 09:16
@s1ck s1ck restored the 3.0-spark-graph-poc branch December 18, 2019 09:16
@s1ck s1ck deleted the 3.0-spark-graph-poc branch December 18, 2019 09:16
@s1ck s1ck restored the 3.0-spark-graph-poc branch December 18, 2019 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.