From 28bce1fe34e839ca0975cd7604979975ec7adcbe Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Tue, 16 Jan 2024 14:23:12 +0000
Subject: [PATCH 01/13] Update gitignore

---
 .gitignore | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 40e4a19..0311c21 100644
--- a/.gitignore
+++ b/.gitignore
@@ -35,4 +35,7 @@ target/
 
 # node modules
 node_modules/
-.env
\ No newline at end of file
+.env
+
+# IDE
+.vscode/
\ No newline at end of file

From bc5c9e1d20c64193df9137b538b76055d14abfb9 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Tue, 16 Jan 2024 14:37:18 +0000
Subject: [PATCH 02/13] Add considerations to write query

---
 modules/ROOT/pages/writing.adoc | 66 +++++++++++++++++++++++++++++++--
 1 file changed, 63 insertions(+), 3 deletions(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index 2a9464b..60cb6a2 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -189,6 +189,7 @@ Writing data to a Neo4j database can be done in three ways:
 
 In case you use the option `query`, the Spark Connector persists the entire Dataset by using the provided query.
 The nodes are sent to Neo4j in a batch of rows defined in the `batch.size` property, and your query is wrapped up in an `UNWIND $events AS event` statement.
+The `query` option supports both `CREATE` and `MERGE` clauses.
 
 Let's look at the following simple Spark program:
 
@@ -199,11 +200,24 @@ import org.apache.spark.sql.{SaveMode, SparkSession}
 val spark = SparkSession.builder().getOrCreate()
 import spark.implicits._
 
-val df = (1 to 10)/*...*/.toDF()
+case class Person(name: String, surname: String, age: Int)
+
+// Create an example DataFrame
+val df = Seq(
+    Person("John", "Doe", 42),
+    Person("Jane", "Doe", 40)
+).toDF()
+
+// Define the Cypher query to use in the write
+val query = "CREATE (n:Person {fullName: event.name + ' ' + event.surname})"
+
 df.write
   .format("org.neo4j.spark.DataSource")
   .option("url", "bolt://localhost:7687")
-  .option("query", "CREATE (n:Person {fullName: event.name + event.surname})")
+  .option("authentication.basic.username", USERNAME)
+  .option("authentication.basic.password", PASSWORD)
+  .option("query", query)
+  .mode(SaveMode.Overwrite)
   .save()
 ----
 
@@ -212,11 +226,57 @@ This generates the following query:
 [source,cypher]
 ----
 UNWIND $events AS event
-CREATE (n:Person {fullName: event.name + event.surname})
+CREATE (n:Person {fullName: event.name + ' ' + event.surname})
 ----
 
 Thus `events` is the batch created from your dataset.
 
+==== Considerations
+
+* You must specify the write mode (from `SaveMode`):
+** `Append`: uses CREATE (Spark 3.x)
+** `Overwrite`: uses MERGE
+** `ErrorIfExists`: uses CREATE (Spark 2.x)
+
+* You can use the `events` list in `WITH` statements as well.
+For example, you can replace the query in the previous example with the following:
+
+[source,scala]
+----
+val query = """
+    |WITH event.name + ' ' + toUpper(event.surname) AS fullName
+    |CREATE (n:Person {fullName: fullName})
+""".stripMargin
+----
+
+* Subqueries that reference the `events` list in ``CALL``s are supported:
+
+[source,scala]
+----
+val query = """
+    |CALL {
+    |  WITH event
+    |  RETURN event.name + ' ' + toUpper(event.surname) AS fullName
+    |}
+    |CREATE (n:Person {fullName: fullName})
+""".stripMargin
+----
+
+* If APOC is installed, APOC procedures and functions can be used:
+
+[source,scala]
+----
+val query = """
+    |CALL {
+    |  WITH event
+    |  RETURN event.name + ' ' + apoc.text.toUpperCase(event.surname) AS fullName
+    |}
+    |CREATE (n:Person {fullName: fullName})
+""".stripMargin
+----
+
+* Although a `RETURN` clause is not forbidden, adding one does not have any effect on the query result.
+
 [[write-node]]
 === Node
 

From c1a01eeecbd28c670eb1b09870fa81e18f2437a1 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Tue, 16 Jan 2024 15:13:51 +0000
Subject: [PATCH 03/13] Add example of NoClassDefFoundError to FAQ

---
 modules/ROOT/pages/faq.adoc | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/modules/ROOT/pages/faq.adoc b/modules/ROOT/pages/faq.adoc
index 92dcf8e..1c41312 100644
--- a/modules/ROOT/pages/faq.adoc
+++ b/modules/ROOT/pages/faq.adoc
@@ -88,6 +88,13 @@ NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
 Caused by: ClassNotFoundException: org.apache.spark.sql.sources.v2.ReadSupport
 ----
 
+Or the following:
+
+----
+java.lang.NoClassDefFoundError: scala/collection/IterableOnce
+Caused by: java.lang.ClassNotFoundException: scala.collection.IterableOnce
+----
+
 This means that your Spark version doesn't match the Spark version on the connector.
 Refer to xref:overview.adoc#_spark_and_scala_compatibility[this page] to know which version you need.
 

From 85ae314073f0c368bfd108dc39f47da28762393a Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Tue, 16 Jan 2024 15:21:50 +0000
Subject: [PATCH 04/13] Add reference to Save mode section

---
 modules/ROOT/pages/writing.adoc | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index 60cb6a2..f7c9414 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -233,10 +233,7 @@ Thus `events` is the batch created from your dataset.
 
 ==== Considerations
 
-* You must specify the write mode (from `SaveMode`):
-** `Append`: uses CREATE (Spark 3.x)
-** `Overwrite`: uses MERGE
-** `ErrorIfExists`: uses CREATE (Spark 2.x)
+* You must always specify the <<save-mode>>.
 
 * You can use the `events` list in `WITH` statements as well.
 For example, you can replace the query in the previous example with the following:

From 61500195781bfda8c65adf4466ec2477ca23daed Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Tue, 16 Jan 2024 16:13:24 +0000
Subject: [PATCH 05/13] Add details on indexes and constraints

---
 modules/ROOT/pages/writing.adoc | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index f7c9414..bcb53d6 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -703,7 +703,11 @@ Before the import starts, the following schema query is being created:
 CREATE INDEX ON :Person(surname)
 ----
 
-*Take into consideration that the first label is used for the index creation.*
+The name of the created index is `spark_INDEX_<LABEL>_<NODE_KEYS>`, where `<LABEL>` is the first label from the `labels` option and `<NODE_KEYS>` is a dash-separated sequence of one or more properties as specified in the `node.keys` options.
+In this example, the name of the created index is `spark_INDEX_Person_surname`.
+If the `node.keys` option were set to `"name,surname"` instead, the index name would become `spark_INDEX_Person_name-surname`.
+
+The index is recreated if it is already present.
 
 
 ==== Constraint creation
@@ -727,7 +731,11 @@ Before the import starts, the code above creates the following schema query:
 CREATE CONSTRAINT FOR (p:Person) REQUIRE (p.surname) IS UNIQUE
 ----
 
-*Take into consideration that the first label is used for the index creation.*
+The name of the created constraint is `spark_NODE_CONSTRAINTS_<LABEL>_<NODE_KEYS>`, where `<LABEL>` is the first label from the `labels` option and `<NODE_KEYS>` is a dash-separated sequence of one or more properties as specified in the `node.keys` options.
+In this example, the name of the created constraint is `spark_NODE_CONSTRAINTS_Person_surname`.
+If the `node.keys` option were set to `"name,surname"` instead, the constraint name would become `spark_NODE_CONSTRAINTS_Person_name-surname`.
+
+The constraint is recreated if it is already present.
 
 === Script option
 

From 2c10d6f03b1bff65cbae899ace70573639007746 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Tue, 16 Jan 2024 16:24:36 +0000
Subject: [PATCH 06/13] Add detail on index creation

---
 modules/ROOT/pages/writing.adoc | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index bcb53d6..ccabaf9 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -681,6 +681,10 @@ You can set the optimization via `schema.optimization.type` option that works on
 * `INDEX`: it creates only indexes on provided nodes.
 * `NODE_CONSTRAINTS`: it creates only indexes on provided nodes.
 
+[IMPORTANT]
+The `schema.optimization.type` option cannot be used with the `query` option.
+If you are using a <<write-query, custom Cypher query>>, you need to create indexes and constraints manually using the <<script-option, `script` option>>.
+
 
 ==== Index creation
 
@@ -712,7 +716,7 @@ The index is recreated if it is already present.
 
 ==== Constraint creation
 
-Below you can see an example of how to create indexes while you're creating nodes.
+Below you can see an example of how to create constraints while you're creating nodes.
 
 ----
 ds.write
@@ -737,6 +741,7 @@ If the `node.keys` option were set to `"name,surname"` instead, the constraint n
 
 The constraint is recreated if it is already present.
 
+[[script-option]]
 === Script option
 
 The script option allows you to execute a series of preparation script before Spark

From 3a9921d81d043e8194780fd484f09c3ebbc501be Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Thu, 25 Jan 2024 12:09:44 +0000
Subject: [PATCH 07/13] Update copyright year

---
 preview.yml | 2 +-
 publish.yml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/preview.yml b/preview.yml
index 5aef8c3..035754d 100644
--- a/preview.yml
+++ b/preview.yml
@@ -52,7 +52,7 @@ asciidoc:
     includePDF: false
     nonhtmloutput: ""
     experimental: ''
-    copyright: 2023
+    copyright: 2024
     common-license-page-uri: https://neo4j.com/docs/license/
     check-mark: icon:check[]
     cross-mark: icon:times[]
diff --git a/publish.yml b/publish.yml
index f89712a..4ddfaaf 100644
--- a/publish.yml
+++ b/publish.yml
@@ -53,7 +53,7 @@ asciidoc:
     includePDF: false
     nonhtmloutput: ""
     experimental: ''
-    copyright: 2023
+    copyright: 2024
     common-license-page-uri: https://neo4j.com/docs/license/
     check-mark: icon:check[]
     cross-mark: icon:times[]

From 0553bdc8936997313e5ba6db85baefb4aa7bb495 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Thu, 1 Feb 2024 09:17:17 +0000
Subject: [PATCH 08/13] Update writing section after review

---
 modules/ROOT/pages/writing.adoc | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index ccabaf9..34bd3cb 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -408,6 +408,11 @@ Neo4j Connector for Apache Spark flattens the maps, and each map value is in it'
 
 You can write a DataFrame to Neo4j by specifying source, target nodes, and relationships.
 
+[WARNING]
+====
+To avoid deadlocks, always use a single partition (for example with `coalesce(1)`) before writing relationships to Neo4j.
+====
+
 ==== Overview
 
 Before diving into the actual process, let's clarify the vocabulary first. Since this method of writing data to Neo4j is more complex and few combinations of options can be used, let's spend more time on explaining it.
@@ -474,6 +479,7 @@ val originalDf = spark.read.format("org.neo4j.spark.DataSource")
 
 originalDf
     .where("`target.price` > 2000")
+    .coalesce(1)
     .write
     .format("org.neo4j.spark.DataSource")
     .option("url", "bolt://expensiveprod.host.com:7687")
@@ -531,7 +537,8 @@ val df = spark.read.format("org.neo4j.spark.DataSource")
   .option("relationship.target.labels", "Product")
   .load()
 
-df.write
+df.coalesce(1)
+  .write
   .format("org.neo4j.spark.DataSource")
   .option("url", "bolt://second.host.com:7687")
   .option("relationship", "SOLD")
@@ -591,7 +598,8 @@ val musicDf = Seq(
         (15, "John Butler", "Guitar")
     ).toDF("experience", "name", "instrument")
 
-musicDf.write
+musicDf.coalesce(1)
+    .write
     .format("org.neo4j.spark.DataSource")
     .option("url", "bolt://localhost:7687")
     .option("relationship", "PLAYS")
@@ -638,7 +646,8 @@ val musicDf = Seq(
         (15, "John Butler", "Wooden", "Guitar")
     ).toDF("experience", "name", "instrument_color", "instrument")
 
-musicDf.write
+musicDf.coalesce(1)
+    .write
     .format("org.neo4j.spark.DataSource")
     .option("url", "bolt://localhost:7687")
     .option("relationship", "PLAYS")
@@ -779,6 +788,22 @@ CREATE (n:Person{fullName: event.name + ' ' + event.surname, age: scriptResult[0
 `scriptResult` is the result from the last query contained within the `script` options
 that is `RETURN 36 AS age;`
 
+=== Performance considerations
+
+Since writing is typically an expensive operation, make sure you write only the columns you need from the DataFrame.
+For example, if the columns from the data source are `name`, `surname`, `age`, and `livesIn`, but you only need `name` and `surname`, you can do the following:
+
+[source, scala]
+----
+ds.select(ds("name"), ds("surname"))
+  .write
+  .format("org.neo4j.spark.DataSource")
+  .mode(SaveMode.ErrorIfExists)
+  .option("url", "bolt://localhost:7687")
+  .option("labels", ":Person:Customer")
+  .save()
+----
+
 == Note about columns with Map type
 
 When a Dataframe column is a map, what we do internally is to flatten the map as Neo4j does not support this type for graph entity properties; so for a Spark job like this:

From 7efbb7d0850e94b4df152dbf1bfe9bf59d99abb7 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Thu, 1 Feb 2024 09:29:18 +0000
Subject: [PATCH 09/13] Update reading section after review

---
 modules/ROOT/pages/reading.adoc | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/modules/ROOT/pages/reading.adoc b/modules/ROOT/pages/reading.adoc
index ec97d5d..8bd8fca 100644
--- a/modules/ROOT/pages/reading.adoc
+++ b/modules/ROOT/pages/reading.adoc
@@ -606,3 +606,8 @@ Where:
 * `<target_labels>` is the list of labels provided by `relationship.target.labels` option
 * `<relationship>` is the list of labels provided by `relationship`  option
 * `<limit>` is the value provided via `schema.flatten.limit`
+
+=== Performance considerations
+
+If the schema is not specified, the Spark Connector uses sampling as explained xref:quickstart.adoc#_schema[here] and xref:architecture.adoc#_schema_considerations[here].
+Since sampling is potentially an expensive operation, consider xref:quickstart.adoc#user-defined-schema[supplying your own schema].
\ No newline at end of file

From fb6d36212e6f4ce374cb3556acde458168823d97 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Thu, 1 Feb 2024 09:31:46 +0000
Subject: [PATCH 10/13] Reformat query examples in writing section

---
 modules/ROOT/pages/writing.adoc | 42 ++++++++++++++-------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index 34bd3cb..a1ae5df 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -237,39 +237,33 @@ Thus `events` is the batch created from your dataset.
 
 * You can use the `events` list in `WITH` statements as well.
 For example, you can replace the query in the previous example with the following:
-
-[source,scala]
++
+[source, cypher]
 ----
-val query = """
-    |WITH event.name + ' ' + toUpper(event.surname) AS fullName
-    |CREATE (n:Person {fullName: fullName})
-""".stripMargin
+WITH event.name + ' ' + toUpper(event.surname) AS fullName
+CREATE (n:Person {fullName: fullName})
 ----
 
 * Subqueries that reference the `events` list in ``CALL``s are supported:
-
-[source,scala]
++
+[source, cypher]
 ----
-val query = """
-    |CALL {
-    |  WITH event
-    |  RETURN event.name + ' ' + toUpper(event.surname) AS fullName
-    |}
-    |CREATE (n:Person {fullName: fullName})
-""".stripMargin
+CALL {
+  WITH event
+  RETURN event.name + ' ' + toUpper(event.surname) AS fullName
+}
+CREATE (n:Person {fullName: fullName})
 ----
 
 * If APOC is installed, APOC procedures and functions can be used:
-
-[source,scala]
++
+[source, cypher]
 ----
-val query = """
-    |CALL {
-    |  WITH event
-    |  RETURN event.name + ' ' + apoc.text.toUpperCase(event.surname) AS fullName
-    |}
-    |CREATE (n:Person {fullName: fullName})
-""".stripMargin
+CALL {
+  WITH event
+  RETURN event.name + ' ' + apoc.text.toUpperCase(event.surname) AS fullName
+}
+CREATE (n:Person {fullName: fullName})
 ----
 
 * Although a `RETURN` clause is not forbidden, adding one does not have any effect on the query result.

From be36df1f2284aadc3222eb57f020f4f64635828f Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Thu, 1 Feb 2024 09:33:16 +0000
Subject: [PATCH 11/13] Reworded error in FAQ

---
 modules/ROOT/pages/faq.adoc | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/modules/ROOT/pages/faq.adoc b/modules/ROOT/pages/faq.adoc
index 1c41312..08172f4 100644
--- a/modules/ROOT/pages/faq.adoc
+++ b/modules/ROOT/pages/faq.adoc
@@ -81,15 +81,13 @@ We are working to fully support all the save modes on Spark 3.
 
 == I am getting errors _NoClassDefFoundError_ or _ClassNotFoundException_. What should I do?
 
-You may get this type of error:
+You may get one of the following types of error:
 
 ----
 NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
 Caused by: ClassNotFoundException: org.apache.spark.sql.sources.v2.ReadSupport
 ----
 
-Or the following:
-
 ----
 java.lang.NoClassDefFoundError: scala/collection/IterableOnce
 Caused by: java.lang.ClassNotFoundException: scala.collection.IterableOnce

From da68cc8dda022752d884d6cd2aad0d203ef1f2c5 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Mon, 5 Feb 2024 12:39:54 +0000
Subject: [PATCH 12/13] Fix recreation of index and constraint

---
 modules/ROOT/pages/writing.adoc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index a1ae5df..73883ed 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -714,7 +714,7 @@ The name of the created index is `spark_INDEX_<LABEL>_<NODE_KEYS>`, where `<LABE
 In this example, the name of the created index is `spark_INDEX_Person_surname`.
 If the `node.keys` option were set to `"name,surname"` instead, the index name would become `spark_INDEX_Person_name-surname`.
 
-The index is recreated if it is already present.
+The index is not recreated if it is already present.
 
 
 ==== Constraint creation
@@ -742,7 +742,7 @@ The name of the created constraint is `spark_NODE_CONSTRAINTS_<LABEL>_<NODE_KEYS
 In this example, the name of the created constraint is `spark_NODE_CONSTRAINTS_Person_surname`.
 If the `node.keys` option were set to `"name,surname"` instead, the constraint name would become `spark_NODE_CONSTRAINTS_Person_name-surname`.
 
-The constraint is recreated if it is already present.
+The constraint is not recreated if it is already present.
 
 [[script-option]]
 === Script option

From d19e5d5736ff043d90e08d0e41fcd52b97538561 Mon Sep 17 00:00:00 2001
From: Nicola Vitucci <nicola.vitucci@neo4j.com>
Date: Mon, 5 Feb 2024 12:40:30 +0000
Subject: [PATCH 13/13] Fix wording

---
 modules/ROOT/pages/writing.adoc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/modules/ROOT/pages/writing.adoc b/modules/ROOT/pages/writing.adoc
index 73883ed..b8220ff 100644
--- a/modules/ROOT/pages/writing.adoc
+++ b/modules/ROOT/pages/writing.adoc
@@ -712,7 +712,7 @@ CREATE INDEX ON :Person(surname)
 
 The name of the created index is `spark_INDEX_<LABEL>_<NODE_KEYS>`, where `<LABEL>` is the first label from the `labels` option and `<NODE_KEYS>` is a dash-separated sequence of one or more properties as specified in the `node.keys` options.
 In this example, the name of the created index is `spark_INDEX_Person_surname`.
-If the `node.keys` option were set to `"name,surname"` instead, the index name would become `spark_INDEX_Person_name-surname`.
+If the `node.keys` option were set to `"name,surname"` instead, the index name would be `spark_INDEX_Person_name-surname`.
 
 The index is not recreated if it is already present.
 
@@ -740,7 +740,7 @@ CREATE CONSTRAINT FOR (p:Person) REQUIRE (p.surname) IS UNIQUE
 
 The name of the created constraint is `spark_NODE_CONSTRAINTS_<LABEL>_<NODE_KEYS>`, where `<LABEL>` is the first label from the `labels` option and `<NODE_KEYS>` is a dash-separated sequence of one or more properties as specified in the `node.keys` options.
 In this example, the name of the created constraint is `spark_NODE_CONSTRAINTS_Person_surname`.
-If the `node.keys` option were set to `"name,surname"` instead, the constraint name would become `spark_NODE_CONSTRAINTS_Person_name-surname`.
+If the `node.keys` option were set to `"name,surname"` instead, the constraint name would be `spark_NODE_CONSTRAINTS_Person_name-surname`.
 
 The constraint is not recreated if it is already present.