From b5dddda0d3dbeba0f12b2baee8f39c9f4fe73f82 Mon Sep 17 00:00:00 2001
From: Wil Roberts <47739563+robertswh@users.noreply.github.com>
Date: Wed, 13 Dec 2023 10:51:44 +0000
Subject: [PATCH] new local session section in Spark session guide (#106)

* new local session section, first draft

* taking on review comments

* fixed link
---
 .../spark-overview/example-spark-sessions.md  | 52 ++++++++++++++++---
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/ons-spark/spark-overview/example-spark-sessions.md b/ons-spark/spark-overview/example-spark-sessions.md
index d78c4c7e..58cc9dda 100644
--- a/ons-spark/spark-overview/example-spark-sessions.md
+++ b/ons-spark/spark-overview/example-spark-sessions.md
@@ -1,20 +1,59 @@
 ## Example Spark Sessions
 
-This document gives some example Spark sessions. For more information on Spark sessions and why you need to be careful with memory usage, please consult the [Guidance on Spark Sessions](../spark-overview/spark-session-guidance) and [Configuration Hierarchy and `spark-defaults.conf`](../spark-overview/spark-defaults).
+This article gives some example Spark sessions, or Spark applications. For more information on Spark sessions and why you need to be careful with memory usage, please consult the [Guidance on Spark Sessions](../spark-overview/spark-session-guidance) and [Configuration Hierarchy and `spark-defaults.conf`](../spark-overview/spark-defaults).
 
 
-Remember to only use a Spark session for as long as you need. It's good etiquette to use [`spark.stop()`](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.stop.html) (for PySpark) or [`spark_disconnect(sc)`](https://spark.rstudio.com/packages/sparklyr/latest/reference/spark-connections.html) (for sparklyr) in your scripts. Stopping the CDSW or Jupyter Notebook session will also close the Spark session if one is running.
+Remember to only use a Spark session for as long as you need. It's good etiquette to use [`spark.stop()`](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.stop.html) (for PySpark) or [`spark_disconnect(sc)`](https://spark.rstudio.com/packages/sparklyr/latest/reference/spark-connections.html) (for sparklyr) in your scripts. Stopping the container or Jupyter Notebook session will also close the Spark session if one is running.
 
-### Default/Blank Session
+### Local mode
 
-As a starting point you can create a Spark session with all the default options. This is the bare minimum you need to create a Spark session and will work fine for many DAP users.
+As mentioned at the top of the article on [Guidance on Spark Sessions](../spark-overview/spark-session-guidance) there are two modes to running a Spark application, one is local mode (this example) and the other is cluster mode (all other examples below). Local mode can be used when running a Spark application on a single computer or node.
 
-Please use this session by default or if unsure in any way about your resource requirements.
+Details:
+- Utilises resource of a single node or machine
+- This example uses 2 cores
+- Amount of memory used depends on the node or machine
+
+Use case:
+- Developing code using dummy or synthetic data or a small sample of data
+- Writing unit tests
+
+Example of actual usage:
+- Pipeline development using dummy data
+- Throughout this book
+
+````{tabs}
+```{code-tab} py
+from pyspark.sql import SparkSession
 
-Note that for PySpark, `.config("spark.ui.showConsoleProgress", "false")` is still recommended for use with this session; this will stop the console progress in Spark, which sometimes obscures results from displaying properly.
+spark = (
+    SparkSession.builder.master("local[2]")
+    .appName("local_session")
+    .getOrCreate()
+)
+```
+
+```{code-tab} r R
+library(sparklyr)
+
+sc <- sparklyr::spark_connect(
+  master = "local[2]",
+  app_name = "local-session",
+  config = sparklyr::spark_config())
+```
+````
+
+Note that all dependencies must also be in place to run a Spark application on your laptop, see Setting up resources section in the [Getting Started with Spark](../spark-overview/spark-start) article for further information.
+
+### Default Session
+
+As a starting point you can create a Spark session with all the default options. This is the bare minimum you need to create a Spark session and will work fine in most cases.
+
+Please use this session by default or if unsure in any way about your resource requirements.
 
 Details:
 - Will give you the default config options
+- Amount of resource depends on your specific platform
     
 Use case:
 - When unsure of your requirements
@@ -33,6 +72,7 @@ spark = (
     .getOrCreate()
 )
 ```
+
 ```{code-tab} r R
 library(sparklyr)