Skip to content

Commit

Permalink
Minor update to the notebook examples
Browse files Browse the repository at this point in the history
  • Loading branch information
LucaCanali committed Mar 25, 2024
1 parent ca25de7 commit d4193bf
Show file tree
Hide file tree
Showing 4 changed files with 79 additions and 72 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ and spark-shell/pyspark environments.
- [TPCDS PySpark](https://github.com/LucaCanali/Miscellaneous/tree/master/Performance_Testing/TPCDS_PySpark) - A tool you can use run TPCDS with PySpark, instrumented with sparkMeasure
- [Spark monitoring dashboard](https://github.com/cerndb/spark-dashboard) - A custom monitoring pipeline and dashboard for Spark
- [Introductory course on Apache Spark](https://sparktraining.web.cern.ch/)
- [Flamegraphs for profiling Spark jobs](https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Tools_Spark_Pyroscope_FlameGraph.md)
- [Notes on Apache Spark](https://github.com/LucaCanali/Miscellaneous/tree/master/Spark_Notes)

Main author and contact: Luca.Canali@cern.ch
Expand Down Expand Up @@ -86,9 +87,9 @@ Examples:

- [<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Jupyter_logo.svg/250px-Jupyter_logo.svg.png" height="50"> Local Python/Jupyter Notebook](examples/SparkMeasure_Jupyter_Python_getting_started.ipynb)

- [<img src="https://upload.wikimedia.org/wikipedia/commons/6/63/Databricks_Logo.png" height="40"> Scala notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/2729765977711377/442806354506758/latest.html)
- [<img src="https://upload.wikimedia.org/wikipedia/commons/6/63/Databricks_Logo.png" height="40"> Scala notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/2910895789597333/442806354506758/latest.html)

- [<img src="https://upload.wikimedia.org/wikipedia/commons/6/63/Databricks_Logo.png" height="40"> Python notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/3856830937265976/442806354506758/latest.html)
- [<img src="https://upload.wikimedia.org/wikipedia/commons/6/63/Databricks_Logo.png" height="40"> Python notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/2910895789597316/442806354506758/latest.html)


- Stage-level metrics from the command line:
Expand Down
6 changes: 3 additions & 3 deletions examples/Example_Notebooks_Databricks.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This is to link to two example notebooks that you can run with the Databricks community edition

- [example Scala notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/2729765977711377/442806354506758/latest.html),
- [example Python notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/3856830937265976/442806354506758/latest.html)
- [example Scala notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/2910895789597333/442806354506758/latest.html)
- [example Python notebook on Databricks](https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/2061385495597958/2910895789597316/442806354506758/latest.html)



8 changes: 4 additions & 4 deletions examples/SparkMeasure_Jupyter_Colab_Example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"![sparkMeasure architecture diagram](https://github.com/LucaCanali/sparkMeasure/raw/master/docs/sparkMeasure_architecture_diagram.png)\n",
"\n",
"Author and contact: Luca.Canali@cern.ch \n",
"Last updated: April 2023"
"Last updated: March 2024"
]
},
{
Expand Down Expand Up @@ -60,13 +60,13 @@
"# Start the Spark Session\n",
"# This example uses Spark in local mode for simplicity.\n",
"# You can modify master to use YARN or K8S if available \n",
"# This example uses sparkMeasure 0.23 for scala 2.12, taken from maven central\n",
"# This example uses sparkMeasure 0.24 for scala 2.12, taken from maven central\n",
"\n",
"spark = SparkSession \\\n",
" .builder \\\n",
" .master(\"local[*]\") \\\n",
" .appName(\"Test sparkmeasure instrumentation of Python/PySpark code\") \\\n",
" .config(\"spark.jars.packages\",\"ch.cern.sparkmeasure:spark-measure_2.12:0.23\") \\\n",
" .config(\"spark.jars.packages\",\"ch.cern.sparkmeasure:spark-measure_2.12:0.24\") \\\n",
" .getOrCreate()"
]
},
Expand Down Expand Up @@ -542,7 +542,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.11.5"
}
},
"nbformat": 4,
Expand Down
132 changes: 69 additions & 63 deletions examples/SparkMeasure_Jupyter_Python_getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"![sparkMeasure architecture diagram](https://github.com/LucaCanali/sparkMeasure/raw/master/docs/sparkMeasure_architecture_diagram.png)\n",
"\n",
"Author and contact: Luca.Canali@cern.ch \n",
"Last updated: April 2023"
"Last updated: March 2024"
]
},
{
Expand All @@ -40,7 +40,7 @@
"# It can be handy if you have to choose among multiple Spark homes\n",
"# !pip install findspark\n",
"# import findspark\n",
"# findspark.init(\"/home/luca/Spark/spark-3.4.0-bin-hadoop3\")"
"# findspark.init(\"/home/luca/Spark/spark-3.5.1-bin-hadoop3\")"
]
},
{
Expand All @@ -52,14 +52,14 @@
"# Start the Spark Session\n",
"# This example uses Spark in local mode for simplicity.\n",
"# You can modify master to use YARN or K8S if available \n",
"# This example uses sparkMeasure 0.23 for scala 2.12, taken from maven central\n",
"# This example uses sparkMeasure 0.24 for scala 2.12, taken from maven central\n",
"\n",
"\n",
"from pyspark.sql import SparkSession\n",
"spark = (SparkSession.builder\n",
" .appName(\"Test sparkmeasure instrumentation of Python/PySpark code\")\n",
" .master(\"local[*]\")\n",
" .config(\"spark.jars.packages\",\"ch.cern.sparkmeasure:spark-measure_2.12:0.23\")\n",
" .config(\"spark.jars.packages\",\"ch.cern.sparkmeasure:spark-measure_2.12:0.24\")\n",
" .getOrCreate()\n",
" ) \n"
]
Expand Down Expand Up @@ -100,17 +100,17 @@
"Aggregated Spark stage metrics:\n",
"numStages => 3\n",
"numTasks => 17\n",
"elapsedTime => 1372 (1 s)\n",
"stageDuration => 1047 (1 s)\n",
"executorRunTime => 2753 (3 s)\n",
"executorCpuTime => 2311 (2 s)\n",
"executorDeserializeTime => 3440 (3 s)\n",
"executorDeserializeCpuTime => 1321 (1 s)\n",
"resultSerializationTime => 4 (4 ms)\n",
"jvmGCTime => 192 (0.2 s)\n",
"elapsedTime => 1151 (1 s)\n",
"stageDuration => 936 (0.9 s)\n",
"executorRunTime => 3255 (3 s)\n",
"executorCpuTime => 2116 (2 s)\n",
"executorDeserializeTime => 909 (0.9 s)\n",
"executorDeserializeCpuTime => 228 (0.2 s)\n",
"resultSerializationTime => 36 (36 ms)\n",
"jvmGCTime => 0 (0 ms)\n",
"shuffleFetchWaitTime => 0 (0 ms)\n",
"shuffleWriteTime => 29 (29 ms)\n",
"resultSize => 16134 (15.8 KB)\n",
"shuffleWriteTime => 11 (11 ms)\n",
"resultSize => 16295 (15.9 KB)\n",
"diskBytesSpilled => 0 (0 Bytes)\n",
"memoryBytesSpilled => 0 (0 Bytes)\n",
"peakExecutionMemory => 0\n",
Expand All @@ -129,10 +129,12 @@
"shuffleBytesWritten => 472 (472 Bytes)\n",
"shuffleRecordsWritten => 8\n",
"\n",
"Average number of active tasks => 2.8\n",
"\n",
"Stages and their duration:\n",
"Stage 0 duration => 632 (0.6 s)\n",
"Stage 1 duration => 362 (0.4 s)\n",
"Stage 3 duration => 53 (53 ms)\n"
"Stage 0 duration => 394 (0.4 s)\n",
"Stage 1 duration => 458 (0.5 s)\n",
"Stage 3 duration => 84 (84 ms)\n"
]
}
],
Expand All @@ -157,11 +159,11 @@
"\n",
"Additional stage-level executor metrics (memory usage info):\n",
"\n",
"Stage 0 JVMHeapMemory maxVal bytes => 105968664 (101.1 MB)\n",
"Stage 0 JVMHeapMemory maxVal bytes => 337096704 (321.5 MB)\n",
"Stage 0 OnHeapExecutionMemory maxVal bytes => 0 (0 Bytes)\n",
"Stage 1 JVMHeapMemory maxVal bytes => 105968664 (101.1 MB)\n",
"Stage 1 JVMHeapMemory maxVal bytes => 337096704 (321.5 MB)\n",
"Stage 1 OnHeapExecutionMemory maxVal bytes => 0 (0 Bytes)\n",
"Stage 3 JVMHeapMemory maxVal bytes => 105968664 (101.1 MB)\n",
"Stage 3 JVMHeapMemory maxVal bytes => 337096704 (321.5 MB)\n",
"Stage 3 OnHeapExecutionMemory maxVal bytes => 0 (0 Bytes)\n"
]
}
Expand Down Expand Up @@ -197,17 +199,17 @@
"Aggregated Spark stage metrics:\n",
"numStages => 3\n",
"numTasks => 17\n",
"elapsedTime => 427 (0.4 s)\n",
"stageDuration => 350 (0.4 s)\n",
"executorRunTime => 2151 (2 s)\n",
"executorCpuTime => 1986 (2 s)\n",
"executorDeserializeTime => 55 (55 ms)\n",
"executorDeserializeCpuTime => 36 (36 ms)\n",
"resultSerializationTime => 0 (0 ms)\n",
"elapsedTime => 537 (0.5 s)\n",
"stageDuration => 448 (0.4 s)\n",
"executorRunTime => 2331 (2 s)\n",
"executorCpuTime => 1742 (2 s)\n",
"executorDeserializeTime => 125 (0.1 s)\n",
"executorDeserializeCpuTime => 64 (64 ms)\n",
"resultSerializationTime => 8 (8 ms)\n",
"jvmGCTime => 0 (0 ms)\n",
"shuffleFetchWaitTime => 0 (0 ms)\n",
"shuffleWriteTime => 19 (19 ms)\n",
"resultSize => 16048 (15.7 KB)\n",
"shuffleWriteTime => 10 (10 ms)\n",
"resultSize => 16080 (15.7 KB)\n",
"diskBytesSpilled => 0 (0 Bytes)\n",
"memoryBytesSpilled => 0 (0 Bytes)\n",
"peakExecutionMemory => 0\n",
Expand All @@ -226,10 +228,12 @@
"shuffleBytesWritten => 472 (472 Bytes)\n",
"shuffleRecordsWritten => 8\n",
"\n",
"Average number of active tasks => 4.3\n",
"\n",
"Stages and their duration:\n",
"Stage 4 duration => 28 (28 ms)\n",
"Stage 5 duration => 303 (0.3 s)\n",
"Stage 7 duration => 19 (19 ms)\n"
"Stage 4 duration => 39 (39 ms)\n",
"Stage 5 duration => 389 (0.4 s)\n",
"Stage 7 duration => 20 (20 ms)\n"
]
}
],
Expand Down Expand Up @@ -291,17 +295,17 @@
"Aggregated Spark stage metrics:\n",
"numStages => 3\n",
"numTasks => 17\n",
"elapsedTime => 473 (0.5 s)\n",
"stageDuration => 388 (0.4 s)\n",
"executorRunTime => 2365 (2 s)\n",
"executorCpuTime => 1860 (2 s)\n",
"executorDeserializeTime => 80 (80 ms)\n",
"executorDeserializeCpuTime => 39 (39 ms)\n",
"resultSerializationTime => 0 (0 ms)\n",
"jvmGCTime => 0 (0 ms)\n",
"elapsedTime => 478 (0.5 s)\n",
"stageDuration => 398 (0.4 s)\n",
"executorRunTime => 2259 (2 s)\n",
"executorCpuTime => 1752 (2 s)\n",
"executorDeserializeTime => 78 (78 ms)\n",
"executorDeserializeCpuTime => 48 (48 ms)\n",
"resultSerializationTime => 31 (31 ms)\n",
"jvmGCTime => 105 (0.1 s)\n",
"shuffleFetchWaitTime => 0 (0 ms)\n",
"shuffleWriteTime => 8 (8 ms)\n",
"resultSize => 16048 (15.7 KB)\n",
"shuffleWriteTime => 48 (48 ms)\n",
"resultSize => 16467 (16.1 KB)\n",
"diskBytesSpilled => 0 (0 Bytes)\n",
"memoryBytesSpilled => 0 (0 Bytes)\n",
"peakExecutionMemory => 0\n",
Expand All @@ -320,10 +324,12 @@
"shuffleBytesWritten => 472 (472 Bytes)\n",
"shuffleRecordsWritten => 8\n",
"\n",
"Average number of active tasks => 4.7\n",
"\n",
"Stages and their duration:\n",
"Stage 8 duration => 25 (25 ms)\n",
"Stage 9 duration => 350 (0.4 s)\n",
"Stage 11 duration => 13 (13 ms)\n"
"Stage 8 duration => 38 (38 ms)\n",
"Stage 9 duration => 339 (0.3 s)\n",
"Stage 11 duration => 21 (21 ms)\n"
]
}
],
Expand Down Expand Up @@ -366,18 +372,18 @@
"numTasks => 17\n",
"successful tasks => 17\n",
"speculative tasks => 0\n",
"taskDuration => 2415 (2 s)\n",
"schedulerDelayTime => 97 (97 ms)\n",
"executorRunTime => 2269 (2 s)\n",
"executorCpuTime => 1906 (2 s)\n",
"executorDeserializeTime => 49 (49 ms)\n",
"executorDeserializeCpuTime => 26 (26 ms)\n",
"resultSerializationTime => 0 (0 ms)\n",
"taskDuration => 2268 (2 s)\n",
"schedulerDelayTime => 112 (0.1 s)\n",
"executorRunTime => 2084 (2 s)\n",
"executorCpuTime => 1721 (2 s)\n",
"executorDeserializeTime => 70 (70 ms)\n",
"executorDeserializeCpuTime => 34 (34 ms)\n",
"resultSerializationTime => 2 (2 ms)\n",
"jvmGCTime => 0 (0 ms)\n",
"shuffleFetchWaitTime => 0 (0 ms)\n",
"shuffleWriteTime => 0 (0 ms)\n",
"gettingResultTime => 0 (0 ms)\n",
"resultSize => 2667 (2.6 KB)\n",
"resultSize => 4006 (3.9 KB)\n",
"diskBytesSpilled => 0 (0 Bytes)\n",
"memoryBytesSpilled => 0 (0 Bytes)\n",
"peakExecutionMemory => 0\n",
Expand Down Expand Up @@ -432,18 +438,18 @@
"numTasks => 17\n",
"successful tasks => 17\n",
"speculative tasks => 0\n",
"taskDuration => 2368 (2 s)\n",
"schedulerDelayTime => 100 (0.1 s)\n",
"executorRunTime => 2203 (2 s)\n",
"executorCpuTime => 1939 (2 s)\n",
"executorDeserializeTime => 65 (65 ms)\n",
"executorDeserializeCpuTime => 30 (30 ms)\n",
"resultSerializationTime => 0 (0 ms)\n",
"taskDuration => 2320 (2 s)\n",
"schedulerDelayTime => 108 (0.1 s)\n",
"executorRunTime => 2128 (2 s)\n",
"executorCpuTime => 1738 (2 s)\n",
"executorDeserializeTime => 76 (76 ms)\n",
"executorDeserializeCpuTime => 38 (38 ms)\n",
"resultSerializationTime => 8 (8 ms)\n",
"jvmGCTime => 0 (0 ms)\n",
"shuffleFetchWaitTime => 0 (0 ms)\n",
"shuffleWriteTime => 7 (7 ms)\n",
"shuffleWriteTime => 1 (1 ms)\n",
"gettingResultTime => 0 (0 ms)\n",
"resultSize => 2667 (2.6 KB)\n",
"resultSize => 4006 (3.9 KB)\n",
"diskBytesSpilled => 0 (0 Bytes)\n",
"memoryBytesSpilled => 0 (0 Bytes)\n",
"peakExecutionMemory => 0\n",
Expand Down Expand Up @@ -503,7 +509,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.11.5"
}
},
"nbformat": 4,
Expand Down

0 comments on commit d4193bf

Please sign in to comment.