Course website with videos and slides: https://sparktraining.web.cern.ch/
See also the notebooks on display in the CERN SWAN Gallery
Contact: Luca.Canali@cern.ch
Tutorial-DataFrame.ipynb
Solutions-DataFrame.ipynb
Examples-Pandas on Spark
Tutorial-SparkSQL.ipynb
HandsOn-SparkSQL_exercises.ipynb
HandsOn-SparkSQL_with_solutions.ipynb
Tutorial-SparkStreaming.ipynb
ML_Demo1_Classifier.ipynb
ML_Demo2_Regression.ipynb
Spark_JDBC_Oracle.ipynb
Demo_Spark_on_Hadoop.ipynb
Demo_Dimuon_mass_spectrum.ipynb
NXCals-example.ipynb
NXCals-example_bis.ipynb
TPCDS_PySpark_CERN_SWAN_getstarted.ipynb
LHCb_OpenData_Spark.ipynb
Dimuon_Spark_ROOT_RDataFrame.ipynb
- Open SWAN and clone the repo:
- note this can take a couple of minutes
- as an alternative you can clone the repo from the SWAN GUI https://swan.web.cern.ch
- find and click the button "Download project from git"
- when prompted, clone the repo
https://github.com/cerndb/SparkTraining.git
- Open the tutorial notebooks at SparkTraining -> notebooks
How to run the notebooks from private Jupyter installations or other notebook services (Colab, Binder, etc)
pip install pyspark
git clone https://github.com/cerndb/SparkTraining
- or clone the image at
https://gitlab.cern.ch/db/SparkTraining
- or clone the image at
- Start jupyter:
jupyter-notebook
- Run the notebooks on Colab:
- With this option you will need also to download the data folder and pip install pyspark
- Run on binder: