GitHub - cerndb/SparkTraining: Material for the course "Introduction to Apache Spark APIs for Data Processing" https://sparktraining.web.cern.ch/

Training material for the course "Introduction to Spark APIs for Data Processing"

Course website with videos and slides: https://sparktraining.web.cern.ch/

Notebooks

How to run the notebooks from CERN SWAN Notebook Service

Open SWAN and clone the repo:
- note this can take a couple of minutes
- as an alternative you can clone the repo from the SWAN GUI https://swan.web.cern.ch
  - find and click the button "Download project from git"
  - when prompted, clone the repo https://github.com/cerndb/SparkTraining.git
Open the tutorial notebooks at SparkTraining -> notebooks

How to run the notebooks from private Jupyter installations or other notebook services (Colab, Binder, etc)

pip install pyspark
git clone https://github.com/cerndb/SparkTraining
- or clone the image at https://gitlab.cern.ch/db/SparkTraining
Start jupyter: jupyter-notebook
Run the notebooks on GitHub Codespaces:
Run the notebooks on Colab:
- With this option you will need also to download the data folder and pip install pyspark
Run on binder:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.binder		.binder
.devcontainer/SparkTraining		.devcontainer/SparkTraining
code		code
data		data
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training material for the course "Introduction to Spark APIs for Data Processing"

Course website with videos and slides: https://sparktraining.web.cern.ch/

Contents

Notebooks

Session 1

Session 2

Session 3

Session 4

Additional SWAN gallery notebooks

How to run the notebooks from CERN SWAN Notebook Service

How to run the notebooks from private Jupyter installations or other notebook services (Colab, Binder, etc)

About

Releases 1

Packages

Languages

License

cerndb/SparkTraining

Folders and files

Latest commit

History

Repository files navigation

Training material for the course "Introduction to Spark APIs for Data Processing"

Course website with videos and slides: https://sparktraining.web.cern.ch/

Contents

Notebooks

Session 1

Session 2

Session 3

Session 4

Additional SWAN gallery notebooks

How to run the notebooks from CERN SWAN Notebook Service

How to run the notebooks from private Jupyter installations or other notebook services (Colab, Binder, etc)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages