Name		Name	Last commit message	Last commit date
parent directory ..
notebooks and scripts		notebooks and scripts
README.md		README.md
scale-your-analytics.pdf		scale-your-analytics.pdf

README.md

Scale Your Analytics - Leveraging Apache Spark in Python and R - 17th September 2024

Audience

Are you a data science practitioner who primarily uses Python and/or R? Have you found yourself in situations where your data grew too big and your code failed with an out-of-memory error, or your data processing pipeline brought your machine to its limit? You attempted to scale up but eventually faced the same problems or ran into other ones? If so, this workshop might be for you. We'll talk about scaling your analytics and specifically about how to leverage Apache Spark to scale out your analytics beyond a single machine. We’ll start with an overview of scaling options and the fundamentals of Apache Spark. After that, we’ll explore a simple data processing pipeline in Spark and will see how it compares to equivalent implementations in Python and R.

The workshop will focus on Spark's data frame API and primarily provide examples using Python/pyspark, but the concepts & considerations conveyed are equally applicable to R/SparkR/sparklyr. The workshop will not go into the specifics of Spark structured streaming, Spark's machine learning library (MLlib) and graph processing (GraphX).

Duration

Presentation: ~ 2.5h

Location

ImpactHub Viadukt - Viaduktstrasse 93, 8005 Zürich

Schedule

4:45 pm - Doors open
5:15 pm - Welcome / Start of workshop
7:45 pm - End of workshop / closing remarks
7:45 - 9:00 pm - Apéro at the bar

Prerequisites

Basic knowledge of Python and/or R is highly recommended. No prior knowledge of Apache Spark and the corresponding language APIs is needed.

Workshop participants are not required to bring their laptops.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scale-your-analytics

scale-your-analytics

README.md

Scale Your Analytics - Leveraging Apache Spark in Python and R - 17th September 2024

Audience

Duration

Location

Schedule

Prerequisites

References

Files

scale-your-analytics

Directory actions

More options

Directory actions

More options

Latest commit

History

scale-your-analytics

Folders and files

parent directory

README.md

Scale Your Analytics - Leveraging Apache Spark in Python and R - 17th September 2024

Audience

Duration

Location

Schedule

Prerequisites

References