You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the new exercises, we're not covering some of the more interesting Spark functions.
We'll create a new exercise for "Additional Spark Functions" (in the small-exercises repo) to cover the following:
DataFrame Cleaning
na.drop
na.fill
replace
coalesce
DataFrame Queries
select + array_contains
Aggregations
stddev
variance
mean
String Operations
regexp_replace
regexp_extract
For everything else
UDF
CFRs
All functions should have a solution (in a separate Solutions notebook)
All functions should have a link to the documentation for PySpark
Notes
We might be able to reuse some of the examples we had in the Wrangling with Spark exercise, but do it better. If there's an opportunity to use our domain data, that would be best but we might need to dirty up some data and save it as a CSV or something in the repo in order to pull it in
Open Questions
Are these valuable?
cache
unpersist
createOrReplaceGlobalTempView
createOrReplaceTempView
Should all functions have a test? Perhaps we can do it later?
The text was updated successfully, but these errors were encountered:
With the new exercises, we're not covering some of the more interesting Spark functions.
We'll create a new exercise for "Additional Spark Functions" (in the small-exercises repo) to cover the following:
DataFrame Cleaning
DataFrame Queries
Aggregations
String Operations
For everything else
CFRs
Notes
We might be able to reuse some of the examples we had in the Wrangling with Spark exercise, but do it better. If there's an opportunity to use our domain data, that would be best but we might need to dirty up some data and save it as a CSV or something in the repo in order to pull it in
Open Questions
Are these valuable?
Should all functions have a test? Perhaps we can do it later?
The text was updated successfully, but these errors were encountered: