Skip to content

Spark configuration and tests from parsnip #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 29, 2020
Merged

Spark configuration and tests from parsnip #2

merged 23 commits into from
Jul 29, 2020

Conversation

topepo
Copy link
Member

@topepo topepo commented Jul 8, 2020

This is to move spark testing out of parsnip.

Currently not testing on Windows, only macOS. Fails with error:

> ## -----------------------------------------------------------------------------
> 
> library(sparklyr)

Attaching package: 'sparklyr'

The following object is masked from 'package:purrr':

    invoke

> 
> if (.Platform$OS.type == "windows") {
+   spark_install_winutils("2.4")
+   sparklyr::spark_install(verbose = TRUE, version = "2.4", hadoop_version = "2.7")
+ } else {
+   sparklyr::spark_install(verbose = TRUE, version = "2.4")
+ }
Installing winutils...
trying URL 'https://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/winutils.exe'
Content type 'application/octet-stream' length 108032 bytes (105 KB)
==================================================
downloaded 105 KB

Installed winutils in C:\Users\runneradmin\AppData\Local\spark\spark-2.4-bin-hadoop2.7\tmp\hadoop\bin\winutils.exe
Installing Spark 2.4.5 for Hadoop 2.7 or later.
Downloading from:
- 'https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz'
Installing to:
- 'C:\Users\runneradmin\AppData\Local/spark/spark-2.4.5-bin-hadoop2.7'
trying URL 'https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz'
Content type 'application/x-gzip' length 232530699 bytes (221.8 MB)
==================================================
downloaded 221.8 MB

Installation complete.
> 
> sc <- try(sparklyr::spark_connect(master = "local"), silent = TRUE)
> 
> if(inherits(sc, "try-error")) {
+   print(sc)
+ }
[1] "Error : \n\nTo run Spark on Windows you need a copy of Hadoop winutils.exe:\n\n1. Download Hadoop winutils.exe from:\n\n   https://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/\n\n2. Copy winutils.exe to C:\\Users\\runneradmin\\AppData\\Local\\spark\\spark-2.4.5-bin-hadoop2.7\\tmp\\hadoop\\bin\n\nAlternatively, if you are using RStudio you can install the RStudio Preview Release,\nwhich includes an embedded copy of Hadoop winutils.exe:\n\n  https://www.rstudio.com/products/rstudio/download/preview/\n\n\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError: 

To run Spark on Windows you need a copy of Hadoop winutils.exe:

1. Download Hadoop winutils.exe from:

   https://github.com/steveloughran/winutils/raw/master/hadoop-2.6.0/bin/

2. Copy winutils.exe to C:\Users\runneradmin\AppData\Local\spark\spark-2.4.5-bin-hadoop2.7\tmp\hadoop\bin

Alternatively, if you are using RStudio you can install the RStudio Preview Release,
which includes an embedded copy of Hadoop winutils.exe:

  https://www.rstudio.com/products/rstudio/download/preview/

>

I'll continue to work on it. Perhaps @javierluraschi could take a another look.

topepo added a commit to tidymodels/parsnip that referenced this pull request Jul 29, 2020
@topepo topepo merged commit bc5e0ca into master Jul 29, 2020
@topepo topepo deleted the spark branch July 29, 2020 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant