Skip to content

Commit 814385f

Browse files
authored
fix broken link + minor fixes (#17154)
1 parent 9a3bd1f commit 814385f

File tree

1 file changed

+18
-19
lines changed

1 file changed

+18
-19
lines changed

docs/spark/how-to-guides/windows-instructions.md

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,9 @@ title: Build a .NET for Apache Spark application on Windows
33
description: Learn how to build your .NET for Apache Spark application on Windows.
44
ms.date: 01/29/2020
55
ms.topic: conceptual
6-
ms.custom: mvc,how-to
6+
ms.custom: how-to
77
---
88

9-
109
# Learn how to build your .NET for Apache Spark application on Windows
1110

1211
This article teaches you how to build your .NET for Apache Spark applications on Windows.
@@ -23,22 +22,22 @@ If you already have all of the following prerequisites, skip to the [build](#bui
2322
* .NET Core cross-platform development
2423
* All Required Components
2524
3. Install **[Java 1.8](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)**.
26-
- Select the appropriate version for your operating system e.g., jdk-8u201-windows-x64.exe for Win x64 machine.
27-
- Install using the installer and verify you are able to run `java` from your command-line.
25+
- Select the appropriate version for your operating system. For example, *jdk-8u201-windows-x64.exe* for Windows x64 machine.
26+
- Install using the installer and verify you are able to run `java` from your command line.
2827
4. Install **[Apache Maven 3.6.0+](https://maven.apache.org/download.cgi)**.
29-
- Download [Apache Maven 3.6.0](http://mirror.metrocast.net/apache/maven/maven-3/3.6.0/binaries/apache-maven-3.6.0-bin.zip).
30-
- Extract to a local directory e.g., `C:\bin\apache-maven-3.6.0\`.
31-
- Add Apache Maven to your [PATH environment variable](https://www.java.com/en/download/help/path.xml) e.g., `C:\bin\apache-maven-3.6.0\bin`.
28+
- Download [Apache Maven 3.6.0](http://mirror.metrocast.net/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.zip).
29+
- Extract to a local directory. For example, *C:\bin\apache-maven-3.6.0\*.
30+
- Add Apache Maven to your [PATH environment variable](https://www.java.com/en/download/help/path.xml). For example, *C:\bin\apache-maven-3.6.0\bin*.
3231
- Verify you are able to run `mvn` from your command-line.
3332
5. Install **[Apache Spark 2.3+](https://spark.apache.org/downloads.html)**.
34-
- Download [Apache Spark 2.3+](https://spark.apache.org/downloads.html) and extract it into a local folder (e.g., `C:\bin\spark-2.3.2-bin-hadoop2.7\`) using [7-zip](https://www.7-zip.org/). (The supported spark versions are 2.3.*, 2.4.0, 2.4.1, 2.4.3 and 2.4.4)
35-
- Add a [new environment variable](https://www.java.com/en/download/help/path.xml) `SPARK_HOME` e.g., `C:\bin\spark-2.3.2-bin-hadoop2.7\`.
33+
- Download [Apache Spark 2.3+](https://spark.apache.org/downloads.html) and extract it into a local folder (for example, *C:\bin\spark-2.3.2-bin-hadoop2.7\*) using [7-zip](https://www.7-zip.org/). (The supported spark versions are 2.3.*, 2.4.0, 2.4.1, 2.4.3 and 2.4.4)
34+
- Add a [new environment variable](https://www.java.com/en/download/help/path.xml) `SPARK_HOME`. For example, *C:\bin\spark-2.3.2-bin-hadoop2.7\*.
3635

3736
```powershell
3837
set SPARK_HOME=C:\bin\spark-2.3.2-bin-hadoop2.7\
3938
```
4039
41-
- Add Apache Spark to your [PATH environment variable](https://www.java.com/en/download/help/path.xml) e.g., `C:\bin\spark-2.3.2-bin-hadoop2.7\bin`.
40+
- Add Apache Spark to your [PATH environment variable](https://www.java.com/en/download/help/path.xml). For example, *C:\bin\spark-2.3.2-bin-hadoop2.7\bin*.
4241
4342
```powershell
4443
set PATH=%SPARK_HOME%\bin;%PATH%
@@ -66,36 +65,36 @@ If you already have all of the following prerequisites, skip to the [build](#bui
6665
</details>
6766
6867
6. Install **[WinUtils](https://github.com/steveloughran/winutils)**.
69-
- Download `winutils.exe` binary from [WinUtils repository](https://github.com/steveloughran/winutils). You should select the version of Hadoop the Spark distribution was compiled with, e.g. use hadoop-2.7.1 for Spark 2.3.2.
70-
- Save `winutils.exe` binary to a directory of your choice e.g., `C:\hadoop\bin`.
68+
- Download `winutils.exe` binary from [WinUtils repository](https://github.com/steveloughran/winutils). You should select the version of Hadoop the Spark distribution was compiled with. For exammple, use hadoop-2.7.1 for Spark 2.3.2.
69+
- Save `winutils.exe` binary to a directory of your choice. For example, *C:\hadoop\bin*.
7170
- Set `HADOOP_HOME` to reflect the directory with winutils.exe (without bin). For instance, using command-line:
7271
7372
```powershell
7473
set HADOOP_HOME=C:\hadoop
7574
```
7675
77-
- Set PATH environment variable to include `%HADOOP_HOME%\bin`. For instance, using command-line:
76+
- Set PATH environment variable to include `%HADOOP_HOME%\bin`. For instance, using command line:
7877
7978
```powershell
8079
set PATH=%HADOOP_HOME%\bin;%PATH%
8180
```
8281
83-
Make sure you are able to run `dotnet`, `java`, `mvn`, `spark-shell` from your command-line before you move to the next section. Feel there is a better way? Please [open an issue](https://github.com/dotnet/spark/issues) and feel free to contribute.
82+
Make sure you are able to run `dotnet`, `java`, `mvn`, `spark-shell` from your command line before you move to the next section. Feel there is a better way? [Open an issue](https://github.com/dotnet/spark/issues) and feel free to contribute.
8483
8584
> [!NOTE]
86-
> A new instance of the command-line may be required if any environment variables were updated.
85+
> A new instance of the command line may be required if any environment variables were updated.
8786
8887
## Build
8988
90-
For the remainder of this guide, you will need to have cloned the .NET for Apache Spark repository into your machine. You can choose any location for the cloned repository, e.g., `C:\github\dotnet-spark\`.
89+
For the remainder of this guide, you will need to have cloned the .NET for Apache Spark repository into your machine. You can choose any location for the cloned repository. For example, *C:\github\dotnet-spark\*.
9190
9291
```bash
9392
git clone https://github.com/dotnet/spark.git C:\github\dotnet-spark
9493
```
9594

9695
### Build .NET for Apache Spark Scala extensions layer
9796

98-
When you submit a .NET application, .NET for Apache Spark has the necessary logic written in Scala that informs Apache Spark how to handle your requests (e.g., request to create a new Spark Session, request to transfer data from .NET side to JVM side etc.). This logic can be found in the [.NET for Spark Scala Source Code](https://github.com/dotnet/spark/tree/master/src/scala).
97+
When you submit a .NET application, .NET for Apache Spark has the necessary logic written in Scala that informs Apache Spark how to handle your requests (for example, request to create a new Spark Session, request to transfer data from .NET side to JVM side etc.). This logic can be found in the [.NET for Spark Scala Source Code](https://github.com/dotnet/spark/tree/master/src/scala).
9998

10099
Regardless of whether you are using .NET Framework or .NET Core, you will need to build the .NET for Apache Spark Scala extension layer:
101100

@@ -208,13 +207,13 @@ This section explains how to build the [sample applications](https://github.com/
208207

209208
Once you build the samples, running them will be through `spark-submit` regardless of whether you are targeting .NET Framework or .NET Core. Make sure you have followed the [prerequisites](#prerequisites) section and installed Apache Spark.
210209

211-
1. Set the `DOTNET_WORKER_DIR` or `PATH` environment variable to include the path where the `Microsoft.Spark.Worker` binary has been generated (e.g., `C:\github\dotnet\spark\artifacts\bin\Microsoft.Spark.Worker\Debug\net461` for .NET Framework, `C:\github\dotnet-spark\artifacts\bin\Microsoft.Spark.Worker\Debug\netcoreapp2.1\win10-x64\publish` for .NET Core):
210+
1. Set the `DOTNET_WORKER_DIR` or `PATH` environment variable to include the path where the `Microsoft.Spark.Worker` binary has been generated (for example, *C:\github\dotnet\spark\artifacts\bin\Microsoft.Spark.Worker\Debug\net461* for .NET Framework, *C:\github\dotnet-spark\artifacts\bin\Microsoft.Spark.Worker\Debug\netcoreapp2.1\win10-x64\publish* for .NET Core):
212211

213212
```powershell
214213
set DOTNET_WORKER_DIR=C:\github\dotnet-spark\artifacts\bin\Microsoft.Spark.Worker\Debug\netcoreapp2.1\win10-x64\publish
215214
```
216215

217-
2. Open Powershell and go to the directory where your app binary has been generated (e.g., `C:\github\dotnet\spark\artifacts\bin\Microsoft.Spark.CSharp.Examples\Debug\net461` for .NET Framework, `C:\github\dotnet-spark\artifacts\bin\Microsoft.Spark.CSharp.Examples\Debug\netcoreapp2.1\win10-x64\publish` for .NET Core):
216+
2. Open Powershell and go to the directory where your app binary has been generated (for example, *C:\github\dotnet\spark\artifacts\bin\Microsoft.Spark.CSharp.Examples\Debug\net461* for .NET Framework, *C:\github\dotnet-spark\artifacts\bin\Microsoft.Spark.CSharp.Examples\Debug\netcoreapp2.1\win10-x64\publish* for .NET Core):
218217

219218
```powershell
220219
cd C:\github\dotnet-spark\artifacts\bin\Microsoft.Spark.CSharp.Examples\Debug\netcoreapp2.1\win10-x64\publish

0 commit comments

Comments
 (0)