How to build machine image (AMI) to run test pipeline on Windows #7

hcho3 · 2019-11-29T01:03:11Z

The testing pipeline consists of two stages:

Build
Test

We require separate machine images for Steps 1 and 2.

Build machine: Windows 2012 R2, CUDA 9.0, Visual Studio 2017

Go to the EC2 console and launch a new P2.xlarge instance. Choose Windows Server 2012 R2 as the OS. Make sure to allocate 70 GB of disk space, since Visual Studio eats up lots of disk space. Set a security group policy that allows you to connect via port 3389. Select xgboost-ci key pair when prompted for a key pair.
Wait a few minutes and then connect to the instance via Microsoft Remote Desktop. Then follow instructions in Install Chrome in Windows Server using Powershell to install Google Chrome. (Internet Explorer is so locked down that we won't be able to do anything).
Using Chrome, navigate to this page to download Visual Studio 2017 Community. To support CUDA 9.0, make sure to choose version 15.0, not 15.9. The Community Edition is available free-of-charge for all open-source developers. You may need to create a new Microsoft account if you don't have one yet. When running the Visual Studio installer, make sure to install the component Desktop Development with C++.
Navigate to NVIDIA website to download CUDA Toolkit 9.0. In the download page, make sure to select Windows Server 2012 R2 as the OS. You should download the base download as well as 4 patches.
Install Notepad++ so that we can edit any text files.
Install Ninja. Place ninja.exe in C:\Ninja directory and add C:\Ninja to the system PATH environment variable.
Install Miniconda. Choose the Python 3.7 + Windows 64-bit variant. Choose "Install for All Users" option. In the "Advanced Options" dialog, check "Register Anaconda as the system Python 3.7" and un-check "Add Anaconda to the system PATH variable."

After Miniconda is fully installed, go to the Start menu and run "Anaconda Prompt (Miniconda)." This is a special terminal we use to configure Conda. Run conda init in the terminal. Now we can use Conda environment in ordinary Command Prompt or Powershell, just by running conda activate.
8 Open Command Prompt and activate Conda environment with command conda activate. Then install essential Python packages:

conda install numpy scipy matplotlib scikit-learn pandas pytest

Install Git for Windows from the official git website. Leave all installation options as default, except for the one question about the PATH environment variable. Choose the last option "Use Git and optional Unix tools from the Command Prompt".

Obtain latest CMake from the KitWare website. In the installer wizard dialog, select "Add CMake to the system PATH for all users."
Install Java Runtime Environment (JRE) from Oracle.
Now stop the instance and change the instance type to C5.4xlarge. This is so that we won't need to use an expensive GPU instance type when compiling XGBoost. To check this is okay, turn the instance back on, connect to it, and then run the following commands in the special terminal "x64 Native Tools Command Prompt for VS 2017":

git clone --recursive https://github.com/dmlc/xgboost
cd xgboost
mkdir build
cd build
cmake .. -GNinja -DUSE_CUDA=ON -DCMAKE_BUILD_TYPE=Release
ninja

You will see that it is possible to compile CUDA programs using a machine without an NVIDIA GPU. However, running it would require a GPU.
12. Install OpenSSH for Windows by following instructions in this page. Set up public key authentication, with the private key xgboost-ci. (Mess this up, and the Jenkins master won't be able to SSH into the Windows workers.) Make sure to follow advice in PowerShell/Win32-OpenSSH#1306 (comment). Note: all changes in the SSH configuration take effect when you restart two Windows services associated with OpenSSH.
13. Set a password you can remember. This is important because the EC2 console cannot retrieve Windows password from a custom AMI.
14. Stop the instance again and create an AMI from the stopped instance.

Testing machine: Windows Server 2008 R2, no GPU

Note that you don't need Visual Studio to run tests.

Go to the EC2 console and launch a new C5.4xlarge instance. Choose Windows Server 2008 R2 as the OS. Allocate about 40 GB of disk space. Set a security group policy that allows you to connect via port 3389. Select xgboost-ci key pair when prompted for a key pair.
Wait a few minutes and then connect to the instance via Microsoft Remote Desktop. Then follow instructions in Install Chrome in Windows Server using Powershell to install Google Chrome. (Internet Explorer is so locked down that we won't be able to do anything).
Install Notepad++ so that we can edit any text files.
Install Miniconda. Refer to Step 6 of the first section ("Build machine").
Open Command Prompt and activate Conda environment with command conda activate. Then install essential Python packages:

conda install numpy scipy matplotlib scikit-learn pandas pytest

Install Git for Windows from the official git website. Refer to Step 8 of the first section ("Build machine").
Install Java Runtime Environment (JRE) from Oracle.
Install GraphViz. Then add directory C:\Program Files (x86)\Graphviz2.38\bin to the system PATH. (Search "Edit the system environment variables from the Start menu.) To check, open a new Command Prompt and run dot -V command. Then install the graphviz python package by running conda activate && pip install graphviz.
Install OpenSSH for Windows. Refer to Step 12 of the first section ("Build machine").
Set a password you can remember. This is important because the EC2 console cannot retrieve Windows password from a custom AMI.
Stop the instance and create an AMI from the stopped instance.

Testing machine: Windows Server 2012 R2, CUDA 9.0

Go to the EC2 console and launch a new P2.xlarge instance. Choose Windows Server 2012 R2 as the OS. Allocate about 40 GB of disk space. Set a security group policy that allows you to connect via port 3389. Select xgboost-ci key pair when prompted for a key pair.
Wait a few minutes and then connect to the instance via Microsoft Remote Desktop. Then follow instructions in Install Chrome in Windows Server using Powershell to install Google Chrome. (Internet Explorer is so locked down that we won't be able to do anything).
Navigate to NVIDIA website to download CUDA Toolkit 9.0. In the download page, make sure to select Windows Server 2012 R2 as the OS. You should download the base download as well as 4 patches.
Install Notepad++ so that we can edit any text files.
Install Miniconda. Refer to Step 6 of the first section ("Build machine").
Open Command Prompt and activate Conda environment with command conda activate. Then install essential Python packages:

conda install numpy scipy matplotlib scikit-learn pandas pytest

Install Git for Windows from the official git website. Refer to Step 8 of the first section ("Build machine").
Install Java Runtime Environment (JRE) from Oracle.
Install GraphViz. Then add directory C:\Program Files (x86)\Graphviz2.38\bin to the system PATH. (Search "Edit the system environment variables from the Start menu.) To check, open a new Command Prompt and run dot -V command. Then install the graphviz python package by running conda activate && pip install graphviz.
Install OpenSSH for Windows. Refer to Step 12 of the first section ("Build machine").
Install Microsoft Visual C++ Redistributable for Visual Studio 2015, 2017, and 2019. This will ensure that the C++ tests (testxgboost.exe) will be able to run without crashing.
Set a password you can remember. This is important because the EC2 console cannot retrieve Windows password from a custom AMI.
Stop the instance and create an AMI from the stopped instance.

Testing machine: Windows Server 2016, CUDA 10.0

Carry out all steps in the previous section, with following changes:

Use P2.xlarge instance type. (G4 is not compatible with CUDA 10.0)
Install CUDA Toolkit 10.0.

Testing machine: Windows Server 2019, CUDA 10.1

Carry out all steps in the previous section, with following changes:

Use G4dn.xlarge instance type.
Install CUDA Toolkit 10.1.

The text was updated successfully, but these errors were encountered:

hcho3 · 2019-11-29T01:03:51Z

@trivialfis It looks like I have to re-build Windows machines from scratch. So I'll go ahead and document all steps as I go.

hcho3 · 2019-11-29T01:43:52Z

Note to myself: CUDA 9.0 is quite old and doesn't work with Visual Studio 15.9 (latest update of VS 2017). For now, stick with Visual Studio 15.0. We should consider upgrading CUDA eventually, as there are some C++11 features that Visual Studio 15.0 does not support but 15.9 does. See https://docs.microsoft.com/en-us/cpp/overview/visual-cpp-language-conformance?view=vs-2019#compiler-features

hcho3 · 2019-12-02T09:38:55Z

Note: CUDA 10.0 is not compatible with Tesla T4, at least on Windows.

Note: CUDA 10.2 is now out. We should probably support it.

hcho3 · 2022-09-29T05:17:52Z

Now we have an automated build pipeline for Windows worker image. See dmlc/xgboost#8142

hcho3 pinned this issue Nov 29, 2019

This was referenced Nov 29, 2019

How to set up a Jenkins master node from scratch. #6

Open

How to reproduce a crash in Windows pipeline: an example #8

Closed

hcho3 mentioned this issue Jun 5, 2020

Set up CI for Windows neo-ai/neo-ai-dlr#104

Open

hcho3 closed this as completed Sep 29, 2022

hcho3 unpinned this issue Sep 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to build machine image (AMI) to run test pipeline on Windows #7

How to build machine image (AMI) to run test pipeline on Windows #7

hcho3 commented Nov 29, 2019 •

edited

Loading

hcho3 commented Nov 29, 2019

hcho3 commented Nov 29, 2019 •

edited

Loading

hcho3 commented Dec 2, 2019

hcho3 commented Sep 29, 2022

How to build machine image (AMI) to run test pipeline on Windows #7

How to build machine image (AMI) to run test pipeline on Windows #7

Comments

hcho3 commented Nov 29, 2019 • edited Loading

Build machine: Windows 2012 R2, CUDA 9.0, Visual Studio 2017

Testing machine: Windows Server 2008 R2, no GPU

Testing machine: Windows Server 2012 R2, CUDA 9.0

Testing machine: Windows Server 2016, CUDA 10.0

Testing machine: Windows Server 2019, CUDA 10.1

hcho3 commented Nov 29, 2019

hcho3 commented Nov 29, 2019 • edited Loading

hcho3 commented Dec 2, 2019

hcho3 commented Sep 29, 2022

hcho3 commented Nov 29, 2019 •

edited

Loading

hcho3 commented Nov 29, 2019 •

edited

Loading