Skip to content

Latest commit

 

History

History
315 lines (206 loc) · 17.1 KB

GoogleCloudSetUpNotes.md

File metadata and controls

315 lines (206 loc) · 17.1 KB

Setting up Google Cloud Platform on your MacOS

Credits:

  • These notes are modified from Alvaro Navas's notes on Environment Set up in GCP.
  • I also found this youtube video helpful, it has someone going through this in realtime. The order is a little different to these notes. But the initial set up is the same upto ~2:37.

All that I've done is add in some screenshots for MacOS and tried to spend a bit of time on explaining things that I thought might be helpful for similar learners.

Install and setup Gcloud SDK

1. Download Gcloud SDK from this link and install it according to the instructions for your OS.

  • NB I gunzipped it alt text
  • Then had to double click the .tar file in order to create the folder google-cloud-sdk alt text

2. Initialize the SDK following these instructions.

Before you do this make sure you create a GoogleCloud account. NB This is not the same as Colab etc. There are links in the guide above, and as of writing this there is still $300 USD worth of credits for free.

i. In your terminal navigate to the google-cloud-sdk and type install.sh to begin this process.

ii. Do you want to help improve the Google Cloud CLI (y/N)? --> I chose y but it doesn't matter.

iii. Modify profile to update your $PATH and enable shell command completion?... Do you want to continue (Y/n)? --->y, this way you don't always need to enter the relevant path.

iv. Enter a path to an rc file to update, or leave blank to use [/Users/marcusleiwe/.zshrc]: --> left blank.

v. Google Cloud CLI works best with Python 3.11 and certain modules. Download and run Python 3.11 installer? (Y/n)? --> I picked no because I already have 3.11.7. You can see what version you have by typing python -V into your terminal command line (see the screenshot below) alt text

vi. Reload the .zshrc file to make sure you're in the right place. By typing the command out below

`source ~/.zshrc`

vii. type gclou and press tab. If it auto-completes to gcloud then you should have gcloud on your system.

viii. Run gcloud init from a terminal and follow the instructions.

  • There's a login section, where it will launch a page and allow you to login to the google account associated with your cloud account. (NB This is the part which Tony forogt initially in the video)

  • Then select the project you want to run or create a new project (in this case I picked mlops-zoomcamp which I created beforehand) alt text

ix. Make sure that your project is selected with the command gcloud config list. This should produce the following inputs

[core]
account = xxxxxxxx@gmail.com #Your e-mail should go here
disable_usage_reporting = False
project = mlops-zoomcamp-xxxxx

Your active configuration is: [default]

Creating a VM instance

From the project dashboard, we need to create a virtual instance. If it is not visible as a card in the products section click on the view all products button (see screenshot). alt text

From here, select compute engine to set it up alt text

If not already installed you may need to enable the Compute Engine API (see screenshot). alt text

From here it should navigate you to the VM instances page, from here we can create our VM instance. (click on the Create Instance button in the main pane) alt text

VM configs for MLops course

This should take you to the config settings for your VN. I will be following Alvaro's suggestions and also following the suggestions from Alexy on how he set up his EC2 instance in AWS.

Manual Installation (recommended if you want to learn about the set up)

  • Name: mlops-course-vm You can choose anything, but pick something that isn't too long to type

  • Region: asia-east1 There are lots of options here, and you can check them out on the link here. I'm based in Hong Kong at the moment so I've picked Taiwan (asia-east1) because it offers more services than the current Hong Kong region while still minimising the latencies. Helpful links to make your decision

  • Zone: asia-east-1b From what I understand in general it is helpful to try and store all the data within the same zone as it is faster and potentially cheaper. Once again there is a helpful google guide here to explain the differences between regions, zones, and clusters.

  • Machine Configuration: E2 series instance alt text

A e2-standard-4 instance is recommended (4 vCPUs, 16GB RAM). To do this select

  • General purpose: E2

    • Machine Type: e2-standard-4 (4vCPU, 2 core, 16GB memory) NB this is not the default option, in the screenshot above it will show you how to roughly navigate to selecting the option.
      • vCPUs to core ration and visible core count left blank
  • Availability policies: Standard There is also the Spot option. Essentially, this is cheaper but uses spare capacity, so your processes could terminate at any given time. For safety's sake I've stuck to standard, but seeing as this is just a training course I probably could get away with running spot instances.

  • Display device: Not selected

  • Boot disk: Recommended to change to Ubuntu 20.04 LTS, and pick at least 30GB of storage. alt text

  • Leave all other settings on their default values and click create

You should then be directed back to the VM Instances page and you should see the instance is running alt text NB when you are finished remember to switch it off. Otherwise you will pay for it.

CLI based Installation

This is much easier. Just type the following instructions

gcloud compute instances create mlops-course-vm --zone=asia-east1b --image-family=ubuntu-2004-lts --image-project=ubuntu-os-cloud --machine-type=e2-standard-4 --boot-disk-size=30GB

When you create an instance, it will be started automatically. You can skip to step 3 of the next section.

Set up SSH access

  1. Start your instance from the VM instances dashboard.

  2. In your local terminal, make sure that gcloud SDK is configured for your project. Use gcloud config list to list your current config's details. The output should be the same as your initialisation

     [core]
     account = xxxxxxxx@gmail.com #Your e-mail should go here
     disable_usage_reporting = False
     project = mlops-zoomcamp-423810
    
     Your active configuration is: [default]
    

    Troubleshooting...

    If you have multiple google accounts but the current config does not match the account you want:

    i. Use gcloud config configurations list to see all of the available configs and their associated accounts.

    ii. Change to the config you want with gcloud config configurations activate my-project --> In this case it would be gcloud config configurations activate mlops-course-vm.

    If the config matches your account but points to a different project:

    i. Use gcloud projects list to list the projects available to your account (it can take a while to load).

    ii. Use gcloud config set project my-project to change your current config to your project. --> In this case it would be gcloud config set project mlops-course-vm.

  3. Set up the SSH connection to your VM instances with gcloud compute config-ssh.

    • Inside ~/ssh/ a new config file should appear with the necessary info to connect.

    • If you did not have a SSH key, a pair of public and private SSH keys will be generated for you. alt text After the fingerprint and random art you should receive this message

        Updating project ssh metadata...⠹Updated [https://www.googleapis.com/compute/v1/projects/mlops-zoomcamp-xxxx].                                                             
        Updating project ssh metadata...done.                                                                                                                                        
        You should now be able to use ssh/scp with your instances.
        For example, try running:
      
        $ ssh mlops-course-vm.asia-east1-b.mlops-zoomcamp-xxxxx
      
    • The output of this command will give you the host name of your instance in this format: instance.zone.project ; write it down. NB You can find it out if you forget but it seems like a lot of hassle.

  4. You should now be able to open a terminal and SSH to your VM instance like this:

    ssh instance.zone.project

    I received a warning stating

    The authenticity of host 'compute.xxxxxxxxx (xx.xxx.xxx.xx)' can't be established.
    ED25519 key fingerprint is SHA256:xxxxxxx.
    This key is not known by any other names
    Are you sure you want to continue connecting (yes/no/[fingerprint])?
    

    Here I typed in the fingerprint obtained from above. It will close the connection and you will have to restart. Type gcloud compute config-ssh You will then be prompted to enter your passphrase and once done you're in! alt text

  5. In VSCode, with the Remote SSH extension, if you run the command palette (press cmd/cntrl + shift + p) and look for Remote-SSH: Connect to Host (or alternatively you click on the Remote SSH icon on the bottom left corner and click on Connect to Host), your instance should now be listed. Select it to connect to it and work remotely.

This can be found on the remote explorer icon on the left hand ribbon, from there you can either run in the current window or in a new window

alt text

(Optional) Starting your instance with gcloud sdk after you shut it down.

List your available instances. gcloud compute instances list Start your instance. gcloud compute instances start <instance_name> Set up ssh so that you don't have to manually change the IP in your config files. gcloud compute config-ssh

Install Tools

Run this first in your SSH session: sudo apt update && sudo apt -y upgrade. I did this through the terminal

alt text

Alvaro recommends to to run this command often, once per day or every few days, to keep your VM up to date.

Anaconda:

In your local browser, go to the Anaconda download page, scroll to the bottom, right click on the 64 bit x86 installer link under Linux and copy the URL. At the time of writing this gist, the URL is https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh

  1. In your SSH session, type wget <anaconda_url> to download the installer.
  2. Find the filename of the installer with ls Run the installer with bash (you can start typing the name and then press the Tab key to autocomplete)

alt text

  1. Follow the on-screen instructions. Anwer yes to all yes/no questions and leave all other default values.
  2. Log out of your current SSH session with exit and log back in. You should now see a (base) at the beginning of your command prompt. Succesful Install
  3. You may now remove the Anaconda installer with rm

Docker:

  1. Run sudo apt install docker.io to install it.
  • After this operation, 267 MB of additional disk space will be used. Do you want to continue? [Y/n] : y
  1. Change your settings so that you can run Docker without sudo:

    i. Run sudo groupadd docker This might already be accesible. This creates a group called docker which you can use to allow permissions etc. A helpful guide will explain it in more detail.

    ii. Run sudo gpasswd -a $USER docker This adds the user to the group that can use docker. Geeks4Geeks offer a reasonable description if you find the -help tag a bit vague

    iii. Log out of your SSH session and log back in.

    iv. Run sudo service docker restart

    v. Test that Docker can run successfully with docker run hello-world. NB If it can't find it locally it will download the image. Regardless you should obtain the output below. alt text

Docker compose:

  1. Go to https://github.com/docker/compose/releases and copy the URL for the docker-compose-linux-x86_64 binary for its latest version.

    i. At the time of writing, the last available version is v2.27.0 and the URL for it is https://github.com/docker/compose/releases/download/v2.27.0/docker-compose-linux-x86_64

  2. Create a folder for binary files for your Linux user:

    1. Create a subfolder bin in your home account with mkdir ~/bin.

    2. Go to the folder with cd ~/bin.

    3. Download the binary file with wget <compose_url> -O docker-compose. I this case I typed...

      wget https://github.com/docker/compose/releases/download/v2.27.0/docker-compose-linux-x86_64 -O docker-compose

    If you forget to add the -O option, you can rename the file with mv <long_filename> docker-compose

    1. Make sure that the docker-compose file is in the folder with ls.

    2. Make the binary executable with chmod +x docker-compose.

    3. Check the file with ls again; it should now be colored green. You should now be able to run it with ./docker-compose version alt text

    4. Go back to the home folder with cd ~.

    5. Run nano .bashrc to modify your path environment variable:

      1. Scroll to the end of the file.

      2. Add this line at the end: export PATH="${HOME}/bin:${PATH}" It should look like this

      alt text

      1. Press CTRL + o in your keyboard and press Enter afterwards to save the file.

      2. Press CTRL + x in your keyboard to exit the Nano editor.

  3. Reload the path environment variable with source .bashrc.

  4. You should now be able to run Docker compose from anywhere; test it with docker-compose version

Terraform:

Before we start on the instructions here, if you don't know what terraforming is you can read about it here. Or you can watch the Data Talks Data Engineering video on Terraforming.

But essentially it allows us to store infrastructure as code

  • Simplicity in keeping track of infrastructure; you can view a file to see how the infrastructure is set up
  • Collaboration; you can upload the file to a repository and get others to proofread your set up
  • Reproducibility; you can build it in a dev environment and tweak it, then push it to prod
  • Ensure resources are removed; You can quickly and easily determine which features are needed

NB It does not

  • Manage code and update code on infrastructure
  • Change immutable resources. I.e you cannot change the provisions of your virtual machine here. That will need to be done elsewhere.
  • Manage resources that are not in your terraform file
  1. Run curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
  2. Run sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
  3. Run sudo apt-get update && sudo apt-get install terraform

alt text

  1. Upload/download files to/from your instance Upload a file to the instance.

Type in a terminal of your loacl machine

scp path/to/local/file <instance_name>:path/to/remote/file

NB You will also be prompted to enter your passphrase if you have set one up.

e.g.

scp ./Desktop/Ch1_Screenshots/Terraforming.png mlops-course-vm.asia-east1-b.mlops-zoomcamp-xxxx:./test/test.png

You can also drag & drop stuff in VSCode with the remote extension.

Download a file.

# To your local machine
scp <instance_name>:path/to/remote/file path/to/local/file

e.g. scp mlops-course-vm.asia-east1-b.mlops-zoomcamp-xxxx:./test/test.png ./Desktop/Ch1_Screenshots/FromGoogle.png

If you use a client like Cyberduck, you can connect with SFTP to your instance using the instance.zone.project name as server, and adding the generated private ssh key.