Kubernetes Guide

A guide covering Kubernetes including the applications and tools that will make you a better and more efficient Kubernetes developer.

Note: You can easily convert this markdown file to a PDF in VSCode using this handy extension Markdown PDF.

Getting Started with Kubernetes

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications.

Building Highly-Availability(HA) Clusters with kubeadm. Source: Kubernetes.io

Red Hat CodeReady Containers (CRC) is a tool that provides a minimal, preconfigured OpenShift 4 cluster on a laptop or desktop machine for development and testing purposes. CRC is delivered as a platform inside of the VM.

odo (OpenShift Do), a CLI tool for developers, to manage application components on the OpenShift Container Platform.

System Requirements:

OS: CentOS Stream 8/RHEL 8/Fedora or later (the latest 2 releases).
Download: pull-secret
Login: Red Hat account

Other physical requirements include:

Four virtual CPUs (4 vCPUs)
10GB of memory (RAM)
40GB of storage space

To set up CodeReady Containers, start by creating the crc directory, and then download and extract the crc package:

mkdir /home/<user>/crc

wget https://mirror.openshift.com/pub/openshift-v4/clients/crc/latest/crc-linux-amd64.tar.xz

tar -xvf crc-linux-amd64.tar.xz

Next, move the files to the crc directory and remove the downloaded package(s):

mv /home/<user>/crc-linux-<version>-amd64/* /home/<user>/crc

rm /home/<user>/crc-linux-amd64.tar.xz

rm -r /home/<user>/crc-linux-<version>-amd64

Change to the crc directory, make crc executable, and export your PATH like this:

cd /home/<user>/crc

chmod +x crc

export PATH=$PATH:/home/<user>/crc

Set up and start the cluster:

crc setup

crc start -p /<path-to-the-pull-secret-file>/pull-secret.txt

Set up the OC environment:

crc oc-env

eval $(crc oc-env)

Log in as the developer user:

oc login -u developer -p developer https://api.crc.testing:6443

oc logout

And then, log in as the platform’s admin:

oc login -u kubeadmin -p password https://api.crc.testing:6443

oc logout

Interacting with the cluster. The most common ways include:

Starting the graphical web console:

crc console

Display the cluster’s status:

crc status

Shut down the OpenShift cluster:

crc stop

Delete or kill the OpenShift cluster:

crc delete

Setting up Podman on WSL

Back to the Top

Podman (the POD manager) is an open source tool for developing, managing, and running containers on your Linux® systems. It also manages the entire container ecosystem using the libpod library. Podman’s daemonless and inclusive architecture makes it a more secure and accessible option for container management, and its accompanying tools and features, such as Buildah and Skopeo, allow developers to customize their container environments to best suit their needs.

Fedora: sudo dnf install podman
CentOS: sudo yum --enablerepo=extras install podman
Ubuntu 20.04 or later: sudo apt install podman
Debian 11 (bullseye) or later, or sid/unstable: sudo apt install podman
ArchLinux: sudo pacman -S podman and then tweaks for rootless

Podman

Setting up Buildah on WSL

Back to the Top

Buildah is an open source, Linux-based tool that can build Docker- and Kubernetes-compatible images, and is easy to incorporate into scripts and build pipelines. In addition, Buildah has overlap functionality with Podman, Skopeo, and CRI-O.

Fedora: sudo dnf -y install buildah
CentOS: sudo yum --enablerepo=extras install buildah
Ubuntu 20.04 or later: sudo apt install buildah
Debian 11 (bullseye) or later, or sid/unstable: sudo apt install -y buildah
ArchLinux: sudo pacman -S buildah and then tweaks for rootless

Buildah

Installing Kubernetes on WSL with Rancher Desktop

Back to the Top

Rancher Desktop is an open-source desktop application for Mac, Windows and Linux. Rancher Desktop runs Kubernetes and container management on your desktop letting you choose the version of Kubernetes you want to run. It can also build, push, pull, and run container images using either the Docker CLI (with Moby/dockerd) or nerdctl (with containerd).

Features:

Installs a new Linux VM in WSL2 that has a Kubernetes cluster based on k3s as well as installs various components in it such as KIM (for building docker images on the cluster) and the Traefik Ingress Controller.
It installs the kubectl and Helm CLIs on the Windows side linked to them.
A nice Windows app to manage its settings and help facilitate its upgrades.

Rancher Desktop

Rancher Desktop Kubernetes Settings

.deb Dev Repository

curl -s https://download.opensuse.org/repositories/isv:/Rancher:/dev/deb/Release.key | gpg --dearmor | sudo dd status=none of=/usr/share/keyrings/isv-rancher-dev-archive-keyring.gpg

echo 'deb [signed-by=/usr/share/keyrings/isv-rancher-dev-archive-keyring.gpg] https://download.opensuse.org/repositories/isv:/Rancher:/dev/deb/ ./' | sudo dd status=none of=/etc/apt/sources.list.d/isv-rancher-dev.list

sudo apt update

See available versions

apt list -a rancher-desktop

sudo apt install rancher-desktop=<version>

.rpm Dev Repository

sudo zypper addrepo https://download.opensuse.org/repositories/isv:/Rancher:/dev/rpm/isv:Rancher:dev.repo

sudo zypper refresh

See available versions

zypper search -s rancher-desktop

zypper install --oldpackage rancher-desktop=<version>

Rancher Desktop Architecture Overview

Installing Kubernetes on WSL with Docker Desktop

Back to the Top

Enable the WSL 2 base engine in Docker Desktop

We also need to set in Resources which WSL2 distribution we want to access Docker from, as shown below using Ubuntu 20.04. Then remember to restart Docker for Windows, and once the restart is complete we can use the docker command from within WSL:

Make sure to use kind as a simple way to run Kubernetes in a container. Here we will install the instructions from the official Kind website.

curl -Lo ./kind https://github.com/kubernetes-sigs/kind/releases/download/v0.16.0/kind-$(uname)-amd64

chmod +x ./kind

mv ./kind /usr/local/bin/

Now that kind is installed, we can create the Kubernetes cluster

echo $KUBECONFIG

ls $HOME/.kube

kind create cluster --name wslkube

ls $HOME/.kube

We have successfully created a single-node Kubernetes cluster.

kubectl get nodes

kubectl get all --all-namespaces

Installing Kubernetes on WSL with Microk8s

Back to the Top

Note: This install option requires systemd to be running on WSL
WSL Systemd requirements: Windows 11 and a version of WSL 0.67.6 or above.

MicroK8s is the simplest production-grade upstream Kubernets setup to get up and running.

Installing Microk8s

sudo snap install microk8s --classic

Checking the status while Kubernetes starts

microk8s status --wait-ready

Turning on the services you want

microk8s enable dashboard dns registry istio

Try microk8s enable --help for a list of available services and optional features. microk8s disable turns off a service.

Start using Kubernetes

microk8s kubectl get all --all-namespaces

If you mainly use MicroK8s you can make our kubectl the default one on your command-line with alias mkctl="microk8s kubectl".

Access the Kubernetes dashboard

microk8s dashboard-proxy

Kubernetes Tools, Frameworks, and Projects

Back to the Top

Open Container Initiative is an open governance structure for the express purpose of creating open industry standards around container formats and runtimes.

Buildah is a command line tool to build Open Container Initiative (OCI) images. It can be used with Docker, Podman, Kubernetes.

Podman is a daemonless, open source, Linux native tool designed to make it easy to find, run, build, share and deploy applications using Open Containers Initiative (OCI) Containers and Container Images. Podman provides a command line interface (CLI) familiar to anyone who has used the Docker Container Engine.

Containerd is a daemon that manages the complete container lifecycle of its host system, from image transfer and storage to container execution and supervision to low-level storage to network attachments and beyond. It is available for Linux and Windows.

Google Kubernetes Engine (GKE) is a managed, production-ready environment for running containerized applications.

Azure Kubernetes Service (AKS) is serverless Kubernetes, with a integrated continuous integration and continuous delivery (CI/CD) experience, and enterprise-grade security and governance. Unite your development and operations teams on a single platform to rapidly build, deliver, and scale applications with confidence.

Amazon EKS is a tool that runs Kubernetes control plane instances across multiple Availability Zones to ensure high availability.

AWS Controllers for Kubernetes (ACK) is a new tool that lets you directly manage AWS services from Kubernetes. ACK makes it simple to build scalable and highly-available Kubernetes applications that utilize AWS services.

Container Engine for Kubernetes (OKE) is an Oracle-managed container orchestration service that can reduce the time and cost to build modern cloud native applications. Unlike most other vendors, Oracle Cloud Infrastructure provides Container Engine for Kubernetes as a free service that runs on higher-performance, lower-cost compute.

Anthos is a modern application management platform that provides a consistent development and operations experience for cloud and on-premises environments.

Red Hat Openshift is a fully managed Kubernetes platform that provides a foundation for on-premises, hybrid, and multicloud deployments.

OKD is a community distribution of Kubernetes optimized for continuous application development and multi-tenant deployment. OKD adds developer and operations-centric tools on top of Kubernetes to enable rapid application development, easy deployment and scaling, and long-term lifecycle maintenance for small and large teams.

Odo is a fast, iterative, and straightforward CLI tool for developers who write, build, and deploy applications on Kubernetes and OpenShift.

Kata Operator is an operator to perform lifecycle management (install/upgrade/uninstall) of Kata Runtime on Openshift as well as Kubernetes cluster.

Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments.

OpenShift Hive is an operator which runs as a service on top of Kubernetes/OpenShift. The Hive service can be used to provision and perform initial configuration of OpenShift 4 clusters.

Rook is a tool that turns distributed storage systems into self-managing, self-scaling, self-healing storage services. It automates the tasks of a storage administrator: deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management.

VMware Tanzu is a centralized management platform for consistently operating and securing your Kubernetes infrastructure and modern applications across multiple teams and private/public clouds.

Kubespray is a tool that combines Kubernetes and Ansible to easily install Kubernetes clusters that can be deployed on AWS, GCE, Azure, OpenStack, vSphere, Packet (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal.

KubeInit provides Ansible playbooks and roles for the deployment and configuration of multiple Kubernetes distributions.

Rancher is a complete software stack for teams adopting containers. It addresses the operational and security challenges of managing multiple Kubernetes clusters, while providing DevOps teams with integrated tools for running containerized workloads.

K3s is a highly available, certified Kubernetes distribution designed for production workloads in unattended, resource-constrained, remote locations or inside IoT appliances.

Helm is a Kubernetes Package Manager tool that makes it easier to install and manage Kubernetes applications.

Knative is a Kubernetes-based platform to build, deploy, and manage modern serverless workloads. Knative takes care of the operational overhead details of networking, autoscaling (even to zero), and revision tracking.

KubeFlow is a tool dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.

Kubebox is a Terminal and Web console for Kubernetes.

Kubsec is a Security risk analysis for Kubernetes resources.

Replex is a Kubernetes Governance and Cost Management for the Cloud-Native Enterprise.

Virtual Kubelet is an open-source Kubernetes kubelet implementation that masquerades as a kubelet.

Telepresence is a fast, local development for Kubernetes and OpenShift microservices.

Weave Scope is a tool that automatically detects processes, containers, hosts. No kernel modules, no agents, no special libraries, no coding. It seamless integration with Docker, Kubernetes, DCOS and AWS ECS.

Nuclio is a high-performance "serverless" framework focused on data, I/O, and compute intensive workloads. It is well integrated with popular data science tools, such as Jupyter and Kubeflow; supports a variety of data and streaming sources; and supports execution over CPUs and GPUs.

Supergiant Control is a tool that manages the lifecycle of clusters on your infrastructure and allows deployment of applications via HELM. Its deployment and configuration workflows will help you to get up and running with Kubernetes faster.

Supergiant Capacity - Beta is a tool that ensures that the right hardware is available for the required resource load of your Kubernetes cluster at any given time. This helps prevent over-provisioning of your container environment and overspending on your hardware budget.

Test suite for Kubernetes is a test suite consists of two Helm charts for network bandwith testing and load testing a Kuberntes cluster.

Keel is a Kubernetes Operator to automate Helm, DaemonSet, StatefulSet & Deployment updates.

Kube Monkey is an implementation of Netflix's Chaos Monkey for Kubernetes clusters. It randomly deletes Kubernetes (k8s) pods in the cluster encouraging and validating the development of failure-resilient services.

Kube State Metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects. It's not focused on the health of the individual Kubernetes components, but rather on the health of the various objects inside, such as deployments, nodes and pods.

Sonobuoy is a diagnostic tool that makes it easier to understand the state of a Kubernetes cluster by running a choice of configuration tests in an accessible and non-destructive manner.

PowerfulSeal is a powerful testing tool for your Kubernetes clusters, so that you can detect problems as early as possible.

Test Infra is a repository contains tools and configuration files for the testing and automation needs of the Kubernetes project.

cAdvisor (Container Advisor) is a tool that provides container users an understanding of the resource usage and performance characteristics of their running containers. It's a running daemon that collects, aggregates, processes, and exports information about running containers. Specifically, for each container it keeps resource isolation parameters, historical resource usage, histograms of complete historical resource usage and network statistics.

Etcd is a distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. Etcd is used as the backend for service discovery and stores cluster state and configuration for Kubernetes.

nacos is an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.

Kuma is a modern Envoy-based service mesh that can run on every cloud, in a single or multi-zone capacity, across both Kubernetes and VMs. Thanks to its broad universal workload support, combined with native support for Envoy as its data plane proxy technology (but with no Envoy expertise required), Kuma provides modern L4-L7 service connectivity, discovery, security, observability, routing and more across any service on any platform, databases included.

Open Service Mesh (OSM) is a lightweight, extensible, cloud native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments.

kserve is a Standardized Serverless ML Inference Platform on Kubernetes.

naftis is an awesome dashboard for Istio built with love.

Traefik Mesh is a simple, yet full-featured service mesh. It is container-native and fits as your de-facto service mesh in your Kubernetes cluster. It supports the latest Service Mesh Interface specification SMI that facilitates integration with pre-existing solution.

Meshery is the cloud native management plane offering lifecycle, configuration, and performance management of Kubernetes, service meshes, and your workloads.

kubectx is a tool to switch between contexts (clusters) on kubectl faster.

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

OpenEBS is a Kubernetes-based tool to create stateful applications using Container Attached Storage.

Container Storage Interface (CSI) is an API that lets container orchestration platforms like Kubernetes seamlessly communicate with stored data via a plug-in.

MicroK8s is a tool that delivers the full Kubernetes experience. In a Fully containerized deployment with compressed over-the-air updates for ultra-reliable operations. It is supported on Linux, Windows, and MacOS.

Charmed Kubernetes is a well integrated, turn-key, conformant Kubernetes platform, optimized for your multi-cloud environments developed by Canonical.

Grafana Kubernetes App is a toll that allows you to monitor your Kubernetes cluster's performance. It includes 4 dashboards, Cluster, Node, Pod/Container and Deployment. It allows for the automatic deployment of the required Prometheus exporters and a default scrape config to use with your in cluster Prometheus deployment.

KubeEdge is an open source system for extending native containerized application orchestration capabilities to hosts at Edge.It is built upon kubernetes and provides fundamental infrastructure support for network, app. deployment and metadata synchronization between cloud and edge.

Lens is the most powerful IDE for people who need to deal with Kubernetes clusters on a daily basis. It has support for MacOS, Windows and Linux operating systems.

kind is a tool for running local Kubernetes clusters using Docker container “nodes”. It was primarily designed for testing Kubernetes itself, but may be used for local development or CI.

Flux CD is a tool that automatically ensures that the state of your Kubernetes cluster matches the configuration you've supplied in Git. It uses an operator in the cluster to trigger deployments inside Kubernetes, which means that you don't need a separate continuous delivery tool.

Platform9 Managed Kubernetes (PMK) is a Kubernetes as a service that ensures fully automated Day-2 operations with 99.9% SLA on any environment, whether in data-centers, public clouds, or at the edge.

Getting Started with OpenShift

Back to the Top

What is OpenShift?

Red Hat OpenShift is an open source container application platform based on the Kubernetes container orchestrator for enterprise app development and deployment in the hybrid cloud Red Hat OpenShift, the open hybrid cloud platform built on Kubernetes. OpenShift can manage applications written in different languages and frameworks, such as Ruby, Node.js, Java, Perl, and Python.

Red Hat OpenShift Development Architecture. Source: Red Hat

OpenShift Developer Resources

Back to the Top

Certifications & Courses

Back to the Top

Books

Back to the Top

Source-to-Image (S2I) images for programming/buildng your Apps

Back to the Top

Java

Java - Source-to-Image (S2I) Builder Images for OpenShift.

Python

Python - Source-to-Image (S2I) Builder Images for OpenShift.

Golang

Golang- Source-to-Image (S2I) Builder Images for OpenShift.

Ruby

Ruby - Source-to-Image (S2I) Builder Images for OpenShift.

.NET Core

[.NET Core - Source-to-Image (S2I) Builder Images for OpenShift(https://docs.openshift.com/online/pro/using_images/s2i_images/dot_net_core.html).

Node.js

Node.js - Source-to-Image (S2I) Builder Images for OpenShift.

Perl

Perl - Source-to-Image (S2I) Builder Images for OpenShift.

PHP

PHP - Source-to-Image (S2I) Builder Images for OpenShift.

Builder Images for setting up Databases

Back to the Top

MySQL

MySQL - Database Images for OpenShift

PostgreSQL

PostgreSQL - Database Images for OpenShift

MongoDB

MongoDB - Database Images for OpenShift

MariaDB

MariaDB - Database Images for OpenShift

Redis

Redis - Database Images for OpenShift

Setting up on Microsoft Azure

Back to the Top

Microsoft Azure Red Hat OpenShift is a fully managed offering of OpenShift running in Azure. This service is jointly managed and supported by Microsoft and Red Hat.

Requirements:

Azure CLI version 2.6.0 or later.
56 vCPUs, so you must increase the account limit.

By default, each cluster creates the following instances:

One bootstrap machine, which is removed after installation
Three control plane machines
Three compute machines

Because the bootstrap, control plane, and worker machines use Standard_DS4_v2 virtual machines, which use 8 vCPUs, a default cluster requires 56 vCPUs. The bootstrap node VM is used only during installation. To deploy more worker nodes, enable autoscaling, deploy large workloads, or use a different instance type, you must further increase the vCPU limit for your account to ensure that your cluster can deploy the machines that you require.

1 VNet. Each default cluster requires one Virtual Network (VNet), which contains two subnets.
7 Network interfaces. Each default cluster requires seven network interfaces. If you create more machines or your deployed workloads create load balancers, your cluster uses more network interfaces.
2 Network security groups. Each cluster creates network security groups for each subnet in the VNet. The default cluster creates network security groups for the control plane and for the compute node subnets: controlplane
- Allows the control plane machines to be reached on port 6443 from anywhere. node
- Allows worker nodes to be reached from the internet on ports 80 and 443.
3 Network load balancers. Each cluster creates the following load balancers: default
- Public IP address that load balances requests to ports 80 and 443 across worker machines internal
- Private IP address that load balances requests to ports 6443 and 22623 across control plane machines external
- Public IP address that load balances requests to port 6443 across control plane machines
Note: If your applications create more Kubernetes LoadBalancer service objects, your cluster uses more load balancers.
2 Public IP addresses. The public load balancer uses a public IP address. The bootstrap machine also uses a public IP address so that you can SSH into the machine to troubleshoot issues during installation. The IP address for the bootstrap node is used only during installation.
7 Private IP addresses. The internal load balancer, each of the three control plane machines, and each of the three worker machines each use a private IP address.

Ingress traffic to an Azure Red Hat OpenShift cluster. Image Credit: Red Hat

Egress traffic from an Azure Red Hat OpenShift cluster and connection to the cluster. Image Credit: Red Hat

Register the Resource Providers

If you have multiple Azure subscriptions, specify the relevant subscription ID:

az account set --subscription <SUBSCRIPTION ID>

Register the Microsoft.RedHatOpenShift resource provider:

az provider register -n Microsoft.RedHatOpenShift --wait

Register the Microsoft.Compute resource provider:

az provider register -n Microsoft.Compute --wait

Register the Microsoft.Storage resource provider:

az provider register -n Microsoft.Storage --wait

Register the Microsoft.Authorization resource provider:

az provider register -n Microsoft.Authorization --wait

Create a Resource Group:

az group create \
  --name $RESOURCEGROUP \
  --location $LOCATION

Creating a Virtual Network:

az network vnet create \
   --resource-group $RESOURCEGROUP \
   --name aro-vnet \
   --address-prefixes 10.0.0.0/22

Adding empty subnet for the master nodes.

az network vnet subnet create \
  --resource-group $RESOURCEGROUP \
  --vnet-name aro-vnet \
  --name master-subnet \
  --address-prefixes 10.0.0.0/23 \
  --service-endpoints Microsoft.ContainerRegistry

Adding empty subnet for the worker nodes.

az network vnet subnet create \
  --resource-group $RESOURCEGROUP \
  --vnet-name aro-vnet \
  --name worker-subnet \
  --address-prefixes 10.0.2.0/23 \
  --service-endpoints Microsoft.ContainerRegistry

Disable subnet private endpoint policies on the master subnet.

az network vnet subnet update \
  --name master-subnet \
  --resource-group $RESOURCEGROUP \
  --vnet-name aro-vnet \
  --disable-private-link-service-network-policies true

Creating a Cluster

az aro create \
  --resource-group $RESOURCEGROUP \
  --name $CLUSTER \
  --vnet aro-vnet \
  --master-subnet master-subnet \
  --worker-subnet worker-subnet

Setting up on Google Cloud (GCP)

Back to the Top

Minimum Requirements:

gcloud CLI or OpenShift CLI (oc).

Master Nodes:

Minimum 4 vCPU (additional are strongly recommended).
Minimum 16 GB RAM (additional memory is strongly recommended, especially if etcd is co-located on masters).
Minimum 40 GB hard disk space for the file system .

Worker Nodes:

1 vCPU.
Minimum 8 GB RAM.
Minimum 15 GB hard disk space for the file system.
If you don’t have a GCP account already, sign-up for Cloud Platform, setup billing and activate APIs.
Setup a service account. A service account is a way to interact with your GCP resources by using a different identity than your primary login and is generally intended for server-to-server interaction. From the GCP Navigation Menu, click on "Permissions."
- Click on "Service accounts."

Click on "Create service account," which will prompt you to enter a service account name. Provide a name for your project and click on "Furnish a new private key." The default "JSON" Key type should be left selected.

Once you click "Create," a service account “.json” will be downloaded to your browser’s downloads location.

Important: Like any credential, this represents an access mechanism to authenticate and use resources in your GCP account. Never place this file in a publicly accessible source repo (Public GitHub or GitLab).

using the JSON credential via a Kubernetes secret deployed to your OpenShift cluster. To do so, first perform a base64 encoding of your JSON credential file:

base64 -i ~/path/to/downloads/credentials.json

Keep the output (a very long string) ready for use in the next step, where you’ll replace ‘BASE64_CREDENTIAL_STRING’ in the pod example (below) with the output just captured from base64 encoding.

Note: base64 is encoded (not encrypted) and can be readily reversed, so this file (with the base64 string) is just as confidential as the credential file above.

Create the Kubernetes secret inside your OpenShift Cluster. A secret is the proper place to make sensitive information available to pods running in your cluster (like passwords or the credentials downloaded in the previous step).

apiVersion: v1
kind: Secret
metadata:
 name: google-services-secret
type: Opaque
data:
 google-services.json: BASE64_CREDENTIAL_STRING

Note: Replace ‘BASE64_CREDENTIAL_STRING’ with the base64 output from the prior step.

Deploy the secret to the cluster:

oc create -f google-secret.yaml

Setting up Red Hat OpenShift Data Science

Back to the Top

Red Hat® OpenShift® Data Science is a fully managed cloud service for data scientists and developers of intelligent applications on Red Hat OpenShift Dedicated or Red Hat OpenShift Service on AWS. It provides a fully supported sandbox in which to rapidly develop, train, and test machine learning (ML) models in the public cloud before deploying in production.

Red Hat OpenShift Data Science learning tutorials

Installing Red Hat OpenShift Data Science

Opening Red Hat OpenShift Data Science

JuypterHub on Red Hat OpenShift Data Science

Exploring Tools on Red Hat OpenShift Data Science

Setting up JupyterHub Notebook Server

Creating a new Python 3 Notebook

Python 3 JupyterHub Notebook

JupyterHub Notebook Sample Demo

OpenShift Project Models

How OpenShift integrates with JupyterHub using Python - Source-to-Image (S2I)

Red Hat CodeReady Containers (CRC)

Back to the Top

Red Hat CodeReady Containers (CRC) is a tool that provides a minimal, preconfigured OpenShift 4 cluster on a laptop or desktop machine for development and testing purposes. CRC is delivered as a platform inside of the VM.

odo (OpenShift Do), a CLI tool for developers, to manage application components on the OpenShift Container Platform.

System Requirements:

OS: CentOS Stream 8/RHEL 8/Fedora or later (the latest 2 releases).
Download: pull-secret
Login: Red Hat account

Other physical requirements include:

Four virtual CPUs (4 vCPUs)
10GB of memory (RAM)
40GB of storage space

To set up CodeReady Containers, start by creating the crc directory, and then download and extract the crc package:

mkdir /home/<user>/crc

wget https://mirror.openshift.com/pub/openshift-v4/clients/crc/latest/crc-linux-amd64.tar.xz

tar -xvf crc-linux-amd64.tar.xz

Next, move the files to the crc directory and remove the downloaded package(s):

mv /home/<user>/crc-linux-<version>-amd64/* /home/<user>/crc

rm /home/<user>/crc-linux-amd64.tar.xz

rm -r /home/<user>/crc-linux-<version>-amd64

Change to the crc directory, make crc executable, and export your PATH like this:

cd /home/<user>/crc

chmod +x crc

export PATH=$PATH:/home/<user>/crc

Set up and start the cluster:

crc setup

crc start -p /<path-to-the-pull-secret-file>/pull-secret.txt

Set up the OC environment:

crc oc-env

eval $(crc oc-env)

Log in as the developer user:

oc login -u developer -p developer https://api.crc.testing:6443

oc logout

And then, log in as the platform’s admin:

oc login -u kubeadmin -p password https://api.crc.testing:6443

oc logout

Interacting with the cluster. The most common ways include:

Starting the graphical web console:

crc console

Display the cluster’s status:

crc status

Shut down the OpenShift cluster:

crc stop

Delete or kill the OpenShift cluster:

crc delete

Setting up Podman

Back to the Top

Podman (the POD manager) is an open source tool for developing, managing, and running containers on your Linux systems. It also manages the entire container ecosystem using the libpod library. Podman’s daemonless and inclusive architecture makes it a more secure and accessible option for container management, and its accompanying tools and features, such as Buildah and Skopeo, allow developers to customize their container environments to best suit their needs.

Libpod provides a library for applications looking to use the Container Pod concept made popular by Kubernetes.

Installing Podman:

Fedora: sudo dnf install podman
CentOS Stream: sudo dnf install buildah
Ubuntu 20.04 or later: sudo apt install podman
Debian 11 (bullseye) or later, or sid/unstable: sudo apt install podman
openSUSE: sudo zypper install podman
ArchLinux: sudo pacman -S podman and then tweaks for rootless

Podman Desktop is a tool to manage Podman and other container engines from a single UI and tray local environment.

Podman Desktop

Podman

Setting up Buildah

Back to the Top

Buildah is an open source, Linux-based tool that can build Docker- and Kubernetes-compatible images, and is easy to incorporate into scripts and build pipelines. In addition, Buildah has overlap functionality with Podman, Skopeo, and CRI-O.

Fedora: sudo dnf -y install buildah
CentOS Stream: sudo dnf -y install buildah
Ubuntu 20.04 or later: sudo apt install buildah
Debian 11 (bullseye) or later, or sid/unstable: sudo apt install -y buildah
openSUSE: sudo zypper install buildah
ArchLinux: sudo pacman -S buildah and then tweaks for rootless

Buildah

Setting up Skopeo

Back to the Top

Skopeo is a tool for manipulating, inspecting, signing, and transferring container images and image repositories on Linux systems, Windows and MacOS. In addition, Skopeo has overlap functionality with Podman, Buildah, and CRI-O.

Installing Skopeo:

Fedora: sudo dnf install skopeo
CentOS Stream: sudo dnf -y install skopeo
Ubuntu 20.04 or later: sudo apt install skopeo
Debian 11 (bullseye) or later, or sid/unstable: sudo apt install skopeo
openSUSE: sudo zypper install skopeo
Alpine Linux: sudo apk add skopeo
ArchLinux: sudo pacman -S skopeo and then tweaks for rootless
Nix/NixOS: $ nix-env -i skopeo
MacOS: brew install skopeo

Skopeo Usage:

$ skopeo --help

Various operations with container images and container image registries

Usage:
  skopeo [command]

Available Commands:
  copy                                       Copy an IMAGE-NAME from one location to another
  delete                                     Delete image IMAGE-NAME
  help                                       Help about any command
  inspect                                    Inspect image IMAGE-NAME
  list-tags                                  List tags in the transport/repository specified by the REPOSITORY-NAME
  login                                      Login to a container registry
  logout                                     Logout of a container registry
  manifest-digest                            Compute a manifest digest of a file
  standalone-sign                            Create a signature using local files
  standalone-verify                          Verify a signature using local files
  sync                                       Synchronize one or more images from one location to another

File systems

Back to the Top

CIFS (Common Internet File System) is a network filesystem protocol used for providing shared access to files and printers between machines on the network. The client application can read, write, edit and even remove files on the remote server.

Network File System (NFS) is a protocol that provides a file sharing solution for enterprises that have heterogeneous environments that include both Windows and non-Windows computers. It's most notable for its host authentication, it’s simple to setup, and makes it possible to connect to another service using an IP address only.

Additional benefits of NFS file share include:

NFS provides a central management.
NFS allows for a user to log into any server and have access to their files transparently.
It’s been around for a long time, so it comes with familiarity in terms of applications.
No manual refresh needed for new files.
It Can be secured with firewalls and Kerberos.

GlusterFS is a free and open source scalable network filesystem. Gluster is a scalable network filesystem. Using common off-the-shelf hardware, you can create large, distributed storage solutions for media streaming, data analysis, and other data- and bandwidth-intensive tasks.

Ceph is a software-defined storage solution designed to address the object, block, and file storage needs of data centers adopting open source as the new norm for high-growth block storage, object stores and data lakes. Ceph provides enterprise scalable storage while keeping CAPEX and OPEX costs in line with underlying bulk commodity disk prices.

Hadoop Distributed File System (HDFS) is a distributed file system that handles large data sets running on commodity hardware. It is used to scale a single Apache Hadoop cluster to hundreds (and even thousands) of nodes. HDFS is one of the major components of Apache Hadoop, the others being MapReduce and YARN.

ZFS is an enterprise-ready open source file system and volume manager with unprecedented flexibility and an uncompromising commitment to data integrity.

OpenZFS is an open-source storage platform. It includes the functionality of both traditional file systems and volume manager. It has many advanced features including:

Protection against data corruption.
Integrity checking for both data and metadata.
Continuous integrity verification and automatic "self-healing" repair.

Btrfs is a modern copy on write (CoW) filesystem for Linux aimed at implementing advanced features while also focusing on fault tolerance, repair and easy administration. Its main features and benefits are:

Snapshots which do not make the full copy of files
RAID - support for software-based RAID 0, RAID 1, RAID 10
Self-healing - checksums for data and metadata, automatic detection of silent data corruptions

Bcachefs is an advanced new filesystem for Linux, with an emphasis on reliability and robustness and the complete set of features one would expect from a modern filesystem. Scalability has been tested to 50+ TB, will eventually scale far higher.

Ext4 is a journaling file system for Linux, developed as the successor to ext3

Squashfs is a compressed read-only filesystem for Linux. It uses zlib, lz4, lzo, or xz compression to compress files, inodes and directories. Inodes in the system are very small and all blocks are packed to minimize data overhead.

NTFS(New Technology File System) is the primary file system for recent versions of Windows and Windows Server—provides a full set of features including security descriptors, encryption, disk quotas, and rich metadata, and can be used with Cluster Shared Volumes (CSV) to provide continuously available volumes that can be accessed simultaneously from multiple nodes of a failover cluster.

OpenShift Tools

Back to the Top

OpenShift CLI (oc) is a command line interface tool that extends the capabilities of kubectl with many convenience functions that make interacting with both Kubernetes and OpenShift clusters easier.

OpenShift Serverless CLI (kn) is a command line interface tool to deploy serverless applications, then you’ll want access and control via the kn command.

OpenShift Pipelines CLI (tkn) is a command line interface tool for using Tekton to provide cloud-native CI/CD functionality within the cluster. The tkn command is used to manage the functionality from the CLI.

Red Hat CodeReady Containers is an option to host a local, all-in-one OpenShift 4 cluster on your workstation. CodeReady Containers replaces minishift, used to run OpenShift 3 clusters on your workstation, as a quick and easy method of creating test and development clusters.

Helm CLI is a command line interface tool for deploying and managing Kubernetes applications to your clusters.

OpenShift Hive is an operator which runs as a service on top of Kubernetes/OpenShift. The Hive service can be used to provision and perform initial configuration of OpenShift 4 clusters.

OpenShift Service Mesh is a tool that provides a layer on top of OpenShift for securely connecting services in a consistent manner. This provides centralized control, security and observability across your services without having to modify your applications.

Azure Red Hat OpenShift is a flexible, self-service deployment of fully managed OpenShift clusters. Maintain regulatory compliance and focus on your application development, while your master, infrastructure, and application nodes are patched, updated, and monitored by both Microsoft and Red Hat.

Red Hat OpenShift Service on AWS (ROSA) is a fully-managed and jointly supported Red Hat OpenShift offering that combines the power of Red Hat OpenShift, the industry's most comprehensive enterprise Kubernetes platform, and the AWS public cloud.

Red Hat OpenShift on Google Cloud is a fully-managed and jointly supported Red Hat OpenShift offering that enables you to deploy stateful and stateless apps with nearly any language, framework, database, or service. It gives you a hosted environment entirely on Google Cloud. A hybrid environment where you maintain part of your workload on-premises or in a private hosting environment and migrate the rest to Google Cloud.

Red Hat® Quay is a secure, private container registry that builds, analyzes and distributes container images. It provides a high level of automation and customization.

Kata Operator is an operator to perform lifecycle management (install/upgrade/uninstall) of Kata Runtime on Openshift as well as Kubernetes cluster.

Open Container Initiative is an open governance structure for the express purpose of creating open industry standards around container formats and runtimes.

Buildah is a command line tool to build Open Container Initiative (OCI) images. It can be used with Docker, Podman, Kubernetes.

Podman is a daemonless, open source, Linux native tool designed to make it easy to find, run, build, share and deploy applications using Open Containers Initiative (OCI) Containers and Container Images. Podman provides a command line interface (CLI) familiar to anyone who has used the Docker Container Engine.

Containerdis a daemon that manages the complete container lifecycle of its host system, from image transfer and storage to container execution and supervision to low-level storage to network attachments and beyond. It is available for Linux and Windows.

OKD is a community distribution of Kubernetes optimized for continuous application development and multi-tenant deployment. OKD adds developer and operations-centric tools on top of Kubernetes to enable rapid application development, easy deployment and scaling, and long-term lifecycle maintenance for small and large teams.

Go Development

Back to the Top

Go Learning Resources

Go is an open source programming language that makes it easy to build simple, reliable, and efficient software.

Golang Contribution Guide

Google Developers Training

Google Developers Certification

Uber's Go Style Guide

GitLab's Go standards and style guidelines

Effective Go

Go: The Complete Developer's Guide (Golang) on Udemy

Getting Started with Go on Coursera

Programming with Google Go on Coursera

Learning Go Fundamentals on Pluralsight

Learning Go on Codecademy

Go Tools

golang tools holds the source for various packages and tools that support the Go programming language.

Go in Visual Studio Code is an extension that gives you language features like IntelliSense, code navigation, symbol search, bracket matching, snippets, and many more that will help you in Golang development.

Traefik is a modern HTTP reverse proxy and load balancer that makes deploying microservices easy. Traefik integrates with your existing infrastructure components (Docker, Swarm mode, Kubernetes, Marathon, Consul, Etcd, Rancher, Amazon ECS, etc.) and configures itself automatically and dynamically. Pointing Traefik at your orchestrator should be the only configuration step you need.

Gitea is Git with a cup of tea, painless self-hosted git service. Using Go, this can be done with an independent binary distribution across all platforms which Go supports, including Linux, macOS, and Windows on x86, amd64, ARM and PowerPC architectures.

OpenFaaS is Serverless Functions Made Simple. It makes it easy for developers to deploy event-driven functions and microservices to Kubernetes without repetitive, boiler-plate coding. Package your code or an existing binary in a Docker image to get a highly scalable endpoint with auto-scaling and metrics.

micro is a terminal-based text editor that aims to be easy to use and intuitive, while also taking advantage of the capabilities of modern terminals. As its name indicates, micro aims to be somewhat of a successor to the nano editor by being easy to install and use. It strives to be enjoyable as a full-time editor for people who prefer to work in a terminal, or those who regularly edit files over SSH.

Gravitational Teleport is a modern security gateway for remotely accessing into Clusters of Linux servers via SSH or SSH-over-HTTPS in a browser or Kubernetes clusters.

NATS is a simple, secure and performant communications system for digital systems, services and devices. NATS is part of the Cloud Native Computing Foundation (CNCF). NATS has over 30 client language implementations, and its server can run on-premise, in the cloud, at the edge, and even on a Raspberry Pi. NATS can secure and simplify design and operation of modern distributed systems.

Act is a GO program that allows you to run our GitHub Actions locally.

Fiber is an Express inspired web framework built on top of Fasthttp, the fastest HTTP engine for Go. Designed to ease things up for fast development with zero memory allocation and performance in mind.

Glide is a vendor Package Management for Golang.

BadgerDB is an embeddable, persistent and fast key-value (KV) database written in pure Go. It is the underlying database for Dgraph, a fast, distributed graph database. It's meant to be a performant alternative to non-Go-based key-value stores like RocksDB.

Go kit is a programming toolkit for building microservices (or elegant monoliths) in Go. We solve common problems in distributed systems and application architecture so you can focus on delivering business value.

Codis is a proxy based high performance Redis cluster solution written in Go.

zap is a blazing fast, structured, leveled logging in Go.

HttpRouter is a lightweight high performance HTTP request router (also called multiplexer or just mux for short) for Go.

Gorilla WebSocket is a Go implementation of the WebSocket protocol.

Delve is a debugger for the Go programming language.

GORM is a fantastic ORM library for Golang, aims to be developer friendly.

Go Patterns is a curated collection of idiomatic design & application patterns for Go language.

Python Development

Back to the Top

Python Learning Resources

Python is an interpreted, high-level programming language. Python is used heavily in the fields of Data Science and Machine Learning.

Python Developer’s Guide is a comprehensive resource for contributing to Python – for both new and experienced contributors. It is maintained by the same community that maintains Python.

Get started with Kubernetes using Python

Data Science with Python & JupyterHub on Kubernetes

Azure Functions Python developer guide is an introduction to developing Azure Functions using Python. The content below assumes that you've already read the Azure Functions developers guide.

CheckiO is a programming learning platform and a gamified website that teaches Python through solving code challenges and competing for the most elegant and creative solutions.

Python Institute

PCEP – Certified Entry-Level Python Programmer certification

PCAP – Certified Associate in Python Programming certification

PCPP – Certified Professional in Python Programming 1 certification

PCPP – Certified Professional in Python Programming 2

MTA: Introduction to Programming Using Python Certification

Getting Started with Python in Visual Studio Code

Google's Python Style Guide

Google's Python Education Class

Real Python

The Python Open Source Computer Science Degree by Forrest Knight

Intro to Python for Data Science

Intro to Python by W3schools

Codecademy's Python 3 course

Learn Python with Online Courses and Classes from edX

Python Courses Online from Coursera

Python Frameworks and Tools

Python Package Index (PyPI) is a repository of software for the Python programming language. PyPI helps you find and install software developed and shared by the Python community.

PyCharm is the best IDE I've ever used. With PyCharm, you can access the command line, connect to a database, create a virtual environment, and manage your version control system all in one place, saving time by avoiding constantly switching between windows.

Python Tools for Visual Studio(PTVS) is a free, open source plugin that turns Visual Studio into a Python IDE. It supports editing, browsing, IntelliSense, mixed Python/C++ debugging, remote Linux/MacOS debugging, profiling, IPython, and web development with Django and other frameworks.

Pylance is an extension that works alongside Python in Visual Studio Code to provide performant language support. Under the hood, Pylance is powered by Pyright, Microsoft's static type checking tool.

Pyright is a fast type checker meant for large Python source bases. It can run in a “watch” mode and performs fast incremental updates when files are modified.

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries.

Web2py is an open-source web application framework written in Python allowing allows web developers to program dynamic web content. One web2py instance can run multiple web sites using different databases.

AWS Chalice is a framework for writing serverless apps in python. It allows you to quickly create and deploy applications that use AWS Lambda.

Tornado is a Python web framework and asynchronous networking library. Tornado uses a non-blocking network I/O, which can scale to tens of thousands of open connections.

HTTPie is a command line HTTP client that makes CLI interaction with web services as easy as possible. HTTPie is designed for testing, debugging, and generally interacting with APIs & HTTP servers.

Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Sentry is a service that helps you monitor and fix crashes in realtime. The server is in Python, but it contains a full API for sending events from any language, in any application.

Pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world.

Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

Bottle is a fast, simple and lightweight WSGI micro web-framework for Python. It is distributed as a single file module and has no dependencies other than the Python Standard Library.

CherryPy is a minimalist Python object-oriented HTTP web framework.

Sanic is a Python 3.6+ web server and web framework that's written to go fast.

Pyramid is a small and fast open source Python web framework. It makes real-world web application development and deployment more fun and more productive.

TurboGears is a hybrid web framework able to act both as a Full Stack framework or as a Microframework.

Falcon is a reliable, high-performance Python web framework for building large-scale app backends and microservices with support for MongoDB, Pluggable Applications and autogenerated Admin.

Neural Network Intelligence(NNI) is an open source AutoML toolkit for automate machine learning lifecycle, including Feature Engineering, Neural Architecture Search, Model Compression and Hyperparameter Tuning.

Dash is a popular Python framework for building ML & data science web apps for Python, R, Julia, and Jupyter.

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built-in.

Locust is an easy to use, scriptable and scalable performance testing tool.

spaCy is a library for advanced Natural Language Processing in Python and Cython.

NumPy is the fundamental package needed for scientific computing with Python.

Pillow is a friendly PIL(Python Imaging Library) fork.

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers enhanced introspection, rich media, additional shell syntax, tab completion, and rich history.

GraphLab Create is a Python library, backed by a C++ engine, for quickly building large-scale, high-performance machine learning models.

Pandas is a fast, powerful, and easy to use open source data structrures, data analysis and manipulation tool, built on top of the Python programming language.

PuLP is an Linear Programming modeler written in python. PuLP can generate LP files and call on use highly optimized solvers, GLPK, COIN CLP/CBC, CPLEX, and GUROBI, to solve these linear problems.

Matplotlib is a 2D plotting library for creating static, animated, and interactive visualizations in Python. Matplotlib produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms.

Scikit-Learn is a simple and efficient tool for data mining and data analysis. It is built on NumPy,SciPy, and mathplotlib.

Bash/PowerShell Development

Back to the Top

Bash/PowerShell Learning Resources

Introduction to Bash Shell Scripting by Coursera

Bash: Shell Script Basics by Pluralsight

Bash/Shell by Codecademy

Getting Started with PowerShell

Deploy an Azure Kubernetes Service cluster using PowerShell

PowerShell in Azure Cloud Shell

Azure Functions using PowerShell

Azure Automation runbooks

Using Visual Studio Code for PowerShell Development

Integrated Terminal in Visual Studio Code

AWS Tools for Windows PowerShell

PowerShell Best Practices and Style Guide

AWS Command Line Interface and aws-shell Sample for AWS Cloud9

Configuring Cloud Shell on Google Cloud

Google's Shell Style Guide

Bash/ PowerShell Tools

Bash is the GNU Project's shell(Bourne Again SHell), which is an sh-compatible shell that integrates together useful features from the Korn shell (ksh) and the C shell (csh).

PowerShell Core is a cross-platform (Windows, Linux, and macOS) automation and configuration tool/framework that works well with your existing tools and is optimized for dealing with structured data (JSON, CSV, XML, etc.), REST APIs, and object models. It also includes a command-line shell, an associated scripting language and a framework for processing cmdlets.

Azure PowerShell is a set of cmdlets for managing Microsoft Azure resources directly from the PowerShell command line.

AWS Shell is a command-line shell program that provides convenience and productivity features to help both new and advanced users of the AWS Command Line Interface.

Google Cloud Shell is a free admin machine with browser-based command-line access for managing your infrastructure and applications on Google Cloud Platform.

VS Code Bash Debug is a bash debugger GUI frontend based on awesome bashdb scripts (bashdb now included in package).

VS Code Bash IDE is a Visual Studio Code extension utilizing the bash language server, that is based on Tree Sitter and its grammar for Bash and supports explainshell integration.

Machine Learning

Back to the Top

ML Learning Resources

Kubernetes for Machine Learning on Platform9

Introducing Amazon SageMaker Operators for Kubernetes

Deploying machine learning models on Kubernetes with Google Cloud

Create and attach Azure Kubernetes Service with Azure Machine Learning

Kubernetes for MLOps: Scaling Enterprise Machine Learning, Deep Learning, AI with HPE

Machine Learning by Stanford University from Coursera

Machine Learning Courses Online from Coursera

Machine Learning Courses Online from Udemy

Learn Machine Learning with Online Courses and Classes from edX

ML frameworks & applications

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

Tensorman is a utility for easy management of Tensorflow containers by developed by System76.Tensorman allows Tensorflow to operate in an isolated environment that is contained from the rest of the system. This virtual environment can operate independent of the base system, allowing you to use any version of Tensorflow on any version of a Linux distribution that supports the Docker runtime.

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.It was developed with a focus on enabling fast experimentation. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML.

PyTorch is a library for deep learning on irregular input data such as graphs, point clouds, and manifolds. Primarily developed by Facebook's AI Research lab.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.

Azure Databricks is a fast and collaborative Apache Spark-based big data analytics service designed for data science and data engineering. Azure Databricks, sets up your Apache Spark environment in minutes, autoscale, and collaborate on shared projects in an interactive workspace. Azure Databricks supports Python, Scala, R, Java, and SQL, as well as data science frameworks and libraries including TensorFlow, PyTorch, and scikit-learn.

Microsoft Cognitive Toolkit (CNTK) is an open-source toolkit for commercial-grade distributed deep learning. It describes neural networks as a series of computational steps via a directed graph. CNTK allows the user to easily realize and combine popular model types such as feed-forward DNNs, convolutional neural networks (CNNs) and recurrent neural networks (RNNs/LSTMs). CNTK implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers.

Apache Airflow is an open-source workflow management platform created by the community to programmatically author, schedule and monitor workflows. Install. Principles. Scalable. Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers. Airflow is ready to scale to infinity.

Open Neural Network Exchange(ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides an open source format for AI models, both deep learning and traditional ML. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.

Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scaling effectively to multiple GPUs and multiple machines. Support for Python, R, Julia, Scala, Go, Javascript and more.

AutoGluon is toolkit for Deep learning that automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy deep learning models on tabular, image, and text data.

Anaconda is a very popular Data Science platform for machine learning and deep learning that enables users to develop models, train them, and deploy them.

PlaidML is an advanced and portable tensor compiler for enabling deep learning on laptops, embedded devices, or other devices where the available computing hardware is not well supported or the available software stack contains unpalatable license restrictions.

OpenCV is a highly optimized library with focus on real-time computer vision applications. The C++, Python, and Java interfaces support Linux, MacOS, Windows, iOS, and Android.

Scikit-Learn is a Python module for machine learning built on top of SciPy, NumPy, and matplotlib, making it easier to apply robust and simple implementations of many popular machine learning algorithms.

Weka is an open source machine learning software that can be accessed through a graphical user interface, standard terminal applications, or a Java API. It is widely used for teaching, research, and industrial applications, contains a plethora of built-in tools for standard machine learning tasks, and additionally gives transparent access to well-known toolboxes such as scikit-learn, R, and Deeplearning4j.

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by Berkeley AI Research (BAIR)/The Berkeley Vision and Learning Center (BVLC) and community contributors.

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently including tight integration with NumPy.

nGraph is an open source C++ library, compiler and runtime for Deep Learning. The nGraph Compiler aims to accelerate developing AI workloads using any deep learning framework and deploying to a variety of hardware targets.It provides the freedom, performance, and ease-of-use to AI developers.

NVIDIA cuDNN is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN accelerates widely used deep learning frameworks, including Caffe2, Chainer, Keras, MATLAB, MxNet, PyTorch, and TensorFlow.

Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Jupyter is used widely in industries that do data cleaning and transformation, numerical simulation, statistical modeling, data visualization, data science, and machine learning.

Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Apache Spark Connector for SQL Server and Azure SQL is a high-performance connector that enables you to use transactional data in big data analytics and persists results for ad-hoc queries or reporting. The connector allows you to use any SQL database, on-premises or in the cloud, as an input data source or output data sink for Spark jobs.

Apache PredictionIO is an open source machine learning framework for developers, data scientists, and end users. It supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs. It is based on scalable open source services like Hadoop, HBase (and other DBs), Elasticsearch, Spark and implements what is called a Lambda Architecture.

Cluster Manager for Apache Kafka(CMAK) is a tool for managing Apache Kafka clusters.

BigDL is a distributed deep learning library for Apache Spark. With BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.

Koalas is project makes data scientists more productive when interacting with big data, by implementing the pandas DataFrame API on top of Apache Spark.

Apache Spark™ MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow currently offers four components:

MLflow Tracking: Record and query experiments: code, data, config, and results.

MLflow Projects: Package data science code in a format to reproduce runs on any platform.

MLflow Models: Deploy machine learning models in diverse serving environments.

Model Registry: Store, annotate, discover, and manage models in a central repository.

Eclipse Deeplearning4J (DL4J) is a set of projects intended to support all the needs of a JVM-based(Scala, Kotlin, Clojure, and Groovy) deep learning application. This means starting with the raw data, loading and preprocessing it from wherever and whatever format it is in to building and tuning a wide variety of simple and complex deep learning networks.

Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaconda, Inc. It uses the LLVM compiler project to generate machine code from Python syntax. Numba can compile a large subset of numerically-focused Python, including many NumPy functions. Additionally, Numba has support for automatic parallelization of loops, generation of GPU-accelerated code, and creation of ufuncs and C callbacks.

Chainer is a Python-based deep learning framework aiming at flexibility. It provides automatic differentiation APIs based on the define-by-run approach (dynamic computational graphs) as well as object-oriented high-level APIs to build and train neural networks. It also supports CUDA/cuDNN using CuPy for high performance training and inference.

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn.

Networking

Back to the Top

Networking Learning Resources

AWS Certified Security - Specialty Certification

Microsoft Certified: Azure Security Engineer Associate

Google Cloud Certified Professional Cloud Security Engineer

Cisco Security Certifications

The Red Hat Certified Specialist in Security: Linux

Linux Professional Institute LPIC-3 Enterprise Security Certification

Cybersecurity Training and Courses from IBM Skills

Cybersecurity Courses and Certifications by Offensive Security

Citrix Certified Associate – Networking(CCA-N)

Citrix Certified Professional – Virtualization(CCP-V)

CCNP Routing and Switching

Certified Information Security Manager(CISM)

Wireshark Certified Network Analyst (WCNA)

Juniper Networks Certification Program Enterprise (JNCP)

Networking courses and specializations from Coursera

Network & Security Courses from Udemy

Network & Security Courses from edX

Networking Tools & Concepts

Qt Network Authorization is a tool that provides a set of APIs that enable Qt applications to obtain limited access to online accounts and HTTP services without exposing users' passwords.

cURL is a computer software project providing a library and command-line tool for transferring data using various network protocols(HTTP, HTTPS, FTP, FTPS, SCP, SFTP, TFTP, DICT, TELNET, LDAP LDAPS, MQTT, POP3, POP3S, RTMP, RTMPS, RTSP, SCP, SFTP, SMB, SMBS, SMTP or SMTPS). cURL is also used in cars, television sets, routers, printers, audio equipment, mobile phones, tablets, settop boxes, media players and is the Internet transfer engine for thousands of software applications in over ten billion installations.

cURL Fuzzer is a quality assurance testing for the curl project.

DoH is a stand-alone application for DoH (DNS-over-HTTPS) name resolves and lookups.

Authelia is an open-source highly-available authentication server providing single sign-on capability and two-factor authentication to applications running behind NGINX.

nginx(engine x) is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server, originally written by Igor Sysoev.

Proxmox Virtual Environment(VE) is a complete open-source platform for enterprise virtualization. It inlcudes a built-in web interface that you can easily manage VMs and containers, software-defined storage and networking, high-availability clustering, and multiple out-of-the-box tools on a single solution.

Wireshark is a very popular network protocol analyzer that is commonly used for network troubleshooting, analysis, and communications protocol development. Learn more about the other useful Wireshark Tools available.

HTTPie is a command-line HTTP client. Its goal is to make CLI interaction with web services as human-friendly as possible. HTTPie is designed for testing, debugging, and generally interacting with APIs & HTTP servers.

HTTPStat is a tool that visualizes curl statistics in a simple layout.

Wuzz is an interactive cli tool for HTTP inspection. It can be used to inspect/modify requests copied from the browser's network inspector with the "copy as cURL" feature.

Websocat is a ommand-line client for WebSockets, like netcat (or curl) for ws:// with advanced socat-like functions.

Connection: In networking, a connection refers to pieces of related information that are transferred through a network. This generally infers that a connection is built before the data transfer (by following the procedures laid out in a protocol) and then is deconstructed at the at the end of the data transfer.
Packet: A packet is, generally speaking, the most basic unit that is transferred over a network. When communicating over a network, packets are the envelopes that carry your data (in pieces) from one end point to the other.

Packets have a header portion that contains information about the packet including the source and destination, timestamps, network hops. The main portion of a packet contains the actual data being transferred. It is sometimes called the body or the payload.

Network Interface: A network interface can refer to any kind of software interface to networking hardware. For instance, if you have two network cards in your computer, you can control and configure each network interface associated with them individually.

A network interface may be associated with a physical device, or it may be a representation of a virtual interface. The "loop-back" device, which is a virtual interface to the local machine, is an example of this.

LAN: LAN stands for "local area network". It refers to a network or a portion of a network that is not publicly accessible to the greater internet. A home or office network is an example of a LAN.
WAN: WAN stands for "wide area network". It means a network that is much more extensive than a LAN. While WAN is the relevant term to use to describe large, dispersed networks in general, it is usually meant to mean the internet, as a whole. If an interface is connected to the WAN, it is generally assumed that it is reachable through the internet.
Protocol: A protocol is a set of rules and standards that basically define a language that devices can use to communicate. There are a great number of protocols in use extensively in networking, and they are often implemented in different layers.

Some low level protocols are TCP, UDP, IP, and ICMP. Some familiar examples of application layer protocols, built on these lower protocols, are HTTP (for accessing web content), SSH, TLS/SSL, and FTP.

Port: A port is an address on a single machine that can be tied to a specific piece of software. It is not a physical interface or location, but it allows your server to be able to communicate using more than one application.
Firewall: A firewall is a program that decides whether traffic coming into a server or going out should be allowed. A firewall usually works by creating rules for which type of traffic is acceptable on which ports. Generally, firewalls block ports that are not used by a specific application on a server.
NAT: Network address translation is a way to translate requests that are incoming into a routing server to the relevant devices or servers that it knows about in the LAN. This is usually implemented in physical LANs as a way to route requests through one IP address to the necessary backend servers.
VPN: Virtual private network is a means of connecting separate LANs through the internet, while maintaining privacy. This is used as a means of connecting remote systems as if they were on a local network, often for security reasons.

Network Layers

While networking is often discussed in terms of topology in a horizontal way, between hosts, its implementation is layered in a vertical fashion throughout a computer or network. This means is that there are multiple technologies and protocols that are built on top of each other in order for communication to function more easily. Each successive, higher layer abstracts the raw data a little bit more, and makes it simpler to use for applications and users. It also allows you to leverage lower layers in new ways without having to invest the time and energy to develop the protocols and applications that handle those types of traffic.

As data is sent out of one machine, it begins at the top of the stack and filters downwards. At the lowest level, actual transmission to another machine takes place. At this point, the data travels back up through the layers of the other computer. Each layer has the ability to add its own "wrapper" around the data that it receives from the adjacent layer, which will help the layers that come after decide what to do with the data when it is passed off.

One method of talking about the different layers of network communication is the OSI model. OSI stands for Open Systems Interconnect.This model defines seven separate layers. The layers in this model are:

Application: The application layer is the layer that the users and user-applications most often interact with. Network communication is discussed in terms of availability of resources, partners to communicate with, and data synchronization.
Presentation: The presentation layer is responsible for mapping resources and creating context. It is used to translate lower level networking data into data that applications expect to see.
Session: The session layer is a connection handler. It creates, maintains, and destroys connections between nodes in a persistent way.
Transport: The transport layer is responsible for handing the layers above it a reliable connection. In this context, reliable refers to the ability to verify that a piece of data was received intact at the other end of the connection. This layer can resend information that has been dropped or corrupted and can acknowledge the receipt of data to remote computers.
Network: The network layer is used to route data between different nodes on the network. It uses addresses to be able to tell which computer to send information to. This layer can also break apart larger messages into smaller chunks to be reassembled on the opposite end.
Data Link: This layer is implemented as a method of establishing and maintaining reliable links between different nodes or devices on a network using existing physical connections.
Physical: The physical layer is responsible for handling the actual physical devices that are used to make a connection. This layer involves the bare software that manages physical connections as well as the hardware itself (like Ethernet).

The TCP/IP model, more commonly known as the Internet protocol suite, is another layering model that is simpler and has been widely adopted.It defines the four separate layers, some of which overlap with the OSI model:

Application: In this model, the application layer is responsible for creating and transmitting user data between applications. The applications can be on remote systems, and should appear to operate as if locally to the end user. The communication takes place between peers network.
Transport: The transport layer is responsible for communication between processes. This level of networking utilizes ports to address different services. It can build up unreliable or reliable connections depending on the type of protocol used.
Internet: The internet layer is used to transport data from node to node in a network. This layer is aware of the endpoints of the connections, but does not worry about the actual connection needed to get from one place to another. IP addresses are defined in this layer as a way of reaching remote systems in an addressable manner.
Link: The link layer implements the actual topology of the local network that allows the internet layer to present an addressable interface. It establishes connections between neighboring nodes to send data.

Interfaces

Interfaces are networking communication points for your computer. Each interface is associated with a physical or virtual networking device. Typically, your server will have one configurable network interface for each Ethernet or wireless internet card you have. In addition, it will define a virtual network interface called the "loopback" or localhost interface. This is used as an interface to connect applications and processes on a single computer to other applications and processes. You can see this referenced as the "lo" interface in many tools.

Network Protocols

Networking works by piggybacks on a number of different protocols on top of each other. In this way, one piece of data can be transmitted using multiple protocols encapsulated within one another.

Media Access Control(MAC) is a communications protocol that is used to distinguish specific devices. Each device is supposed to get a unique MAC address during the manufacturing process that differentiates it from every other device on the internet. Addressing hardware by the MAC address allows you to reference a device by a unique value even when the software on top may change the name for that specific device during operation. Media access control is one of the only protocols from the link layer that you are likely to interact with on a regular basis.

The IP protocol is one of the fundamental protocols that allow the internet to work. IP addresses are unique on each network and they allow machines to address each other across a network. It is implemented on the internet layer in the IP/TCP model. Networks can be linked together, but traffic must be routed when crossing network boundaries. This protocol assumes an unreliable network and multiple paths to the same destination that it can dynamically change between. There are a number of different implementations of the protocol. The most common implementation today is IPv4, although IPv6 is growing in popularity as an alternative due to the scarcity of IPv4 addresses available and improvements in the protocols capabilities.

ICMP: internet control message protocol is used to send messages between devices to indicate the availability or error conditions. These packets are used in a variety of network diagnostic tools, such as ping and traceroute. Usually ICMP packets are transmitted when a packet of a different kind meets some kind of a problem. Basically, they are used as a feedback mechanism for network communications.

TCP: Transmission control protocol is implemented in the transport layer of the IP/TCP model and is used to establish reliable connections. TCP is one of the protocols that encapsulates data into packets. It then transfers these to the remote end of the connection using the methods available on the lower layers. On the other end, it can check for errors, request certain pieces to be resent, and reassemble the information into one logical piece to send to the application layer. The protocol builds up a connection prior to data transfer using a system called a three-way handshake. This is a way for the two ends of the communication to acknowledge the request and agree upon a method of ensuring data reliability. After the data has been sent, the connection is torn down using a similar four-way handshake. TCP is the protocol of choice for many of the most popular uses for the internet, including WWW, FTP, SSH, and email. It is safe to say that the internet we know today would not be here without TCP.

UDP: User datagram protocol is a popular companion protocol to TCP and is also implemented in the transport layer. The fundamental difference between UDP and TCP is that UDP offers unreliable data transfer. It does not verify that data has been received on the other end of the connection. This might sound like a bad thing, and for many purposes, it is. However, it is also extremely important for some functions. It’s not required to wait for confirmation that the data was received and forced to resend data, UDP is much faster than TCP. It does not establish a connection with the remote host, it simply fires off the data to that host and doesn't care if it is accepted or not. Since UDP is a simple transaction, it is useful for simple communications like querying for network resources. It also doesn't maintain a state, which makes it great for transmitting data from one machine to many real-time clients. This makes it ideal for VOIP, games, and other applications that cannot afford delays.

HTTP: Hypertext transfer protocol is a protocol defined in the application layer that forms the basis for communication on the web. HTTP defines a number of functions that tell the remote system what you are requesting. For instance, GET, POST, and DELETE all interact with the requested data in a different way.

FTP: File transfer protocol is in the application layer and provides a way of transferring complete files from one host to another. It is inherently insecure, so it is not recommended for any externally facing network unless it is implemented as a public, download-only resource.

DNS: Domain name system is an application layer protocol used to provide a human-friendly naming mechanism for internet resources. It is what ties a domain name to an IP address and allows you to access sites by name in your browser.

SSH: Secure shell is an encrypted protocol implemented in the application layer that can be used to communicate with a remote server in a secure way. Many additional technologies are built around this protocol because of its end-to-end encryption and ubiquity. There are many other protocols that we haven't covered that are equally important. However, this should give you a good overview of some of the fundamental technologies that make the internet and networking possible.

REST(REpresentational State Transfer) is an architectural style for providing standards between computer systems on the web, making it easier for systems to communicate with each other.

JSON Web Token (JWT) is a compact URL-safe means of representing claims to be transferred between two parties. The claims in a JWT are encoded as a JSON object that is digitally signed using JSON Web Signature (JWS).

OAuth 2.0 is an open source authorization framework that enables applications to obtain limited access to user accounts on an HTTP service, such as Amazon, Google, Facebook, Microsoft, Twitter GitHub, and DigitalOcean. It works by delegating user authentication to the service that hosts the user account, and authorizing third-party applications to access the user account.

Virtualization

KVM (for Kernel-based Virtual Machine) is a full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V). It consists of a loadable kernel module, kvm.ko, that provides the core virtualization infrastructure and a processor specific module, kvm-intel.ko or kvm-amd.ko.

QEMU is a fast processor emulator using a portable dynamic translator. QEMU emulates a full system, including a processor and various peripherals. It can be used to launch a different Operating System without rebooting the PC or to debug system code.

Hyper-V enables running virtualized computer systems on top of a physical host. These virtualized systems can be used and managed just as if they were physical computer systems, however they exist in virtualized and isolated environment. Special software called a hypervisor manages access between the virtual systems and the physical hardware resources. Virtualization enables quick deployment of computer systems, a way to quickly restore systems to a previously known good state, and the ability to migrate systems between physical hosts.

VirtManager is a graphical tool for managing virtual machines via libvirt. Most usage is with QEMU/KVM virtual machines, but Xen and libvirt LXC containers are well supported. Common operations for any libvirt driver should work.

oVirt is an open-source distributed virtualization solution, designed to manage your entire enterprise infrastructure. oVirt uses the trusted KVM hypervisor and is built upon several other community projects, including libvirt, Gluster, PatternFly, and Ansible.Founded by Red Hat as a community project on which Red Hat Enterprise Virtualization is based allowing for centralized management of virtual machines, compute, storage and networking resources, from an easy-to-use web-based front-end with platform independent access.

Xen is focused on advancing virtualization in a number of different commercial and open source applications, including server virtualization, Infrastructure as a Services (IaaS), desktop virtualization, security applications, embedded and hardware appliances, and automotive/aviation.

Ganeti is a virtual machine cluster management tool built on top of existing virtualization technologies such as Xen or KVM and other open source software. Once installed, the tool assumes management of the virtual instances (Xen DomU).

Packer is an open source tool for creating identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs on every major operating system, and is highly performant, creating machine images for multiple platforms in parallel. Packer does not replace configuration management like Chef or Puppet. In fact, when building images, Packer is able to use tools like Chef or Puppet to install software onto the image.

Vagrant is a tool for building and managing virtual machine environments in a single workflow. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases production parity, and makes the "works on my machine" excuse a relic of the past. It provides easy to configure, reproducible, and portable work environments built on top of industry-standard technology and controlled by a single consistent workflow to help maximize the productivity and flexibility of you and your team.

VMware Workstation is a hosted hypervisor that runs on x64 versions of Windows and Linux operating systems; it enables users to set up virtual machines on a single physical machine, and use them simultaneously along with the actual machine.

VirtualBox is a powerful x86 and AMD64/Intel64 virtualization product for enterprise as well as home use. Not only is VirtualBox an extremely feature rich, high performance product for enterprise customers.

Databases

Back to the Top

Database Learning Resources

SQL is a standard language for storing, manipulating and retrieving data in relational databases.

SQL Tutorial by W3Schools

Learn SQL Skills Online from Coursera

SQL Courses Online from Udemy

SQL Online Training Courses from LinkedIn Learning

Learn SQL For Free from Codecademy

GitLab's SQL Style Guide

OracleDB SQL Style Guide Basics

Tableau CRM: BI Software and Tools

Databases on AWS

Best Practices and Recommendations for SQL Server Clustering in AWS EC2.

Connecting from Google Kubernetes Engine to a Cloud SQL instance.

Educational Microsoft Azure SQL resources

MySQL Certifications

SQL vs. NoSQL Databases: What's the Difference?

What is NoSQL?

Databases and Tools

Azure Data Studio is an open source data management tool that enables working with SQL Server, Azure SQL DB and SQL DW from Windows, macOS and Linux.

Azure SQL Database is the intelligent, scalable, relational database service built for the cloud. It’s evergreen and always up to date, with AI-powered and automated features that optimize performance and durability for you. Serverless compute and Hyperscale storage options automatically scale resources on demand, so you can focus on building new applications without worrying about storage size or resource management.

Azure SQL Managed Instance is a fully managed SQL Server Database engine instance that's hosted in Azure and placed in your network. This deployment model makes it easy to lift and shift your on-premises applications to the cloud with very few application and database changes. Managed instance has split compute and storage components.

Azure Synapse Analytics is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless or provisioned resources at scale. It brings together the best of the SQL technologies used in enterprise data warehousing, Spark technologies used in big data analytics, and Pipelines for data integration and ETL/ELT.

MSSQL for Visual Studio Code is an extension for developing Microsoft SQL Server, Azure SQL Database and SQL Data Warehouse everywhere with a rich set of functionalities.

SQL Server Data Tools (SSDT) is a development tool for building SQL Server relational databases, Azure SQL Databases, Analysis Services (AS) data models, Integration Services (IS) packages, and Reporting Services (RS) reports. With SSDT, a developer can design and deploy any SQL Server content type with the same ease as they would develop an application in Visual Studio or Visual Studio Code.

Bulk Copy Program is a command-line tool that comes with Microsoft SQL Server. BCP, allows you to import and export large amounts of data in and out of SQL Server databases quickly snd efficeiently.

SQL Server Migration Assistant is a tool from Microsoft that simplifies database migration process from Oracle to SQL Server, Azure SQL Database, Azure SQL Database Managed Instance and Azure SQL Data Warehouse.

SQL Server Integration Services is a development platform for building enterprise-level data integration and data transformations solutions. Use Integration Services to solve complex business problems by copying or downloading files, loading data warehouses, cleansing and mining data, and managing SQL Server objects and data.

SQL Server Business Intelligence(BI) is a collection of tools in Microsoft's SQL Server for transforming raw data into information businesses can use to make decisions.

Tableau is a Data Visualization software used in relational databases, cloud databases, and spreadsheets. Tableau was acquired by Salesforce in August 2019.

DataGrip is a professional DataBase IDE developed by Jet Brains that provides context-sensitive code completion, helping you to write SQL code faster. Completion is aware of the tables structure, foreign keys, and even database objects created in code you're editing.

RStudio is an integrated development environment for R and Python, with a console, syntax-highlighting editor that supports direct code execution, and tools for plotting, history, debugging and workspace management.

MySQL is a fully managed database service to deploy cloud-native applications using the world's most popular open source database.

PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It is a fully managed, multiregion, multimaster, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications.

FoundationDB is an open source distributed database designed to handle large volumes of structured data across clusters of commodity servers. It organizes data as an ordered key-value store and employs ACID transactions for all operations. It is especially well-suited for read/write workloads but also has excellent performance for write-intensive workloads. FoundationDB was acquired by Apple in 2015.

CouchbaseDB is an open source distributed multi-model NoSQL document-oriented database. It creates a key-value store with managed cache for sub-millisecond data operations, with purpose-built indexers for efficient queries and a powerful query engine for executing SQL queries.

IBM DB2 is a collection of hybrid data management products offering a complete suite of AI-empowered capabilities designed to help you manage both structured and unstructured data on premises as well as in private and public cloud environments. Db2 is built on an intelligent common SQL engine designed for scalability and flexibility.

MongoDB is a document database meaning it stores data in JSON-like documents.

OracleDB is a powerful fully managed database helps developers manage business-critical data with the highest availability, reliability, and security.

MariaDB is an enterprise open source database solution for modern, mission-critical applications.

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers and comes bundled inside countless other applications that people use every day.

SQLite Database Browser is an open source SQL tool that allows users to create, design and edits SQLite database files. It lets users show a log of all the SQL commands that have been issued by them and by the application itself.

dbWatch is a complete database monitoring/management solution for SQL Server, Oracle, PostgreSQL, Sybase, MySQL and Azure. Designed for proactive management and automation of routine maintenance in large scale on-premise, hybrid/cloud database environments.

Cosmos DB Profiler is a real-time visual debugger allowing a development team to gain valuable insight and perspective into their usage of Cosmos DB database. It identifies over a dozen suspicious behaviors from your application’s interaction with Cosmos DB.

Adminer is an SQL management client tool for managing databases, tables, relations, indexes, users. Adminer has support for all the popular database management systems such as MySQL, MariaDB, PostgreSQL, SQLite, MS SQL, Oracle, Firebird, SimpleDB, Elasticsearch and MongoDB.

DBeaver is an open source database tool for developers and database administrators. It offers supports for JDBC compliant databases such as MySQL, Oracle, IBM DB2, SQL Server, Firebird, SQLite, Sybase, Teradata, Firebird, Apache Hive, Phoenix, and Presto.

DbVisualizer is a SQL management tool that allows users to manage a wide range of databases such as Oracle, Sybase, SQL Server, MySQL, H3, and SQLite.

AppDynamics Database is a management product for Microsoft SQL Server. With AppDynamics you can monitor and trend key performance metrics such as resource consumption, database objects, schema statistics and more, allowing you to proactively tune and fix issues in a High-Volume Production Environment.

Toad is a SQL Server DBMS toolset developed by Quest. It increases productivity by using extensive automation, intuitive workflows, and built-in expertise. This SQL management tool resolve issues, manage change and promote the highest levels of code quality for both relational and non-relational databases.

Lepide SQL Server is an open source storage manager utility to analyse the performance of SQL Servers. It provides a complete overview of all configuration and permission changes being made to your SQL Server environment through an easy-to-use, graphical user interface.

Sequel Pro is a fast MacOS database management tool for working with MySQL. This SQL management tool helpful for interacting with your database by easily to adding new databases, new tables, and new rows.

Telco 5G

Back to the Top

VMware Cloud First Approach. Source: VMware.

VMware Telco Cloud Automation Components. Source: VMware.

Telco Learning Resources

HPE(Hewlett Packard Enterprise) Telco Blueprints overview

Network Functions Virtualization Infrastructure (NFVI) by Cisco

Introduction to vCloud NFV Telco Edge from VMware

VMware Telco Cloud Automation(TCA) Architecture Overview

5G Telco Cloud from VMware

Maturing OpenStack Together To Solve Telco Needs from Red Hat

Red Hat telco ecosystem program

OpenStack for Telcos by Canonical

Open source NFV platform for 5G from Ubuntu

Understanding 5G Technology from Verizon

Verizon and Unity partner to enable 5G & MEC gaming and enterprise applications

Understanding 5G Technology from Intel

Understanding 5G Technology from Qualcomm

Telco Acceleration with Xilinx

VIMs on OSM Public Wiki

Amazon EC2 Overview and Networking Introduction for Telecom Companies

Citrix Certified Associate – Networking(CCA-N)

Citrix Certified Professional – Virtualization(CCP-V)

CCNP Routing and Switching

Certified Information Security Manager(CISM)

Wireshark Certified Network Analyst (WCNA)

Juniper Networks Certification Program Enterprise (JNCP)

Cloud Native Computing Foundation Training and Certification Program

Tools

Open Stack is an open source cloud platform, deployed as infrastructure-as-a-service (IaaS) to orchestrate data center operations on bare metal, private cloud hardware, public cloud resources, or both (hybrid/multi-cloud architecture). OpenStack includes advance use of virtualization & SDN for network traffic optimization to handle the core cloud-computing services of compute, networking, storage, identity, and image services.

StarlingX is a complete cloud infrastructure software stack for the edge used by the most demanding applications in industrial IOT, telecom, video delivery and other ultra-low latency use cases.

Airship is a collection of open source tools for automating cloud provisioning and management. Airship provides a declarative framework for defining and managing the life cycle of open infrastructure tools and the underlying hardware.

Network functions virtualization (NFV) is the replacement of network appliance hardware with virtual machines. The virtual machines use a hypervisor to run networking software and processes such as routing and load balancing. NFV allows for the separation of communication services from dedicated hardware, such as routers and firewalls. This separation means network operations can provide new services dynamically and without installing new hardware. Deploying network components with network functions virtualization only takes hours compared to months like with traditional networking solutions.

Software Defined Networking (SDN) is an approach to networking that uses software-based controllers or application programming interfaces (APIs) to communicate with underlying hardware infrastructure and direct traffic on a network. This model differs from that of traditional networks, which use dedicated hardware devices (routers and switches) to control network traffic.

Virtualized Infrastructure Manager (VIM) is a service delivery and reduce costs with high performance lifecycle management Manage the full lifecycle of the software and hardware comprising your NFV infrastructure (NFVI), and maintaining a live inventory and allocation plan of both physical and virtual resources.

Management and Orchestration(MANO) is an ETSI-hosted initiative to develop an Open Source NFV Management and Orchestration (MANO) software stack aligned with ETSI NFV. Two of the key components of the ETSI NFV architectural framework are the NFV Orchestrator and VNF Manager, known as NFV MANO.

Magma is an open source software platform that gives network operators an open, flexible and extendable mobile core network solution. Their mission is to connect the world to a faster network by enabling service providers to build cost-effective and extensible carrier-grade networks. Magma is 3GPP generation (2G, 3G, 4G or upcoming 5G networks) and access network agnostic (cellular or WiFi). It can flexibly support a radio access network with minimal development and deployment effort.

OpenRAN is an intelligent Radio Access Network(RAN) integrated on general purpose platforms with open interface between software defined functions. Open RANecosystem enables enormous flexibility and interoperability with a complete openess to multi-vendor deployments.

Open vSwitch(OVS)is an open source production quality, multilayer virtual switch licensed under the open source Apache 2.0 license. It is designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (NetFlow, sFlow, IPFIX, RSPAN, CLI, LACP, 802.1ag).

Edge is a distributed computing framework that brings enterprise applications closer to data sources such as IoT devices or local edge servers. This proximity to data at its source can deliver strong business benefits, including faster insights, improved response times and better bandwidth availability.

Multi-access edge computing (MEC) is an Industry Specification Group (ISG) within ETSI to create a standardized, open environment which will allow the efficient and seamless integration of applications from vendors, service providers, and third-parties across multi-vendor Multi-access Edge Computing platforms.

Virtualized network functions(VNFs) is a software application used in a Network Functions Virtualization (NFV) implementation that has well defined interfaces, and provides one or more component networking functions in a defined way. For example, a security VNF provides Network Address Translation (NAT) and firewall component functions.

Cloud-Native Network Functions(CNF) is a network function designed and implemented to run inside containers. CNFs inherit all the cloud native architectural and operational principles including Kubernetes(K8s) lifecycle management, agility, resilience, and observability.

Physical Network Function(PNF) is a physical network node which has not undergone virtualization. Both PNFs and VNFs (Virtualized Network Functions) can be used to form an overall Network Service.

Network functions virtualization infrastructure(NFVI) is the foundation of the overall NFV architecture. It provides the physical compute, storage, and networking hardware that hosts the VNFs. Each NFVI block can be thought of as an NFVI node and many nodes can be deployed and controlled geographically.

Open Source Security

Back to the Top

Open Source Security Foundation (OpenSSF) is a cross-industry collaboration that brings together leaders to improve the security of open source software by building a broader community, targeted initiatives, and best practices. The OpenSSF brings together open source security initiatives under one foundation to accelerate work through cross-industry support. Along with the Core Infrastructure Initiative and the Open Source Security Coalition, and will include new working groups that address vulnerability disclosures, security tooling and more.

Security Standards, Frameworks and Benchmarks

STIGs Benchmarks - Security Technical Implementation Guides

CIS Benchmarks - CIS Center for Internet Security

CIS Top 18 Critical Security Controls

OSSTMM (Open Source Security Testing Methodology Manual) PDF

NIST Technical Guide to Information Security Testing and Assessment (PDF)

NIST - Current FIPS

ISO Standards Catalogue

Common Criteria for Information Technology Security Evaluation (CC) is an international standard (ISO / IEC 15408) for computer security. It allows an objective evaluation to validate that a particular product satisfies a defined set of security requirements.

ISO 22301 is the international standard that provides a best-practice framework for implementing an optimised BCMS (business continuity management system).

ISO27001 is the international standard that describes the requirements for an ISMS (information security management system). The framework is designed to help organizations manage their security practices in one place, consistently and cost-effectively.

ISO 27701 specifies the requirements for a PIMS (privacy information management system) based on the requirements of ISO 27001. It is extended by a set of privacy-specific requirements, control objectives and controls. Companies that have implemented ISO 27001 will be able to use ISO 27701 to extend their security efforts to cover privacy management.

EU GDPR (General Data Protection Regulation) is a privacy and data protection law that supersedes existing national data protection laws across the EU, bringing uniformity by introducing just one main data protection law for companies/organizations to comply with.

CCPA (California Consumer Privacy Act) is a data privacy law that took effect on January 1, 2020 in the State of California. It applies to businesses that collect California residents’ personal information, and its privacy requirements are similar to those of the EU’s GDPR (General Data Protection Regulation).

Payment Card Industry (PCI) Data Security Standards (DSS) is a global information security standard designed to prevent fraud through increased control of credit card data.

SOC 2 is an auditing procedure that ensures your service providers securely manage your data to protect the interests of your comapny/organization and the privacy of their clients.

NIST CSF is a voluntary framework primarily intended for critical infrastructure organizations to manage and mitigate cybersecurity risk based on existing best practice.

Security Tools

AppArmor is an effective and easy-to-use Linux application security system. AppArmor proactively protects the operating system and applications from external or internal threats, even zero-day attacks, by enforcing good behavior and preventing both known and unknown application flaws from being exploited. AppArmor supplements the traditional Unix discretionary access control (DAC) model by providing mandatory access control (MAC). It has been included in the mainline Linux kernel since version 2.6.36 and its development has been supported by Canonical since 2009.

SELinux is a security enhancement to Linux which allows users and administrators more control over access control. Access can be constrained on such variables as which users and applications can access which resources. These resources may take the form of files. Standard Linux access controls, such as file modes (-rwxr-xr-x) are modifiable by the user and the applications which the user runs. Conversely, SELinux access controls are determined by a policy loaded on the system which may not be changed by careless users or misbehaving applications.

Control Groups(Cgroups) is a Linux kernel feature that allows you to allocate resources such as CPU time, system memory, network bandwidth, or any combination of these resources for user-defined groups of tasks (processes) running on a system.

EarlyOOM is a daemon for Linux that enables users to more quickly recover and regain control over their system in low-memory situations with heavy swap usage.

Libgcrypt is a general purpose cryptographic library originally based on code from GnuPG.

Pi-hole is a DNS sinkhole that protects your devices from unwanted content, without installing any client-side software, intended for use on a private network. It is designed for use on embedded devices with network capability, such as the Raspberry Pi, but it can be used on other machines running Linux and cloud implementations.

Aircrack-ng is a network software suite consisting of a detector, packet sniffer, WEP and WPA/WPA2-PSK cracker and analysis tool for 802.11 wireless LANs. It works with any wireless network interface controller whose driver supports raw monitoring mode and can sniff 802.11a, 802.11b and 802.11g traffic.

Acra is a single database security suite with 9 strong security controls: application level encryption, searchable encryption, data masking, data tokenization, secure authentication, data leakage prevention, database request firewall, cryptographically signed audit logging, security events automation. It is designed to cover the most important data security requirements with SQL and NoSQL databases and distributed apps in a fast, convenient, and reliable way.

Netdata is high-fidelity infrastructure monitoring and troubleshooting, real-time monitoring Agent collects thousands of metrics from systems, hardware, containers, and applications with zero configuration. It runs permanently on all your physical/virtual servers, containers, cloud deployments, and edge/IoT devices, and is perfectly safe to install on your systems mid-incident without any preparation.

Trivy is a comprehensive security scanner for vulnerabilities in container images, file systems, and Git repositories, as well as for configuration issues and hard-coded secrets.

Lynis is a security auditing tool for Linux, macOS, and UNIX-based systems. Assists with compliance testing (HIPAA/ISO27001/PCI DSS) and system hardening. Agentless, and installation optional.

OWASP Nettacker is a project created to automate information gathering, vulnerability scanning and eventually generating a report for networks, including services, bugs, vulnerabilities, misconfigurations, and other information. This software will utilize TCP SYN, ACK, ICMP, and many other protocols in order to detect and bypass Firewall/IDS/IPS devices.

Terrascan is a static code analyzer for Infrastructure as Code to mitigate risk before provisioning cloud native infrastructure.

Sliver is an open source cross-platform adversary emulation/red team framework, it can be used by organizations of all sizes to perform security testing. Sliver's implants support C2 over Mutual TLS (mTLS), WireGuard, HTTP(S), and DNS and are dynamically compiled with per-binary asymmetric encryption keys.

Attack Surface Analyzer is a Microsoft developed open source security tool that analyzes the attack surface of a target system and reports on potential security vulnerabilities introduced during the installation of software or system misconfiguration.

Intel Owl is an Open Source Intelligence, or OSINT solution to get threat intelligence data about a specific file, an IP or a domain from a single API at scale. It integrates a number of analyzers available online and a lot of cutting-edge malware analysis tools.

Deepfence ThreatMapper is a runtime tool that hunts for vulnerabilities in your cloud native production platforms(Linux, K8s, AWS Fargate and more.), and ranks these vulnerabilities based on their risk-of-exploit.

Dockle is a Container Image Linter for Security and helping build the Best-Practice Docker Image.

RustScan is a Modern Port Scanner.

gosec is a Golang Security Checker that inspects source code for security problems by scanning the Go AST.

Prowler is an Open Source security tool to perform AWS security best practices assessments, audits, incident response, continuous monitoring, hardening and forensics readiness. It contains more than 240 controls covering CIS, PCI-DSS, ISO27001, GDPR, HIPAA, FFIEC, SOC2, AWS FTR, ENS and custom security frameworks.

Burp Suite is a leading range of cybersecurity tools.

KernelCI is a community-based open source distributed test automation system focused on upstream kernel development. The primary goal of KernelCI is to use an open testing philosophy to ensure the quality, stability and long-term maintenance of the Linux kernel.

Continuous Kernel Integration project helps find bugs in kernel patches before they are commited to an upstram kernel tree. We are team of kernel developers, kernel testers, and automation engineers.

eBPF is a revolutionary technology that can run sandboxed programs in the Linux kernel without changing kernel source code or loading kernel modules. By making the Linux kernel programmable, infrastructure software can leverage existing layers, making them more intelligent and feature-rich without continuing to add additional layers of complexity to the system.

Cilium uses eBPF to accelerate getting data in and out of L7 proxies such as Envoy, enabling efficient visibility into API protocols like HTTP, gRPC, and Kafka.

Hubble is a Network, Service & Security Observability for Kubernetes using eBPF.

Istio is an open platform to connect, manage, and secure microservices. Istio's control plane provides an abstraction layer over the underlying cluster management platform, such as Kubernetes and Mesos.

Certgen is a convenience tool to generate and store certificates for Hubble Relay mTLS.

Scapy is a python-based interactive packet manipulation program & library.

syzkaller is an unsupervised, coverage-guided kernel fuzzer.

SchedViz is a tool for gathering and visualizing kernel scheduling traces on Linux machines.

oss-fuzz aims to make common open source software more secure and stable by combining modern fuzzing techniques with scalable, distributed execution.

OSSEC is a free, open-source host-based intrusion detection system. It performs log analysis, integrity checking, Windows registry monitoring, rootkit detection, time-based alerting, and active response.

Metasploit Project is a computer security project that provides information about security vulnerabilities and aids in penetration testing and IDS signature development.

Wfuzz was created to facilitate the task in web applications assessments and it is based on a simple concept: it replaces any reference to the FUZZ keyword by the value of a given payload.

Nmap is a security scanner used to discover hosts and services on a computer network, thus building a "map" of the network.

Patchwork is a web-based patch tracking system designed to facilitate the contribution and management of contributions to an open-source project.

pfSense is a free and open source firewall and router that also features unified threat management, load balancing, multi WAN, and more.

Snowpatch is a continuous integration tool for projects using a patch-based, mailing-list-centric git workflow. This workflow is used by a number of well-known open source projects such as the Linux kernel.

Snort is an open-source, free and lightweight network intrusion detection system (NIDS) software for Linux and Windows to detect emerging threats.

Wireshark is a free and open-source packet analyzer. It is used for network troubleshooting, analysis, software and communications protocol development, and education.

OpenSCAP is U.S. standard maintained by National Institute of Standards and Technology (NIST). It provides multiple tools to assist administrators and auditors with assessment, measurement, and enforcement of security baselines. OpenSCAP maintains great flexibility and interoperability by reducing the costs of performing security audits. Whether you want to evaluate DISA STIGs, NIST‘s USGCB, or Red Hat’s Security Response Team’s content, all are supported by OpenSCAP.

Tink is a multi-language, cross-platform, open source library that provides cryptographic APIs that are secure, easy to use correctly, and harder to misuse.

OWASP is an online community, produces freely-available articles, methodologies, documentation, tools, and technologies in the field of web application security.

Open Vulnerability and Assessment Language is a community effort to standardize how to assess and report upon the machine state of computer systems. OVAL includes a language to encode system details, and community repositories of content. Tools and services that use OVAL provide enterprises with accurate, consistent, and actionable information to improve their security.

ClamAV is an open source antivirus engine for detecting trojans, viruses, malware & other malicious threats.

Security Tutorials & Resources

Security Certifications

Contribute

If would you like to contribute to this guide simply make a Pull Request.

License

Back to the Top

Distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) Public License.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Contributing.md		Contributing.md
Getting Started with Kubernetes.go		Getting Started with Kubernetes.go
README.md		README.md
Security Glossary.md		Security Glossary.md

mikeroyal/Kubernetes-Guide

Folders and files

Latest commit

History

Repository files navigation

Kubernetes Guide

A guide covering Kubernetes including the applications and tools that will make you a better and more efficient Kubernetes developer.

Table of Contents

Getting Started with Kubernetes

Developer Resources

Kubernetes Courses & Certifications

Kubernetes Books

YouTube Tutorials

Red Hat CodeReady Containers (CRC) on WSL

Interacting with the cluster. The most common ways include:

Setting up Podman on WSL

Setting up Buildah on WSL

Installing Kubernetes on WSL with Rancher Desktop

.deb Dev Repository

.rpm Dev Repository

Installing Kubernetes on WSL with Docker Desktop

Installing Kubernetes on WSL with Microk8s

Kubernetes Tools, Frameworks, and Projects

Getting Started with OpenShift

What is OpenShift?

OpenShift Developer Resources

Certifications & Courses

Books

Source-to-Image (S2I) images for programming/buildng your Apps

Java

Python

Golang

Ruby

.NET Core

Node.js

Perl

PHP

Builder Images for setting up Databases

MySQL

PostgreSQL

MongoDB

MariaDB

Redis

Setting up on Microsoft Azure

Register the Resource Providers

Setting up on Google Cloud (GCP)

Minimum Requirements:

Setting up Red Hat OpenShift Data Science

Red Hat CodeReady Containers (CRC)

Interacting with the cluster. The most common ways include:

Setting up Podman

Setting up Buildah

Setting up Skopeo

File systems

OpenShift Tools

Go Development

Go Learning Resources

Go Tools

Python Development

Python Learning Resources

Python Frameworks and Tools

Bash/PowerShell Development

Bash/PowerShell Learning Resources

Bash/ PowerShell Tools

Machine Learning

ML Learning Resources

ML frameworks & applications

Networking

Networking Learning Resources

Networking Tools & Concepts

Network Layers

Interfaces

Network Protocols

Virtualization

Databases

Database Learning Resources

Databases and Tools

Telco 5G

Telco Learning Resources

Tools

Packages