High-Performance Compute (HPC)

Typically describes the aggregation of complex processes across many different machines thereby maximizing the computing power of all of the machines.
Used for massively scalable parallel processing of memory-intensive workloads
- e.g. 3D rendering and tasks that process, transform, and analyze large volumes of data.
For other cloud-based and even hybrid solutions, Azure provides access to both Windows and Linux HPC clusters.
Through HPC in the cloud, one could create enough compute instances to create a model or perform a calculation and then destroy the instances immediately afterward.
Advancements in the HPC allows machines to share memory or communicate with each other in a low latency manner.
Features of Azure that support HPC
- HPC Pack
- Azure Batch for short-term massively scalable infrastructure
- Stateless Component Workloads.

Remote Direct Memory Access

Remote Direct Memory Access, or RDMA, is a technology that provides a low-latency network connection between processing running on two servers, or virtual machines in Azure.
From a developer perspective, RDMA is implemented in a way to make it seem that the machines are "sharing memory."
RDMA is efficient because it copies data from the network adapter directly to memory and avoids wasting CPU cycles.
VM sizes A8 to A19 => Most efficient way to run

Microsoft's HPC cluster and job management solution for Windows.
Can be installed on a server that functions as the head node
- The server can be used to manage compute nodes in an HPC cluster.
Can be in hybrid scenarios where you want to "burst to Azure" with A8 or A9 instances to obtain more processing power.
Supports several Linux distributions to run on compute nodes deployed in Azure VMs, managed by a Windows Server head node.

Managed HPC pack offering on Azure.
Provides
- Auto-scaling
- Job scheduling
- compute resource management
  - VMs of compute nodes
Includes end-of-cycle processing such as a bank's daily risk reporting or a payroll that must be done on schedule.
Includes large-scale business, science, and engineering applications that typically need the tools and resources of a compute cluster or grid.
Batch works well with intrinsically parallel (sometimes called "embarrassingly parallel") applications or workloads, which lend themselves to running as parallel tasks on multiple computers.

Pool
- Number and size of machine
- E.g. 100 machines, n1 series.
Job
- Includes one or more code packages.
Task
- Task to execute code package.
Execution:
1. Starts all VMs (takes time)
2. Executes code
3. Removes all VMs

Through the Batch API
Can be managed as part of a larger workflow managed by tools such as Azure Data Factory.
You can wrap an existing application, so it runs as a service on a pool of compute nodes that Batch manages in the background.
- You can develop the service to let users offload peak work to the cloud, or run their work entirely in the cloud.
- The Batch Apps framework handles the movement of input and output files, the splitting of jobs into tasks, job and task processing, and data persistence.

You do not have to rely on the pre-defined HPC services.
Azure provides non-managed solutions and related services to provide a degree of HPC using IaaS.
- These include: • Virtual Machines • VM scale Sets • Azure Container Services • HDInsight • Machine Learning and more.
- Azure fabric is a good platform to deploy parallel compute tasks on several services.
💡 Best practices:
- Azure Resource manager to automate deployment.
- Auto scale based on various criteria.
  - E.g. deploying a scale set of VMs with auto-scale based on a custom metric with HPC Pack.