- Typically describes the aggregation of complex processes across many different machines thereby maximizing the computing power of all of the machines.
- Used for massively scalable parallel processing of memory-intensive workloads
- e.g. 3D rendering and tasks that process, transform, and analyze large volumes of data.
- For other cloud-based and even hybrid solutions, Azure provides access to both Windows and Linux HPC clusters.
- Through HPC in the cloud, one could create enough compute instances to create a model or perform a calculation and then destroy the instances immediately afterward.
- Advancements in the HPC allows machines to share memory or communicate with each other in a low latency manner.
- Features of Azure that support HPC
- HPC Pack
- Azure Batch for short-term massively scalable infrastructure
- Stateless Component Workloads.
- Remote Direct Memory Access, or RDMA, is a technology that provides a low-latency network connection between processing running on two servers, or virtual machines in Azure.
- From a developer perspective, RDMA is implemented in a way to make it seem that the machines are "sharing memory."
- RDMA is efficient because it copies data from the network adapter directly to memory and avoids wasting CPU cycles.
- VM sizes A8 to A19 => Most efficient way to run
- Microsoft's HPC cluster and job management solution for Windows.
- Can be installed on a server that functions as the head node
- The server can be used to manage compute nodes in an HPC cluster.
- Can be in hybrid scenarios where you want to "burst to Azure" with A8 or A9 instances to obtain more processing power.
- Supports several Linux distributions to run on compute nodes deployed in Azure VMs, managed by a Windows Server head node.
- Managed HPC pack offering on Azure.
- Provides
- Auto-scaling
- Job scheduling
- compute resource management
- VMs of compute nodes
- Includes end-of-cycle processing such as a bank's daily risk reporting or a payroll that must be done on schedule.
- Includes large-scale business, science, and engineering applications that typically need the tools and resources of a compute cluster or grid.
- Batch works well with intrinsically parallel (sometimes called "embarrassingly parallel") applications or workloads, which lend themselves to running as parallel tasks on multiple computers.
- Pool
- Number and size of machine
- E.g. 100 machines, n1 series.
- Job
- Includes one or more code packages.
- Task
- Task to execute code package.
- Execution:
- Starts all VMs (takes time)
- Executes code
- Removes all VMs
- Through the Batch API
- Can be managed as part of a larger workflow managed by tools such as Azure Data Factory.
- You can wrap an existing application, so it runs as a service on a pool of compute nodes that Batch manages in the background.
- You can develop the service to let users offload peak work to the cloud, or run their work entirely in the cloud.
- The Batch Apps framework handles the movement of input and output files, the splitting of jobs into tasks, job and task processing, and data persistence.
- You do not have to rely on the pre-defined HPC services.
- Azure provides non-managed solutions and related services to provide a degree of HPC using IaaS.
- These include: • Virtual Machines • VM scale Sets • Azure Container Services • HDInsight • Machine Learning and more.
- Azure fabric is a good platform to deploy parallel compute tasks on several services.
- 💡 Best practices:
- Azure Resource manager to automate deployment.
- Auto scale based on various criteria.
- E.g. deploying a scale set of VMs with auto-scale based on a custom metric with HPC Pack.