Skip to content

Latest commit

 

History

History
72 lines (59 loc) · 3.92 KB

4.2.1. High-Performance Compute (HPC).md

File metadata and controls

72 lines (59 loc) · 3.92 KB

High-Performance Compute (HPC)

  • Typically describes the aggregation of complex processes across many different machines thereby maximizing the computing power of all of the machines.
  • Used for massively scalable parallel processing of memory-intensive workloads
    • e.g. 3D rendering and tasks that process, transform, and analyze large volumes of data.
  • For other cloud-based and even hybrid solutions, Azure provides access to both Windows and Linux HPC clusters.
  • Through HPC in the cloud, one could create enough compute instances to create a model or perform a calculation and then destroy the instances immediately afterward.
  • Advancements in the HPC allows machines to share memory or communicate with each other in a low latency manner.
  • Features of Azure that support HPC

Remote Direct Memory Access

  • Remote Direct Memory Access, or RDMA, is a technology that provides a low-latency network connection between processing running on two servers, or virtual machines in Azure.
  • From a developer perspective, RDMA is implemented in a way to make it seem that the machines are "sharing memory."
  • RDMA is efficient because it copies data from the network adapter directly to memory and avoids wasting CPU cycles.
  • VM sizes A8 to A19 => Most efficient way to run

HPC Pack

  • Microsoft's HPC cluster and job management solution for Windows.
  • Can be installed on a server that functions as the head node
    • The server can be used to manage compute nodes in an HPC cluster.
  • Can be in hybrid scenarios where you want to "burst to Azure" with A8 or A9 instances to obtain more processing power.
  • Supports several Linux distributions to run on compute nodes deployed in Azure VMs, managed by a Windows Server head node.

Azure Batch

  • Managed HPC pack offering on Azure.
  • Provides
    • Auto-scaling
    • Job scheduling
    • compute resource management
      • VMs of compute nodes
  • Includes end-of-cycle processing such as a bank's daily risk reporting or a payroll that must be done on schedule.
  • Includes large-scale business, science, and engineering applications that typically need the tools and resources of a compute cluster or grid.
  • Batch works well with intrinsically parallel (sometimes called "embarrassingly parallel") applications or workloads, which lend themselves to running as parallel tasks on multiple computers.

Concepts

  • Pool
    • Number and size of machine
    • E.g. 100 machines, n1 series.
  • Job
    • Includes one or more code packages.
  • Task
    • Task to execute code package.
  • Execution:
    1. Starts all VMs (takes time)
    2. Executes code
    3. Removes all VMs

Scaling out Parallel Workloads

  • Through the Batch API
  • Can be managed as part of a larger workflow managed by tools such as Azure Data Factory.
  • You can wrap an existing application, so it runs as a service on a pool of compute nodes that Batch manages in the background.
    • You can develop the service to let users offload peak work to the cloud, or run their work entirely in the cloud.
    • The Batch Apps framework handles the movement of input and output files, the splitting of jobs into tasks, job and task processing, and data persistence.

Stateless component workloads

  • You do not have to rely on the pre-defined HPC services.
  • Azure provides non-managed solutions and related services to provide a degree of HPC using IaaS.
    • These include: • Virtual Machines • VM scale Sets • Azure Container Services • HDInsight • Machine Learning and more.
    • Azure fabric is a good platform to deploy parallel compute tasks on several services.
  • 💡 Best practices:
    • Azure Resource manager to automate deployment.
    • Auto scale based on various criteria.
      • E.g. deploying a scale set of VMs with auto-scale based on a custom metric with HPC Pack.