forked from E4S-Project/ECP-ST-CAR-PUBLIC
-
Notifications
You must be signed in to change notification settings - Fork 0
/
2.3.1.16-SICM.tex
18 lines (16 loc) · 5.52 KB
/
2.3.1.16-SICM.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
\subsubsection{\stid{1.16} SICM}
\paragraph{Overview} The goal of this project is to create a universal interface for discovering, managing and sharing within complex memory hierarchies. The result will be a memory API and a software library which implements it. These will allow operating system, runtime and application developers and vendors to access emerging memory technologies. The impact of the project will be immediate and potentially wide reaching, as developers in all areas are struggling to add support for the new memory technologies, each of which offers their own programming interface. The problem we are addressing is how to program the deluge of existing and emerging complex memory technologies on HPC systems. This includes the MCDRAM (on Intel Knights Landing), NV-DIMM, PCI-E NVM, SATA NVM, 3D stacked memory, PCM, memristor, and 3Dxpoint. Also, near node technologies, such as PCI-switch accessible memory or network attached memories, have been proposed in exascale memory designs. Current practice depends on ad hoc solutions rather than a uniform API that provides the needed specificity and portability. This approach is already insufficient and future memory technologies will only exacerbate the problem by adding additional proprietary APIs. Our solution is to provide a unified two-tier node-level complex memory API. The target for the low-level interface are system and runtime developers, as well as expert application developers that prefer full control of what memory types the application is using. The high-level interface is designed for application developers who would rather define coarser-level constraints on the types of memories the application needs and leave out the details of the memory management. The low-level interface is primarily an engineering and implementation project. The solution it provides is urgently needed by the HPC community; as developers work independently to support these novel memory technologies, time and effort is wasted on redundant solutions and overlapping implementations. Adoption of the software is focused on absorbtion into existing open source projects such as hwloc, KOKKOS, Umpire, CLANG/LLVM, OpenMP, and Jemalloc.
\begin{itemize}
\item Low-Level Interface: Finished refactor of low-level interface supporting memory arenas on different memory types. Added support for Umpire, OpenMP and KOKKOS. Investigating features need to fully support these runtimes, currently KOKKOS and the Nalu-wind ECP application. SICM now supports Intel Optane memory, the first NVM memory that can be used as an extension of traditional DRAM memory.
Pull requests have been developed for OpenMP/CLANG/LLVM, KOKKOS and Umpire. the patches to Clang/LLVM/OpenMP turn OpenMP memory spaces in OpenMP 5.x into SICM library calls in the LLVM/OpenMP runtime. The same codepath that supports memkind library was refactored to support multiple custom memory allocators – more general than just SICM support.
SICM currently supports ``pragma openmp allocate'' with memory types: omp\_ (default, large\_cap, const, high\_bw, low\_lat ) \_mem\_spaces and supports KNL, Optane, testing on Sierra/Summit.
\item High-Level Graph Interface: Metall is a persistent memory allocator designed to provide developers with an API to allocate custom C++ data structures in both block-storage and byte- addressable persistent memories (e.g., NVMe and Intel Optane DC Persistent Memory) beyond a single process lifetime. Metall relies on a file-backed mmap mechanism to map a file in a filesystem into the virtual memory of an application, allowing the application to access the mapped region as if it were regular memory which can be larger than the physical main-memory of the system. This is currently being investigated by leaveraging non-ECP funding at LANL.
\item Analysis:
SICM has employed application profiling and analysis to direct data management across complex memory hierarchy, the team extended the SICM high-level interface with application-directed data tiering based on the MemBrain approach which is more effective than an unguided first touch policy. The impact of using different data features to steer hot program data into capacity-constrained device tiers was modeled.
\end{itemize}
\paragraph{Next Steps}
\begin{itemize}
\item Low-Level Interface: Focus on performance of support for runtimes and adding feature requested to support Umpire, OpenMP KOKKOS, and MPI and address the slow move pages implementation in the Linux kernel. Initial efforts to accelerate move pages have not been sucessful and progress has been hampered by current work environment. The Linux kernel modifications for page migration are in collaboration with ECP project Argo 2.3.5.05 and RIKEN research center in Japan. Start collaborating with applications to enable use of heterogenous memory on ECP target platforms. Additionally, the team needs to recoonect with the hwloc team on memory topology discovery.
\item For the analysis work the team will extend and hardend the tools for guiding application memory management and investigate feature categories to classify objects associated with different features such as size, type, allocation time, etc to guide data placement.
\item For the Metall high-level interface focusing on graph applications, work has used miniVite, and ECP graph proxy application. miniVit has been modififed to store and reuse graph data using Metall. graph generation has been a bottleneck in this application. Metall can use mmap and UMap (user-level mmap library in Argo PowerSteering project) underneath to enhance its performance and capability.
\end{itemize}