-
Notifications
You must be signed in to change notification settings - Fork 9
Memory Management
All physical memory is mapped linearly starting from the address 0xFFFF800000000000
on boot - a physical address can be translated into a usable kernel virtual address by simply adding this offset. This simplifies many things but notably simplifies the mapping algorithm itself where if we need to make an allocation to create a new page table, we are guaranteed to be able to access the memory of that allocation. Otherwise we might need to create a new page table just to access the allocation we wanted to use for creating the other page table. Other things like hardware MMIO is also simpler with this method, and there's a potential performance boost due to not needing to perform so many TLB invalidations / shootdowns caused by temporarily mapping things in and out.
Additionally, the kernel code is mapped and linked at 0xFFFFFFFF80000000
. This allows us to to use (the LLVM equivalent of) -mcmodel=kernel
which allows for better code gen. Any other memory in the system has the NX
bit set in the paging structures, which prevents execution.
The physical memory manager (PMM) is responsible for allocating physical frames. We're using a buddy allocation system which requires some data structures to be set up. The size of the buddy allocator's memory overhead is proportional to the amount of actual physical memory in the system so we can't really just hardcode a block of memory and hope it's enough - we need a simple allocator which can allocate for the PMM's data structures.
This is the purpose of the Bump Allocator - it holds a fixed array of usable memory regions (each being a start address and a size), lifted straight from the memory map provided to us by the bootloader
crate. When we need a page, we ask the bump allocator and it simply increases the start address of one of the regions to make room. It does not support deallocation because everything that is allocated is assumed to be needed until the OS is shut down.
Chunks of contiguous physical memory have their own buddy allocator structures. The main thing these structures contain is an array of linked lists. Each list holds sub-chunks of a specific order. So one list might contain chunks with order 0 (ie individual 4kb pages); another list might contain chunks of order 2 (ie 16kb blocks). This means that when an allocation is requested, say an order 0 allocation, we just go to the linked list and pop an order 0 chunk off the head.
The nodes of each list have type PageInfo
. This stores info about a single page, so we allocate one PageInfo
struct per page of usable physical memory in the system. It is important to keep this struct as small as possible. On Linux this is 64 bytes, for example. PageInfo
contains next and prev links to other pages which the buddy allocator manipulates to make allocations as described before. As for the PageInfo
themselves, these are allocated in a giant array of virtually (but not necessarily physically) contiguous memory at a fixed address. This allows us access the PageInfo
corresponding to a given physical address just by doing some simple arithmetic.
In order to allocate this big array we need some pages for the structs themselves, but also likely some pages for page tables to map the array into place. We use the Bump Allocator for this purpose.
We use a MmuInfo
struct to hold and synchronise accesses to page tables. Each MmuInfo
owns a level 4 page table. Each user process owns a MmuInfo
. Since we can think of the kernel itself as just one big process, the kernel has exactly one MmuInfo
struct used by all kernel threads. Access to the page table itself is synchronised by a RwLock
.
When a mapping needs to be made - e.g a user process calls malloc
requesting some memory, the kernel simply takes the RwLock
and modifies the page table of the calling process's MmuInfo
. We sometimes need to map things into the kernel's MmuInfo
, such as to create the array of PageInfo
, or to boot the other CPU cores in SMP.
The kernel's mappings are copied into every new MmuInfo
we create. The appropriate protections are applied so that the user can't inspect any kernel memory, but this means that when e.g an interrupt occurs and we are switched into ring 0, we have full access to the rest of the kernel's memory. So each user process's MmuInfo
will always contain more mappings than the kernel's - it contains the kernel's mappings, plus mappings specific to that user process.