Skip to content

Commit

Permalink
x86/sgx: Add a page reclaimer
Browse files Browse the repository at this point in the history
Just like normal RAM, there is a limited amount of enclave memory available
and overcommitting it is a very valuable tool to reduce resource use.
Introduce a simple reclaim mechanism for enclave pages.

In contrast to normal page reclaim, the kernel cannot directly access
enclave memory.  To get around this, the SGX architecture provides a set of
functions to help.  Among other things, these functions copy enclave memory
to and from normal memory, encrypting it and protecting its integrity in
the process.

Implement a page reclaimer by using these functions. Picks victim pages in
LRU fashion from all the enclaves running in the system.  A new kernel
thread (ksgxswapd) reclaims pages in the background based on watermarks,
similar to normal kswapd.

All enclave pages can be reclaimed, architecturally.  But, there are some
limits to this, such as the special SECS metadata page which must be
reclaimed last.  The page version array (used to mitigate replaying old
reclaimed pages) is also architecturally reclaimable, but not yet
implemented.  The end result is that the vast majority of enclave pages are
currently reclaimable.

Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jethro Beekman <jethro@fortanix.com>
Link: https://lkml.kernel.org/r/20201112220135.165028-22-jarkko@kernel.org
  • Loading branch information
jarkkojs authored and suryasaimadhu committed Nov 18, 2020
1 parent 2adcba7 commit 1728ab5
Show file tree
Hide file tree
Showing 6 changed files with 1,134 additions and 27 deletions.
59 changes: 40 additions & 19 deletions arch/x86/kernel/cpu/sgx/driver.c
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,24 @@ u32 sgx_misc_reserved_mask;
static int sgx_open(struct inode *inode, struct file *file)
{
struct sgx_encl *encl;
int ret;

encl = kzalloc(sizeof(*encl), GFP_KERNEL);
if (!encl)
return -ENOMEM;

kref_init(&encl->refcount);
xa_init(&encl->page_array);
mutex_init(&encl->lock);
INIT_LIST_HEAD(&encl->va_pages);
INIT_LIST_HEAD(&encl->mm_list);
spin_lock_init(&encl->mm_lock);

ret = init_srcu_struct(&encl->srcu);
if (ret) {
kfree(encl);
return ret;
}

file->private_data = encl;

Expand All @@ -33,31 +44,37 @@ static int sgx_open(struct inode *inode, struct file *file)
static int sgx_release(struct inode *inode, struct file *file)
{
struct sgx_encl *encl = file->private_data;
struct sgx_encl_page *entry;
unsigned long index;

xa_for_each(&encl->page_array, index, entry) {
if (entry->epc_page) {
sgx_free_epc_page(entry->epc_page);
encl->secs_child_cnt--;
entry->epc_page = NULL;
struct sgx_encl_mm *encl_mm;

/*
* Drain the remaining mm_list entries. At this point the list contains
* entries for processes, which have closed the enclave file but have
* not exited yet. The processes, which have exited, are gone from the
* list by sgx_mmu_notifier_release().
*/
for ( ; ; ) {
spin_lock(&encl->mm_lock);

if (list_empty(&encl->mm_list)) {
encl_mm = NULL;
} else {
encl_mm = list_first_entry(&encl->mm_list,
struct sgx_encl_mm, list);
list_del_rcu(&encl_mm->list);
}

kfree(entry);
}
spin_unlock(&encl->mm_lock);

xa_destroy(&encl->page_array);
/* The enclave is no longer mapped by any mm. */
if (!encl_mm)
break;

if (!encl->secs_child_cnt && encl->secs.epc_page) {
sgx_free_epc_page(encl->secs.epc_page);
encl->secs.epc_page = NULL;
synchronize_srcu(&encl->srcu);
mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm);
kfree(encl_mm);
}

/* Detect EPC page leaks. */
WARN_ON_ONCE(encl->secs_child_cnt);
WARN_ON_ONCE(encl->secs.epc_page);

kfree(encl);
kref_put(&encl->refcount, sgx_encl_release);
return 0;
}

Expand All @@ -70,6 +87,10 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
if (ret)
return ret;

ret = sgx_encl_mm_add(encl, vma->vm_mm);
if (ret)
return ret;

vma->vm_ops = &sgx_vm_ops;
vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
vma->vm_private_data = encl;
Expand Down
Loading

0 comments on commit 1728ab5

Please sign in to comment.