-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: SGX2 support in Gramine (Phase 1) #683
Comments
Thanks for the great write-up! Very easy to follow.
Please add a separate LibOS test (maybe several, but I would prefer just one test). This way we can just mark this particular test as "requires-EDMM" and skip it in our normal CI, and only run it in the EDMM-enlightened CI. |
Sure will do, it makes sense to do it this way. |
Please rebase to V5 before submitting. I don't understand what's the point of reviewing code based on outdated upstream, just to later have to review the rebase diff... What about the heap pool resizing? (as we discussed on the call - resizing it like |
OK will rebase to V5. Although things look good on the driver side, it is not yet confirmed if V5 will be the last version and so didn't want to keep moving unless there were user-space related changes.
Yes, looking into this. I will come up with an initial design and review it with maintainers.
Heap pool resizing is associated with how we free the heap, but hybrid optimization is to do with pre-allocating (using |
No, the idea is to also grow it in bigger chunks. Ofc. this assumes that user allocations are usually either next to each other or not |
There are two issues here:
|
@boryspoplawski: My assumption is that in practice most LibOS allocation requests are just trying to expand heap via mmap to handle a small allocation from app's
Oh, I forgot about this, it may actually be a huge obstacle for this idea :/ |
Summarizing opens that were discussed offline:
|
One more thing, I forgot to mention is the use of |
Thanks for the proposal and summary!
Quick Q: |
It is actually the number of memory ranges that are freed. How it works is that the percentage is converted to a threshold (in bytes) and whenever a memory range is freed by the application, the freed size is accumulated. When the accumulated free size grows above the threshold, the memory ranges are removed from the enclave. The reason I chose percentage is that it is easier for the end-user to tune. |
@vijaydhanraj The initial EDMM support was implemented in Gramine now. We also have a separate issue on adding optimizations (like lazy allocation) to EDMM: #1099 Looks like the only thing left is adding a separate issue on dynamic thread creation with EDMM. Could you create such an issue? Let me close this issue, since it is basically completed. |
@dimakuv there are two more tasks as part of this issue, 1) hybrid optimization 2) Lazy free optimization that are not complete yet. But agree we can close this and I can create another issue for EDMM optimizations or could reuse #1099 and add these optimizations as part of it and call it EDMM optimizations instead of lazy allocation. Please let me know which is preferred. |
I think it's better to create separate issues. I have a feeling only a subset of these three optimizations will be merged into Gramine (as the others may not yield sufficient perf gains). So we will maybe fix e.g. two of the three, but the third one we'll close as |
Thanks @vijaydhanraj. Marked all these new issues with respective labels. |
SGX1 instruction set requires all enclave memory to be committed at enclave build time. It also requires the developer to predict and use maximum heap and stack sizes in the enclave build. Likewise, additional code modules cannot be dynamically loaded into the enclave environment after enclave build. This increases enclave build time and limits the enclave’s ability to adapt to changing workloads.
Additionally, page protections cannot be changed for an enclave memory. Executable code containing relocations must be loaded as Read, Write, and Execute (RWX) and remain that way for the life of the enclave. This also limits the capabilities of garbage collectors and dynamic translators or just-in-time (JIT) compilers with the enclave.
SGX2 instruction set was designed to overcome these limitations. SGX2 Extensions give the software the ability to dynamically add and remove pages from an enclave and to manage the attributes of enclave pages.
This RFC focuses on adding support for 2 key features,
SGX2 instruction set:
SGX2 offers the below instructions to enable the aforementioned features. Please refer to Intel SDM, Chapter
INTRODUCTION TO INTEL SOFTWARE GUARD EXTENSIONS
for more details.In-Kernel Driver Support:
SGX2 support in-kernel driver changes will probably be part of the 5.20 kernel which will be out sometime in the 1st week of October 2022.
My current PoC is based on the V4 version of the submitted kernel patch series. V5 seems to be the final one and the maintainers are satisfied. Since V5 has only a naming change (see below), the plan is to continue with V4, and once the PR is reviewed and validated by other teams, I plan to move to V5.
User Level SGX2 IOCTLs exposed by in-kernel driver:
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS: With this IOCTL the user specifies a page range and the Enclave Page Cache Map (EPCM) permissions to be applied to all pages in the provided range.
ENCLS[EMODPR]
is run to restrict the EPCM permissions followed by theENCLS[ETRACK]
flow that will ensure no cached linear-to-physical address mappings to the changed pages remain.SGX_IOC_ENCLAVE_MODIFY_TYPES: This IOCTL is used to change the type of an enclave page from a regular (
SGX_PAGE_TYPE_REG
) enclave page to a TCS (SGX_PAGE_TYPE_TCS
) page or change the type from a regular (SGX_PAGE_TYPE_REG
) or TCS (SGX_PAGE_TYPE_TCS
) page to a trimmed (SGX_PAGE_TYPE_TRIM
) page (setting it up for later removal).SGX_IOC_ENCLAVE_REMOVE_PAGES: With this IOCTL the user specifies a page range that should be removed. All pages in the provided range should have the
SGX_PAGE_TYPE_TRIM
page type or else the request will fail withEPERM
(Operation not permitted). Page removal can fail on any page within the provided range. This IOCTL supports partial success by returning the number of pages that were successfully removed.High-level Flow diagrams:
Page Allocation:
The page allocation sequence diagram shows how EPC pages within
ELRANGE
of the enclave are dynamically allocated. Below are the steps:ENCLU[EACCEPT]
on a new page request which triggers a page fault (#PF) as the page is not available yet.ENCLS[EAUG]
for the page (at this point the page becomes VALID and may be used by the enclave).EAUG
ing the page, the control returns back to the untrusted PAL.ENCLU[ERESUME]
to return control back to the enclave.ENCLU[EACCEPT]
and this time the instruction succeeds, and the page is dynamically allocated.Page Deallocation (Removal):
The deallocation sequence removes an EPC page on the enclave’s request. Below are the steps:
SGX_IOC_ENCLAVE_MODIFY_TYPES
) to change the page's type to PT_TRIM.ENCLS[ETRACK]
to track the page's address on all CPUs and issues IPI to flush stale TLB entries.ENCLU[EACCEPT]
to accept changes to each EPC page.SGX_IOC_ENCLAVE_REMOVE_PAGES
IOCTL).ENCLS[EREMOVE]
to complete the request.EPC page removal is expensive due to this 2-stage flow. And so, it needs some optimization around it.
Relaxing Page Permissions:
As the name indicates relaxing page permission extends page permission. For example, changing page permission from
R
->RW
. Below are the steps involved:mprotect
syscall to request the OS update the page tables to match the new EPCM permissions.Step 2 can be skipped if there is no cached linear to physical address in the TLB, but if more restrictive permissions are present for a page, then it can lead to a #PF. To avoid this, it is better to proactively call
mprotect
which will exit the enclave clearing the TLB.As an alternative to calling
mprotect
, there is an ongoing discussion with SGX architectural team about implementing aspurious
exception handler that can analyze and ignore such faults due to stale TLB. Nothing conclusive yet.Restricting Page Permissions:
As the name indicates restricting page permission limits page permission. For example, changing page permission from
RW
->R
. Below are the steps involved:SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
) to restrict EPCM permission associated with an EPC page.ENCLS[EMODPR]
and thenENCLS[ETRACK]
to track removal of TLB address on all CPUs and issues IPI to flush stale TLB entries.ENCLU[EACCEPT]
to accept the restricted page permission for each EPC page.Optimizations:
Based on my tests with a few benchmarks, observed that naive implementation of SGX2 features impact performance in an adverse way. To overcome this, profiled and came up with the following optimizations.
Hybrid Allocation:
As the name indicates users can precisely set the amount of heap to preheat by setting the
size
and the remaining requests can be dynamically allocated. For example, when the size is "64M" Gramine will pre-fault the top64M
of heap pages and add it to the enclave. Any further requests are served dynamically. This is to balance the negative impact of EDMM on the total run time which shifts the page faults cost to the runtime phase.Lazy Free:
Lazy free optimization introduces a manifest syntax that specifies the
percentage
of the total heap that can be freed in a lazy manner. Until this threshold is met, Gramine doesn't release any dynamically allocated memory. This optimization helps reduce the expensive enclave entries/exits associated with the dynamic freeing of EPC pages.Implementation Steps:
Extend current code to store EPC page permission. (This will be a NOP but will help when enabling EDMM)
heap_vma
struct to store the page permission for each VMA region.Introduce dynamic page permissions.
sgx.edmm_enable = true | false
manifest option to turn on SGX2 features.mprotect
syscall.Introduce Naïve dynamic memory allocation.
sgx.edmm_enable_heap = true
.Introduce Hybrid optimization.
preheat_size = “size”
manifest option.Introduce Lazy free optimization.
edmm_lazyfree_percentage = [NUM]
manifest option to turn on SGX2 features where NUM is percentage of total heap that can be freed in a lazy manner.ENCLU[EACCEPT]
an already EACCEPT’ed page due to the following security issue.NOTE
ENCLU[EACCEPT]
on the alreadyEACCEPT
ed page is forbidden due to the following security issue:Say page A is valid at a given VA.
ENCLU[EACCEPT]
on page A again will not be a problem. But with knowledge of the enclave issuingENCLU[EACCEPT]
on page A’s VA, an adversary could EAUG a new page B at the same VA. Then both pages A and B are now valid at the same VA. Hence the adversary can switch between pages A and B depending on what data it wants the enclave to see.Testing Plans:
Should I add a LibOS unit test to dynamically mmap, unmmap, and change permissions for EPC memory or extend our current tests?
Since the in-kernel driver changes are not yet upstreamed, we will have to maintain the code and make sure it doesn’t break with any recent changes to the master. This will require us to set up a CI environment that would apply the EDMM changes and trigger our CI tests to ensure everything works. In case we see merge conflicts, I can resolve them and then push the latest changes. This cycle will continue until the kernel driver is released. Working with S3 team on this.
Previous Attempt:
Based on
OOT
driver I did have some initial support for EDMM inGraphene
butOOT
driver got deprecated and the effort was not pursued. But here is the github link, gramineproject/graphene#2190Next steps (Phase 2):
The text was updated successfully, but these errors were encountered: