Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAL/Linux-SGX] Add AEX-Notify support in exception handling flow #1531

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lzha101
Copy link
Contributor

@lzha101 lzha101 commented Sep 1, 2023

According to PR #1488, will construct three PRs to support AEX-Notify in Gramine. The first part could be found in #1530. This PR includes the second part.

Add support of adding sgx.enable_aex_notify = true to manifest. When AEX-Notify is enabled, will not exit the enclave during the stage-1 exception handler. Instead, a new instruct EDECCSSA will be triggered to switch to SSA0 and directly jump to stage-2 exception handler.

Description of the changes

How to test this PR?


This change is Reviewable

Add support of adding `sgx.enable_aex_notify = true` to manifest.
When AEX-Notify is enabled, will not exit the enclave during the
stage-1 exception handler. Instead, a new instruct EDECCSSA will
be triggered to switch to SSA0 and directly jump to stage-2 exception
handler.

Co-authored-by: Gu, Junjun <junjun.gu@intel.com>
Co-authored-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Co-authored-by: Kailun Qin <kailun.qin@intel.com>
Signed-off-by: Gu, Junjun <junjun.gu@intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
Signed-off-by: Kailun Qin <kailun.qin@intel.com>
Signed-off-by: Zhang, Lili Z <lili.z.zhang@intel.com>
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/host_entry.S line 86 at r1 (raw file):

    movq %rsp, %rbp
    .cfi_def_cfa_register %rbp
    subq $RED_ZONE_SIZE, %rsp

Does this really work? When I played with similar modifications in #1944, I ended up adding the red zone calculation to "before any RSP-changing instruction", which is pushq %rax above.

That's because the red zone from the "normal execution" of the untrusted Gramine runtime is 128 bytes below the RSP at the point of async_exit_pointer. So the "normal execution" may use these bytes for its own local purposes. And without the immediate red zone subtraction, this code (these pushq %rax and co) overrides the local variables of "normal execution", and Gramine breaks.

See how I changed the code in #1944.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: all files reviewed, 23 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


-- commits line 8 at r1:
This commit message will need to be reworked. It should be much more verbose, explaining all the flows added in this PR.


pal/src/host/linux-sgx/enclave_entry.S line 229 at r1 (raw file):


    # definitely-malicious exception (correct untrusted runtime would never generate "0" event)
    FAIL_LOOP

This change is not relevant in the latest code on master, should be removed on rebase


pal/src/host/linux-sgx/enclave_entry.S line 268 at r1 (raw file):

    movq %rdx, %rbx    # upon EENTER the exit address was in RDX, mov it to RBX for EEXIT
    movq $EEXIT, %rax
    enclu

This is a simplification for this corner case, and is unrelated to this PR. Please extract to a separate PR.


pal/src/host/linux-sgx/enclave_entry.S line 549 at r1 (raw file):

    # copy the whole SSA[0].XSAVE region to the CPU context's XSAVE on stack;
    # __restore_xregs / __save_xregs clobber RDX so need to stash it in R10
    movq %rdx, %r10

If I understand this correctly, we replace RBX with R10 because we will still need RBX (the base to the context/registers of SSA0) further in AEX-Notify flows. So we replace RBX with some register that is unused, which happens to be R10.

I would extract this change also to a separate PR.


pal/src/host/linux-sgx/enclave_entry.S line 621 at r1 (raw file):

    # finally jump to the stage-2 C exception handler
.Lfinalize_context_after_deccssa_inst0:
    jmp *%r11

Even though an async signal can arrive at this exact jmp instruction, this should be safe. Because the SSA0 can be considered having the correct and consistent context for the "stage-2 exception handler", including this R11 register which points to the stage-2 exception handling routine in our C code.

That's just info for other reviewers and authors. I hope I understand this corner-case data race correctly.


pal/src/host/linux-sgx/enclave_entry.S line 658 at r1 (raw file):

    .cfi_offset %rbx, -24

    # Disable aex-notify before stack switch

Why do we need to do this? This part of code I don't understand.

What bad thing can happen during OCALL preparation?


pal/src/host/linux-sgx/enclave_entry.S line 823 at r1 (raw file):


1:
    # check if it is ready to enable AEX-Notify

Why do we need to do this? This part of code I don't understand.

What bad thing can happen during OCALL finalization?


pal/src/host/linux-sgx/enclave_entry.S line 844 at r1 (raw file):

    # there was some event, call _PalHandleExternalEvent(event, uc, xregs_state)
    # Note we don't check AEX-Notify here. If we disable it during ocall-exit and 
    # the execution comes below code, the aex-notify will be enabled after event handling

Again, I don't understand the comment. Why do we care about OCALLs?


pal/src/host/linux-sgx/host_entry.S line 86 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Does this really work? When I played with similar modifications in #1944, I ended up adding the red zone calculation to "before any RSP-changing instruction", which is pushq %rax above.

That's because the red zone from the "normal execution" of the untrusted Gramine runtime is 128 bytes below the RSP at the point of async_exit_pointer. So the "normal execution" may use these bytes for its own local purposes. And without the immediate red zone subtraction, this code (these pushq %rax and co) overrides the local variables of "normal execution", and Gramine breaks.

See how I changed the code in #1944.

Ok, I fixed a similar issue in #1944, and it was copy-pasted into #1857. So after one of those PRs is merged, we can rebase and get rid of these things here.


pal/src/host/linux-sgx/host_entry.S line 95 at r1 (raw file):

#endif

    # Check if there are sync/async signals pending and invoke in-enclave stage-1 handler if any

What this comment doesn't mention is:

  • If AEX Notify was enabled in the enclave thread, then the enclave never returns from the stage-1 handler invocation in untrusted runtime and the following code in this asm file is not executed.
  • If AEX Notify was disabled/not used in the enclave thread, then the enclave returns from the stage-1 handler invocation to this code in this asm file, so we need to prepare the ERESUME flow.

Please add these or similar comments here.


pal/src/host/linux-sgx/host_entry.S line 114 at r1 (raw file):

    # In case of normal ERESUME, RDI is not used;
    # In case of ERESUME-morphed-into-EENTER, RDI is external event in flow .Lcssa1_exception
    movq $PAL_EVENT_INTERRUPTED, %rdi

This looks correct but needs more comments.

IIUC, we get to this line of code only in following cases:

  1. No AEX-Notify: then ERESUME never morphs into EENTER, so the value in RDI is ignored. This code snippet becomes a no-op.
  2. AEX-Notify enabled on this enclave thread: previously we called sgx_handle_aex_signal() which would raise a signal (if any) into the enclave, and AEX-Notify would ensure that sgx_handle_aex_signal() does not return (but instead performs EDECCSSA inside the enclave and stays there). Therefore, the only way we could arrive to this line of code is that there was no pending signal, and sgx_handle_aex_signal() didn't do anything. This means that ERESUME will morth into EENTER, and we need to specify RDI (external event). And since there was no external event really, we put a kinda dummy PAL_EVENT_INTERRUPTED (aka SIGCONT). So we survive all the checks inside the SGX enclave, because SIGCONT can always arrive and is benign.

Well, actually, doesn't this introduce a heavy performance hit? On each AEX (without any real signals), we will call into the stage-1 handler, then stage-2 handler will handle dummy SIGCONT, and only then the execution will continue. Sounds like a lot of wasted CPU cycles.


pal/src/host/linux-sgx/host_entry.S line 124 at r1 (raw file):

eresume_pointer:
    # perform ERESUME (RAX already contains "ERESUME" because that's what AEX hardware flow does)
    enclu

This is also unrelated to this PR, so could be extracted in a separate PR (together with the other ones I proposed)


pal/src/host/linux-sgx/host_entry.S line 140 at r1 (raw file):

    # other arguments: RDI - event (sync or async signal)

    # below logic is the same as for sgx_ecall(), see comments for that function

Looks like the only point of this duplicated code is to have global sgx_raise_eenter_instr on ENCLU, so that we can use this global symbol in the handle_sync_signal() check.

This is a waste of assembly. Let's just add the global symbol like global sgx_eenter_instr on the .Ldo_ecall code path, and use that one. Since normal EENTER never fails, this should be sufficient. But if you want to double-check that it's exactly during the sgx_raise() flow, you can probably put some magic value in one of the GPRs (or in a new thread-local variable), and check it additionally.


pal/src/host/linux-sgx/host_exception.c line 96 at r1 (raw file):

}

static bool interrupted_in_aex(void) {

This is unrelated rename and can be extracted into a separate PR (together with the other ones I mentioned elsewhere)


pal/src/host/linux-sgx/host_exception.c line 121 at r1 (raw file):

    if (interrupted_in_enclave(uc)) {
        /* exception happened in app/LibOS/trusted PAL code, mark to handle signal inside enclave */
        assert(pal_get_host_tcb()->aex_sync_event == PAL_EVENT_NO_EVENT);

I would actually replace this assert() with an explicit if (...) { log_error("Nested sync signal, impossible!");BUG(); }

This way we'll also catch such bugs/corner cases in Release mode.


pal/src/host/linux-sgx/host_exception.c line 123 at r1 (raw file):

        assert(pal_get_host_tcb()->aex_sync_event == PAL_EVENT_NO_EVENT);
        pal_get_host_tcb()->aex_sync_event = event;
        pal_get_host_tcb()->sync_signal_cnt++;

This removal of sgx_raise(event) changes flows of Gramine significantly. Since this change also applies to non-AEX-Notify flows, I would really prefer to have it as a separate, preparation PR.

I will also describe these new flows here: #1948


pal/src/host/linux-sgx/host_exception.c line 171 at r1 (raw file):

            DO_SYSCALL(tkill, g_rpc_queue->rpc_threads[i], SIGUSR2);

    assert(event == PAL_EVENT_INTERRUPTED || event == PAL_EVENT_QUIT);

This is unrelated move-of-assert and can be extracted into a separate PR (together with the other ones I mentioned elsewhere)


pal/src/host/linux-sgx/host_exception.c line 209 at r1 (raw file):

 * -- now the regular context must inform the enclave about these events. This function is
 * potentially noreturn -- if there is at least one signal, and the enclave is ready to handle it,
 * then the call to sgx_raise() never returns. Only one of potentially two signals (one sync and

sgx_raise() doesn't return only if we take into account AEX Notify. But for old flows (without AEX Notify), sgx_raise() will always return. This should be reflected in this comment.


pal/src/host/linux-sgx/host_exception.c line 214 at r1 (raw file):

 * cannot occur while in this function, but new async signals can occur (since we are in regular
 * context and cannot block async signals), thus handling async signals must be aware of concurrent
 * signal handling code. */

We must somehow note that async signals are special in Gramine -- there is only SIGCONT which is kinda dummy (can be ignored) and SIGTERM (which is injected only once anyway).

So we don't need a queue of pending async signals, we can get away with just a single slot (which is the pal_get_host_tcb()->aex_async_event variable).


pal/src/host/linux-sgx/host_exception.c line 228 at r1 (raw file):

            PAL_EVENT_NO_EVENT, __ATOMIC_RELAXED);
    if (event != PAL_EVENT_NO_EVENT) {
        /* if async event is consumed by the enclave, then below sgx_raise() does not return;

Again, this misses the old, no-AEX-Notify flows. The comment must also explain that sgx_raise() will always return in old flows.


pal/src/host/linux-sgx/host_exception.c line 235 at r1 (raw file):

        enum pal_event no_event = PAL_EVENT_NO_EVENT;
        __atomic_compare_exchange_n(&pal_get_host_tcb()->aex_async_event, &no_event, event,
                                    /*weak=*/false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);

What is this? Why do we re-add the async signal (event) again as a pending signal to this enclave thread? We already consumed this event, what's the point of re-adding it.

Update: ok, this is probably because you assume AEX-Notify flows only in this code snippet. In the AEX-Notify case, sgx_raise() returns and we end up here only if the enclave was in the middle of stage-1 handler. So this async signal (event) was not handled inside the enclave, so we re-instate it here. Ok, makes sense to me.


pal/src/host/linux-sgx/pal_exception.c line 22 at r1 (raw file):

__attribute_no_sanitize_address
noreturn static void apply_mitigation_handler_and_restore_sgx_context(sgx_cpu_context_t* uc, PAL_XREGS_STATE* xregs_state)

Fix code style and wrap to 100-chars-per-line


pal/src/host/linux-sgx/pal_exception.c line 47 at r1 (raw file):

        apply_mitigation_handler_and_restore_sgx_context(uc, xregs_state);
    else
        _restore_sgx_context(uc, xregs_state);

This looks ugly. Just do the check on AEX-Notify inside apply_mitigation_handler_and_restore_sgx_context(), and call only this function here, without if-else.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 24 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/host_exception.c line 222 at r1 (raw file):

        pal_get_host_tcb()->aex_sync_event = PAL_EVENT_NO_EVENT;
        sgx_raise(event);
        return;

We should not get to this return (if we run with AEX-Notify enabled), because sync signals must always be consumed and then EDECCSSA'ed. I suggest to add the check:

if (aex_notify_enabled) {
    log_error("Sync signal %d was not consumed by the enclave (in AEX-Notify mode)", event);
    BUG();
}

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 26 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/enclave_entry.S line 266 at r1 (raw file):

    # We are interrupted during the never-returning OCALL_EXIT. Because the thread is going to exit
    # anyway, we can ignore this exception.
    movq %rdx, %rbx    # upon EENTER the exit address was in RDX, mov it to RBX for EEXIT

I was surprised by this because Intel SDM says that EENTER puts the exit address into RCX. And this asm code (that leads to this code snippet) doesn't copy RCX into RDX. So who put the exit address into RDX?

It turns out the untrusted asm code does this before calling EENTER to raise an exception (in SSA=1):

leaq .Lafter_resume(%rip), %rdx

This seems redundant, as we could just consult the RCX register. Note that we cannot consult the %gs:SGX_ECALL_RETURN_ADDR variable because it is not be set in the exception-handling context (and it anyway pertains only to the normal context).

But need to be careful to make sure that RCX was not clobbered in the meantime.

Maybe a good candidate for a separate PR.


pal/src/host/linux-sgx/enclave_entry.S line 574 at r1 (raw file):


    # upon EENTER the exit address was in RDX, mov it to RBX for EEXIT
    movq %rdx, %rbx

ditto (seems redundant, as we could just consult the RCX register)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 24 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/enclave_entry.S line 266 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I was surprised by this because Intel SDM says that EENTER puts the exit address into RCX. And this asm code (that leads to this code snippet) doesn't copy RCX into RDX. So who put the exit address into RDX?

It turns out the untrusted asm code does this before calling EENTER to raise an exception (in SSA=1):

leaq .Lafter_resume(%rip), %rdx

This seems redundant, as we could just consult the RCX register. Note that we cannot consult the %gs:SGX_ECALL_RETURN_ADDR variable because it is not be set in the exception-handling context (and it anyway pertains only to the normal context).

But need to be careful to make sure that RCX was not clobbered in the meantime.

Maybe a good candidate for a separate PR.

I'm taking it back. We can't simplify this to use RCX instead of RDX.

That's because the exit address is not always RCX ("Address of IP following EENTER"). In particular, this is not true for OCALLs: in their cases, EEXIT must jump to the OCALL untrusted code, and so EENTER must be instructed to jump to that OCALL untrusted code via RDX. Here's the corresponding code:

leaq .Lsgx_entry(%rip), %rdx


pal/src/host/linux-sgx/enclave_entry.S line 574 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (seems redundant, as we could just consult the RCX register)

I'm taking it back, we can't simplify that. See other comment.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 24 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/host_exception.c line 235 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What is this? Why do we re-add the async signal (event) again as a pending signal to this enclave thread? We already consumed this event, what's the point of re-adding it.

Update: ok, this is probably because you assume AEX-Notify flows only in this code snippet. In the AEX-Notify case, sgx_raise() returns and we end up here only if the enclave was in the middle of stage-1 handler. So this async signal (event) was not handled inside the enclave, so we re-instate it here. Ok, makes sense to me.

Just to make me comment less confusing: I mean that in no-AEX-Notify flow, sgx_raise() always returns here, so there is no point in re-injecting the same async signal (it was consumed). Thus, what this flow misses is a check like this:

        sgx_raise(event);
        /* we end up here if there's no AEX-Notify (but the signal was consumed) or there is AEX-Notify
         * but couldn't EENTER because enclave executes stage-1 handler in SSA 1 (signal was not consumed) */
        if (g_aex_notify_enabled) {
             enum pal_event no_event = PAL_EVENT_NO_EVENT;
             __atomic_compare_exchange_n(&pal_get_host_tcb()->aex_async_event, &no_event, event,
                                         /*weak=*/false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
        }

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 26 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)

a discussion (no related file):
I prepared the series of posts with diagrams on current Gramine signal-handling flows and on AEX-Notify flows: #1948

My proposal follows the implementation in this PR (bar some comments and nitpicks).

We need to discuss these posts and diagrams, to agree on the design choices & implementation.


a discussion (no related file):
Based on my explanations in #1948, I actually believe it will make more sense to have a preliminary PR that modifies signal handling flows without AEX-Notify first. That's the crux of my middle post: #1948 (comment)


Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 26 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/host_exception.c line 235 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Just to make me comment less confusing: I mean that in no-AEX-Notify flow, sgx_raise() always returns here, so there is no point in re-injecting the same async signal (it was consumed). Thus, what this flow misses is a check like this:

        sgx_raise(event);
        /* we end up here if there's no AEX-Notify (but the signal was consumed) or there is AEX-Notify
         * but couldn't EENTER because enclave executes stage-1 handler in SSA 1 (signal was not consumed) */
        if (g_aex_notify_enabled) {
             enum pal_event no_event = PAL_EVENT_NO_EVENT;
             __atomic_compare_exchange_n(&pal_get_host_tcb()->aex_async_event, &no_event, event,
                                         /*weak=*/false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
        }

For info: my proposed code snippet was also wrong, because in no-AEX-Notify flow, if sgx_raise() failed to inject the async signal (because CSSA=2 and EENTER faults with a #GP), then this signal would be lost. (That's because aex_async_event is reset to NO_EVENT in __atomic_exchange_n() above, and there is no code that would restore it back to event in case of failed signal injection).

I fix it in my new version of the code by setting aex_async_event to the value of RDI in the special case of handle_sync_signal() with PAL_EVENT_MEMFAULT (see code above in this file).

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 26 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/host_entry.S line 114 at r1 (raw file):

Well, actually, doesn't this introduce a heavy performance hit? On each AEX (without any real signals), we will call into the stage-1 handler, then stage-2 handler will handle dummy SIGCONT, and only then the execution will continue. Sounds like a lot of wasted CPU cycles.

See this answer from one of AEX Notify authors: #1948 (reply in thread)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 26 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @lzha101)


pal/src/host/linux-sgx/enclave_entry.S line 621 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Even though an async signal can arrive at this exact jmp instruction, this should be safe. Because the SSA0 can be considered having the correct and consistent context for the "stage-2 exception handler", including this R11 register which points to the stage-2 exception handling routine in our C code.

That's just info for other reviewers and authors. I hope I understand this corner-case data race correctly.

I was wrong. If an async signal would happen here, then SSA[0].GPRSGX.RIP would point to this jmp instruction, and the new stage-1 singal handler will end up doing movq SGX_GPR_RIP(%rbx), %r11 (see above), which means that R11 will contain RIP of this jmp instruction, basically meaning jmp *itself, which results in an infinite loop of this instruction jumping onto itself. So this code is wrong and may lead to indeterministic hangs.

In my new PR, I modified this corner case to skip jmp when this is detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants