EDMM: Add support for dynamic thread creation #1223

vijaydhanraj · 2023-03-09T18:53:15Z

Description of the feature

Each executing thread in the enclave is associated with a Thread Control Structure (TCS). The TCS contains meta-data used by the hardware to save and restore thread specific information when entering/exiting the enclave.

The SGX2 extensions allow dynamically adding TCS page to an enclave there by providing capability to add/remove threads during enclave runtime and removes the hard limit of max threads that can be used in an enclave.

Why Gramine should implement it?

Gramine could get away with this (https://gramine.readthedocs.io/en/stable/manifest-syntax.html#number-of-threads) hard limit on the number of threads that can run within an enclave

The text was updated successfully, but these errors were encountered:

mkow · 2023-03-13T14:01:12Z

@vijaydhanraj: IMO it would be good to start the continuation of EDMM work with this feature, as it's the most user-facing one - a lot of users hit errors with their apps because of the thread limit, and the apps often handle the failure badly (i.e. it's not clear why they failed without debugging).

vijaydhanraj · 2023-03-13T16:48:58Z

Sure @mkow, currently working on a design. Once ready, will present it in our community meeting.

llly · 2023-04-12T14:12:32Z

I can work out the design and implement it if @vijaydhanraj doesn't have time for it.

llly · 2023-04-25T07:37:45Z

Design proposal

Current Pal-SGX host prepares sgx.max_threads mumber of static SSA, TCS, TLS, stack, sig_stack before init enclave. It fills all static TCS page address to g_enclave_thread_map. When new host thread is created, Pal-SGX host uses g_enclave_thread_map to find available static TCS page and fails if no available static TCS pages. Then new host thread enters enclave using this static TCS page.

Here proposed that Pal-SGX enclave prepares and manages dynamic thread data: TCS, SSA, stack, sig_stack, TCB(same as TLS). At new thread creation, if no available dynamic thread data, Pal-SGX enclave allocates a dynamic thread data and pass the TCS address to Pal-SGX host. Pal-SGX host adds this dynamic TCS to g_enclave_thread_map. Pal-SGX host can always find available TCS page for new thread to enter enclave.

Changes:

Add new g_thread_meta_map in Pal-SGX enclave to allocate and manage dynamic thread data. The data is page-aligned struct of TCS, SSA, stack, sig_stack and pal_enclave_tcb.
Add an enclave EDMM operation sgx_edmm_convert_tcs_pages to convert a regular page to TCS page. It utilizes current ocall_edmm_modify_pages_type and sgx_eaccept following spec.
Update int ocall_clone_thread(void) to int ocall_clone_thread(void* tcs) and so do related interfaces in order to pass new dynamic TCS page address from enclave to host.
_PalThreadCreate in Pal-SGX enclave checks whether g_thread_meta_map has free dynamic TCS.
If no free dynamic TCS, allocates new dynamic thread data and initialize all values and offsets then covert the TCS page.
Extend g_enclave_thread_map in Pal-SGX host to map both static and dynamic TCS page address.
Update map_tcs, unmap_tcs functions, which manage g_enclave_thread_map. map_tcs add new dynamic TCS to map and return available static or dynamic TCS.
Update pal_start_thread in Pal-SGX enclave to check whether the dynamic TCS is used in current new thread. If dynamic TCS is used, mark it used in g_thread_meta_map before calling workload callback and mark it free after calling workload callback.

Note that there is a rare condition:
One exiting thread has set dynamic TCS unused in g_thread_meta_map but hasn't called ummap_tcs to set dynamic TCS unused in g_enclave_thread_map.
Another new thread calls map_tcs, it cannot pick the TCS of exiting thread from g_enclave_thread_map. If no other unused TCS, it needs to wait until the exiting thread calls ummap_tcs.

This proposal doesn't alter control of static TCS, or thread creation flow. If EDMM is not enabled, everything stays the same.

Impacts:

manifest: No changes. sgx.max_threads became number of static thread data. When thread number exceeds sgx.max_threads, dynamic thread data will be used instead of error returned.

Debugger: It has a MAX_DBG_THREADS 4096 limitation. Will not change it. I suppose users don't debug a workload with more than 4096 threads running at the same time.

Attached a flow chart of the proposal.

dimakuv · 2023-04-25T08:00:31Z

I'm unclear about the following aspects:

Why does _PalThreadCreate() always allocate a dynamic TCB? This in-enclave function also needs to decide whether to allocate the thread metadata (TCS, SSA, etc) from a static pool or from a dynamic pool (which in turn has two options: reuse the already-freed dynamic thread, or create a new thread using EDMM flows).
I don't understand what happens in the Host Child Thread -- the dynamic tcs unreleased box. What is it intended to do?

The rest looks good to me. If I understand correctly, we will never free dynamically created threads, but instead we will reuse them. This is the same as we do with in-enclave thread stacks on the Linux PAL, see:

gramine/pal/src/host/linux/pal_threading.c

Lines 23 to 73 in a01265b

    
           /* Linux PAL cannot use mmap/unmap to manage thread stacks because this may overlap with 
        
            * g_pal_public_state.user_address_{start,end}. Linux PAL also cannot just use malloc/free because 
        
            * PalThreadExit needs to use raw system calls and inline asm. Thus, we resort to recycling thread 
        
            * stacks allocated by previous threads and not used anymore. This still leaks memory but at least 
        
            * it is bounded by the maximum number of simultaneously executing threads. Note that main thread 
        
            * is not a part of this mechanism (it only allocates a tiny altstack). */ 
        
           struct thread_stack_map_t { 
        
               void* stack; 
        
               bool used; 
        
           }; 
        
           static struct thread_stack_map_t* g_thread_stack_map = NULL; 
        
           static size_t g_thread_stack_num  = 0; 
        
           static size_t g_thread_stack_size = 0; 
        
           static spinlock_t g_thread_stack_lock = INIT_SPINLOCK_UNLOCKED; 
        
           static void* get_thread_stack(void) { 
        
               void* ret = NULL; 
        
               spinlock_lock(&g_thread_stack_lock); 
        
               for (size_t i = 0; i < g_thread_stack_num; i++) { 
        
                   if (!g_thread_stack_map[i].used) { 
        
                       /* found allocated and unused stack -- use it */ 
        
                       g_thread_stack_map[i].used = true; 
        
                       ret = g_thread_stack_map[i].stack; 
        
                       goto out; 
        
                   } 
        
               } 
        
               if (g_thread_stack_num == g_thread_stack_size) { 
        
                   /* realloc g_thread_stack_map to accommodate more objects (includes the very first time) */ 
        
                   g_thread_stack_size += 8; 
        
                   struct thread_stack_map_t* tmp = malloc(g_thread_stack_size * sizeof(*tmp)); 
        
                   if (!tmp) 
        
                       goto out; 
        
                   memcpy(tmp, g_thread_stack_map, g_thread_stack_num * sizeof(*tmp)); 
        
                   free(g_thread_stack_map); 
        
                   g_thread_stack_map = tmp; 
        
               } 
        
               ret = malloc(THREAD_STACK_SIZE + ALT_STACK_SIZE); 
        
               if (!ret) 
        
                   goto out; 
        
               g_thread_stack_map[g_thread_stack_num].stack = ret; 
        
               g_thread_stack_map[g_thread_stack_num].used  = true; 
        
               g_thread_stack_num++; 
        
           out: 
        
               spinlock_unlock(&g_thread_stack_lock); 
        
               return ret; 
        
           }

Feel free to reuse some parts of that logic in your implementation for Linux-SGX.

llly · 2023-04-25T08:58:09Z

I don't understand what happens in the Host Child Thread -- the dynamic tcs unreleased box. What is it intended to do?

TCS is really released after host calls EEXIT and then unmap_tcs from g_enclave_thread_map. Enclave thread releases the dynamic TCS metadata before TCS really released after EEXIT in host. So there is race condition, one thread is exiting, released TCS metadata and host hasn't call EEXIT, another thread creation reused the dynamic TCS. host map_tcs need to handle it.

Why does _PalThreadCreate() always allocate a dynamic TCB?

The reason is similar to your second question, enclave doesn't know whether a TCS is released accurately.
Enclave threading functions only manage dynamic TCS pool, allocate new or reuse free dynamic TCS. Enclave also tracking static TCS changes current behavior(without EDMM), for example, add a static TCS counter. I can change the design if it's OK.

dimakuv · 2023-06-13T14:59:14Z

@llly presented the diagram above during today's Gramine meeting, and (at least to me) the design sounded correct.

vijaydhanraj mentioned this issue Mar 9, 2023

RFC: SGX2 support in Gramine (Phase 1) #683

Closed

dimakuv added feature request P: 0 labels Mar 10, 2023

dimakuv assigned vijaydhanraj Mar 10, 2023

dimakuv assigned llly and unassigned vijaydhanraj Apr 13, 2023

dimakuv moved this to Working on it in Gramine Roadmap Apr 25, 2023

dimakuv added this to Gramine Roadmap Apr 25, 2023

llly mentioned this issue Jul 11, 2023

[PAL/Linux-SGX] Add EDMM support for dynamic thread creation #1451

Merged

dimakuv moved this from Working on it to Coming in next release (v1.5) in Gramine Roadmap Jul 18, 2023

mkow closed this as completed in #1451 Dec 2, 2023

github-project-automation bot moved this from Coming in next release (v1.6) to Backburner in Gramine Roadmap Dec 2, 2023

dimakuv moved this from Backburner to Done (only last two releases) in Gramine Roadmap Feb 8, 2024

kailun-qin mentioned this issue Aug 23, 2024

Issues preventing running Go applications in Gramine #702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EDMM: Add support for dynamic thread creation #1223

EDMM: Add support for dynamic thread creation #1223

vijaydhanraj commented Mar 9, 2023

mkow commented Mar 13, 2023

vijaydhanraj commented Mar 13, 2023

llly commented Apr 12, 2023

llly commented Apr 25, 2023 •

edited

Loading

dimakuv commented Apr 25, 2023

llly commented Apr 25, 2023

dimakuv commented Jun 13, 2023

EDMM: Add support for dynamic thread creation #1223

EDMM: Add support for dynamic thread creation #1223

Comments

vijaydhanraj commented Mar 9, 2023

Description of the feature

Why Gramine should implement it?

mkow commented Mar 13, 2023

vijaydhanraj commented Mar 13, 2023

llly commented Apr 12, 2023

llly commented Apr 25, 2023 • edited Loading

Design proposal

Changes:

Impacts:

dimakuv commented Apr 25, 2023

llly commented Apr 25, 2023

dimakuv commented Jun 13, 2023

llly commented Apr 25, 2023 •

edited

Loading