Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pal,LibOS] Add checkpoint-and-restore support for sysfs #583

Merged

Conversation

vijaydhanraj
Copy link
Contributor

@vijaydhanraj vijaydhanraj commented May 13, 2022

Description of the changes

This PR adds checkpoint-and-restore support for sysfs. It addresses issue 3 listed as part of gramineproject/graphene#2105

How to test this PR?

Please run PAL/LibOS regression tests.


This change is Reviewable

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: )

a discussion (no related file):
Should we add a test case for this feature?

I did do a quick check by overriding the double_fork test as below and the test case passed.

diff --git a/LibOS/shim/test/regression/double_fork.c b/LibOS/shim/test/regression/double_fork.c
index e248f417..1abc41a9 100644
--- a/LibOS/shim/test/regression/double_fork.c
+++ b/LibOS/shim/test/regression/double_fork.c
@@ -22,12 +22,17 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#include "dump.h"
 #include "futex.h"
 
 static pid_t main_pid;
 
 static void do_grandchild(int fd) {
     char c = 0;
+    if (dump_path("/sys") < 0) {
+        err(1, "grandchild sysfs");
+    }
+
     if (read(fd, &c, 1) != 1 || c != 'a') {
         err(1, "grandchild read");
     }

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: all files reviewed, 23 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @vijaydhanraj)

a discussion (no related file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Should we add a test case for this feature?

I did do a quick check by overriding the double_fork test as below and the test case passed.

diff --git a/LibOS/shim/test/regression/double_fork.c b/LibOS/shim/test/regression/double_fork.c
index e248f417..1abc41a9 100644
--- a/LibOS/shim/test/regression/double_fork.c
+++ b/LibOS/shim/test/regression/double_fork.c
@@ -22,12 +22,17 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#include "dump.h"
 #include "futex.h"
 
 static pid_t main_pid;
 
 static void do_grandchild(int fd) {
     char c = 0;
+    if (dump_path("/sys") < 0) {
+        err(1, "grandchild sysfs");
+    }
+
     if (read(fd, &c, 1) != 1 || c != 'a') {
         err(1, "grandchild read");
     }

Did you also check that the dumped info is correct in the child/grandchild? That it's not all zeros?



LibOS/shim/src/fs/sys/fs.c line 286 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

This function is guaranteed to be called only once, so you can remove this GET_FROM_CP_MAP, this check, ADD_TO_CP_MAP, the else branch, and the two lines for objp.

See my other comment below.


LibOS/shim/src/fs/sys/fs.c line 313 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 340 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 367 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 394 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 415 at r1 (raw file):

BEGIN_CP_FUNC(numa_distances) {
    __UNUSED(size);
    assert(size == sizeof(size_t));

I think this assert is misleading. I would just remove it.


LibOS/shim/src/fs/sys/fs.c line 421 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 445 at r1 (raw file):

    assert(size == sizeof(struct pal_topo_info));

    struct pal_topo_info* topo_info = (struct pal_topo_info*)obj;

Please just use g_pal_public_state->topo_info explicitly in this function, not the obj argument. See my other comment below.


LibOS/shim/src/fs/sys/fs.c line 449 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

There is no need for this off check -- this function is guaranteed to be called only once (via DEFINE_MIGRATE). Just remove this and the previous line, and also remove the else branch.

For an example, see BEGIN_CP_FUNC(loaded_elf_objects) in shim_rtld.c


LibOS/shim/src/fs/sys/fs.c line 451 at r1 (raw file):

    if (!off) {
        off = ADD_CP_OFFSET(sizeof(struct pal_topo_info));
        ADD_TO_CP_MAP(obj, off);

This also won't be needed when you remove the off check.


LibOS/shim/src/fs/sys/fs.c line 460 at r1 (raw file):

        DO_CP(sockets, topo_info->sockets, &new_topo_info->sockets);
        DO_CP(numa_nodes, topo_info->numa_nodes, &new_topo_info->numa_nodes);
        DO_CP(numa_distances, topo_info->numa_distance_matrix, &new_topo_info->numa_distance_matrix);

Please wrap in 100 chars


LibOS/shim/src/fs/sys/fs.c line 468 at r1 (raw file):

    if (objp)
        *objp = (void*)new_topo_info;

These two lines are not needed, because we never call this function with objp.


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

        CP_REBASE(topo_info->numa_distance_matrix);
    } else {
        assert(!topo_info->numa_nodes);

Shouldn't you also assert !topo_info->numa_distance_matrix ?


LibOS/shim/src/fs/sys/fs.c line 508 at r1 (raw file):

    g_pal_public_state->topo_info.caches_cnt = topo_info->caches_cnt;
    g_pal_public_state->topo_info.caches= topo_info->caches;

caches= -> caches = (add space before =)

Here and everywhere below


LibOS/shim/src/fs/sys/fs.c line 517 at r1 (raw file):

    g_pal_public_state->topo_info.numa_nodes_cnt = topo_info->numa_nodes_cnt;
    g_pal_public_state->topo_info.numa_nodes= topo_info->numa_nodes;
    g_pal_public_state->topo_info.numa_distance_matrix= topo_info->numa_distance_matrix;

In the helper functions, you just do memcpy() of the whole object. You could do the same here.

On the other side, this set of assignments is explicit and readable. I guess let's keep it explicit like this, but I'd like to know @mkow opinion.


LibOS/shim/src/sys/shim_clone.c line 115 at r1 (raw file):

    DEFINE_MIGRATE(brk, NULL, 0);
    DEFINE_MIGRATE(loaded_elf_objects, NULL, 0);
    DEFINE_MIGRATE(topo_info, &g_pal_public_state->topo_info, sizeof(g_pal_public_state->topo_info));

Please use NULL, 0 here instead of a reference to g_pal_public_state.

This is because you have functions like BEGIN_CP_FUNC(caches) which explicitly use the global variable g_pal_public_state->topo_info.caches_cnt.

So the fact that you specify g_pal_public_state as an input argument here is meaningless -- the helper functions still directly use this global var (ideally these helper functions would use the input argument, but we don't have this in our checkpointing scheme, and there is no need for this anyway).


Pal/include/pal/pal.h line 131 at r1 (raw file):

};

const struct pal_public_state* DkGetPalPublicState(void);

I'm not happy with the removal of const, but I understand that now g_pal_public_state becomes a pointer to a non-constant object (because topo_info field can change now, during restore in the child process).

I'm not sure how to fix this. Maybe @mkow can suggest something?

Anyway, we need to add a comment about this pecularity here. I suggest:

/* we cannot mark this as returning a pointer to `const` object, because LibOS can
 * change `pal_public_state.topo_info` during  checkpoint restore in the child */
struct pal_public_state* DkGetPalPublicState(void);

Pal/src/host/Linux/db_main.c line 274 at r1 (raw file):

            INIT_FAIL(-ret, "get_topology_info() failed");
    }

Same point here about all-zeros value of g_pal_public_state.topo_info in the child process. See my comment below, for SGX PAL.


Pal/src/host/Linux-SGX/db_main.c line 524 at r1 (raw file):

    /* if there is a parent, create parent handle */
    PAL_HANDLE parent = NULL;
    bool is_first_process;

Why do you need this helper variable? You can still use parent_stream_fd != -1 in the check below. Please remove it.


Pal/src/host/Linux-SGX/db_main.c line 577 at r1 (raw file):

        }
    }

Now you will have g_pal_public_state.topo_info uninitialized at this point (in the child enclave). I guess this is fine because g_pal_public_state is a global variable and thus is initialized to all-zeros. So g_pal_public_state.topo_info will be all-zeros at this point, which is a sane value.

So unless Michal thinks that we need something explicit here, I'm fine. Just raising this point for discussion.


Pal/src/host/Linux-SGX/sgx_main.c line 945 at r1 (raw file):

    /* Get host topology information only for the first process. This information will be
     * checkpointed and restored during forking of the child process(es). */
    if (g_pal_enclave.is_first_process) {

We use parent_stream_fd < 0 in a similar check above. Please use that one instead of what you currently have.


Pal/src/host/Linux-SGX/sgx_main.c line 949 at r1 (raw file):

        if (ret < 0)
            return ret;
    }

Now it is possible that we continue with topo_info object having garbage (since it is allocated on the stack and not initialized).

Please add struct pal_topo_info topo_info = {0}; above, in the declaration of this variable.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: all files reviewed, 23 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @vijaydhanraj)

a discussion (no related file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Should we add a test case for this feature?

I did do a quick check by overriding the double_fork test as below and the test case passed.

diff --git a/LibOS/shim/test/regression/double_fork.c b/LibOS/shim/test/regression/double_fork.c
index e248f417..1abc41a9 100644
--- a/LibOS/shim/test/regression/double_fork.c
+++ b/LibOS/shim/test/regression/double_fork.c
@@ -22,12 +22,17 @@
 #include <sys/wait.h>
 #include <unistd.h>
 
+#include "dump.h"
 #include "futex.h"
 
 static pid_t main_pid;
 
 static void do_grandchild(int fd) {
     char c = 0;
+    if (dump_path("/sys") < 0) {
+        err(1, "grandchild sysfs");
+    }
+
     if (read(fd, &c, 1) != 1 || c != 'a') {
         err(1, "grandchild read");
     }

Did you also check that the dumped info is correct in the child/grandchild? That it's not all zeros?



LibOS/shim/src/fs/sys/fs.c line 286 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

This function is guaranteed to be called only once, so you can remove this GET_FROM_CP_MAP, this check, ADD_TO_CP_MAP, the else branch, and the two lines for objp.

See my other comment below.


LibOS/shim/src/fs/sys/fs.c line 313 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 340 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 367 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 394 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 415 at r1 (raw file):

BEGIN_CP_FUNC(numa_distances) {
    __UNUSED(size);
    assert(size == sizeof(size_t));

I think this assert is misleading. I would just remove it.


LibOS/shim/src/fs/sys/fs.c line 421 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

ditto


LibOS/shim/src/fs/sys/fs.c line 445 at r1 (raw file):

    assert(size == sizeof(struct pal_topo_info));

    struct pal_topo_info* topo_info = (struct pal_topo_info*)obj;

Please just use g_pal_public_state->topo_info explicitly in this function, not the obj argument. See my other comment below.


LibOS/shim/src/fs/sys/fs.c line 449 at r1 (raw file):

    size_t off = GET_FROM_CP_MAP(obj);
    if (!off) {

There is no need for this off check -- this function is guaranteed to be called only once (via DEFINE_MIGRATE). Just remove this and the previous line, and also remove the else branch.

For an example, see BEGIN_CP_FUNC(loaded_elf_objects) in shim_rtld.c


LibOS/shim/src/fs/sys/fs.c line 451 at r1 (raw file):

    if (!off) {
        off = ADD_CP_OFFSET(sizeof(struct pal_topo_info));
        ADD_TO_CP_MAP(obj, off);

This also won't be needed when you remove the off check.


LibOS/shim/src/fs/sys/fs.c line 460 at r1 (raw file):

        DO_CP(sockets, topo_info->sockets, &new_topo_info->sockets);
        DO_CP(numa_nodes, topo_info->numa_nodes, &new_topo_info->numa_nodes);
        DO_CP(numa_distances, topo_info->numa_distance_matrix, &new_topo_info->numa_distance_matrix);

Please wrap in 100 chars


LibOS/shim/src/fs/sys/fs.c line 468 at r1 (raw file):

    if (objp)
        *objp = (void*)new_topo_info;

These two lines are not needed, because we never call this function with objp.


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

        CP_REBASE(topo_info->numa_distance_matrix);
    } else {
        assert(!topo_info->numa_nodes);

Shouldn't you also assert !topo_info->numa_distance_matrix ?


LibOS/shim/src/fs/sys/fs.c line 508 at r1 (raw file):

    g_pal_public_state->topo_info.caches_cnt = topo_info->caches_cnt;
    g_pal_public_state->topo_info.caches= topo_info->caches;

caches= -> caches = (add space before =)

Here and everywhere below


LibOS/shim/src/fs/sys/fs.c line 517 at r1 (raw file):

    g_pal_public_state->topo_info.numa_nodes_cnt = topo_info->numa_nodes_cnt;
    g_pal_public_state->topo_info.numa_nodes= topo_info->numa_nodes;
    g_pal_public_state->topo_info.numa_distance_matrix= topo_info->numa_distance_matrix;

In the helper functions, you just do memcpy() of the whole object. You could do the same here.

On the other side, this set of assignments is explicit and readable. I guess let's keep it explicit like this, but I'd like to know @mkow opinion.


LibOS/shim/src/sys/shim_clone.c line 115 at r1 (raw file):

    DEFINE_MIGRATE(brk, NULL, 0);
    DEFINE_MIGRATE(loaded_elf_objects, NULL, 0);
    DEFINE_MIGRATE(topo_info, &g_pal_public_state->topo_info, sizeof(g_pal_public_state->topo_info));

Please use NULL, 0 here instead of a reference to g_pal_public_state.

This is because you have functions like BEGIN_CP_FUNC(caches) which explicitly use the global variable g_pal_public_state->topo_info.caches_cnt.

So the fact that you specify g_pal_public_state as an input argument here is meaningless -- the helper functions still directly use this global var (ideally these helper functions would use the input argument, but we don't have this in our checkpointing scheme, and there is no need for this anyway).


Pal/include/pal/pal.h line 131 at r1 (raw file):

};

const struct pal_public_state* DkGetPalPublicState(void);

I'm not happy with the removal of const, but I understand that now g_pal_public_state becomes a pointer to a non-constant object (because topo_info field can change now, during restore in the child process).

I'm not sure how to fix this. Maybe @mkow can suggest something?

Anyway, we need to add a comment about this pecularity here. I suggest:

/* we cannot mark this as returning a pointer to `const` object, because LibOS can
 * change `pal_public_state.topo_info` during  checkpoint restore in the child */
struct pal_public_state* DkGetPalPublicState(void);

Pal/src/host/Linux/db_main.c line 274 at r1 (raw file):

            INIT_FAIL(-ret, "get_topology_info() failed");
    }

Same point here about all-zeros value of g_pal_public_state.topo_info in the child process. See my comment below, for SGX PAL.


Pal/src/host/Linux-SGX/db_main.c line 524 at r1 (raw file):

    /* if there is a parent, create parent handle */
    PAL_HANDLE parent = NULL;
    bool is_first_process;

Why do you need this helper variable? You can still use parent_stream_fd != -1 in the check below. Please remove it.


Pal/src/host/Linux-SGX/db_main.c line 577 at r1 (raw file):

        }
    }

Now you will have g_pal_public_state.topo_info uninitialized at this point (in the child enclave). I guess this is fine because g_pal_public_state is a global variable and thus is initialized to all-zeros. So g_pal_public_state.topo_info will be all-zeros at this point, which is a sane value.

So unless Michal thinks that we need something explicit here, I'm fine. Just raising this point for discussion.


Pal/src/host/Linux-SGX/sgx_main.c line 945 at r1 (raw file):

    /* Get host topology information only for the first process. This information will be
     * checkpointed and restored during forking of the child process(es). */
    if (g_pal_enclave.is_first_process) {

We use parent_stream_fd < 0 in a similar check above. Please use that one instead of what you currently have.


Pal/src/host/Linux-SGX/sgx_main.c line 949 at r1 (raw file):

        if (ret < 0)
            return ret;
    }

Now it is possible that we continue with topo_info object having garbage (since it is allocated on the stack and not initialized).

Please add struct pal_topo_info topo_info = {0}; above, in the declaration of this variable.

@dimakuv
Copy link

dimakuv commented May 13, 2022

Jenkins, test this please

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 9 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Did you also check that the dumped info is correct in the child/grandchild? That it's not all zeros?

Yes, that was my first check. I didn't cross-check each and every sysfs path but randomly checked a few paths and they were correct.

I do see the error messages below but don't think they are related to sysfs checkpointing and restore changes.

[P3:T33:double_fork] error: Failed to send IPC msg to 2: -32
[P3:T33:double_fork] error: Sending IPC process-exit notification failed: -32


LibOS/shim/src/fs/sys/fs.c line 286 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This function is guaranteed to be called only once, so you can remove this GET_FROM_CP_MAP, this check, ADD_TO_CP_MAP, the else branch, and the two lines for objp.

See my other comment below.

Yes removed GET_FROM_CP_MAP and ADD_TO_CP_MAP and also the else part but we need to keep the objp so that the new caches is properly updated.


LibOS/shim/src/fs/sys/fs.c line 313 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto

Done.


LibOS/shim/src/fs/sys/fs.c line 340 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto

Done.


LibOS/shim/src/fs/sys/fs.c line 367 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto

Done.


LibOS/shim/src/fs/sys/fs.c line 394 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto

Done.


LibOS/shim/src/fs/sys/fs.c line 415 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think this assert is misleading. I would just remove it.

I think having assert is good but agree the __UNUSED was misleading. I have removed it here and in all the other sub CP functions.


LibOS/shim/src/fs/sys/fs.c line 421 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto

Done.


LibOS/shim/src/fs/sys/fs.c line 445 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please just use g_pal_public_state->topo_info explicitly in this function, not the obj argument. See my other comment below.

Yes, done.


LibOS/shim/src/fs/sys/fs.c line 449 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

There is no need for this off check -- this function is guaranteed to be called only once (via DEFINE_MIGRATE). Just remove this and the previous line, and also remove the else branch.

For an example, see BEGIN_CP_FUNC(loaded_elf_objects) in shim_rtld.c

Thanks for the pointer, done.


LibOS/shim/src/fs/sys/fs.c line 451 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This also won't be needed when you remove the off check.

Yes, correct. Removed it.


LibOS/shim/src/fs/sys/fs.c line 460 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please wrap in 100 chars

For me, it shows 98 chars so felt wrapping was not needed here. Can you please check again?


LibOS/shim/src/fs/sys/fs.c line 468 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

These two lines are not needed, because we never call this function with objp.

Done.


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Shouldn't you also assert !topo_info->numa_distance_matrix ?

Since numa_nodes and numa_distance_matrix are dependent on numa_nodes_cnt both should be NULL if the cnt was zero and so added assert just for numa_node.

Also, if the first assert is true will the next line be executed?


LibOS/shim/src/fs/sys/fs.c line 508 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

caches= -> caches = (add space before =)

Here and everywhere below

Sorry copy-paste error :)


LibOS/shim/src/fs/sys/fs.c line 517 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

In the helper functions, you just do memcpy() of the whole object. You could do the same here.

On the other side, this set of assignments is explicit and readable. I guess let's keep it explicit like this, but I'd like to know @mkow opinion.

Yes, we can, and I am open to modifying it either way.


LibOS/shim/src/sys/shim_clone.c line 115 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please use NULL, 0 here instead of a reference to g_pal_public_state.

This is because you have functions like BEGIN_CP_FUNC(caches) which explicitly use the global variable g_pal_public_state->topo_info.caches_cnt.

So the fact that you specify g_pal_public_state as an input argument here is meaningless -- the helper functions still directly use this global var (ideally these helper functions would use the input argument, but we don't have this in our checkpointing scheme, and there is no need for this anyway).

Got it, changed to use NULL, 0


Pal/include/pal/pal.h line 131 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm not happy with the removal of const, but I understand that now g_pal_public_state becomes a pointer to a non-constant object (because topo_info field can change now, during restore in the child process).

I'm not sure how to fix this. Maybe @mkow can suggest something?

Anyway, we need to add a comment about this pecularity here. I suggest:

/* we cannot mark this as returning a pointer to `const` object, because LibOS can
 * change `pal_public_state.topo_info` during  checkpoint restore in the child */
struct pal_public_state* DkGetPalPublicState(void);

Yes, correct that is the reason I removed the const. I have added comments and open to suggestions.


Pal/src/host/Linux-SGX/db_main.c line 524 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why do you need this helper variable? You can still use parent_stream_fd != -1 in the check below. Please remove it.

It was more readable this way as parent_stream_fd == -1 doesn't explicitly say it is the first process but have removed it. I have added a comment to explain the logic.


Pal/src/host/Linux-SGX/sgx_main.c line 945 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We use parent_stream_fd < 0 in a similar check above. Please use that one instead of what you currently have.

Done.


Pal/src/host/Linux-SGX/sgx_main.c line 949 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Now it is possible that we continue with topo_info object having garbage (since it is allocated on the stack and not initialized).

Please add struct pal_topo_info topo_info = {0}; above, in the declaration of this variable.

Done.

@vijaydhanraj
Copy link
Contributor Author

Oh, why is GitHub rendering reviewable comments this way? Any thoughts?

@boryspoplawski
Copy link
Contributor

Oh, why is GitHub rendering reviewable comments this way? Any thoughts?

You have added tripple backtick at the end twice, which confused the syntax. I've taken the liberty of editing it (removed unnecessary ```

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, thanks @boryspoplawski. Appreciate the help!

Reviewable status: 4 of 9 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 4 of 9 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)

a discussion (no related file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Yes, that was my first check. I didn't cross-check each and every sysfs path but randomly checked a few paths and they were correct.

I do see the error messages below but don't think they are related to sysfs checkpointing and restore changes.

[P3:T33:double_fork] error: Failed to send IPC msg to 2: -32
[P3:T33:double_fork] error: Sending IPC process-exit notification failed: -32

</blockquote></details>

I wouldn't add such a test - you don't even verify the output, so it's not too useful (same as the sysfs topology tests we already have).

___
___
*[`LibOS/shim/src/fs/sys/fs.c` line 504 at r1](https://reviewable.io/reviews/gramineproject/gramine/583#-N1wiLIxCznxWWe3DL-9:-N24GUxf3m8XDalBZ9fw:bwqtxpz) ([raw file](https://github.com/gramineproject/gramine/blob/7e8558650069d5b4dcfad8fb7b79818589e02c45/LibOS/shim/src/fs/sys/fs.c#L504)):*
> Since `numa_nodes` and `numa_distance_matrix` are dependent on `numa_nodes_cnt` both should be NULL if the `cnt` was zero

No, why would they _have_ to be `NULL`?


<!-- Sent from Reviewable.io -->

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r1, 4 of 5 files at r2.
Reviewable status: 8 of 9 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


Pal/include/pal/pal.h line 131 at r1 (raw file):

now g_pal_public_state becomes a pointer to a non-constant object

But that's not what const means in C.

int x = 5;
const int* ptr = &x;
x = 10;

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 460 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

For me, it shows 98 chars so felt wrapping was not needed here. Can you please check again?

+1 to Vijay, this line doesn't need rewrapping (at least in r2 of this PR).


LibOS/shim/src/fs/sys/fs.c line 517 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Yes, we can, and I am open to modifying it either way.

I'd prefer memcpy, but see my other comment.


LibOS/shim/src/fs/sys/fs.c line 409 at r2 (raw file):

END_CP_FUNC(topo_info)

BEGIN_RS_FUNC(topo_info) {

Why is this change so complicated? Can't you just send one blob with all the arrays copied inside, point the array pointers to them and the to CP_REBASE() on retrieval?

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 286 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Yes removed GET_FROM_CP_MAP and ADD_TO_CP_MAP and also the else part but we need to keep the objp so that the new caches is properly updated.

Looks good. Yes, you are right, objp assignment is needed, I was wrong.


LibOS/shim/src/fs/sys/fs.c line 460 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

+1 to Vijay, this line doesn't need rewrapping (at least in r2 of this PR).

After the change, it is now 98, so it's good now.


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Since numa_nodes and numa_distance_matrix are dependent on numa_nodes_cnt both should be NULL if the cnt was zero

No, why would they have to be NULL?

I didn't understand Michal's point, but this code will be modified anyway, given our discussions above.


LibOS/shim/src/fs/sys/fs.c line 281 at r2 (raw file):

    assert(size == sizeof(struct pal_cache_info));

    struct pal_cache_info* caches = (struct pal_cache_info*)obj;

Actually, now that I read @mkow's comment and looking at the code, I wonder -- why do we split these "copy caches/threads/cores/etc" memory copies into separate BEGIN_CP_FUNC() functions?

We can simply inline all this code into the main function BEGIN_CP_FUNC(topo_info). Thus, we'll remove a lot of boilerplate code, and we won't have these 4-5 helper checkpoint functions.

So basically, you can simply copy this code from BEGIN_CP_FUNC(caches) into BEGIN_CP_FUNC(topo_info) and remove the now-redundant things like assignments to objp. You'll still need those ADD_CP_OFFSET(caches_size) and ADD_CP_FUNC_ENTRY(off) statements, but they will be all in the main function.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

    struct pal_cache_info* new_caches = NULL;

    size_t caches_cnt = g_pal_public_state->topo_info.caches_cnt;

Actually, what happens when caches_cnt == 0? I think the rest of the code will break (I'm sure ADD_CP_OFFSET behaves incorrectly when it is given 0). The best would be to skip all this logic then, and simply do objp = NULL.

Same for all other helper functions (threads, cores, ...).


LibOS/shim/src/fs/sys/fs.c line 409 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why is this change so complicated? Can't you just send one blob with all the arrays copied inside, point the array pointers to them and the to CP_REBASE() on retrieval?

Hm, this is also possible. I suggested (comment above) to inline all these helper checkpoint functions into the main topo_info function.

But Michal suggests to go even further and do something like this:

BEGIN_CP_FUNC(topo_info) {
    ...
    size_t off = ADD_CP_OFFSET(sizeof(struct pal_topo_info) +
        topo_info->caches_cnt * sizeof(struct pal_cache_info) +
        topo_info->threads_cnt * sizeof(struct pal_cpu_thread_info) +
        ... and so on ...);

    new_topo_info = (struct pal_topo_info*)(base + off);
    *new_topo_info = *topo_info;

    struct pal_cache_info* caches = (struct pal_cache_info*)(base + off + sizeof(struct pal_topo_info));
    memcpy(caches, topo_info->caches, topo_info->caches_cnt * sizeof(struct pal_cache_info));
    new_topo_info->caches = caches;

    ... same for threads, for cores, for NUMA, ...
}

To be honest, I'm not sure that this code is easier to read than what I proposed (the inlined helper functions).

On the other hand, the restore function (BEGIN_RS_FUNC(topo_info)) will be indeed trivial. Something like this:

    struct pal_topo_info* topo_info = (void*)(base + GET_CP_FUNC_ENTRY());
    CP_REBASE(topo_info->caches);
    CP_REBASE(topo_info->threads);
    ...
    memcpy(&g_pal_public_state->topo_info, topo_info, sizeof(struct pal_topo_info) +
        topo_info->caches_cnt * sizeof(struct pal_cache_info) +
        topo_info->threads_cnt * sizeof(struct pal_cpu_thread_info) + ...);

(Note that if topo_info->caches == NULL, the CP_REBASE macro becomes a no-op, so everything works.)

Yeah, ok, @mkow's version is actually more concise and thus better.


Pal/include/pal/pal.h line 131 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

now g_pal_public_state becomes a pointer to a non-constant object

But that's not what const means in C.

int x = 5;
const int* ptr = &x;
x = 10;

@mkow Yeah, I wasn't precise in my comment. The object can change, but it is not allowed to change the object values through the const pointer. Which is exactly the case in this PR (and the reason for removal of const).


Pal/include/pal/pal.h line 132 at r2 (raw file):

/* We cannot mark this as returning a pointer to `const` object, because LibOS can
 * change `pal_public_state.topo_info` during  checkpoint restore in the child */

Please remove double space between during and checkpoint


Pal/src/host/Linux-SGX/db_main.c line 577 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Now you will have g_pal_public_state.topo_info uninitialized at this point (in the child enclave). I guess this is fine because g_pal_public_state is a global variable and thus is initialized to all-zeros. So g_pal_public_state.topo_info will be all-zeros at this point, which is a sane value.

So unless Michal thinks that we need something explicit here, I'm fine. Just raising this point for discussion.

Looks like Michal doesn't mind. I'm resolving this comment.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @vijaydhanraj)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I didn't understand Michal's point, but this code will be modified anyway, given our discussions above.

My point is: Why a pointer to a zero-sized array must be NULL? What's wrong if it isn't?


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Actually, what happens when caches_cnt == 0? I think the rest of the code will break (I'm sure ADD_CP_OFFSET behaves incorrectly when it is given 0). The best would be to skip all this logic then, and simply do objp = NULL.

Same for all other helper functions (threads, cores, ...).

IMO the code should handle 0 counts correctly, if it doesn't then we need to fix it.


Pal/include/pal/pal.h line 131 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@mkow Yeah, I wasn't precise in my comment. The object can change, but it is not allowed to change the object values through the const pointer. Which is exactly the case in this PR (and the reason for removal of const).

Ah, I see that we actually use the ret val of this function for modifications (it wasn't clear from the diff, because the assignment to g_pal_public_state didn't change).

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 9 files reviewed, 7 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

I wouldn't add such a test - you don't even verify the output, so it's not too useful (same as the sysfs topology tests we already have).

No, I wasn't going to add this diff as a test case. I just wanted to show how I validated the changes and wanted to get your thoughts if we needed to add a new test case.



LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):
Yes @mkow is correct both pointers need not be NULL. From the code, it looks like they will be at least base + sizeof(struct shim_cp_entry).

Shouldn't you also assert !topo_info->numa_distance_matrix

But we don't need an additional assert because if the count is zero then we fail and exit with assert(!topo_info->numa_nodes); The assert in the next line won't even be executed.


LibOS/shim/src/fs/sys/fs.c line 517 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd prefer memcpy, but see my other comment.

Sure, used memcpy


LibOS/shim/src/fs/sys/fs.c line 281 at r2 (raw file):
I split it as this was more readable to me.

Thus, we'll remove a lot of boilerplate code, and we won't have these 4-5 helper checkpoint functions.

We will end up removing just ADD_CP_FUNC_ENTRY(off); and the objp and yes the cp functions. But feel inlining will just bloat one single function. I prefer to keep the function small when possible.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

IMO the code should handle 0 counts correctly, if it doesn't then we need to fix it.

I did do a quick check and the code does seem to work fine. If the cnt is zero ADD_CP_OFFSET adds at a minimum sizeof(struct shim_cp_entry). So, we should be good here with no overwrites.


LibOS/shim/src/fs/sys/fs.c line 409 at r2 (raw file):
What is complicated? The suggested approach, at least to me, doesn't look either concise or readable as we must manually offset for each variable which can be prone to error if we change the pal_topo_info struct in the future. See the code below.

diff --git a/LibOS/shim/src/fs/sys/fs.c b/LibOS/shim/src/fs/sys/fs.c
index 4259470c..381784fd 100644
--- a/LibOS/shim/src/fs/sys/fs.c
+++ b/LibOS/shim/src/fs/sys/fs.c
@@ -391,16 +391,69 @@ BEGIN_CP_FUNC(topo_info) {
     struct pal_topo_info* topo_info = &g_pal_public_state->topo_info;
     struct pal_topo_info* new_topo_info = NULL;
 
-    size_t off = ADD_CP_OFFSET(sizeof(struct pal_topo_info));
+    size_t off = ADD_CP_OFFSET(sizeof(struct pal_topo_info) +
+                               topo_info->caches_cnt * sizeof(struct pal_cache_info) +
+                               topo_info->threads_cnt * sizeof(struct pal_cpu_thread_info) +
+                               topo_info->cores_cnt * sizeof(struct pal_cpu_core_info) +
+                               topo_info->sockets_cnt * sizeof(struct pal_socket_info) +
+                               topo_info->numa_nodes_cnt * sizeof(struct pal_numa_node_info) +
+                               topo_info->numa_nodes_cnt * topo_info->numa_nodes_cnt *
+                               sizeof(size_t));
+    //size_t off = ADD_CP_OFFSET(sizeof(struct pal_topo_info));
     new_topo_info = (struct pal_topo_info*)(base + off);
     *new_topo_info = *topo_info;
 
-    DO_CP(caches, topo_info->caches, &new_topo_info->caches);
-    DO_CP(threads, topo_info->threads, &new_topo_info->threads);
-    DO_CP(cores, topo_info->cores, &new_topo_info->cores);
-    DO_CP(sockets, topo_info->sockets, &new_topo_info->sockets);
-    DO_CP(numa_nodes, topo_info->numa_nodes, &new_topo_info->numa_nodes);
-    DO_CP(numa_distances, topo_info->numa_distance_matrix, &new_topo_info->numa_distance_matrix);
+    //DO_CP(caches, topo_info->caches, &new_topo_info->caches);
+    struct pal_cache_info* caches = (struct pal_cache_info*)(base + off + sizeof(struct pal_topo_info));
+    memcpy(caches, topo_info->caches, topo_info->caches_cnt * sizeof(struct pal_cache_info));
+    new_topo_info->caches = caches;
+
+    //DO_CP(threads, topo_info->threads, &new_topo_info->threads);
+    struct pal_cpu_thread_info* threads = (struct pal_cpu_thread_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info));
+    memcpy(threads, topo_info->threads, topo_info->threads_cnt * sizeof(struct pal_cpu_thread_info));
+    new_topo_info->threads = threads;
+
+    //DO_CP(cores, topo_info->cores, &new_topo_info->cores);
+    struct pal_cpu_core_info* cores = (struct pal_cpu_core_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info));
+    memcpy(cores, topo_info->cores, topo_info->cores_cnt * sizeof(struct pal_cpu_core_info));
+    new_topo_info->cores = cores;
+
+    //DO_CP(sockets, topo_info->sockets, &new_topo_info->sockets);
+    struct pal_socket_info* sockets = (struct pal_socket_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info) +
+                                            sizeof(struct pal_cpu_core_info));
+    memcpy(sockets, topo_info->sockets, topo_info->sockets_cnt * sizeof(struct pal_socket_info));
+    new_topo_info->sockets = sockets;
+
+    //DO_CP(numa_nodes, topo_info->numa_nodes, &new_topo_info->numa_nodes);
+    struct pal_numa_node_info* numa_nodes = (struct pal_numa_node_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info) +
+                                            sizeof(struct pal_cpu_core_info) +
+                                            sizeof(struct pal_socket_info));
+    memcpy(numa_nodes, topo_info->numa_nodes,
+           topo_info->numa_nodes_cnt * sizeof(struct pal_numa_node_info));
+    new_topo_info->numa_nodes = numa_nodes;
+
+    //DO_CP(numa_distances, topo_info->numa_distance_matrix, &new_topo_info->numa_distance_matrix);
+    size_t* numa_distance_matrix = (size_t*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info) +
+                                            sizeof(struct pal_cpu_core_info) +
+                                            sizeof(struct pal_socket_info) +
+                                            sizeof(struct pal_numa_node_info));
+    memcpy(numa_distance_matrix, topo_info->numa_distance_matrix,
+           topo_info->numa_nodes_cnt * topo_info->numa_nodes_cnt * sizeof(size_t));
+    new_topo_info->numa_distance_matrix = numa_distance_matrix;

     ADD_CP_FUNC_ENTRY(off);
 }

On the other hand, the restore function (BEGIN_RS_FUNC(topo_info)) will be indeed trivial

Isn't it the same? You seem to have removed the cnt check and used memcpy which we can do even with the current implementation. The reason for the cnt check is that the pointers will not be NULL as we do base + off. off will be at least the size of struct shim_cp_entry. So, if the cnt is zero and if we have a valid address, we need to stop right away.

The inline approach is better, but we will simply bloat a single function over a few functions. I prefer the current implementation over both suggestions but if you insist, I will change to using this blob approach.


Pal/include/pal/pal.h line 132 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please remove double space between during and checkpoint

Done.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Yes @mkow is correct both pointers need not be NULL. From the code, it looks like they will be at least base + sizeof(struct shim_cp_entry).

Shouldn't you also assert !topo_info->numa_distance_matrix

But we don't need an additional assert because if the count is zero then we fail and exit with assert(!topo_info->numa_nodes); The assert in the next line won't even be executed.

I actually very much prefer to have NULL pointers when cnt == 0. My proposed code in the above comment ensures this.

So, if we use my code, we can simplify this restore (RS) function to the trivial:

    struct pal_topo_info* topo_info = (void*)(base + GET_CP_FUNC_ENTRY());
    CP_REBASE(topo_info->caches);
    CP_REBASE(topo_info->threads);
    ...
    memcpy(&g_pal_public_state->topo_info, topo_info, sizeof(*topo_info));

In other words, we can skip the checks and asserts because we are guaranteed (by the construction of checkpoint functions) that when the corresponding cnt == 0, then the corresponding pointer is NULL.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

I did do a quick check and the code does seem to work fine. If the cnt is zero ADD_CP_OFFSET adds at a minimum sizeof(struct shim_cp_entry). So, we should be good here with no overwrites.

Thanks for checking, but then it doesn't make much sense to have some dummy unused regions of checkpoint memory (of size sizeof(struct shim_cp_entry)). Why not just skip the step and return NULL when cnt == 0? This is much cleaner than adding some uninitialized memory blob into the checkpoint.

I suggest the refactoring like this:

BEGIN_CP_FUNC(caches) {
    assert(size == sizeof(struct pal_cache_info));

    size_t caches_cnt = g_pal_public_state->topo_info.caches_cnt;
    if (caches_cnt == 0) {
        *objp = NULL;
        return;
    }

    size_t caches_size = caches_cnt * sizeof(struct pal_cache_info);
    size_t off = ADD_CP_OFFSET(caches_size);
    struct pal_cache_info* new_caches = (struct pal_cache_info*)(base + off);
    memcpy(new_caches, g_pal_public_state->topo_info.caches, caches_size);

    ADD_CP_FUNC_ENTRY(off);
    *objp = (void*)new_caches;
}
END_CP_FUNC_NO_RS(caches)

This also has a side benefit that if cnt is zero, then the corresponding array is definitely NULL (in your current version, the array points to some uninitialized blob inside the checkpoint).


LibOS/shim/src/fs/sys/fs.c line 409 at r2 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

What is complicated? The suggested approach, at least to me, doesn't look either concise or readable as we must manually offset for each variable which can be prone to error if we change the pal_topo_info struct in the future. See the code below.

diff --git a/LibOS/shim/src/fs/sys/fs.c b/LibOS/shim/src/fs/sys/fs.c
index 4259470c..381784fd 100644
--- a/LibOS/shim/src/fs/sys/fs.c
+++ b/LibOS/shim/src/fs/sys/fs.c
@@ -391,16 +391,69 @@ BEGIN_CP_FUNC(topo_info) {
     struct pal_topo_info* topo_info = &g_pal_public_state->topo_info;
     struct pal_topo_info* new_topo_info = NULL;
 
-    size_t off = ADD_CP_OFFSET(sizeof(struct pal_topo_info));
+    size_t off = ADD_CP_OFFSET(sizeof(struct pal_topo_info) +
+                               topo_info->caches_cnt * sizeof(struct pal_cache_info) +
+                               topo_info->threads_cnt * sizeof(struct pal_cpu_thread_info) +
+                               topo_info->cores_cnt * sizeof(struct pal_cpu_core_info) +
+                               topo_info->sockets_cnt * sizeof(struct pal_socket_info) +
+                               topo_info->numa_nodes_cnt * sizeof(struct pal_numa_node_info) +
+                               topo_info->numa_nodes_cnt * topo_info->numa_nodes_cnt *
+                               sizeof(size_t));
+    //size_t off = ADD_CP_OFFSET(sizeof(struct pal_topo_info));
     new_topo_info = (struct pal_topo_info*)(base + off);
     *new_topo_info = *topo_info;
 
-    DO_CP(caches, topo_info->caches, &new_topo_info->caches);
-    DO_CP(threads, topo_info->threads, &new_topo_info->threads);
-    DO_CP(cores, topo_info->cores, &new_topo_info->cores);
-    DO_CP(sockets, topo_info->sockets, &new_topo_info->sockets);
-    DO_CP(numa_nodes, topo_info->numa_nodes, &new_topo_info->numa_nodes);
-    DO_CP(numa_distances, topo_info->numa_distance_matrix, &new_topo_info->numa_distance_matrix);
+    //DO_CP(caches, topo_info->caches, &new_topo_info->caches);
+    struct pal_cache_info* caches = (struct pal_cache_info*)(base + off + sizeof(struct pal_topo_info));
+    memcpy(caches, topo_info->caches, topo_info->caches_cnt * sizeof(struct pal_cache_info));
+    new_topo_info->caches = caches;
+
+    //DO_CP(threads, topo_info->threads, &new_topo_info->threads);
+    struct pal_cpu_thread_info* threads = (struct pal_cpu_thread_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info));
+    memcpy(threads, topo_info->threads, topo_info->threads_cnt * sizeof(struct pal_cpu_thread_info));
+    new_topo_info->threads = threads;
+
+    //DO_CP(cores, topo_info->cores, &new_topo_info->cores);
+    struct pal_cpu_core_info* cores = (struct pal_cpu_core_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info));
+    memcpy(cores, topo_info->cores, topo_info->cores_cnt * sizeof(struct pal_cpu_core_info));
+    new_topo_info->cores = cores;
+
+    //DO_CP(sockets, topo_info->sockets, &new_topo_info->sockets);
+    struct pal_socket_info* sockets = (struct pal_socket_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info) +
+                                            sizeof(struct pal_cpu_core_info));
+    memcpy(sockets, topo_info->sockets, topo_info->sockets_cnt * sizeof(struct pal_socket_info));
+    new_topo_info->sockets = sockets;
+
+    //DO_CP(numa_nodes, topo_info->numa_nodes, &new_topo_info->numa_nodes);
+    struct pal_numa_node_info* numa_nodes = (struct pal_numa_node_info*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info) +
+                                            sizeof(struct pal_cpu_core_info) +
+                                            sizeof(struct pal_socket_info));
+    memcpy(numa_nodes, topo_info->numa_nodes,
+           topo_info->numa_nodes_cnt * sizeof(struct pal_numa_node_info));
+    new_topo_info->numa_nodes = numa_nodes;
+
+    //DO_CP(numa_distances, topo_info->numa_distance_matrix, &new_topo_info->numa_distance_matrix);
+    size_t* numa_distance_matrix = (size_t*)(base + off +
+                                            sizeof(struct pal_topo_info) +
+                                            sizeof(struct pal_cache_info) +
+                                            sizeof(struct pal_cpu_thread_info) +
+                                            sizeof(struct pal_cpu_core_info) +
+                                            sizeof(struct pal_socket_info) +
+                                            sizeof(struct pal_numa_node_info));
+    memcpy(numa_distance_matrix, topo_info->numa_distance_matrix,
+           topo_info->numa_nodes_cnt * topo_info->numa_nodes_cnt * sizeof(size_t));
+    new_topo_info->numa_distance_matrix = numa_distance_matrix;

     ADD_CP_FUNC_ENTRY(off);
 }

On the other hand, the restore function (BEGIN_RS_FUNC(topo_info)) will be indeed trivial

Isn't it the same? You seem to have removed the cnt check and used memcpy which we can do even with the current implementation. The reason for the cnt check is that the pointers will not be NULL as we do base + off. off will be at least the size of struct shim_cp_entry. So, if the cnt is zero and if we have a valid address, we need to stop right away.

The inline approach is better, but we will simply bloat a single function over a few functions. I prefer the current implementation over both suggestions but if you insist, I will change to using this blob approach.

OK, I agree that the "one big fat function" looks ugly. I'm fine with the current version.


LibOS/shim/src/fs/sys/fs.c line 281 at r3 (raw file):

    assert(size == sizeof(struct pal_cache_info));

    struct pal_cache_info* caches = (struct pal_cache_info*)obj;

Here I don't like that you take caches from the input argument obj, but then you take caches_cnt from the global variable g_pal_public_state. I know that both end up using the same object (the global g_pal_public_state), but this is not easy to see for future reader.

I suggest to explicitly use the global variable and mark obj as unused. See my proposed code in the comment below.


Pal/include/pal/pal.h line 131 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ah, I see that we actually use the ret val of this function for modifications (it wasn't clear from the diff, because the assignment to g_pal_public_state didn't change).

@mkow You're still blocking on this discussion.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r1, 4 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 409 at r2 (raw file):

What is complicated? The suggested approach, at least to me, doesn't look either concise or readable as we must manually offset for each variable which can be prone to error if we change the pal_topo_info struct in the future. See the code below.

The inline approach is better, but we will simply bloat a single function over a few functions. I prefer the current implementation over both suggestions but if you insist, I will change to using this blob approach.

Please take a look at boryspoplawski@34cf2f6#diff-9e25c4e2e0ea819880c207dd80fa7acbeec1f5d1d822502427abd1c2b70a71e4
It's one function approach, IMO quite readable and much, much shorter.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I actually very much prefer to have NULL pointers when cnt == 0. My proposed code in the above comment ensures this.

So, if we use my code, we can simplify this restore (RS) function to the trivial:

    struct pal_topo_info* topo_info = (void*)(base + GET_CP_FUNC_ENTRY());
    CP_REBASE(topo_info->caches);
    CP_REBASE(topo_info->threads);
    ...
    memcpy(&g_pal_public_state->topo_info, topo_info, sizeof(*topo_info));

In other words, we can skip the checks and asserts because we are guaranteed (by the construction of checkpoint functions) that when the corresponding cnt == 0, then the corresponding pointer is NULL.

From the code, we can see that CP_REBASE doesn't fail if a checkpoint pointer is NULL. So, if cnt is zero we will end up with empty values. So, it is better to have a cnt == 0 check here and fail immediately.

Below is the output from such a scenario with double_fork test trying to print sysfs data.

/sys: directory
/sys/devices: directory
/sys/devices/system: directory
/sys/devices/system/cpu: directory
/sys/devices/system/cpu/online: file
[/sys/devices/system/cpu/online] 
/sys/devices/system/cpu/possible: file
[/sys/devices/system/cpu/possible] 
/sys/devices/system/node: directory
/sys/devices/system/node/online: file
[/sys/devices/system/node/online] 
/sys/devices/system/node/possible: file
[/sys/devices/system/node/possible] TEST OK

LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Thanks for checking, but then it doesn't make much sense to have some dummy unused regions of checkpoint memory (of size sizeof(struct shim_cp_entry)). Why not just skip the step and return NULL when cnt == 0? This is much cleaner than adding some uninitialized memory blob into the checkpoint.

I suggest the refactoring like this:

BEGIN_CP_FUNC(caches) {
    assert(size == sizeof(struct pal_cache_info));

    size_t caches_cnt = g_pal_public_state->topo_info.caches_cnt;
    if (caches_cnt == 0) {
        *objp = NULL;
        return;
    }

    size_t caches_size = caches_cnt * sizeof(struct pal_cache_info);
    size_t off = ADD_CP_OFFSET(caches_size);
    struct pal_cache_info* new_caches = (struct pal_cache_info*)(base + off);
    memcpy(new_caches, g_pal_public_state->topo_info.caches, caches_size);

    ADD_CP_FUNC_ENTRY(off);
    *objp = (void*)new_caches;
}
END_CP_FUNC_NO_RS(caches)

This also has a side benefit that if cnt is zero, then the corresponding array is definitely NULL (in your current version, the array points to some uninitialized blob inside the checkpoint).

Yes, we now set the pointer to NULL if cnt is zero. But the code is inlined now and removed this function.


LibOS/shim/src/fs/sys/fs.c line 409 at r2 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

What is complicated? The suggested approach, at least to me, doesn't look either concise or readable as we must manually offset for each variable which can be prone to error if we change the pal_topo_info struct in the future. See the code below.

The inline approach is better, but we will simply bloat a single function over a few functions. I prefer the current implementation over both suggestions but if you insist, I will change to using this blob approach.

Please take a look at boryspoplawski@34cf2f6#diff-9e25c4e2e0ea819880c207dd80fa7acbeec1f5d1d822502427abd1c2b70a71e4
It's one function approach, IMO quite readable and much, much shorter.

Thanks @boryspoplawski. Yes, I have used the inline approach now. There are 2 changes, one is we need to copy *_cnt variables. And second is I have kept the check for *_cnt > 0. Please see my other comment on this.


LibOS/shim/src/fs/sys/fs.c line 281 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Here I don't like that you take caches from the input argument obj, but then you take caches_cnt from the global variable g_pal_public_state. I know that both end up using the same object (the global g_pal_public_state), but this is not easy to see for future reader.

I suggest to explicitly use the global variable and mark obj as unused. See my proposed code in the comment below.

Inlined the code and removed this function.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3, all commit messages.
Reviewable status: 8 of 9 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

This is much cleaner than adding some uninitialized memory blob into the checkpoint.

What uninitialized blob? If count is 0 then why anything would fail? I completely fail to see why we should have a special handling for 0 sizes anywhere.


Pal/include/pal/pal.h line 131 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@mkow You're still blocking on this discussion.

Oops, sorry.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This is much cleaner than adding some uninitialized memory blob into the checkpoint.

What uninitialized blob? If count is 0 then why anything would fail? I completely fail to see why we should have a special handling for 0 sizes anywhere.

Is count == 0 a valid case for any of the topology objects?

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

From the code, we can see that CP_REBASE doesn't fail if a checkpoint pointer is NULL. So, if cnt is zero we will end up with empty values. So, it is better to have a cnt == 0 check here and fail immediately.

Below is the output from such a scenario with double_fork test trying to print sysfs data.

/sys: directory
/sys/devices: directory
/sys/devices/system: directory
/sys/devices/system/cpu: directory
/sys/devices/system/cpu/online: file
[/sys/devices/system/cpu/online] 
/sys/devices/system/cpu/possible: file
[/sys/devices/system/cpu/possible] 
/sys/devices/system/node: directory
/sys/devices/system/node/online: file
[/sys/devices/system/node/online] 
/sys/devices/system/node/possible: file
[/sys/devices/system/node/possible] TEST OK

Please remove all these checks, there is no point in having them. CP_REBASE works on NULL correctly.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

Please remove all these checks, there is no point in having them. CP_REBASE works on NULL correctly.

Sorry if I wasn't clear earlier. I am not saying the count == 0 can cause an issue with CP_REBASE and yes it works fine with NULL. But count == 0 is invalid for any of the topology objects and our topo info could be empty as shown in the earlier comment. The check is to catch this case and fail right away.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Sorry if I wasn't clear earlier. I am not saying the count == 0 can cause an issue with CP_REBASE and yes it works fine with NULL. But count == 0 is invalid for any of the topology objects and our topo info could be empty as shown in the earlier comment. The check is to catch this case and fail right away.

  1. You don't fail anywhere, you merely added an assert.
  2. If count == 0 is invalid, then how can it happen here? There is no such flow in the code, it's impossible.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

  1. You don't fail anywhere, you merely added an assert

Yes correct we don't fail in a release build but in debug build it helps.

  1. If count == 0 is invalid, then how can it happen here? There is no such flow in the code, it's impossible.

If there was a bug in the checkpointing and restoring code, the count could be zero.
But if you think this is redundant and count will always be greater than zero, then shall we also remove the *_cnt check in the checkpointing flow as well?


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Is count == 0 a valid case for any of the topology objects?

Shall we remove these *_cnt checks as well?

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Yes correct we don't fail in a release build but in debug build it helps.

Helps in what way?

If there was a bug in the checkpointing and restoring code, the count could be zero.

And the pointer would be not NULL? Then these asserts will pass.

But if you think this is redundant and count will always be greater than zero, then shall we also remove the *_cnt check in the checkpointing flow as well?

Just assume that count can be 0 in a legitimate case, everything will be simpler. I see no point in special casing here. In the checkpointing flow above it's different, because it doesn't create the metadata, but if ADD_CP_OFFSET works well with 0 then yes, we can remove them.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Shall we remove these *_cnt checks as well?

Not sure if ADD_CP_OFFSET works well with 0 and even if it does, this check gets rid of the unnecessary metadata in the checkpointing blob.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Helps in what way?

Sorry, I don't understand. I added assert to catch this logically impossible condition at least in debug mode.

And the pointer would be not NULL? Then these asserts will pass.

No, if we maintain the code as it is now when count == 0 the corresponding pointer will be NULL as we don't add it to the checkpoint and the assert will fail.

In the checkpointing flow above it's different because it doesn't create the metadata, but if ADD_CP_OFFSET works well with 0 then yes, we can remove them.

We did discuss this in one of the earlier comments and I verified that ADD_CP_OFFSET works well with 0. I prefer to have uniform semantics between checkpointing and restoring flow.

Just assume that count can be 0 in a legitimate case, everything will be simpler. I see no point in special casing here

Not sure what will be simpler. But if count == 0 is a legitimate case and say for example threads_cnt is zero, then /proc/cpuinfo will show no entries. These cannot be empty as there should be at least core 0 online and we should see its information /proc entry. Also, many other things like set/get affinity and /sys/ will break too.

So, to wrap up the discussion,
My suggestion is if we conclude count == 0 is a legitimate case, then let us keep the checks and assert. If not, we can remove them from both checkpointing and restoring flow.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

No, if we maintain the code as it is now when count == 0 the corresponding pointer will be NULL as we don't add it to the checkpoint and the assert will fail.

But you explicitly said this is a case of a bug when count is changed to 0.

Not sure what will be simpler. But if count == 0 is a legitimate case and say for example threads_cnt is zero, then /proc/cpuinfo will show no entries. These cannot be empty as there should be at least core 0 online and we should see its information /proc entry. Also, many other things like set/get affinity and /sys/ will break too.

We don't care for these here. The code will be simpler / less complicated, because it will contain less ifs and less lines.

My suggestion is if we conclude count == 0 is a legitimate case, then let us keep the checks and assert.

What? If it's legitimate case, then your code breaks it, because it asserts that count == 0, pointer == NULL is invalid.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, boryspoplawski (Borys Popławski) wrote…

No, if we maintain the code as it is now when count == 0 the corresponding pointer will be NULL as we don't add it to the checkpoint and the assert will fail.

But you explicitly said this is a case of a bug when count is changed to 0.

Not sure what will be simpler. But if count == 0 is a legitimate case and say for example threads_cnt is zero, then /proc/cpuinfo will show no entries. These cannot be empty as there should be at least core 0 online and we should see its information /proc entry. Also, many other things like set/get affinity and /sys/ will break too.

We don't care for these here. The code will be simpler / less complicated, because it will contain less ifs and less lines.

My suggestion is if we conclude count == 0 is a legitimate case, then let us keep the checks and assert.

What? If it's legitimate case, then your code breaks it, because it asserts that count == 0, pointer == NULL is invalid.

@vijaydhanraj: Think why you don't check here that numa_distance_matrix is reciprocal, that all the pointers are properly aligned, or that topo_info->caches_cnt * sizeof(*topo_info->caches) doesn't overflow, and 50 other technically valid checks which you could also do here.
With your logic adding all of them here would be a good idea, "to catch this logically impossible condition at least in debug mode".


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Is count == 0 a valid case for any of the topology objects?

Then why don't you check here that numa_distance_matrix is reciprocal? It's also an invalid case. Or that all the pointers are properly aligned? Or that topo_info->caches_cnt * sizeof(*topo_info->caches) doesn't overflow?
Answering this question will help you understand why checking for count == 0 makes no sense here.

Not sure if ADD_CP_OFFSET works well with 0 and even if it does, this check gets rid of the unnecessary metadata in the checkpointing blob.

It should work, if not we should fix it because it may cause troubles in other places. Ad additional data: it's a rather negligible gain, I prefer less IFs in the code.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

But you explicitly said this is a case of a bug when count is changed to 0.

I think there is some confusion in the way I described it. What I meant is if count==0, then the assert will be triggered as pointer will be NULL

What? If it's legitimate case, then your code breaks it, because it asserts that count == 0, pointer == NULL is invalid

Sorry for the confusion. By "legitimate" I meant that there can be a possibility of "count" being zero. Basically, we are not saying count == 0 is impossible and will never be true. In that case, we can have the check/assert.

@mkow, I didn't add this check from the perspective of sanitization/validation. I added the check for count from the perspective of checkpointing and restoring. If the count is zero, then everything will succeed but the user can end up with incorrect/empty entries. I wanted to catch this case.

But if count == 0 can never be true then we can remove these checks/asserts.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Answering this question will help you understand why checking for count == 0 makes no sense here.

Please see my other comment.

It should work, if not we should fix it because it may cause troubles in other places. Ad additional data: it's a rather negligible gain, I prefer less IFs in the code.

@mkow looks like you are saying count == 0 can never be true. And you suggest removing the if checks from both checkpointing/resorting flows. Can you please confirm?

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

@mkow, I didn't add this check from the perspective of sanitization/validation. I added the check for count from the perspective of checkpointing and restoring. If the count is zero, then everything will succeed but the user can end up with incorrect/empty entries. I wanted to catch this case.

In what sense they are "incorrect"? How this "incorrect" is different from e.g. non-reciprocality of numa_distance_matrix or other cases I mentioned?


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

@mkow looks like you are saying count == 0 can never be true. And you suggest removing the if checks from both checkpointing/resorting flows. Can you please confirm?

It can't be, but that's not my point. My point is that these checks make no sense and you can just remove them. This code should not rely/assume such things about the data, because why would it? It only makes it more complicated and mixes different abstraction layers.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 5 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @dimakuv, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

@mkow, I didn't add this check from the perspective of sanitization/validation. I added the check for count from the perspective of checkpointing and restoring. If the count is zero, then everything will succeed but the user can end up with incorrect/empty entries. I wanted to catch this case.

In what sense they are "incorrect"? How this "incorrect" is different from e.g. non-reciprocality of numa_distance_matrix or other cases I mentioned?

Take threads_cnt for example, we fail with -PAL_ERROR_INVAL if it was zero. But in this case, if there was an issue with checkpointing/restore logic and threads_cnt was zero we continue without any failure.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

@mkow looks like you are saying count == 0 can never be true. And you suggest removing the if checks from both checkpointing/resorting flows. Can you please confirm?

It can't be, but that's not my point. My point is that these checks make no sense and you can just remove them. This code should not rely/assume such things about the data, because why would it? It only makes it more complicated and mixes different abstraction layers.

OK, I have removed the checks and asserts from both the checkpointing and restoring functions.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Take threads_cnt for example, we fail with -PAL_ERROR_INVAL if it was zero. But in this case, if there was an issue with checkpointing/restore logic and threads_cnt was zero we continue without any failure.

I can't follow this discussion any more. I think the discussion moved to a general "at which point there are too many assert statements in the code". I think it's a matter of taste, but I definitely don't want to drag this PR over this issue.

But looking at the current code, the logic raises questions for me.

  • During checkpoint we do this:
    new_topo_info->caches = (void*)(base + ADD_CP_OFFSET(caches_size));

If new_topo_info->caches_cnt is zero, then new_topo_info->caches is a pointer to some garbage.

  • During restore we do this:
    CP_REBASE(topo_info->caches);

So the topo_info->caches continues to be a pointer to some garbage.

I guess what @mkow is saying is: "Who cares what the value of caches pointer is when the caches_cnt == 0. All Gramine code is written in such a way as to ignore the caches pointer when the corresponding caches_cnt is zero. We can have garbage in that pointer, and we don't care."

If the above statement is fine with everyone, let's agree on this and stop the discussion.

I am fine with the above statement.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

OK, I have removed the checks and asserts from both the checkpointing and restoring functions.

I guess we resolved this particular thread and decided to remove checks on count == 0.

But I would like to point out to @mkow that ADD_CP_OFFSET doesn't work correctly with 0 size, see

struct shim_cp_entry* oob = (void*)base + __ADD_CP_OFFSET(sizeof(struct shim_cp_entry)); \

When I say "doesn't work correctly", I mean that there will be a dummy object created inside the checkpoint blob, and this dummy object will never be used (so the "incorrectness" is that we lose memory on smth that is never used).

Of course, as @mkow mentions, we should actually fix this macro. But this is a story for another PR (actually, a huge change, because you can't just change one macro in the checkpointing subsystem, it needs to be completely rewritten).

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

we fail with -PAL_ERROR_INVAL if it was zero

Where?

I think it's a matter of taste

No, here it's a matter of logic and consistency in the codebase. Choosing a random subset of topology properties we want and checking them here makes no sense. Either we do full sanitization here (but what for?) or we don't do anything.

So the topo_info->caches continues to be a pointer to some garbage.

Not really, it's a pointer to 0 bytes of that garbage.

All Gramine code is written in such a way as to ignore the caches pointer when the corresponding caches_cnt is zero.

This isn't a property of our code specifically, that's how C works. If you have a pointer to a zero-sized allocation then you're allowed to touch exactly zero bytes through that pointer.


LibOS/shim/src/fs/sys/fs.c line 284 at r2 (raw file):

(so the "incorrectness" is that we lose memory on smth that is never used).

That's not incorrect, just suboptimal memory-wise, which seems negligible in this case. Especially that this case shouldn't even happen in practice, so there's no point in optimizing it right now.


LibOS/shim/src/fs/sys/fs.c line 314 at r5 (raw file):

    size_t numa_dm_size = topo_info->numa_nodes_cnt * topo_info->numa_nodes_cnt
                            * sizeof(*topo_info->numa_distance_matrix);

Please indent by 2 spaces less.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):
Thanks @mkow . I actually didn't know that the C standard has this to say:

If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. § 7.22.3

Anyway, I'm fine with the current implementation.

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj)

@boryspoplawski
Copy link
Contributor

Jenkins, test this please

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Thanks @mkow . I actually didn't know that the C standard has this to say:

If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned to indicate an error, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object. § 7.22.3

Anyway, I'm fine with the current implementation.

https://godbolt.org/z/nbz1hhc3s

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj)

dimakuv
dimakuv previously approved these changes May 18, 2022
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @boryspoplawski, @mkow, and @vijaydhanraj)


LibOS/shim/src/fs/sys/fs.c line 314 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please indent by 2 spaces less.

I'm on it.

Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
@dimakuv dimakuv force-pushed the vdhanraj/sysfs_checkpoint_restore branch from 18672ec to e1d0f87 Compare May 18, 2022 13:23
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6, all commit messages.
Dismissed @vijaydhanraj from a discussion.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @boryspoplawski and @mkow)


LibOS/shim/src/fs/sys/fs.c line 314 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I'm on it.

Done.

@dimakuv
Copy link

dimakuv commented May 18, 2022

Jenkins, test this plesae

@dimakuv
Copy link

dimakuv commented May 18, 2022

Jenkins, test this please (made a typo the last time)

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @mkow)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimakuv dimakuv merged commit e1d0f87 into gramineproject:master May 18, 2022
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved


LibOS/shim/src/fs/sys/fs.c line 504 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Where?

I was referring to this, https://github.com/gramineproject/gramine/blob/master/Pal/src/host/Linux-SGX/db_main.c#L141

But this is a part of sanitization? How does this differ from 30 other checks in that function, which you didn't include here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants