Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

[Pal/LibOS] Add physical id and fix cpu cores in /proc/cpuinfo #1910

Merged

Conversation

vijaydhanraj
Copy link
Contributor

@vijaydhanraj vijaydhanraj commented Oct 21, 2020

Description of the changes

Applications tend to use /proc/cpuinfo to get the cpu cores and physical id for computing the number of physical cores in a
socket. Currently cpu cores field is incorrectly implemented as it is set to the number of logical processors online and physical id isn't implemented. This patch addresses both of these issues.

For example, openVINO these fields in the following way to decide how many threads it should use for its inference engine.
https://github.com/openvinotoolkit/openvino/blob/master/inference-engine/src/inference_engine/os/lin/lin_system_conf.cpp#L24

How to test this PR?

We have proc_common.c regression test which already covers /proc/cpuinfo.


This change is Reviewable

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r1.
Reviewable status: all files reviewed, 17 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @vijaydhanraj)


Pal/include/arch/x86_64/pal-arch.h, line 27 at r1 (raw file):

#define PAL_LIBOS_TCB_SIZE 256
#define PAL_MAX_CPU        1024

This constant is not architecture-dependent. Let's move it to Pal/include/pal/pal.h


Pal/src/host/Linux/db_main.c, line 276 at r1 (raw file):

 * host without SMT support, /sys/devices/system/cpu/smt/active will return 0 (the actual value)
 */
int get_hw_res_count(const char *filename) {

The implementation of this function is wrong. Consider what happens if run on a single-cpu machine: your /sys/devices/system/cpu/online contains 0 (meaning there is one CPU with index 0). This will lead this function to return 0 (firstint). However, Graphene just wanted to know how many CPUs are there, so the correct return must be 1.

I suggest to change this function as follows:

/* Opens a pseudo-file describing HW resources such as online CPUs and counts the number of HW resources
 * present in the file (if count == true) or simply reads the integer stored in the file (if count == false).
 * Returns -errno on failure. For example, calling this function on `/sys/devices/system/cpu/online` (with
 * complex formats like "1,3-5,6") will return 5, and calling this function on `/sys/devices/system/cpu/smt/active`
 * will return 1 (host supports SMT) or 0 (host doesn't support SMT). */
int get_hw_resource(const char *filename, bool count) {
   ... current logic of open and read ...
   if (!count) {
      /* caller wants to read an int stored in the file */
     int ret = strtol(buf, &end, 10);
     ... add error checking and return ret ...
   } else {
      ... current logic of reading the list and counting ...
   }

Pal/src/host/Linux/db_main-x86_64.c, line 78 at r1 (raw file):

int _DkGetCPUInfo(PAL_CPU_INFO* ci) {
    char filename[128];

Please move this filename close to the actual place where you use it. (This is a preference in style, but I think it makes sense here: you only use filename once in that particular place.)


Pal/src/host/Linux/db_main-x86_64.c, line 83 at r1 (raw file):

    const size_t VENDOR_ID_SIZE = 13;
    char* vendor_id = malloc(VENDOR_ID_SIZE);

While we're here, can you add error checking if (!vendor_id) {return -PAL_ERROR_NOMEM}


Pal/src/host/Linux/db_main-x86_64.c, line 93 at r1 (raw file):

    const size_t BRAND_SIZE = 49;
    char* brand = malloc(BRAND_SIZE);

While we're here, can you add error checking if (!brand) {return -PAL_ERROR_NOMEM}


Pal/src/host/Linux/db_main-x86_64.c, line 107 at r1 (raw file):

    if (cpu_num <= 0) {
        rv = cpu_num;
        goto out;

This goes wrong if cpu_num == 0. In this case, rv == 0 and we jump to out, which returns rv == 0 -- but this indicates "success"!

You should keep the old logic of -PAL_ERROR_STREAMNOTEXIST in the get_hw_res_count() function and don't check for cpu_num == 0 here.


Pal/src/host/Linux/db_main-x86_64.c, line 112 at r1 (raw file):

    /* we cannot use CPUID(0xb) because it counts even disabled-by-BIOS cores (e.g. HT cores);
     * instead we extract info on number of cores by parsing sysfs pseudo-files */

Could you move this comment up to the first get_hw_res_count()? It doesn't make sense here, in the middle of using sysfs files.

Also, update extract info on number of cores ... -> extract info on total number of logical processors, number of physical cores, SMT support, etc. ...


Pal/src/host/Linux/db_main-x86_64.c, line 116 at r1 (raw file):

    if (cpu_cores <= 0) {
        rv = cpu_cores;
        goto out;

ditto (== 0 is wrong)


Pal/src/host/Linux/db_main-x86_64.c, line 124 at r1 (raw file):

        goto out;
    }
    /* This code assumes there are only 2 HTs per core but platforms like KNL can have more */

Please prepend withFIXME: . Also decipher what KNL is (Knights Landing).


Pal/src/host/Linux/db_main-x86_64.c, line 168 at r1 (raw file):

    /* Extract physical id for /proc/cpuinfo */
    int phy_id = 0;

Remove this declaration from here, it's just a temporary variable that you can declare inside the for loop directly.


Pal/src/host/Linux/db_main-x86_64.c, line 169 at r1 (raw file):

    /* Extract physical id for /proc/cpuinfo */
    int phy_id = 0;
    for (int idx =0; idx < cpu_num; idx++) {

idx =0 -> idx = 0


Pal/src/host/Linux/db_main-x86_64.c, line 171 at r1 (raw file):

    for (int idx =0; idx < cpu_num; idx++) {
        snprintf(filename, sizeof(filename),
                 "/sys/devices/system/cpu/cpu%d/topology/physical_package_id", idx);

This goes wrong if some CPUs are offline. Say the host has SMT, 4 CPUs (2 physical cores), and CPU1 is offline. In this case, you get incorrect package ID for CPU2 visible inside graphene.

I am not sure how to circumvent this. Even with this PR, we don't really support offline CPUs (not correctly, that is).

I suggest to add a check like this:

/* TODO: correctly support offline CPUs */
if (cpu_cores > cpu_num) {
  printf("Warning: some CPUs seem to be offline; Graphene doesn't take this into account which may lead to subpar performance\n");
}

Pal/src/host/Linux-SGX/db_main-x86_64.c, line 122 at r1 (raw file):

    const size_t VENDOR_ID_SIZE = 13;
    char* vendor_id = malloc(VENDOR_ID_SIZE);

While we're here, can you add error checking if (!vendor_id) {return -PAL_ERROR_NOMEM}


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 137 at r1 (raw file):

    const size_t BRAND_SIZE = 49;
    char* brand = malloc(BRAND_SIZE);

While we're here, can you add error checking if (!brand) {return -PAL_ERROR_NOMEM}


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 153 at r1 (raw file):

    /* number of physical cores in a package */
    ci->cpu_cores = g_pal_sec.cpu_cores;
    /* physical package to which the logical processor belongs. Array index represents LPs */

A better comment will be /* array of "logical processor -> physical package" mappings */


Pal/src/host/Linux-SGX/pal_security.h, line 45 at r1 (raw file):

    PAL_NUM cpu_cores;

    /* Physical package id */

A better comment will be /* array of "logical processor -> physical package" mappings */


Pal/src/host/Linux-SGX/sgx_main.c, line 687 at r1 (raw file):

 * /sys/devices/system/cpu/online,.with complex formats like "1,3-5,6 will return 5. While on a
 * host without SMT support, /sys/devices/system/cpu/smt/active will return 0, the actual value.
 */

See all my comments for the Linux PAL above.

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 19 unresolved discussions, not enough approvals from maintainers (3 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @vijaydhanraj)


Pal/include/arch/x86_64/pal-arch.h, line 27 at r1 (raw file):

1024

How did you come up with this value?
Based on Linux kconfig(arch/x86/Kconfig), it's 1, 2, 8, 32, 64, 512, 8192.


Pal/include/arch/x86_64/pal-arch.h, line 193 at r1 (raw file):

    PAL_NUM cpu_num;
    PAL_NUM cpu_cores;
    PAL_NUM phy_id[PAL_MAX_CPU];

This bumps the size of PAL_CPU_INFO. dynamic allocation?

}
g_pal_sec.cpu_cores = sec_info.cpu_cores;

COPY_ARRAY(g_pal_sec.phy_id, sec_info.phy_id);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the subsequent patches, I plan to add the number of nodes present. We can cross-verify that each physical package ID should not be greater than the number of nodes. (Basically create a for loop and validate each physical package id). Other suggestions welcome.

return smt_active;

/* This code assumes there are only 2 HTs per core but platforms like KNL can have more */
pal_sec->cpu_cores = (smt_active == 0) ? cpu_cores : (cpu_cores / 2);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we intend to run on platforms like KNL which can have 4HTs per core, then I can extract the number of HTs (instead of hard coding to 2) simply by dividing the number of thread_siblings by core_siblings.

}
ci->cpu_num = cores;
/* This code assumes there are only 2 HTs per core but platforms like KNL can have more */
ci->cpu_cores = (smt_active == 0) ? cpu_cores : (cpu_cores / 2) ;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we intend to run on platforms like KNL which can have 4HTs per core, then I can extract the number of HTs (instead of hard coding to 2) simply by dividing the number of thread_siblings by core_siblings.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 22 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 27 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This constant is not architecture-dependent. Let's move it to Pal/include/pal/pal.h

Removed this #define and used dynamic allocation.


Pal/include/arch/x86_64/pal-arch.h, line 27 at r1 (raw file):

Previously, yamahata wrote…
1024

How did you come up with this value?
Based on Linux kconfig(arch/x86/Kconfig), it's 1, 2, 8, 32, 64, 512, 8192.

Just random, big enough to fit most common platforms. But have removed the static allocation and implemented dynamic allocation.


Pal/include/arch/x86_64/pal-arch.h, line 193 at r1 (raw file):

Previously, yamahata wrote…

This bumps the size of PAL_CPU_INFO. dynamic allocation?

yes, you are right @yamahata and even I am a little hesitant to use static allocation. Have updated the code to use dynamic allocation.


Pal/src/host/Linux/db_main.c, line 276 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

The implementation of this function is wrong. Consider what happens if run on a single-cpu machine: your /sys/devices/system/cpu/online contains 0 (meaning there is one CPU with index 0). This will lead this function to return 0 (firstint). However, Graphene just wanted to know how many CPUs are there, so the correct return must be 1.

I suggest to change this function as follows:

/* Opens a pseudo-file describing HW resources such as online CPUs and counts the number of HW resources
 * present in the file (if count == true) or simply reads the integer stored in the file (if count == false).
 * Returns -errno on failure. For example, calling this function on `/sys/devices/system/cpu/online` (with
 * complex formats like "1,3-5,6") will return 5, and calling this function on `/sys/devices/system/cpu/smt/active`
 * will return 1 (host supports SMT) or 0 (host doesn't support SMT). */
int get_hw_resource(const char *filename, bool count) {
   ... current logic of open and read ...
   if (!count) {
      /* caller wants to read an int stored in the file */
     int ret = strtol(buf, &end, 10);
     ... add error checking and return ret ...
   } else {
      ... current logic of reading the list and counting ...
   }

Good catch @dimakuv. I missed the single-cpu case. Modified the code along similar lines but not exactly as above. Also reworded the comment a little.


Pal/src/host/Linux/db_main-x86_64.c, line 78 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please move this filename close to the actual place where you use it. (This is a preference in style, but I think it makes sense here: you only use filename once in that particular place.)

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 83 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

While we're here, can you add error checking if (!vendor_id) {return -PAL_ERROR_NOMEM}

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 93 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

While we're here, can you add error checking if (!brand) {return -PAL_ERROR_NOMEM}

Done. Also added error checks for flags and newflags allocation failures.


Pal/src/host/Linux/db_main-x86_64.c, line 107 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This goes wrong if cpu_num == 0. In this case, rv == 0 and we jump to out, which returns rv == 0 -- but this indicates "success"!

You should keep the old logic of -PAL_ERROR_STREAMNOTEXIST in the get_hw_res_count() function and don't check for cpu_num == 0 here.

updated get_hw_res_count to fix this issue.


Pal/src/host/Linux/db_main-x86_64.c, line 112 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Could you move this comment up to the first get_hw_res_count()? It doesn't make sense here, in the middle of using sysfs files.

Also, update extract info on number of cores ... -> extract info on total number of logical processors, number of physical cores, SMT support, etc. ...

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 116 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (== 0 is wrong)

updated get_hw_res_count to fix this issue.


Pal/src/host/Linux/db_main-x86_64.c, line 124 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please prepend withFIXME: . Also decipher what KNL is (Knights Landing).

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 168 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Remove this declaration from here, it's just a temporary variable that you can declare inside the for loop directly.

Changed declaration to allocate memory dynamically based on the number of logical processors.


Pal/src/host/Linux/db_main-x86_64.c, line 169 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

idx =0 -> idx = 0

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 171 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This goes wrong if some CPUs are offline. Say the host has SMT, 4 CPUs (2 physical cores), and CPU1 is offline. In this case, you get incorrect package ID for CPU2 visible inside graphene.

I am not sure how to circumvent this. Even with this PR, we don't really support offline CPUs (not correctly, that is).

I suggest to add a check like this:

/* TODO: correctly support offline CPUs */
if (cpu_cores > cpu_num) {
  printf("Warning: some CPUs seem to be offline; Graphene doesn't take this into account which may lead to subpar performance\n");
}

Correct, this PR assumes all cores are online and doesn't take into account offlined processor cases.

Regarding the physical package id, the current design just runs a loop from 0 to the total number of online processors and reads the corresponding package id. So if CPU1 is offlined, it will fail when trying to read the physical_package_id file as CPU1 is offlined as the file is not present.

I don't think this check will help as well. In the above example that you have mentioned, cpu_cores which is /sys/devices/system/cpu/cpu0/topology/core_siblings_list will be 3 and if SMT is active, we further divide by 2.cpu_num which is /sys/devices/system/cpu/online will be 3. cpu_cores will either be less than or equal to cpu_num.

To correctly identify if the processors are offlined, we need to read /sys/devices/system/cpu/possible and /sys/devices/system/cpu/online. If the counts don't match, some processors are offlined. I have added a print based on this logic.

I would suggest, we treat offlined case as a separate PR as it brings its own complexities. We need to take into account things like to which node CPU belongs and SMT. This can be looked at as phase 2 of CPU topology changes.

In fact, even the current implementation of /proc/cpuinfo also has issues when cores are offlined. If we consider the above example of offlining CPU1, /proc/cpuinfo will show processor field as 0,1, and 2 instead of 0, 2, and 3 as we iterate from 0 to the total number of online processors.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 122 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

While we're here, can you add error checking if (!vendor_id) {return -PAL_ERROR_NOMEM}

Done.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 137 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

While we're here, can you add error checking if (!brand) {return -PAL_ERROR_NOMEM}

Done. Also added error checks for flags and newflags allocation failures.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 153 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

A better comment will be /* array of "logical processor -> physical package" mappings */

Done.


Pal/src/host/Linux-SGX/pal_security.h, line 45 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

A better comment will be /* array of "logical processor -> physical package" mappings */

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 687 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

See all my comments for the Linux PAL above.

I think I have addressed all, but please flag if I missed any.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r2.
Reviewable status: all files reviewed, 26 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 27 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Removed this #define and used dynamic allocation.

Forgot to remove? I still see the define.


Pal/src/host/Linux/db_main.c, line 275 at r2 (raw file):

 * file (if count == false). For example on a single-cpu machine, calling this function on
 * `/sys/devices/system/cpu/online` with count == true will return 1 and 0 with count == false.
 * Returns PAL ERROR on failure.

PAL ERROR -> PAL error code


Pal/src/host/Linux/db_main.c, line 276 at r2 (raw file):

 * `/sys/devices/system/cpu/online` with count == true will return 1 and 0 with count == false.
 * Returns PAL ERROR on failure.
 * N.B: Understands complex formats like "1,3-5,6 when called with count == true.

Forgot closing "


Pal/src/host/Linux/db_main.c, line 278 at r2 (raw file):

 * N.B: Understands complex formats like "1,3-5,6 when called with count == true.
*/
int get_hw_resource(const char *filename, bool count) {

Just notice wrong asterisk alignment: change to ... char* filename, ...


Pal/src/host/Linux/db_main.c, line 307 at r2 (raw file):

            if (*end == '\n' || *end == '\0')
                retval = firstint;
            break;

Why not just return retval? Then we won't need the else if part and we won't need the 4-space indentation.


Pal/src/host/Linux/db_main.c, line 309 at r2 (raw file):

            break;
        } else if (*end == '\0' || *end == ',' || *end == '\n') {
            /* single CPU index, count as one more CPU */

Let's rename CPU -> HW resource


Pal/src/host/Linux/db_main.c, line 312 at r2 (raw file):

            resource_cnt++;
        } else if (*end == '-') {
            /* CPU range, count how many CPUs in range */

Let's rename CPU -> HW resource


Pal/src/host/Linux/db_main-x86_64.c, line 125 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

If we intend to run on platforms like KNL which can have 4HTs per core, then I can extract the number of HTs (instead of hard coding to 2) simply by dividing the number of thread_siblings by core_siblings.

How hard is it to do? If it's trivial, then yes, please add this logic. But not blocking on this.


Pal/src/host/Linux/db_main-x86_64.c, line 171 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Correct, this PR assumes all cores are online and doesn't take into account offlined processor cases.

Regarding the physical package id, the current design just runs a loop from 0 to the total number of online processors and reads the corresponding package id. So if CPU1 is offlined, it will fail when trying to read the physical_package_id file as CPU1 is offlined as the file is not present.

I don't think this check will help as well. In the above example that you have mentioned, cpu_cores which is /sys/devices/system/cpu/cpu0/topology/core_siblings_list will be 3 and if SMT is active, we further divide by 2.cpu_num which is /sys/devices/system/cpu/online will be 3. cpu_cores will either be less than or equal to cpu_num.

To correctly identify if the processors are offlined, we need to read /sys/devices/system/cpu/possible and /sys/devices/system/cpu/online. If the counts don't match, some processors are offlined. I have added a print based on this logic.

I would suggest, we treat offlined case as a separate PR as it brings its own complexities. We need to take into account things like to which node CPU belongs and SMT. This can be looked at as phase 2 of CPU topology changes.

In fact, even the current implementation of /proc/cpuinfo also has issues when cores are offlined. If we consider the above example of offlining CPU1, /proc/cpuinfo will show processor field as 0,1, and 2 instead of 0, 2, and 3 as we iterate from 0 to the total number of online processors.

OK. I'm happy with the current changes in this PR. The offline-cpus case can be postponed to another PR.


Pal/src/host/Linux/db_main-x86_64.c, line 134 at r2 (raw file):

    /* FIXME: This code assumes there are only 2 HTs per core but platforms like
     * KNL(Knights landing) can have more */

Slight fix: KNL(Knights landing) -> KNL (Knights Landing)


Pal/src/host/Linux/db_main-x86_64.c, line 143 at r2 (raw file):

    /* array of "logical processor -> physical package" mappings */
    int *phy_id = (int*)malloc(cpu_num * sizeof(int));

Wrong asterisk alignment: int* phy_id


Pal/src/host/Linux/db_main-x86_64.c, line 147 at r2 (raw file):

        rv = -PAL_ERROR_NOMEM;
        goto out_brand;
    }

Please add an empty line after this


Pal/src/host/Linux/db_main-x86_64.c, line 149 at r2 (raw file):

    }
    char filename[128];
    for (int idx =0; idx < cpu_num; idx++) {

Add space: idx = 0


Pal/src/host/Linux-SGX/db_main.c, line 301 at r2 (raw file):

    g_pal_sec.cpu_cores = sec_info.cpu_cores;

    g_pal_sec.phy_id = sec_info.phy_id;

This is wrong! You must dynamically allocate the array g_pal_sec.phy_id and then memcpy(g_pal_sec.phy_id, sec_info.phy_id, sec_info.cpu_cores * sizeof(int)) (something like this).

What happens now is that you assign g_pal_sec.phy_id to point to untrusted memory. SGX enclaves allow this, but we lost security: the malicious host OS can replace e.g. g_pal_sec.phy_id[0] = 123 in untrusted memory and in-enclave logic breaks.

For dynamic allocation to work, you should move this logic below init_enclave_pages() (otherwise you cannot call malloc). I suggest to move this step to after init_tsc().


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 122 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Done.

Not done?


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 158 at r2 (raw file):

    /* array of "logical processor -> physical package" mappings */
    ci->phy_id = g_pal_sec.phy_id;

This is fine and secure (because both arrays are inside enclave memory).


Pal/src/host/Linux-SGX/sgx_main.c, line 687 at r2 (raw file):

 * file (if count == false). For example on a single-cpu machine, calling this function on
 * `/sys/devices/system/cpu/online` with count == true will return 1 and 0 with count == false.
 * Returns -errno on failure.

-errno -> UNIX error code


Pal/src/host/Linux-SGX/sgx_main.c, line 688 at r2 (raw file):

 * `/sys/devices/system/cpu/online` with count == true will return 1 and 0 with count == false.
 * Returns -errno on failure.
 * N.B: Understands complex formats like "1,3-5,6 when called with count == true.

Forgot closing "


Pal/src/host/Linux-SGX/sgx_main.c, line 690 at r2 (raw file):

 * N.B: Understands complex formats like "1,3-5,6 when called with count == true.
*/
static int get_hw_resource(const char *filename, bool count) {

const char* filename


Pal/src/host/Linux-SGX/sgx_main.c, line 719 at r2 (raw file):

            if (*end == '\n' || *end == '\0')
                retval = firstint;
            break;

Use return retval


Pal/src/host/Linux-SGX/sgx_main.c, line 784 at r2 (raw file):

    /* FIXME: This code assumes there are only 2 HTs per core but platforms like
     * KNL(Knights landing) can have more */

see my comments for Linux PAL


Pal/src/host/Linux-SGX/sgx_main.c, line 787 at r2 (raw file):

    int smt_active = get_hw_resource("/sys/devices/system/cpu/smt/active", /*count=*/false);
    if (smt_active < 0)
        return smt_active;

Can you add a debug message like SGX_DBG(DBG_E, "Cannot read /sys/devices/system/cpu/smt/active\n"); before return?


Pal/src/host/Linux-SGX/sgx_main.c, line 793 at r2 (raw file):

    int *phy_id = (int*)malloc(num_cpus *sizeof(int));
    if (!phy_id)
        return -ENOMEM;

Insert empty line


Pal/src/host/Linux-SGX/sgx_main.c, line 800 at r2 (raw file):

        phy_id[idx] = get_hw_resource(filename, /*count=*/false);
        if (phy_id[idx] < 0) {
            return phy_id[idx];

Can you add a debug message like SGX_DBG(DBG_E, "Cannot read /sys/devices/system/cpu/cpu%d/topology/physical_package_id\n", idx); before return? Ideally we also would like to free(phy_id) here; even though memory-cleanup is completely absent from this function :)

So let's do this:

        if (phy_id[idx] < 0) {
            SGX_DBG(DBG_E, "Cannot read /sys/devices/system/cpu/cpu%d/topology/physical_package_id\n", idx);
            ret = phy_id[idx];
            free(phy_id);
            return ret;
        }

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 26 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 27 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Forgot to remove? I still see the define.

sorry, must have missed saving the file after the change :)


Pal/src/host/Linux/db_main.c, line 275 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

PAL ERROR -> PAL error code

Done.


Pal/src/host/Linux/db_main.c, line 276 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Forgot closing "

Fixed.


Pal/src/host/Linux/db_main.c, line 278 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Just notice wrong asterisk alignment: change to ... char* filename, ...

Done.


Pal/src/host/Linux/db_main.c, line 307 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not just return retval? Then we won't need the else if part and we won't need the 4-space indentation.

yes, we can do that too. This was just my preference.


Pal/src/host/Linux/db_main.c, line 309 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Let's rename CPU -> HW resource

Done.


Pal/src/host/Linux/db_main.c, line 312 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Let's rename CPU -> HW resource

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 125 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

How hard is it to do? If it's trivial, then yes, please add this logic. But not blocking on this.

very simple. Modified the code to do this.


Pal/src/host/Linux/db_main-x86_64.c, line 134 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Slight fix: KNL(Knights landing) -> KNL (Knights Landing)

Not applicable as we not correctly identify the number of siblings per core.


Pal/src/host/Linux/db_main-x86_64.c, line 143 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Wrong asterisk alignment: int* phy_id

sorry habits :). I have fixed it.


Pal/src/host/Linux/db_main-x86_64.c, line 147 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please add an empty line after this

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 149 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Add space: idx = 0

Done.


Pal/src/host/Linux-SGX/db_main.c, line 301 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This is wrong! You must dynamically allocate the array g_pal_sec.phy_id and then memcpy(g_pal_sec.phy_id, sec_info.phy_id, sec_info.cpu_cores * sizeof(int)) (something like this).

What happens now is that you assign g_pal_sec.phy_id to point to untrusted memory. SGX enclaves allow this, but we lost security: the malicious host OS can replace e.g. g_pal_sec.phy_id[0] = 123 in untrusted memory and in-enclave logic breaks.

For dynamic allocation to work, you should move this logic below init_enclave_pages() (otherwise you cannot call malloc). I suggest to move this step to after init_tsc().

yeah wasn't sure if we needed two allocations, one outside and one inside enclave but this makes it clear. Have fixed the code accordingly.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 122 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Not done?

yeah missed this. Fixed it now.


Pal/src/host/Linux-SGX/sgx_main.c, line 687 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

-errno -> UNIX error code

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 688 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Forgot closing "

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 690 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

const char* filename

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 719 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Use return retval

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 784 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

see my comments for Linux PAL

not applicable as we can now identify the right number of siblings.


Pal/src/host/Linux-SGX/sgx_main.c, line 787 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can you add a debug message like SGX_DBG(DBG_E, "Cannot read /sys/devices/system/cpu/smt/active\n"); before return?

I think you want to add debug logs for cases where the user just wants to read the contents of the file right? But since we no longer read this file, but actually count the number of thread_siblings_list to identify SMT, I have not included this print.


Pal/src/host/Linux-SGX/sgx_main.c, line 793 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Insert empty line

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 800 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can you add a debug message like SGX_DBG(DBG_E, "Cannot read /sys/devices/system/cpu/cpu%d/topology/physical_package_id\n", idx); before return? Ideally we also would like to free(phy_id) here; even though memory-cleanup is completely absent from this function :)

So let's do this:

        if (phy_id[idx] < 0) {
            SGX_DBG(DBG_E, "Cannot read /sys/devices/system/cpu/cpu%d/topology/physical_package_id\n", idx);
            ret = phy_id[idx];
            free(phy_id);
            return ret;
        }

Done. I have added a similar debug log even of Linux PAL.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r3.
Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj and @yamahata)


Pal/src/host/Linux/db_main-x86_64.c, line 125 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

very simple. Modified the code to do this.

Thanks. Please unblock your own comment (see on the right of this text window, change status from "-" to " ") to resolve this.


Pal/src/host/Linux/db_main-x86_64.c, line 139 at r3 (raw file):

        goto out_brand;
    }
    ci->cpu_cores = (smt_siblings == 1) ? cpu_cores : (cpu_cores / smt_siblings) ;

Why not just ci->cpu_cores = cpu_cores / smt_siblings? Even the case of 1 will be correctly handled.


Pal/src/host/Linux/db_main-x86_64.c, line 154 at r3 (raw file):

        phy_id[idx] = get_hw_resource(filename, /*count=*/false);
        if (phy_id[idx] < 0) {
            printf(" cannot read %s\n", filename);

Nitpick: printf("Cannot read ...")


Pal/src/host/Linux-SGX/db_main.c, line 301 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

As part of the subsequent patches, I plan to add the number of nodes present. We can cross-verify that each physical package ID should not be greater than the number of nodes. (Basically create a for loop and validate each physical package id). Other suggestions welcome.

OK. Please unblock your own comment (see on the right of this text window, change status from "-" to " ") to resolve this.


Pal/src/host/Linux-SGX/db_main.c, line 346 at r3 (raw file):

    SET_ENCLAVE_TLS(ready_for_exceptions, 1UL);

    /* Allocate enclave memory to store "logical processor -> physical package" mappings*/

Nitpick: ...mappings */ (add space)


Pal/src/host/Linux-SGX/sgx_main.c, line 773 at r1 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

If we intend to run on platforms like KNL which can have 4HTs per core, then I can extract the number of HTs (instead of hard coding to 2) simply by dividing the number of thread_siblings by core_siblings.

Please unblock your own comment (see on the right of this text window, change status from "-" to " ") to resolve this.


Pal/src/host/Linux-SGX/sgx_main.c, line 790 at r3 (raw file):

    if (smt_siblings < 0)
        return smt_siblings;
    pal_sec->cpu_cores = (smt_siblings == 1) ? cpu_cores : (cpu_cores / smt_siblings);

See my other comment

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @yamahata)


Pal/src/host/Linux/db_main-x86_64.c, line 125 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Thanks. Please unblock your own comment (see on the right of this text window, change status from "-" to " ") to resolve this.

yes, done.


Pal/src/host/Linux/db_main-x86_64.c, line 139 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Why not just ci->cpu_cores = cpu_cores / smt_siblings? Even the case of 1 will be correctly handled.

yes, done.


Pal/src/host/Linux/db_main-x86_64.c, line 154 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Nitpick: printf("Cannot read ...")

Done.


Pal/src/host/Linux-SGX/db_main.c, line 301 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

OK. Please unblock your own comment (see on the right of this text window, change status from "-" to " ") to resolve this.

Done.


Pal/src/host/Linux-SGX/db_main.c, line 346 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Nitpick: ...mappings */ (add space)

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 773 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Please unblock your own comment (see on the right of this text window, change status from "-" to " ") to resolve this.

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 790 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

See my other comment

Done.

yamahata
yamahata previously approved these changes Oct 28, 2020
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 9 files at r2, 3 of 6 files at r3, 3 of 3 files at r4.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj)


Pal/src/host/Linux/db_main-x86_64.c, line 154 at r3 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Done.

You still have a leading space. Please fix.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 8 of 9 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @yamahata)


Pal/src/host/Linux/db_main-x86_64.c, line 154 at r3 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

You still have a leading space. Please fix.

yes fixed this.

dimakuv
dimakuv previously approved these changes Oct 30, 2020
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 3 files at r4, 1 of 1 files at r5.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 9 files at r2, 3 of 6 files at r3, 2 of 3 files at r4, 1 of 1 files at r5.
Reviewable status: all files reviewed, 23 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj)


Pal/include/arch/x86_64/pal-arch.h, line 191 at r5 (raw file):

    PAL_NUM cpu_num;
    PAL_NUM cpu_cores;

How those two relate to each other? Please comment and explain their meaning or use better variable names (when I'm reading the current names I'm confused - at first it looks like the first one is CPU count (i.e. sockets) and the latter is cores count, but this doesn't make sense - can't both sockets have different core counts?).


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

    PAL_NUM cpu_num;
    PAL_NUM cpu_cores;
    PAL_PTR phy_id;

Please change to int* and drop the cast from info.c, we need to get rid of these PAL types, they don't make any sense.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

    PAL_NUM cpu_num;
    PAL_NUM cpu_cores;
    PAL_PTR phy_id;

Please comment here how many elements has this array


Pal/src/host/Linux/db_main.c, line 274 at r5 (raw file):

and 0 with count == false

Why 0 in this case? Not an error? 0-7 and similar inputs are not numbers.


Pal/src/host/Linux/db_main.c, line 277 at r5 (raw file):

 * Returns PAL error code on failure.
 * N.B: Understands complex formats like "1,3-5,6" when called with count == true.
*/

broken alignment


Pal/src/host/Linux/db_main.c, line 278 at r5 (raw file):

get_hw_resource

Is this format used anywhere outside of /sys/devices/system/cpu/*? If not, I'd say "cpu" instead of "hw". Or even better, parse_sysfs_cpu_info (this is a global function, get_hw_resource could mean a lot of things in the global namespace).


Pal/src/host/Linux/db_main-x86_64.c, line 121 at r5 (raw file):

    int possible_cpus = get_hw_resource("/sys/devices/system/cpu/possible", /*count=*/true);
    /* TODO: correctly support offline CPUs */
    if ((possible_cpus > 0) && (possible_cpus > cpu_num)) {

no need for parentheses


Pal/src/host/Linux/db_main-x86_64.c, line 127 at r5 (raw file):

    int cpu_cores = get_hw_resource("/sys/devices/system/cpu/cpu0/topology/core_siblings_list",
                                     /*count=*/true);

please fix alignment of the wrapped line


Pal/src/host/Linux/pal_linux.h, line 129 at r5 (raw file):

                        PAL_HANDLE* manifest);

int get_hw_resource(const char *filename, bool count);

* should be on the left


Pal/src/host/Linux-SGX/db_main.c, line 352 at r5 (raw file):

        ocall_exit(1, /*is_exitgroup=*/true);
    }
    memcpy(phy_id, sec_info.phy_id, num_cpus * sizeof(int));

sec_info.phy_id resides in untrusted memory, you have to copy it using sgx_copy_to_enclave. Please also prefix this field in struct pal_sec with uptr_ to make it more visible that it's a pointer to untrusted memory.


Pal/src/host/Linux-SGX/db_main.c, line 353 at r5 (raw file):

    }
    memcpy(phy_id, sec_info.phy_id, num_cpus * sizeof(int));
    g_pal_sec.phy_id = (PAL_PTR)phy_id;

this cast could be removed after you resolve my previous comment


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 123 at r5 (raw file):

    const size_t VENDOR_ID_SIZE = 13;
    char* vendor_id = malloc(VENDOR_ID_SIZE);
    if(!vendor_id)

space after if


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 140 at r5 (raw file):

    const size_t BRAND_SIZE = 49;
    char* brand = malloc(BRAND_SIZE);
    if(!brand) {

ditto


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 171 at r5 (raw file):

    int flen = 0, fmax = 80;
    char* flags = malloc(fmax);
    if(!flags) {

ditto


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 184 at r5 (raw file):

            if (flen + len + 1 > fmax) {
                char* new_flags = malloc(fmax * 2);
                if(!new_flags) {

ditto


Pal/src/host/Linux-SGX/pal_security.h, line 43 at r5 (raw file):

    /* physical cores in a package */
    PAL_NUM cpu_cores;

Why isn't this an array? Can't packages have different cores counts?


Pal/src/host/Linux-SGX/pal_security.h, line 46 at r5 (raw file):

    /* array of "logical processor -> physical package" mappings */
    PAL_PTR phy_id;

ditto (type + comment about the count)


Pal/src/host/Linux-SGX/sgx_main.c, line 686 at r5 (raw file):

 * HW resources present in the file (if count == true) or simply reads the integer stored in the
 * file (if count == false). For example on a single-cpu machine, calling this function on
 * `/sys/devices/system/cpu/online` with count == true will return 1 and 0 with count == false.

ditto (0?)


Pal/src/host/Linux-SGX/sgx_main.c, line 689 at r5 (raw file):

 * Returns UNIX error code on failure.
 * N.B: Understands complex formats like "1,3-5,6" when called with count == true.
*/

broken alignment


Pal/src/host/Linux-SGX/sgx_main.c, line 690 at r5 (raw file):

 * N.B: Understands complex formats like "1,3-5,6" when called with count == true.
*/
static int get_hw_resource(const char* filename, bool count) {

ditto (func name)


Pal/src/host/Linux-SGX/sgx_main.c, line 775 at r5 (raw file):

Opens a

ditto (parentheses)


Pal/src/host/Linux-SGX/sgx_main.c, line 793 at r5 (raw file):

    /* array of "logical processor -> physical package" mappings */
    int *phy_id = (int*)malloc(num_cpus *sizeof(int));

wrong asterisk alignment x2


Pal/src/host/Linux-SGX/sgx_main.c, line 809 at r5 (raw file):

        }
    }
    pal_sec->phy_id = (PAL_PTR)phy_id;

this cast won't be needed after fixing the type

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 9 files reviewed, 23 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 191 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
    PAL_NUM cpu_num;
    PAL_NUM cpu_cores;

How those two relate to each other? Please comment and explain their meaning or use better variable names (when I'm reading the current names I'm confused - at first it looks like the first one is CPU count (i.e. sockets) and the latter is cores count, but this doesn't make sense - can't both sockets have different core counts?).

cpu_num refers to the total number of logical processors available in the host (online processors). This includes all the sockets. This variable name already existed. cpu_cores refers to the total number of physical cores in a physical package (socket). This is the new added by this PR. Kept this name to match /proc/cpuinfo output.
note: A physical core can have 2 or more logical processors when HT is turned on.

I have added comments to clarify this. @mkow please let me know if you want me to rename cpu_num to online_procs and cpu_cores to cores_per_socket. Do let me know if you have any other suggestions as well.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please change to int* and drop the cast from info.c, we need to get rid of these PAL types, they don't make any sense.

sure, done.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Please comment here how many elements has this array

This depends on the number of logical processors available in the host. I have added this comment /* array of "logical processor -> physical package" mappings */ to clarify.


Pal/src/host/Linux/db_main.c, line 274 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
and 0 with count == false

Why 0 in this case? Not an error? 0-7 and similar inputs are not numbers.

when count == false we simply read the contents of the file. So on a single-cpu machine, contents of /sys/devices/system/cpu/online will be zero and is valid. But if we set count == truethe expectation is to count the number of cpus which should be greater than zero.


Pal/src/host/Linux/db_main.c, line 277 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

broken alignment

Done.


Pal/src/host/Linux/db_main.c, line 278 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
get_hw_resource

Is this format used anywhere outside of /sys/devices/system/cpu/*? If not, I'd say "cpu" instead of "hw". Or even better, parse_sysfs_cpu_info (this is a global function, get_hw_resource could mean a lot of things in the global namespace).

yes as part of my other changes for sysfs (cpu topology), I intended to use this function to read /sys/devices/system/node/* as well.


Pal/src/host/Linux/db_main-x86_64.c, line 121 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no need for parentheses

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 127 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please fix alignment of the wrapped line

Done.


Pal/src/host/Linux/pal_linux.h, line 129 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

* should be on the left

Done.


Pal/src/host/Linux-SGX/db_main.c, line 352 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

sec_info.phy_id resides in untrusted memory, you have to copy it using sgx_copy_to_enclave. Please also prefix this field in struct pal_sec with uptr_ to make it more visible that it's a pointer to untrusted memory.

Thanks, updated to use sgx_copy_to_enclave.
Here we copy from sec_info.phy_id to g_pal_sec.phy_id. Both these variables are of type struct pal_sec, but phy_id in sec_info points to untrusted memory whereas phy_id in g_pal_sec points to trusted memory. So I think renaming to uptr_phy_id might lead to some confusion.


Pal/src/host/Linux-SGX/db_main.c, line 353 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

this cast could be removed after you resolve my previous comment

Done.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 123 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

space after if

Done.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 140 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 171 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 184 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/pal_security.h, line 43 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
    /* physical cores in a package */
    PAL_NUM cpu_cores;

Why isn't this an array? Can't packages have different cores counts?

yes, they can have when the user offlines a few logical processors in a package, but since this PR doesn't address offlining have kept this an unsigned int. We should look at offlining as phase 2 of CPU topology changes.


Pal/src/host/Linux-SGX/pal_security.h, line 46 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (type + comment about the count)

Changed type to int*. Please see my earlier comment about count and renaming phy_id to uptr_phy_id. Doesn't the comment array of logical processor clarify?


Pal/src/host/Linux-SGX/sgx_main.c, line 686 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (0?)

please see my earlier comment on this.


Pal/src/host/Linux-SGX/sgx_main.c, line 689 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

broken alignment

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 690 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (func name)

please see my earlier comment on this.


Pal/src/host/Linux-SGX/sgx_main.c, line 775 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…
Opens a

ditto (parentheses)

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 793 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

wrong asterisk alignment x2

Fixed * in the ptr as well as add space after multiplication sign.


Pal/src/host/Linux-SGX/sgx_main.c, line 809 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

this cast won't be needed after fixing the type

Done.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r6.
Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj)


Pal/include/arch/x86_64/pal-arch.h, line 191 at r5 (raw file):
But this still doesn't answer one of my questions, quoting: "can't both sockets have different core counts?". If they can, this representation won't work for such cases.

let me know if you want me to rename cpu_num to online_procs and cpu_cores to cores_per_socket

I like this suggestion, but maybe online_cpu_threads or online_logical_cores? Would be even less ambiguous. But let's do this after we resolve the discussion about multisocket configs.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

This depends on the number of logical processors available in the host. I have added this comment /* array of "logical processor -> physical package" mappings */ to clarify.

What I mean is saying "[..] and has cpu_num elements".


Pal/src/host/Linux/db_main.c, line 274 at r5 (raw file):

So on a single-cpu machine, contents of /sys/devices/system/cpu/online will be zero

Hmm, no. Maybe you mean a single-logical-core machine? I read single-cpu as "single cpu package", i.e. single socket.


Pal/src/host/Linux/db_main.c, line 278 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

yes as part of my other changes for sysfs (cpu topology), I intended to use this function to read /sys/devices/system/node/* as well.

Sounds good then.


Pal/src/host/Linux/db_main-x86_64.c, line 141 at r6 (raw file):

    ci->cpu_cores = cpu_cores / smt_siblings;

    /* array of "logical processors -> physical package" mappings */

I'd revert adding this "-s" to "processor", if you say "array of" then you describe a single element of the array.


Pal/src/host/Linux-SGX/db_main.c, line 352 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Thanks, updated to use sgx_copy_to_enclave.
Here we copy from sec_info.phy_id to g_pal_sec.phy_id. Both these variables are of type struct pal_sec, but phy_id in sec_info points to untrusted memory whereas phy_id in g_pal_sec points to trusted memory. So I think renaming to uptr_phy_id might lead to some confusion.

Ah, I see, good point. Let's leave it as is.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 159 at r6 (raw file):

    ci->cpu_cores = g_pal_sec.cpu_cores;

    /* array of "logical processors -> physical package" mappings */

ditto


Pal/src/host/Linux-SGX/pal_security.h, line 43 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

yes, they can have when the user offlines a few logical processors in a package, but since this PR doesn't address offlining have kept this an unsigned int. We should look at offlining as phase 2 of CPU topology changes.

Is taking CPUs offline the only case when this can happen? I'm not into servers, so I don't know, but having this assumption embedded in the code when it later turns wrong sounds dangerous. Could you please verify that this is indeed the case and it's not possible to have 2 CPUs with different number of cores in two sockets?


Pal/src/host/Linux-SGX/pal_security.h, line 46 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Changed type to int*. Please see my earlier comment about count and renaming phy_id to uptr_phy_id. Doesn't the comment array of logical processor clarify?

See my response to the corresponding thread.


Pal/src/host/Linux-SGX/pal_security.h, line 45 at r6 (raw file):

    PAL_NUM cpu_cores;

    /* array of "logical processors -> physical package" mappings */

ditto


Pal/src/host/Linux-SGX/sgx_main.c, line 686 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

please see my earlier comment on this.

ditto ("single-cpu")


Pal/src/host/Linux-SGX/sgx_main.c, line 792 at r6 (raw file):

    pal_sec->cpu_cores = cpu_cores / smt_siblings;

    /* array of "logical processors -> physical package" mappings */

ditto

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, @vijaydhanraj, and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 191 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

But this still doesn't answer one of my questions, quoting: "can't both sockets have different core counts?". If they can, this representation won't work for such cases.

let me know if you want me to rename cpu_num to online_procs and cpu_cores to cores_per_socket

I like this suggestion, but maybe online_cpu_threads or online_logical_cores? Would be even less ambiguous. But let's do this after we resolve the discussion about multisocket configs.

usually, the sockets have a symmetrical number of processors by default unless the user offlines. But my knowledge is limited as well so adding @dimakuv and @yamahata for additional inputs.

The current implementation of cpu_cores is definitely broken which gives the total number of online processors as physical cores per socket :).


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What I mean is saying "[..] and has cpu_num elements".

you mean something like this /* array of "logical processor -> physical package" mappings having cpu_num elements */ right?


Pal/src/host/Linux/db_main.c, line 274 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

So on a single-cpu machine, contents of /sys/devices/system/cpu/online will be zero

Hmm, no. Maybe you mean a single-logical-core machine? I read single-cpu as "single cpu package", i.e. single socket.

yes, correct I did mean a machine with a single core. Not sure if it is right to add the logical term to the core. I think core usually refers to physical cores.


Pal/src/host/Linux-SGX/pal_security.h, line 43 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Is taking CPUs offline the only case when this can happen? I'm not into servers, so I don't know, but having this assumption embedded in the code when it later turns wrong sounds dangerous. Could you please verify that this is indeed the case and it's not possible to have 2 CPUs with different number of cores in two sockets?

AFAIK, sockets would be symmetrical with respect to the number of cores. Usually, the number of cores in a socket determines the frequency. For example, a 12 core/socket part will have say 2.5GHz, whereas a 24 core/socket part will lesser frequency due to thermal reasons. Given this example, one cannot easily add asymmetric cores in the socket without special bios changes. But again my knowledge is limited so adding @dimakuv and @yamahata for further inputs.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 9 files at r6.
Reviewable status: all files reviewed, 10 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow, @vijaydhanraj, and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 191 at r5 (raw file):

can't both sockets have different core counts?

Theoretically each socket may have a different core count, but I am not aware of any machine like this. To me, the assumption that each socket has the same number of logical cores sounds legit.

I like this suggestion, but maybe online_cpu_threads or online_logical_cores?

I would suggest cpu_num -> online_logical_cores and cpu_cores -> physical_cores_per_socket.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

you mean something like this /* array of "logical processor -> physical package" mappings having cpu_num elements */ right?

I think Michal wants smth like this:

/* array of "logical processor -> socket" mappings; has cpu_num elements */
int* cpu_socket;

I am also voting to rename phy_id to cpu_socket (read it as "returns a socket for this logical processor/CPU").


Pal/src/host/Linux/db_main.c, line 274 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

yes, correct I did mean a machine with a single core. Not sure if it is right to add the logical term to the core. I think core usually refers to physical cores.

These terms are overloaded and fuzzy. Let's just stick to one convention in Graphene (and in this PR). I suggest to follow Michal's terminology:

  • logical core to denote a logical thread/logical CPU/logical core
  • physical core to denote a physical CPU/physical core
  • socket to denote a physical package.

I would prefer to remove "CPU" and "physical package" from our variable names and comments and stick to these three.


Pal/src/host/Linux-SGX/pal_security.h, line 43 at r5 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

AFAIK, sockets would be symmetrical with respect to the number of cores. Usually, the number of cores in a socket determines the frequency. For example, a 12 core/socket part will have say 2.5GHz, whereas a 24 core/socket part will lesser frequency due to thermal reasons. Given this example, one cannot easily add asymmetric cores in the socket without special bios changes. But again my knowledge is limited so adding @dimakuv and @yamahata for further inputs.

As mentioned in the previous comment, I have never seen asymmetric sockets.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @vijaydhanraj, and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 191 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

can't both sockets have different core counts?

Theoretically each socket may have a different core count, but I am not aware of any machine like this. To me, the assumption that each socket has the same number of logical cores sounds legit.

I like this suggestion, but maybe online_cpu_threads or online_logical_cores?

I would suggest cpu_num -> online_logical_cores and cpu_cores -> physical_cores_per_socket.

Ok, let's leave the implementation as-is then.

Dima's naming suggestions also sound good.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I think Michal wants smth like this:

/* array of "logical processor -> socket" mappings; has cpu_num elements */
int* cpu_socket;

I am also voting to rename phy_id to cpu_socket (read it as "returns a socket for this logical processor/CPU").

+1 to Dima's suggestions.


Pal/src/host/Linux-SGX/pal_security.h, line 43 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

As mentioned in the previous comment, I have never seen asymmetric sockets.

Ok, let's leave it as-is then.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 191 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Ok, let's leave the implementation as-is then.

Dima's naming suggestions also sound good.

Done.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

+1 to Dima's suggestions.

Done.


Pal/src/host/Linux/db_main.c, line 274 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

These terms are overloaded and fuzzy. Let's just stick to one convention in Graphene (and in this PR). I suggest to follow Michal's terminology:

  • logical core to denote a logical thread/logical CPU/logical core
  • physical core to denote a physical CPU/physical core
  • socket to denote a physical package.

I would prefer to remove "CPU" and "physical package" from our variable names and comments and stick to these three.

Sure will use this naming convention. Also, cleaned-up the variables names accordingly.


Pal/src/host/Linux/db_main-x86_64.c, line 141 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I'd revert adding this "-s" to "processor", if you say "array of" then you describe a single element of the array.

yes I have reverted this change.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 159 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/pal_security.h, line 46 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

See my response to the corresponding thread.

Done.


Pal/src/host/Linux-SGX/pal_security.h, line 45 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 686 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto ("single-cpu")

changed to single-core now.


Pal/src/host/Linux-SGX/sgx_main.c, line 792 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


/* array of "logical processors -> physical package" mappings */
int* phy_id;
/* array of "logical processor->physical package" mappings; has online_logical_cores elements*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change elements*/ -> elements */ in the next fixup or if things look fine before merge.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r7.
Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @vijaydhanraj, and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 195 at r7 (raw file):

    PAL_NUM physical_cores_per_socket;
    /* array of "logical processor->physical package" mappings; has online_logical_cores elements */
    int* cpu_socket ;

trailing space


Pal/src/host/Linux/db_main-x86_64.c, line 127 at r7 (raw file):

    int core_siblings = get_hw_resource("/sys/devices/system/cpu/cpu0/topology/core_siblings_list",
                                           /*count=*/true);

please fix alignment


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 159 at r6 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Done.

Here this comment about element count doesn't make sense, it was only intended for the structure definition for its users.

Btw. now when all fields are more descriptive, we can remove the comments altogether (here and in the 2 assignments above).


Pal/src/host/Linux-SGX/pal_security.h, line 45 at r6 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Done.

ditto (see my previous comment)


Pal/src/host/Linux-SGX/sgx_main.c, line 792 at r6 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Done.

Not done.

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, @vijaydhanraj, and @yamahata)


Pal/include/arch/x86_64/pal-arch.h, line 195 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

trailing space

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 127 at r7 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please fix alignment

Done.


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 159 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Here this comment about element count doesn't make sense, it was only intended for the structure definition for its users.

Btw. now when all fields are more descriptive, we can remove the comments altogether (here and in the 2 assignments above).

sure, done.


Pal/src/host/Linux-SGX/pal_security.h, line 45 at r7 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

will change elements*/ -> elements */ in the next fixup or if things look fine before merge.

Not relevant. Removed the comment itself.


Pal/src/host/Linux-SGX/sgx_main.c, line 792 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Not done.

Done.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r8.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @vijaydhanraj, and @yamahata)


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 159 at r6 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

sure, done.

No need for the empty lines now (between the cpu-number-related assignments):

    ci->online_logical_cores = g_pal_sec.online_logical_cores;
    ci->physical_cores_per_socket = g_pal_sec.physical_cores_per_socket;
    ci->cpu_socket = g_pal_sec.cpu_socket;

Pal/src/host/Linux-SGX/pal_security.h, line 45 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (see my previous comment)

ditto

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @yamahata)


Pal/src/host/Linux-SGX/db_main-x86_64.c, line 159 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

No need for the empty lines now (between the cpu-number-related assignments):

    ci->online_logical_cores = g_pal_sec.online_logical_cores;
    ci->physical_cores_per_socket = g_pal_sec.physical_cores_per_socket;
    ci->cpu_socket = g_pal_sec.cpu_socket;

sure, clubbed these together.


Pal/src/host/Linux-SGX/pal_security.h, line 45 at r6 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 9 files at r7, 3 of 5 files at r8, 2 of 2 files at r9.
Reviewable status: all files reviewed, 7 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @mkow and @vijaydhanraj)

a discussion (no related file):
I like the new names much better. It makes code much more self-explanatory.



Pal/include/arch/x86_64/pal-arch.h, line 190 at r9 (raw file):

/* PAL_CPU_INFO holds /proc/cpuinfo data */
typedef struct PAL_CPU_INFO_ {
    /* Number of logical processors available in the host */

Can we also change all such comments logical processor(s) -> logical core(s)?


Pal/include/arch/x86_64/pal-arch.h, line 192 at r9 (raw file):

    /* Number of logical processors available in the host */
    PAL_NUM online_logical_cores;
    /* Number of physical cores in a physical package (socket) */

Cam we also change physical package -> socket everywhere (maybe only keep it in this comment here because you show that these two are synonyms, but replace everywhere else)?


Pal/src/host/Linux/db_main-x86_64.c, line 119 at r9 (raw file):

    ci->online_logical_cores = online_logical_cores;

    int possible_cores = get_hw_resource("/sys/devices/system/cpu/possible", /*count=*/true);

Can we rename possible_cores -> possible_logical_cores for clarity?


Pal/src/host/Linux-SGX/db_main.c, line 350 at r9 (raw file):

    int* cpu_socket = (int*)malloc(online_logical_cores * sizeof(int));
    if (!cpu_socket) {
        SGX_DBG(DBG_E, "Allocation for logical processor -> physical package mappings failed\n");

core, socket


Pal/src/host/Linux-SGX/sgx_main.c, line 792 at r6 (raw file):

Previously, vijaydhanraj (Vijay Dhanraj) wrote…

Done.

ditto (update terminology in comments)


Pal/src/host/Linux-SGX/sgx_main.c, line 773 at r9 (raw file):

    pal_sec->online_logical_cores = online_logical_cores;

    int possible_cores = get_hw_resource("/sys/devices/system/cpu/possible", /*count=*/true);

ditto (possible_logical_cores)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r9.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @vijaydhanraj)


Pal/src/host/Linux/db_main-x86_64.c, line 119 at r9 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can we rename possible_cores -> possible_logical_cores for clarity?

+1

Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


Pal/include/arch/x86_64/pal-arch.h, line 190 at r9 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Can we also change all such comments logical processor(s) -> logical core(s)?

Done.


Pal/include/arch/x86_64/pal-arch.h, line 192 at r9 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Cam we also change physical package -> socket everywhere (maybe only keep it in this comment here because you show that these two are synonyms, but replace everywhere else)?

Done.


Pal/src/host/Linux/db_main-x86_64.c, line 119 at r9 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

+1

sure, done.


Pal/src/host/Linux-SGX/db_main.c, line 350 at r9 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

core, socket

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 792 at r6 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (update terminology in comments)

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 773 at r9 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

ditto (possible_logical_cores)

Done.

mkow
mkow previously approved these changes Nov 4, 2020
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r10.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

@mkow
Copy link
Member

mkow commented Nov 4, 2020

Jenkins, retest this please

Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r10.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @vijaydhanraj)

a discussion (no related file):
Jenkins failed with this:

Bootstrap.c: In function 'main':
Bootstrap.c:65:55: error: 'PAL_CPU_INFO {aka struct PAL_CPU_INFO_}' has no member named 'cpu_num'; did you mean 'cpu_vendor'?
     pal_printf("CPU num: %ld\n", pal_control.cpu_info.cpu_num);
                                                       ^~~~~~~
                                                       cpu_vendor
Makefile:116: recipe for target 'Bootstrap' failed
make[2]: Leaving directory '/home/jenkins/workspace/graphene-18.04/Pal/regression'

You forgot to modify cpu_num in one of the PAL test files. Please grep for cpu_num everywhere in Graphene code and rename remaining uses of this variable.


Copy link
Contributor Author

@vijaydhanraj vijaydhanraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Jenkins failed with this:

Bootstrap.c: In function 'main':
Bootstrap.c:65:55: error: 'PAL_CPU_INFO {aka struct PAL_CPU_INFO_}' has no member named 'cpu_num'; did you mean 'cpu_vendor'?
     pal_printf("CPU num: %ld\n", pal_control.cpu_info.cpu_num);
                                                       ^~~~~~~
                                                       cpu_vendor
Makefile:116: recipe for target 'Bootstrap' failed
make[2]: Leaving directory '/home/jenkins/workspace/graphene-18.04/Pal/regression'

You forgot to modify cpu_num in one of the PAL test files. Please grep for cpu_num everywhere in Graphene code and rename remaining uses of this variable.

ah my bad, I tested LibOS/regression but not the PAL/regression. Fixed this now and it works for both cases.
@dimakuv thanks for checking.


@mkow
Copy link
Member

mkow commented Nov 5, 2020

Jenkins, retest this please

mkow
mkow previously approved these changes Nov 5, 2020
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r11.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

dimakuv
dimakuv previously approved these changes Nov 5, 2020
Copy link

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r11.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

yamahata
yamahata previously approved these changes Nov 5, 2020
Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r6, 3 of 9 files at r7, 2 of 2 files at r9, 4 of 4 files at r10, 1 of 1 files at r11.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners

Applications tend to use `/proc/cpuinfo` to get the `cpu cores`
and `physical id` for computing number of physical cores in a
socket. Currently `cpu cores` field is incorrectly implemented as
it is set to number of logical processors online and `physical id`
isn't implemented. This patch addresses both of these issues.
@mkow mkow dismissed stale reviews from yamahata, dimakuv, and themself via 41a87ad November 5, 2020 23:25
@mkow mkow force-pushed the vdhanraj/fixup_proc_cpuinfo branch from 5264626 to 41a87ad Compare November 5, 2020 23:25
@mkow
Copy link
Member

mkow commented Nov 5, 2020

Jenkins, retest this please

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r12.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL)

Copy link
Contributor

@yamahata yamahata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r12.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@mkow mkow merged commit 41a87ad into gramineproject:master Nov 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants