Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable new win-arm64 platform #82

Merged
merged 1 commit into from
Jul 5, 2022

Conversation

gaborkertesz-linaro
Copy link
Contributor

@gaborkertesz-linaro gaborkertesz-linaro commented Mar 11, 2022

This PR enables the new Windows on Arm (win-arm64) platform, by implementing the required APIs, reading topology information via Windows API.

@gaborkertesz-linaro
Copy link
Contributor Author

gaborkertesz-linaro commented Mar 11, 2022

Build on win-arm64
cmake . -G "Visual Studio 16 2019" -A ARM64
cmake --build .

src/arm/windows/init.c Outdated Show resolved Hide resolved
break;
}
/* Reset the lowest bit in affinity mask */
group_processors_mask &= (group_processors_mask - 1);
Copy link

@marcpems marcpems Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resets the most significant bits, not the lowest bit.
Would it be clearer to shift right and test for 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've inherited this code from x86 init: https://github.com/pytorch/cpuinfo/blob/master/src/x86/windows/init.c#L229
I mean it's absolutely not an excuse, I'll check that again.

Copy link
Contributor Author

@gaborkertesz-linaro gaborkertesz-linaro Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it really clears all bits, that are lower than the least set bit.

ff 11111111
fe 11111110
fc 11111100
f8 11111000
f0 11110000
e0 11100000
c0 11000000
80 10000000
0 00000000

0x10
0x0

0x20
0x0

0x40
0x0

0x80
0x0

src/arm/windows/init.c Outdated Show resolved Hide resolved
}

/* 2. Read topology information via MSDN API: packages, cores and caches*/
nr_of_packages = read_packages_for_processors(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are many things that can go wrong in these calls, but there is no way to chain the error and exit the function on error. That's not great for defensive code.
Consider adding throw/catch if permitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see, there are not throw/catch in the current codebase and I'd think that's intentional, so I'd try to cleanup the error paths and make sure it's going to be safe.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes C code mostly

Copy link

@marcpems marcpems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general principals of querying CPU details and assigning to the cpuinfo relevant structures looks ok to me. There are some issues with the error handling here that might need addressing.

src/arm/windows/init.c Outdated Show resolved Hide resolved
src/arm/windows/init.c Outdated Show resolved Hide resolved
src/arm/windows/init.c Show resolved Hide resolved
@gaborkertesz-linaro
Copy link
Contributor Author

@malfet Could you review it? Thanks!

@gaborkertesz-linaro
Copy link
Contributor Author

related PR:
pytorch/pytorch#72424

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, but please explain why ARM64 needs a special treatment in ELSEIF(NOT CPUINFO_TARGET_PROCESSOR_MATCHES
And use bool instead of BOOL whenever possible (i.e. in callback its probably fine, but otherwise should not be used)

CMakeLists.txt Outdated Show resolved Hide resolved
cmake/DownloadGoogleBenchmark.cmake Outdated Show resolved Hide resolved
src/arm/windows/init.c Outdated Show resolved Hide resolved
src/arm/windows/windows_arm_init.h Outdated Show resolved Hide resolved
src/arm/windows/init_by_logical_sys_info.c Outdated Show resolved Hide resolved
@gaborkertesz-linaro
Copy link
Contributor Author

@malfet Can I do anything else to close this and the related PR?

@strega-nil-ms
Copy link

any updates on this?

@gaborkertesz-linaro
Copy link
Contributor Author

any updates on this?

Not on my side, but I'd happily close this!

@niyas-sait
Copy link

Can this be merged, please?

@kalaskarsanket
Copy link

Any plans on merging this PR ?

Copy link
Contributor

@Maratyszcza Maratyszcza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use dash - to separate filename parts rather than underscore _ for consistency with other files in cpuinfo.

};

/* Please add new SoC/chip info here! */
static struct woa_chip_info woa_chips[] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This covers literally two devices. We need a more generic detection method that doesn't require hard-coding every SoC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely agree that this solution is not ideal, but we haven't found any better and still robust option considering the existing Windows API. Also new SoCs won't turn up frequently, so even though it's a limited solution, it can be maintained.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar concern but there doesn't seem to be any better way to do it with the help of Windows API.

Ampere SoC also runs Windows/Arm64 - Does that need to be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice, but I can't really test that, so I'd ask to do as a follow-up commit.

CMakeLists.txt Outdated
@@ -171,6 +171,9 @@ IF(CPUINFO_SUPPORTED_PLATFORM)
LIST(APPEND CPUINFO_SRCS
src/arm/android/properties.c)
ENDIF()
ELSEIF(CMAKE_SYSTEM_NAME STREQUAL "Windows" AND CPUINFO_TARGET_PROCESSOR STREQUAL "ARM64")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong indentation?

Copy link

@kaadam kaadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gabor, thanks for working on this. It will help to solve our build issue in WoA Chromium. I managed to backport your PR to Chromium to check how the build works.
I left some comments about my findings. Basically lot of them are warnings reported by Clang-cl which treat as errors. Thanks.

static void set_cpuinfo_isa_fields();
static bool get_system_info_from_registry(
struct woa_chip_info** chip_info,
enum cpu_info_vendor* vendor);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: cpu_info_vendor -> cpuinfo_vendor


struct vendor_info {
char vendor_name[VENDOR_NAME_MAX];
enum cpu_info_vendor cpu_info_vendor;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

PINIT_ONCE init_once, PVOID parameter, PVOID* context)
{
struct woa_chip_info *chip_info = NULL;
enum cpu_info_vendor vendor = cpuinfo_vendor_unknown;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: cpu_info_vendor -> cpuinfo_vendor


static bool get_system_info_from_registry(
struct woa_chip_info** chip_info,
enum cpu_info_vendor* vendor)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -0,0 +1,32 @@
#pragma once

bool get_core_uarch_for_efficiency(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these functions down to avoid 'forward references to enum' reported by Clang-cl.

bool result = false;
char* textBuffer = NULL;
LPCTSTR cpu0_subkey =
"HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark these strings as wide-character literals by using 'L'.
L"HARDWARE\DESCRIPTION\System\CentralProcessor\0"

Copy link
Contributor Author

@gaborkertesz-linaro gaborkertesz-linaro Jul 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the MSVC compiler throws the following error to 'L' , even though I'd expected your proposal is the correct one:

warning C4133: 'initializing': incompatible types - from 'unsigned short [47]' to 'LPCTSTR'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh okay, seems using (LPCTSTR) cast works fine.

break;
case CacheData:
processors[processor_global_index].cache.l1d = current_cache;
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as above

case CacheData:
current_cache = l1d_base + l1d_counter;
l1d_counter++;
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as above

break;
case CacheData:
numbers_of_caches[cpuinfo_cache_level_1d]++;
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang-cl complains about the missing other two enum values:
"error: enumeration values 'CacheUnified' and 'CacheTrace' not handled in switch [-Werror,-Wswitch]"
Could you add these values as empty cases as well?

/* Please add new SoC/chip info here! */
static struct woa_chip_info woa_chips[] = {
/* Microsoft SQ1 Kryo 495 4 + 4 cores (3 GHz + 1.80 GHz) */
"Microsoft SQ1",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coud you add some extra braces to each struct elements?
../../third_party/cpuinfo/src/src/arm/windows/init.c(39,2): error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] "Microsoft SQ1", ^~~~~~~~~~~~~~~~ { ../../third_party/cpuinfo/src/src/arm/windows/init.c(52,2): error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] "Microsoft SQ2", ^~~~~~~~~~~~~~~~ {

@niyas-sait
Copy link

(@gaborkertesz-linaro is on holiday now and planning to address the comments in couple of weeks)

aarongable pushed a commit to chromium/chromium that referenced this pull request Jun 28, 2022
At the moment the Chrome build is broken for Windows on ARM,
since ruy build relies on cpuinfo third_party library which
doesn't support Windows on ARM port yet.
There is a pull request which implements the required functions
for WoA platfrom [1]. After it is landed we can re-enable cpuinfo build again.

[1] pytorch/cpuinfo#82

Bug: 1329774
Change-Id: Idb619eaa3180fac5068a939704151da0f653ee89
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3721652
Reviewed-by: Tarun Bansal <tbansal@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1018705}
@gaborkertesz-linaro
Copy link
Contributor Author

I hope I've fixed all of your findings, please let me know if I missed any of them!
Unfortunetaly I wasn't able to setup building with LLVM by CMake, but I've built with '-Wall' by MSVC compiler.

@kaadam
Copy link

kaadam commented Jul 5, 2022

I hope I've fixed all of your findings, please let me know if I missed any of them! Unfortunetaly I wasn't able to setup building with LLVM by CMake, but I've built with '-Wall' by MSVC compiler.

Gabor, thanks for updating it. Checking on my side, I get back to you asap.

This patch implements the required APIs for the new
win-arm64 platform by reading topology information via
Windows API.
Build config: cmake . -A ARM64
Copy link

@kaadam kaadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks. I'm not a reviewer, but I give you an informal LGTM.

@gaborkertesz-linaro
Copy link
Contributor Author

@malfet What do you think?

@malfet malfet merged commit 1baac2b into pytorch:master Jul 5, 2022
megengine-bot pushed a commit to MegEngine/cpuinfo that referenced this pull request Sep 5, 2022
This patch implements the required APIs for the new
win-arm64 platform by reading topology information via
Windows API.
Build config: cmake . -A ARM64

GitOrigin-RevId: 1baac2b
mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this pull request Oct 14, 2022
At the moment the Chrome build is broken for Windows on ARM,
since ruy build relies on cpuinfo third_party library which
doesn't support Windows on ARM port yet.
There is a pull request which implements the required functions
for WoA platfrom [1]. After it is landed we can re-enable cpuinfo build again.

[1] pytorch/cpuinfo#82

Bug: 1329774
Change-Id: Idb619eaa3180fac5068a939704151da0f653ee89
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3721652
Reviewed-by: Tarun Bansal <tbansal@chromium.org>
Commit-Queue: Tarun Bansal <tbansal@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1018705}
NOKEYCHECK=True
GitOrigin-RevId: 2db8c63892cff0e269c00f2da300aff3843e5735
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants