Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RISC-V support #272

Closed
wants to merge 14 commits into from
Closed

Add RISC-V support #272

wants to merge 14 commits into from

Conversation

DaniAffCH
Copy link
Contributor

@DaniAffCH DaniAffCH commented Aug 30, 2022

I added RISC-V architecture support on linux by parsing /proc/cpuinfo. It has been tested using QEMU emulator.
I have done the following points:

  • features detection (i kept it as a single letter, is it fine or do i have to replace it with its description?)
  • microarchitecture detection (i'm not sure if it's correct, so check it carefully)
  • vendor detection
    Tests have been added as well

@@ -49,6 +49,7 @@ set(PROCESSOR_IS_ARM FALSE)
set(PROCESSOR_IS_AARCH64 FALSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gchatelet, should we update the Bazel file in this PR?


#include "cpuinfo_riscv.h"

////////////////////////////////////////////////////////////////////////////////
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to move duplicate logic to a generic place for other operating systems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment there isn't a duplicate logic except for the GetRiscvInfo() function (just a few lines of code). Do I nevertheless have to move it to another file?

Copy link
Contributor

@toor1245 toor1245 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the next developer who will add support for another OS should rewrite/move your common logic. See #268, #204

Comment on lines +45 to +46
char uarch[64]; // 0 terminated string
char vendor[64]; // 0 terminated string
Copy link
Contributor

@toor1245 toor1245 Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider API that will be consistent with other operating systems, what about FreeBSD and possible future operating systems? Can we get this info?

Copy link
Contributor Author

@DaniAffCH DaniAffCH Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about other os compatibility. The more general way would be using misa instruction but as we have seen in #247 it's not possible unless you have machine privileges. Do you have any suggestion? We could define custom codes for each uarch and vendor and fill them accordingly with informations retrieved from the os

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that we should define only common info, and avoid features which on certain operating systems will return empty results.

for example, if both Linux and FreeBSD return hart from the cpuinfo file, then we can add to our functionality, but if only one Linux/FreeBSD returns some info and the other doesn't provide that info we should avoid it.

@DaniAffCH DaniAffCH changed the title RISC-V support Add RISC-V support Aug 31, 2022
Copy link
Contributor

@michael-roe michael-roe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your parser of the RISC-V ISA string is incorrect, because the ISA string can contain multi-letter extensions, such as "_zbc". Simply searching for single letters within the ISA string won't work, you need to parse it to find out where the multi-letter extensions begin.

@DaniAffCH
Copy link
Contributor Author

Ok @michael-roe, Do I have to support these extensions as well?

@michael-roe
Copy link
Contributor

Ok @michael-roe, Do I have to support these extensions as well?

Probably ought to support all the user space ones. The supervisor state ones (starting with S) are less relevant to user space programs that call cpu_features, so can probably leave those out.

value.size-=2;
for (size_t i = 0; i < RISCV_LAST_; ++i){
tmp_sv.ptr = kCpuInfoFlags[i];
tmp_sv.size = strlen(kCpuInfoFlags[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using libc functions like strlen, strcpy? According to documentation cpu_features is suitable for implementing fundamental libc functions like malloc, memcpy, and memcmp. Hence we use our implementation, see src/copy.inl, src/equals.inl.

Copy link
Contributor

@toor1245 toor1245 Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be strongly enforced but it is not anymore as it is hard to prevent the compiler from generating calls to these functions. That is, even if the code contains no call to C memory functions, the compiler may decide to generate them (e.g., https://godbolt.org/z/h5KKM7MEj)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we provide info to REAMDE that even with compiler flags such as freestanding no-stdlib no-builtin there is a possibility that the compiler will generate C functions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I think so, here is an example where the compiler generates a call even with all the compiler flags https://godbolt.org/z/rTTfzEq15

In fact, compilers require some C functions to exist.

StringView key, value;
if (CpuFeatures_StringView_GetAttributeKeyValue(line, &key, &value)) {
if (CpuFeatures_StringView_IsEquals(key, str("isa"))) {
StringView tmp_sv;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(CpuFeatures_StringView_StartsWith(value, "rv") == false) continue;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(!CpuFeatures_StringView_StartsWith(value, "rv")) continue;?

if (CpuFeatures_StringView_IsEquals(key, str("isa"))) {
StringView tmp_sv;
value.ptr+=2; // ignore 'rv' prefix at the beginning
value.size-=2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value = CpuFeatures_StringView_PopFront(value, 2)

value.ptr+=2; // ignore 'rv' prefix at the beginning
value.size-=2;
for (size_t i = 0; i < RISCV_LAST_; ++i){
tmp_sv.ptr = kCpuInfoFlags[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringView tmp_sv = str(kCpuInfoFlags[i]);

tmp_sv.ptr = kCpuInfoFlags[i];
tmp_sv.size = strlen(kCpuInfoFlags[i]);
kSetters[i](&info->features,
CpuFeatures_StringView_IndexOf(value, tmp_sv) != -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract a variable:

bool is_set = CpuFeatures_StringView_IndexOf(value, tmp_sv) != -1;
kSetters[i](&info->features, is_set);

value.size-=2;
for (size_t i = 0; i < RISCV_LAST_; ++i){
tmp_sv.ptr = kCpuInfoFlags[i];
tmp_sv.size = strlen(kCpuInfoFlags[i]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be strongly enforced but it is not anymore as it is hard to prevent the compiler from generating calls to these functions. That is, even if the code contains no call to C memory functions, the compiler may decide to generate them (e.g., https://godbolt.org/z/h5KKM7MEj)

}

#endif // defined(CPU_FEATURES_OS_LINUX) || defined(CPU_FEATURES_OS_ANDROID)
#endif // CPU_FEATURES_ARCH_RISCV
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line at EOF

@@ -408,6 +413,12 @@ static Node* CreateTree(void) {
AddMapEntry(root, "microarchitecture",
CreateString(strings.type.base_platform));
AddFlags(root, &info.features);
#elif defined(CPU_FEATURES_ARCH_RISCV)
const RiscvInfo info = GetRiscvInfo();
AddMapEntry(root, "arch", CreateString("risc-v"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With enough time people will rely on this string. Are we sure we want "risc-v"?
Alternative naming scheme I can think of :

  • "riscv"
  • "riscv-32"/ "riscv-64" / "riscv-128"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second naming scheme seems better to me since this functionality is implemented with GetRiscvInfo and more precise arch detection compared to riscv, but I also don't mind "riscv"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an afterthought, it seems that the 32 / 64 / 128 address space is really seen as a feature so we could stick to "risc-v". As you said the information exists elsewhere. I don't mind either, I just wanted to check whether we could regret this choice later on.

TEST(CpuinfoRiscvTest, KendryteFromCpuInfo) {
ResetHwcaps();
auto& fs = GetEmptyFilesystem();
fs.CreateFile("/proc/cpuinfo", R"(hart : 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put hart : 0 on a new line for visibility

}

} // namespace
} // namespace cpu_features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line at EOF

StringView tmp_sv;
value.ptr+=2; // ignore 'rv' prefix at the beginning
value.size-=2;
for (size_t i = 0; i < RISCV_LAST_; ++i){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity here is O(N^2)
I'd rather scan the characters after "rv" and check if the character corresponds to one of the riscv extensions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just read the comment from @michael-roe that the extensions can be strings and not single letters so I take my previous comment back.
Do we have a strong guarantee that no extension string can be embedded in another extension string ? If not, we cannot simply look at CpuFeatures_StringView_IndexOf(...) != -1 as we would take an extension for another one.

Comment on lines +50 to +52
RISCV_32,
RISCV_64,
RISCV_128,
Copy link
Contributor

@toor1245 toor1245 Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it right that we mixed isa extensions and arch to RiscvFeaturesEnum should we separate them?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia separates them into base and extension, I think we should do the same (see my comment on l.40)

| Linux | yes² | yes¹ | yes¹ | yes¹ | yes¹ | yes¹ |
| MacOs | yes² | N/A | not yet | N/A | no | N/A |
| Windows | yes² | not yet | not yet | N/A | N/A | N/A |
| FreeBSD | yes² | not yet | not yet | not yet | not yet | N/A |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FreeBSD - not yet

Comment on lines +50 to +52
RISCV_32,
RISCV_64,
RISCV_128,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wikipedia separates them into base and extension, I think we should do the same (see my comment on l.40)

CPU_FEATURES_START_CPP_NAMESPACE

typedef struct {
int riscv32 : 1; // Is a 32 bit architecture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Is a 32 bit architecture AFAIU this is more about address space

int m : 1; // Integer Multiplication and Division
int v : 1; // Vector Operations
int q : 1; // Quad Precision Floating Point
} RiscvFeatures;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed lower in this file I think we should make a distinction between base and extensions.
https://en.wikipedia.org/wiki/RISC-V#ISA_base_and_extensions

I was tempted to create two Features structs (i.e., BaseRiscvFeatures and ExtRiscvFeatures) but this will not play nicely with the current introspection facilities.

Let's embed the distinction in the name instead.
I see two paths forward : use the names as they are defined in the wikipedia page or prefix them with base_ (std_?) and ext_.

I think I prefer sticking to the original names (RVWMO, RV32I, RV32E, RV64I, RV128I for base and M,A,F,D,Zicsr,Zifencei,G,Q,L,C,B,J,T,P,V,K,N,H,S,Zam,Ztso for extensions) as long as riscv commits to this naming scheme.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you to stick naming according to riscv documentation, thus we eliminate the confusion with riscv32/riscv64/riscv128, also we can add a comment with the isa definition (base/ext):

typedef struct {
    int rvmwo : 1;  // Base ISA. Weak Memory Ordering
    int rv32i: 1;   // Base ISA. Base Integer Instruction Set, 32-bit
    ...
    int zicsr: 1;   // Extension ISA. Control and Status Register (CSR)
    ...
} RiscvFeatures;

StringView tmp_sv;
value.ptr+=2; // ignore 'rv' prefix at the beginning
value.size-=2;
for (size_t i = 0; i < RISCV_LAST_; ++i){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just read the comment from @michael-roe that the extensions can be strings and not single letters so I take my previous comment back.
Do we have a strong guarantee that no extension string can be embedded in another extension string ? If not, we cannot simply look at CpuFeatures_StringView_IndexOf(...) != -1 as we would take an extension for another one.

value.size-=2;
for (size_t i = 0; i < RISCV_LAST_; ++i){
tmp_sv.ptr = kCpuInfoFlags[i];
tmp_sv.size = strlen(kCpuInfoFlags[i]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I think so, here is an example where the compiler generates a call even with all the compiler flags https://godbolt.org/z/rTTfzEq15

In fact, compilers require some C functions to exist.

@@ -408,6 +413,12 @@ static Node* CreateTree(void) {
AddMapEntry(root, "microarchitecture",
CreateString(strings.type.base_platform));
AddFlags(root, &info.features);
#elif defined(CPU_FEATURES_ARCH_RISCV)
const RiscvInfo info = GetRiscvInfo();
AddMapEntry(root, "arch", CreateString("risc-v"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an afterthought, it seems that the 32 / 64 / 128 address space is really seen as a feature so we could stick to "risc-v". As you said the information exists elsewhere. I don't mind either, I just wanted to check whether we could regret this choice later on.

@Mizux Mizux mentioned this pull request Nov 3, 2022
4 tasks
@gchatelet
Copy link
Collaborator

Hi @DaniAffCH are you still interested in contributing to this branch?
I can take it over if you want.

@gchatelet gchatelet mentioned this pull request Nov 9, 2022
@DaniAffCH
Copy link
Contributor Author

Hi @gchatelet, yes I'm still interested in contributing to this project but I have a really busy period, so I'm unable to fix the PR issues, I'll do it as soon as I'm free though. If you want to keep the PR list clean feel free to close my PR and I'll open it again when I'm ready.

@gchatelet
Copy link
Collaborator

Hi @gchatelet, yes I'm still interested in contributing to this project but I have a really busy period, so I'm unable to fix the PR issues, I'll do it as soon as I'm free though. If you want to keep the PR list clean feel free to close my PR and I'll open it again when I'm ready.

Ok no worries. I've started #287 based on this PR to retain your original work (I hope you don't mind!). I'm now working on fixing the comments we've made. Thx to @Mizux we also have a running CI for RISC-V 💪.

@DaniAffCH
Copy link
Contributor Author

DaniAffCH commented Nov 10, 2022

Hi @gchatelet, yes I'm still interested in contributing to this project but I have a really busy period, so I'm unable to fix the PR issues, I'll do it as soon as I'm free though. If you want to keep the PR list clean feel free to close my PR and I'll open it again when I'm ready.

Ok no worries. I've started #287 based on this PR to retain your original work (I hope you don't mind!). I'm now working on fixing the comments we've made. Thx to @Mizux we also have a running CI for RISC-V muscle.

Ok, good job! I can probably be back active in 2 weeks. Please keep track of the problems you have fixed (or I'm gonna figure it out from your commits).

@gchatelet
Copy link
Collaborator

Merged as part of #287 . Thx a lot @DaniAffCH for the initial implementation and research, this helped !

@gchatelet gchatelet closed this Jan 13, 2023
@gchatelet gchatelet added this to the v0.8.0 milestone Apr 27, 2023
@gchatelet gchatelet added the enhancement New feature or request label Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants