-
-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Threading phase 2 #301
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to merge. Glad to see continued progress. I'll push a fixup for you after this for other platform breakages.
@@ -45,6 +45,7 @@ cosmo: push %rbp | |||
pop %rax | |||
#endif | |||
call _init | |||
call _main_thread_init # FIXME: use .init.start macro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will show you how this can be fixed in a follow-up change.
The simplest right thing to do here, is say:
#if SupportsLinux()
call _main_thread_init
#endif
Then put:
if (!IsLinux()) return;
At the start of that function. Since
But ideally, we wouldn't want to link in threading initialization runtime if threading isn't used. So the idiomatic trick that Cosmopolitan uses to keep binaries minimal (i.e. you only pay for what you use) is what we call "yoinking". To do it in pure C you could have an individual file that looks like this:
static textstartup void cthread_init() {
/* do stuff */
}
const void *const cthread_ctor[] initarray = {
cthread_init,
};
Then, assuming the cthread library implements one function per file (a good practice in general) then any function which makes the assumption that cthread_init()
was called, would simply put:
STATIC_YOINK("cthread_ctor");
At the top of the file. It's a code size saving technique compared to the more conventional alternative, of having every cthread API call cthread_init() at the beginning, and then putting static bool once; if (!lockcmpxchg(&once, false, true)) return;
at the beginning of the init function.
The textstartup
keyword is optional and basically asks the linker to relocate initialization code to the same section of the binary, so that fewer page faults occur during startup for large binaries.
Finally, there's the extreme code size saving technique of embedding code in the _init()
function, which runs before all constructors. This has to be written in assembly and diverges from the System V ABI in order to make efficient use of the LODS
and STOS
instructions. The best example of this pattern is in libc/nexgen32e/kcpuids.S
: "rcx", "r11", "cc", "memory"); | ||
return rc; | ||
} | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we normally do here is return enosys()
but to be consistent with the above code you would likely want to return -ENOSYS;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're rigth, sorry for the quick and dirty.
int cthread_native_sem_init(cthread_native_sem_t* sem, int count) { | ||
static void pause(int attempt) { | ||
if (attempt < 16) { | ||
for (int i = 0; i < (1 << attempt); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the exponential backoff latency in nanoseconds for my cpu, assuming 31 nanosecond pause:
0 = 31
1 = 62
2 = 124
3 = 248
4 = 496
5 = 992
6 = 1,984
7 = 3,968
8 = 7,936
9 = 15,872
10 = 31,744
11 = 63,488
12 = 126,976
13 = 253,952
14 = 507,904
15 = 1,015,808
16 = 2,031,616
17 = 4,063,232
18 = 8,126,464
19 = 16,252,928
20 = 32,505,856
21 = 65,011,712
22 = 130,023,424
23 = 260,046,848
24 = 520,093,696
25 = 1,040,187,392
26 = 2,080,374,784
27 = 4,160,749,568
28 = 8,321,499,136
29 = 16,642,998,272
30 = 33,285,996,544
After 6
you might consider switching to nanosleep
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs tweaking, indeed.
@@ -342,6 +342,19 @@ SECTIONS { | |||
/*END: Read Only Data (only needed for initialization) */ | |||
/*END: Read Only Data */ | |||
} :Rom | |||
|
|||
.tdata . : { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contents of these might need to be moved into .data
and .bss
so as to not break APE on non-Linux. I'll take a look into it after merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought just before .data
would be fine, as it is just more data somehow. However, .tdata
(and maybe .tbss
) should most likely be kept in its own section because it has a special TLS flag for the ELF header.
I think it would also be nice to keep them together to enable a smarter init when cthread is disabled (see 91d7833#commitcomment-58693093).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There does appear to be a TLS program header. Linux and FreeBSD support it for sure, so it could shave a few microseconds off startup time. OpenBSD probably doesn't and I would hope ignores it, but we'll have to see. I think you're doing the right thing for now, in setting it up manually.
Cosmopolitan Threads are currently Linux-only (with some NetBSD and Windows support too!). This change ensures we only initialize the high-level threading runtime when Cosmopolitan Threads are used.
Continuation of #282
The goal is to implement the pthread interface (or at least similar) for cosmopolitan.
The key observation that makes it possible to consider such a daunting task is the following:
It seems that there are really few part of the code that is OS dependent, namely OS thread creation, OS thread destruction, futex, and TLS.
The complex parts are built on top of those, and on top of atomics which are OS independent.
The goal of this PR is to focus on the portable abstraction, and make it works on Linux (because that's the platform I know).
Status: