Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix major problem in do_switch routine #89

Merged
merged 1 commit into from
Mar 13, 2024
Merged

Conversation

ghaerr
Copy link
Contributor

@ghaerr ghaerr commented Mar 12, 2024

While spending more time figuring out exactly how Fiwix works, I started looking closely at the do_switch routine, which is used by the scheduler to switch stacks and run another task.

Absolutely amazingly enough, I found that every time a task is switched back to run again, after restoring the flags then the register stack with popa, the restarted task actually falls through and executes the cpuid routine every time before resuming user mode execution. And it looks like this has been occurring since v1.0.0!!!!

I initially thought there must be some magic I wasn't following in the task switch. But inserting the following code will display a dot ('.) on every task switch, instead of falling through into cpuid:

diff --git a/kernel/core386.S b/kernel/core386.S
index 2577ecc..1f4a513 100644
--- a/kernel/core386.S
+++ b/kernel/core386.S
@@ -295,6 +295,16 @@ BUILD_IRQ(15, irq15)
    popfl
    popa

+   pushl   %eax
+   mov     $msg,%eax
+   pushl   %eax
+   call    printk
+   popl    %eax
+   popl    %eax
+   ret
+msg:    .byte      '.'
+        .byte      0
+
 .align 4
 .globl cpuid; cpuid:
    pushl   %ebp

The cpuid routine trashes EAX and ECX. The reason the kernel isn't crashing user programs is that the actual return from do_switch is back to the only place do_switch is normally called, the C routine context_switch, where EAX and ECX are essentially scratch also, as an immediate STI and RET are executed. (The do_switch calls in the KEXEC code appear to never return, so KEXEC did not likely see any register trashing either).

I'm not sure if a Fiwix speedup will be seen, as the normal context switch occurs at only 100HZ, even though cpuid was always executed with interrupts disabled.

Nothing like this one-line fix!! :)

@ghaerr
Copy link
Contributor Author

ghaerr commented Mar 12, 2024

Separately, I notice that the kernel stack location shown in doc/kmem_layout.txt, that of being after the kernel bss, is incorrect. It is now at the top 4K of the first 64K of physical RAM, correct? (i.e. physical address F000-FFFF).

Is there still a guard for stack overflow? No, correct?

Also that (starting) kernel stack is only used for the idle task, (PID 0), right?

I can update the document if you'd like.

@mikaku
Copy link
Owner

mikaku commented Mar 12, 2024

the restarted task actually falls through and executes the cpuid routine every time before resuming user mode execution. And it looks like this has been occurring since v1.0.0!!!!

Oh my!!, 🤦 nice catch!
I think that if CPUID is calling so often, it should affect the performance of the kernel somehow.
I've applied your patch here and just started a whole package build to see if timings changed.

@mikaku
Copy link
Owner

mikaku commented Mar 13, 2024

Sorry for the late reply, but when the whole package build completed I detected that the times of the new build were a lot bigger than the previous and that wasn't what I was expecting. So, the first thing I did was to take an older kernel version from GitHub but the results were the same.

At this time, I chose to do the same test on another Fedora 39 and the results were the same (slow build times). This time I decided to downgrade the QEMU version from 8.1.3 to 8.1.0 and now the build times are back to normal. I'd like to be able to downgrade to even 7.x but that hasn't been possible.

I don't know if the regression in disk performance is a bug in QEMU or something that QEMU changed in 8.1.3 affects somehow to Fiwix. In any case, and unfortunately, your fix in the do_switch routine hasn't brought any visible benefit in the kernel performance, at least not under an emulation. Real-hardware is a completely different story though.

@mikaku mikaku merged commit 39cb5d5 into mikaku:master Mar 13, 2024
@mikaku
Copy link
Owner

mikaku commented Mar 13, 2024

Thank you very much!

@mikaku
Copy link
Owner

mikaku commented Mar 13, 2024

Separately, I notice that the kernel stack location shown in doc/kmem_layout.txt, that of being after the kernel bss, is incorrect. It is now at the top 4K of the first 64K of physical RAM, correct? (i.e. physical address F000-FFFF).

Oh yes, this was modified in #63 and I forgot to update the documentation.

Is there still a guard for stack overflow? No, correct?

Hmm, no.

Also that (starting) kernel stack is only used for the idle task, (PID 0), right?

I think so, because every kernel process has its own stack address in tss->esp0.

I can update the document if you'd like.

Yes, please.
Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants