Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLE: Slice the very slow memset/memcpy variants #18560

Merged
merged 3 commits into from
Dec 17, 2023

Conversation

unknownbrackets
Copy link
Collaborator

I've been busy lately with some family/real life stuff, but had a little time to write this. I found that sometimes, these slow memcpy/memset variants can be called with very large sizes (megabytes) and cause stutters and performance issues by consuming such a large amount of cycles all at once.

This makes them execute in slices so thread scheduling and vsync interrupts can still properly happen.

I didn't test ARM or anything at all so hopefully it's not broken. Also have not checked against recent commits.

-[Unknown]

@hrydgard hrydgard added this to the v1.17.0 milestone Dec 16, 2023
Copy link
Owner

@hrydgard hrydgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, code looks good. Though since the PSP is doing cooperative multitasking, this only applies to interrupt-caused thread rescheduling, it shouldn't trigger in other circumstances, right?

Just requesting an explanatory comment.

Also, will help out with testing.

@@ -231,6 +248,9 @@ static int Replace_memcpy_jak() {
}
}

if (sliced) {
return 5 + bytes * -8 + 2;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment that a negative value is used to signal Comp_ReplacementFunc that the function call is to be repeated.

@unknownbrackets
Copy link
Collaborator Author

Interesting, code looks good. Though since the PSP is doing cooperative multitasking, this only applies to interrupt-caused thread rescheduling, it shouldn't trigger in other circumstances, right?

No, you misunderstand the PSP's threading model. You actually told me this years ago, and I thought we talked about how it was not really the case.

CPUs of course execute interrupts and the PSP firmware's interrupt handlers reschedule threads. That's enough for me to consider it incorrect to call the threading model strictly "cooperative".

Those interrupts can be vblanks, timers, whatever. It's basically impossible to have a better priority thread that is in a runnable state but isn't running - it will always get switched to, because whatever put it in that state would've been an interrupt handler on a timer or dmac or something, and will switch it.

When I made this change in PPSSPP it fixed a bunch of bugs. I did it after finally deciding to test whether it really was strictly cooperative.

It's easy to test your hypothesis if you want to. Simply create a low-priority background thread that spins on while (true) continue; and then call sceDisplayWaitVblankStart(). If your hypothesis were true, the system would hang and become unusable - somehow the tight loop would overheat the CPU and make interrupts stop triggering? - but you'll see that it just hangs up for one vblank.

Keep in mind these memcpy and memset implementations are SLOW. So a memset() of 8MB (which there are games that do) will take 5 + 8*1024*1024 * 6 + 2 = 50,331,655 cycles. That's 0.2267 seconds which means about 13 vblank interrupts get skipped - in other words, it drops 13 frames. Instead, those interrupts should trigger and better-priority threads waiting on vblank or what have you should run.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Dec 16, 2023

Yeah, you're right, we have talked about this, and it's true that if you have vblank interrupts or other interrupts enabled, then it's really quite close to a fully preemptive system, just not with a defined rescheduling interval like Windows or Linux have. But it still isn't completely pre-emptive.

Let's say you have only two threads with the same priority, if you don't have any interrupts enabled and both just have their PC in a spin function and don't call anything related to interrupts or scheduling, the one running will keep running and the other will never end up being scheduled. This is unlike Linux and Windows and other pre-emptive systems, and this fact sticks in my mind a little too much.. because as you say, in reality various interrupts that are there for other purposes than purely rescheduling do cause rescheduling.

(Also what I'm requesting a comment about it not this at all, just the negative value thing)

When they take an especially long time, this allows thread switches
meanwhile.  Important for cases where they might consume more than a total
frame worth of cycles in a background thread.
@unknownbrackets
Copy link
Collaborator Author

At least according to JPCSP and pspsdk, there are fields on the thread info struct called preempts:

https://github.com/pspdev/pspsdk/blob/db7ee550eaed98a5a51a5f0b0872ef9f8620288e/src/user/pspthreadman.h#L115-L118

I would say it is a preemptive system (when considering that the firmware/kernel reschedules in its interrupt handlers which are not optional), and it's just that the thread priority is very different from traditional thread priority. So yes, the best-priority thread at the top of that priority queue is always running, and there's no slicing or "fairness" among other threads. But that's all about priority. As soon as a better-priority thread could run, it preempts. It's simply not a cooperative threading model.

At its core, it may arguably be - if you ripped out the rest of the kernel. But that's not a valid way to think of the system.

-[Unknown]

@hrydgard
Copy link
Owner

Alright, I'm convinced - hereby updating my mental model with push -f :)

I'll test this in a bit on all the archs except RISC-V and then merge.

@hrydgard
Copy link
Owner

Hm, this is broken on ARM64 at least - Daxter freaks out completely. I'll take a look later.

@hrydgard
Copy link
Owner

Actually, rebasing on master, it's no longer broken on ARM64. So I guess this was branched at an unfortunate time?

I'm going to test it rebased for ARM32 as well.

@hrydgard
Copy link
Owner

ARM32/ARM64 both confirmed working after rebase. Also checked that these functions actually got called in some games I tested :)

@hrydgard hrydgard merged commit e5af1f8 into hrydgard:master Dec 17, 2023
18 checks passed
@unknownbrackets unknownbrackets deleted the replacement-slice branch December 29, 2023 03:46
@hrydgard hrydgard mentioned this pull request Feb 3, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants