-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unusual slowdown in the MLB 2k games #17623
Comments
The two functions sceKernelReferThreadProfiler and sceKernelReferGlobalProfiler are spammed by this game. So let's reduce logging to DEBUG and delay the response by a tested number of ms. Fixes #17623
OK, this got a lot worse by adding a delay to these functions - now everything is extremely slow. As mentioned in #17666 , tests point to these functions always returning 0. And indeed, the game checks for that, and if zero it does some stuff then calls it again, a bunch of times per frame (but limited). Returning any other value than 0 causes the game to go accessing stuff there, so the function signature in the pspsdk seems correct (that it returns a pointer to a structure, if it does, which it might only do on devkits, or in some specific mode, for example). Anyway, the function is called 2600 times per frame (at the savestate I'm debugging) independent of timing, so it's not being used in a timing loop, which was my first theory. So simply getting rid of the delay and the logging and letting it just execute as quickly as possible seems like it will fix the problem, but will be inconsistent with my timing test. |
Here's the output from an improved test:
Source code: CHECKPOINT_ENABLE_TIME = 1;
int i;
checkpointNext("sceKernelReferThreadProfiler");
for (i = 0; i < 8; i++) {
PspDebugProfilerRegs *regs = sceKernelReferThreadProfiler();
checkpoint("regs: %08x", regs);
}
checkpointNext("sceKernelReferGlobalProfiler");
for (i = 0; i < 8; i++) {
PspDebugProfilerRegs *regs = sceKernelReferGlobalProfiler();
checkpoint("regs: %08x", regs);
}
return 0; But I don't get it, how can those timings be microseconds? 4000 us (4 ms) seems crazy long just for calling checkpointNext? I'm really confused now how to get emulation and the PSP to match here... Adding such long delays just breaks it. |
This fixes timing in the MLB games, but is inconsistent with testing :( See issue #17623
Calling checkpoint() sometimes causes a flush, causes drawing to the screen, causes a lot of things. So calling it each time defeats the purpose of using many calls to reduce the overhead cost. Should also compare test.py output to deduct overhead (we don't really cost as much to draw to the screen, as you know, though.) I used this: void t_sceKernelReferThreadProfiler() {
sceKernelReferThreadProfiler();
sceKernelReferThreadProfiler();
sceKernelReferThreadProfiler();
sceKernelReferThreadProfiler();
sceKernelReferThreadProfiler();
sceKernelReferThreadProfiler();
sceKernelReferThreadProfiler();
sceKernelReferThreadProfiler();
}
void t_sceKernelReferGlobalProfiler() {
sceKernelReferGlobalProfiler();
sceKernelReferGlobalProfiler();
sceKernelReferGlobalProfiler();
sceKernelReferGlobalProfiler();
sceKernelReferGlobalProfiler();
sceKernelReferGlobalProfiler();
sceKernelReferGlobalProfiler();
sceKernelReferGlobalProfiler();
}
#define NUM_SAMPLES 128
u64 base[NUM_SAMPLES + 1];
u64 timings[NUM_SAMPLES + 1];
void doTiming(void func(), void cleanup()) {
int i;
timings[0] = sceKernelGetSystemTimeWide();
for (i = 1; i <= NUM_SAMPLES; ++i) {
func();
timings[i] = sceKernelGetSystemTimeWide();
if (cleanup != NULL) {
cleanup();
}
}
}
void calibrateFunc() {
}
void calibrate() {
doTiming(&calibrateFunc, &calibrateFunc);
memcpy(base, timings, sizeof(base));
}
int averageCycles(void func()) {
doTiming(func, NULL);
int i;
// Well, sum for average.
u64 average = 0;
for (i = 1; i <= NUM_SAMPLES; ++i) {
u64 elapsed = timings[i] - timings[i - 1];
u64 overhead = base[i] - base[i - 1];
if (elapsed > overhead) {
average += elapsed - overhead;
} else {
schedf("warning: too quick to measure?\n");
}
}
flushschedf();
return (int) (average / NUM_SAMPLES);
}
int averageCyclesCleanup(void func(), void cleanup()) {
doTiming(func, cleanup);
int i;
// Well, sum for average.
u64 average = 0;
for (i = 1; i <= NUM_SAMPLES; ++i) {
u64 elapsed = timings[i] - timings[i - 1];
u64 overhead = base[i] - base[i - 1];
if (elapsed > overhead) {
average += elapsed - overhead;
} else {
schedf("warning: too quick to measure?\n");
}
}
flushschedf();
return (int) (average / NUM_SAMPLES);
}
...
schedf("sceKernelReferThreadProfiler: %d\n", averageCycles(&t_sceKernelReferThreadProfiler));
schedf("sceKernelReferGlobalProfiler: %d\n", averageCycles(&t_sceKernelReferGlobalProfiler)); Which is how I typically measure these things. Still slow, but I get an average of this at 111Mhz: sceKernelReferThreadProfiler: 12 This is per 8, though, so -[Unknown] |
Ah, I see, didn't realize the checkpoints had such overhead. Now this makes sense. I think we'll just go with your new and improved timings, indeed - the game performs fine and things now line up. |
Closing, reported fixed after #17677 |
Reported by Brahmadath through e-mail, that it slows down in specific situations between games.
Curiously, according to the screenshot, the game spends a TON of time in
sceKernelReferGlobalProfiler
. Possibly this call should take a lot more simulated time, maybe it's in a tight loop around it for some reason? We've seen similar things before. Or we return bad data confusing the game - which may clearly be the case since we have a stub implementation that just returns 0, currently.The text was updated successfully, but these errors were encountered: