Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unusual slowdown in the MLB 2k games #17623

Closed
hrydgard opened this issue Jun 25, 2023 · 5 comments · Fixed by #17666
Closed

Unusual slowdown in the MLB 2k games #17623

hrydgard opened this issue Jun 25, 2023 · 5 comments · Fixed by #17666
Labels
HLE/Kernel Kernel, memory manager, other HLE issues Performance Unexpected slow performance issues
Milestone

Comments

@hrydgard
Copy link
Owner

hrydgard commented Jun 25, 2023

Reported by Brahmadath through e-mail, that it slows down in specific situations between games.

Curiously, according to the screenshot, the game spends a TON of time in sceKernelReferGlobalProfiler. Possibly this call should take a lot more simulated time, maybe it's in a tight loop around it for some reason? We've seen similar things before. Or we return bad data confusing the game - which may clearly be the case since we have a stub implementation that just returns 0, currently.

Screenshot_20230626-012051_PPSSPP (1)

@hrydgard hrydgard added HLE/Kernel Kernel, memory manager, other HLE issues Performance Unexpected slow performance issues labels Jun 25, 2023
@hrydgard hrydgard added this to the v1.16.0 milestone Jun 25, 2023
hrydgard added a commit that referenced this issue Jul 5, 2023
The two functions sceKernelReferThreadProfiler and
sceKernelReferGlobalProfiler are spammed by this game. So let's reduce
logging to DEBUG and delay the response by a tested number of ms.

Fixes #17623
@hrydgard hrydgard reopened this Jul 8, 2023
@hrydgard
Copy link
Owner Author

hrydgard commented Jul 8, 2023

OK, this got a lot worse by adding a delay to these functions - now everything is extremely slow.

As mentioned in #17666 , tests point to these functions always returning 0. And indeed, the game checks for that, and if zero it does some stuff then calls it again, a bunch of times per frame (but limited).

Returning any other value than 0 causes the game to go accessing stuff there, so the function signature in the pspsdk seems correct (that it returns a pointer to a structure, if it does, which it might only do on devkits, or in some specific mode, for example).

Anyway, the function is called 2600 times per frame (at the savestate I'm debugging) independent of timing, so it's not being used in a timing loop, which was my first theory. So simply getting rid of the delay and the logging and letting it just execute as quickly as possible seems like it will fix the problem, but will be inconsistent with my timing test.

@hrydgard
Copy link
Owner Author

hrydgard commented Jul 8, 2023

Here's the output from an improved test:

[x/183566489] sceKernelReferThreadProfiler
[x/92] regs: 00000000
[x/65] regs: 00000000
[x/57] regs: 00000000
[x/57] regs: 00000000
[x/56] regs: 00000000
[x/58] regs: 00000000
[x/58] regs: 00000000
[x/57] regs: 00000000

[x/4608] sceKernelReferGlobalProfiler
[x/70] regs: 00000000
[x/64] regs: 00000000
[x/57] regs: 00000000
[x/57] regs: 00000000
[x/57] regs: 00000000
[x/56] regs: 00000000
[x/57] regs: 00000000
[x/55] regs: 00000000

Source code:

	CHECKPOINT_ENABLE_TIME = 1;
	int i;

	checkpointNext("sceKernelReferThreadProfiler");
	for (i = 0; i < 8; i++) {
		PspDebugProfilerRegs *regs = sceKernelReferThreadProfiler();
		checkpoint("regs: %08x", regs);
	}

	checkpointNext("sceKernelReferGlobalProfiler");
	for (i = 0; i < 8; i++) {
		PspDebugProfilerRegs *regs = sceKernelReferGlobalProfiler();
		checkpoint("regs: %08x", regs);
	}
	return 0;

But I don't get it, how can those timings be microseconds? 4000 us (4 ms) seems crazy long just for calling checkpointNext?

I'm really confused now how to get emulation and the PSP to match here... Adding such long delays just breaks it.

hrydgard added a commit that referenced this issue Jul 8, 2023
This fixes timing in the MLB games, but is inconsistent with testing :(

See issue #17623
@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Jul 8, 2023

Calling checkpoint() sometimes causes a flush, causes drawing to the screen, causes a lot of things. So calling it each time defeats the purpose of using many calls to reduce the overhead cost. Should also compare test.py output to deduct overhead (we don't really cost as much to draw to the screen, as you know, though.)

I used this:

void t_sceKernelReferThreadProfiler() {
	sceKernelReferThreadProfiler();
	sceKernelReferThreadProfiler();
	sceKernelReferThreadProfiler();
	sceKernelReferThreadProfiler();
	sceKernelReferThreadProfiler();
	sceKernelReferThreadProfiler();
	sceKernelReferThreadProfiler();
	sceKernelReferThreadProfiler();
}

void t_sceKernelReferGlobalProfiler() {
	sceKernelReferGlobalProfiler();
	sceKernelReferGlobalProfiler();
	sceKernelReferGlobalProfiler();
	sceKernelReferGlobalProfiler();
	sceKernelReferGlobalProfiler();
	sceKernelReferGlobalProfiler();
	sceKernelReferGlobalProfiler();
	sceKernelReferGlobalProfiler();
}

#define NUM_SAMPLES 128
u64 base[NUM_SAMPLES + 1];
u64 timings[NUM_SAMPLES + 1];

void doTiming(void func(), void cleanup()) {
	int i;
	timings[0] = sceKernelGetSystemTimeWide();
	for (i = 1; i <= NUM_SAMPLES; ++i) {
		func();
		timings[i] = sceKernelGetSystemTimeWide();
		if (cleanup != NULL) {
			cleanup();
		}
	}
}

void calibrateFunc() {
}

void calibrate() {
	doTiming(&calibrateFunc, &calibrateFunc);
	memcpy(base, timings, sizeof(base));
}

int averageCycles(void func()) {
	doTiming(func, NULL);

	int i;
	// Well, sum for average.
	u64 average = 0;
	for (i = 1; i <= NUM_SAMPLES; ++i) {
		u64 elapsed = timings[i] - timings[i - 1];
		u64 overhead = base[i] - base[i - 1];
		if (elapsed > overhead) {
			average += elapsed - overhead;
		} else {
			schedf("warning: too quick to measure?\n");
		}
	}

	flushschedf();
	return (int) (average / NUM_SAMPLES);
}

int averageCyclesCleanup(void func(), void cleanup()) {
	doTiming(func, cleanup);

	int i;
	// Well, sum for average.
	u64 average = 0;
	for (i = 1; i <= NUM_SAMPLES; ++i) {
		u64 elapsed = timings[i] - timings[i - 1];
		u64 overhead = base[i] - base[i - 1];
		if (elapsed > overhead) {
			average += elapsed - overhead;
		} else {
			schedf("warning: too quick to measure?\n");
		}
	}

	flushschedf();
	return (int) (average / NUM_SAMPLES);
}

...

	schedf("sceKernelReferThreadProfiler: %d\n", averageCycles(&t_sceKernelReferThreadProfiler));
	schedf("sceKernelReferGlobalProfiler: %d\n", averageCycles(&t_sceKernelReferGlobalProfiler));

Which is how I typically measure these things. Still slow, but I get an average of this at 111Mhz:

sceKernelReferThreadProfiler: 12
sceKernelReferGlobalProfiler: 11

This is per 8, though, so hleEatCycles(150) looks like a reasonable starting point... running the test for that gives 12 for both, so probably better to err a bit lower to be safe. hleEatCycles(140); seems good and outputs 11 from the test for each.

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented Jul 8, 2023

Ah, I see, didn't realize the checkpoints had such overhead. Now this makes sense.

I think we'll just go with your new and improved timings, indeed - the game performs fine and things now line up.

@hrydgard
Copy link
Owner Author

Closing, reported fixed after #17677

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HLE/Kernel Kernel, memory manager, other HLE issues Performance Unexpected slow performance issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants