Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option for using a different governor for integrated GPUs #179

Merged
merged 4 commits into from
Jan 10, 2020

Conversation

gfxstrand
Copy link
Contributor

Overview

This PR adds two new configuration options: igpu_desiredgov and igpu_power_threshold which allow for a different CPU governor when the Intel integrated GPU is under load. This currently only applies to Intel integrated GPUs and not AMD APUs because it uses the Intel RAPL infrastructure for getting power information. If on a platform that without an Intel integrated GPU or where the kernel does not support RAPL, the new options will be ignored and it will fall back to the old behavior.

One of the core principals of gamemode to date has been that, when playing a game, we want to use the "performance" CPU governor to increase CPU performance and prevent CPU-limiting. However, when the integrated GPU is under load, this can be counter-productive because the CPU and GPU share a thermal and power budget. By throwing the CPU governor to "performance" game mode currently makes the CPU frequency management far too aggressive and it burns more power than needed. With a discrete GPU, this is fine because the worst that happens is a bit more fan noise. With an integrated GPU, however, the additional power being burned by the CPU is power not available to the GPU and this can cause the GPU to clock down and lead to significantly worse performance.

By using the "powersave" governor instead of the "performance" governor while the integrated GPU is under load, we can save power on the CPU side which lets the GPU clock up higher. On my Razer Blade Stealth 13 with an i7-1065G7, this improves the performance of "Shadow of the Tomb Raider" by around 25-30% according to its internal benchmark mode.

Design

The design used in this PR makes some design trade-offs which may or may not be the right ones but I think it's enough to kick off the discussion. The ideal design would probably be

if (game_uses_integrated) {
    governor = "powersave";
} else {
    governor = "performance";
}

However, we don't actually know that information in the gamemode daemon.

The approach taken by this PR is to instead try to figure out if the integrated GPU is under a reasonably heavy load. It does this by using the Intel RAPL framework to figure out how much power is being burned by the CPU and GPU. The theory is that if the GPU is burning significant power relative to the CPU then we should use the powersave governor to try and give it more.

This approach has a number of problems:

  1. I assumed in this MR that the right threshold is if the GPU is burning more than 50% as much power as the CPU. However, for a lighter-weight GPU that may be too low. For a light-weight processor like an i3 or i5, it could end up being too high if the GPU is being used for compositing but you really want that little CPU to pull its weight.
  2. The algorithm is dynamic so it may be switching your governor back-and-forth. If the threshold is set at the wrong spot, it could theoretically do this quite often and yield inconsistent performance.
  3. The algorithm is dynamic so if things aren't configured right, it can end up popping up a little authorization box every time it wants to switch governors rather than once when the game starts. This gets annoying quickly.

It also has some significant upsides:

  1. Because it doesn't look at the static configuration of the system, it works even if you have a system with both a discrete GPU and an integrated GPU and it makes the right choice regardless of which you use without relying on changing your configuration.
  2. The algorithm is dynamic so it will throw your CPU into "performance" mode while sitting at a loading screen compiling shaders but then go into "powersave" within 5 seconds of getting into the game itself.

Other design ideas

When looking into this, I considered a few other ideas which I think are worth enumerating.

  1. Only apply the optimization if the game is using the integrated GPU. This is the idea I mentioned above. The problem is that the app doesn't pass that information to gamemode. We could add it to the dbus protocol but there are already many games shipping with gamemode integration and they would never get the fix. Even if we did try to do that, it'd be tricky to implement it in a consistent way because the vendor strings the app gets from something like GL might not translate well.
  2. Only apply the optimization if the only GPU is integrated. This is one idea which @aejsmith and I threw around when discussing this problem a week ago or so. There are two problems with it. First is that it doesn't work on a hybrid setup so anyone who wants to play a game on their integrated card while they're on battery is toast. The second is that it's actually rather annoying to figure out what your GPU topology looks like from a daemon. It can be done but it's not as fun as one would like.
  3. Look at what GPU devices the app opens. We have the PID of the client which connected to gamemode so we could theoretically look at /proc and see what files it has open to try and figure out which GPU is in use. However, this assumes that gamemode will be started by the process using the GPU rather than a wrapper, that the process is using the GPU before it talks to gamemode, and that the process only opens one GPU.
  4. Look at GPU clock rates. This has the same problems as the power approach only it's a bit worse because the only APIs the kernel provides for querying those rates are instantaneous and so we'd have to poll them fairly often and look for patterns. This, in and of itself, could get CPU intensive.
  5. Look for some sort of GPU busy metric. Unfortunately, we don't have something like that easily exposed through any kernel interfaces at the moment. Also, the obvious metrics we get from the hardware all require the OA unit which slows the whole GPU down when it's sampling.

I'm very much open to other design ideas. I'm trying to enumerate as many as I can here to get the conversation going. However, we need to come up with something because gamemode is actively hurting performance on integrated GPUs. With distros shipping it and games enabling it in the hopes of better Linux perf, that's really bad for all the laptop users out there. 😦

Copy link
Contributor

@mdiluz mdiluz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, Jason, this is an awesome investigation and PR. Apologies if my code had been hurting performance on integrated chips up until now. Perhaps that's a callout to implement some smaller benchmarking backend to make sure no changes here cause problematic perf for a number of setups.

You make a good argument that this is the most pragmatic solution for now, but were you able to confirm that once we hit this power threshold and swap back to powersave we don't hit the CPU-based frame pacing issues that plagued that governor? The protective factor might be that when the threshold is met, the GPU is doing some significant amount of work, so in practice, that may not be a situation where there'd be worry about feeding the GPU more work.

Anyway, I've added some minor feedback, but the jist of the PR seems solid enough that it'd be great to merge.


double ratio = (double)igpu_energy_delta_uj /
(double)cpu_energy_delta_uj;
if (ratio > threshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your input you mention that it'd be possible to be leveled at about the threshold, causing a flip flop. Would it make sense to have an allowance for the threshold? I'm unsure what the variance over time of these energy deltas looks like, so I don't know what the value should be, but maybe it could look something like this:

const double buffer = 0.1f;
if( ratio > threshold + buffer ) {
  // GAME_MODE_GOVERNOR_IGPU_DESIRED
} else if ( ratio < threshold - buffer ) {
  // GAME_MODE_GOVERNOR_DESIRED
} else {
  // Leave as before
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible but I think ultimately unlikely. Once you go above the threshold, it switches to the powersave governor at which point the power used by the CPU is likely to significantly decrease making the ratio even larger so there's a natural bias in the system which does exactly what you want. Unfortunately, I'm not sure how reliable that is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. It might be prudent to try and graph how long the response time on that switch is, but given our current polling rate is 5 seconds by default we're likely in the clear.

@gfxstrand
Copy link
Contributor Author

You make a good argument that this is the most pragmatic solution for now, but were you able to confirm that once we hit this power threshold and swap back to powersave we don't hit the CPU-based frame pacing issues that plagued that governor? The protective factor might be that when the threshold is met, the GPU is doing some significant amount of work, so in practice, that may not be a situation where there'd be worry about feeding the GPU more work.

Typically, with an integrated GPU, feeding it isn't nearly as much of an issue. Even with our latest GPUs (Ice Lake is significantly more powerful than anything we've shipped previously), a game like Tomb Raider can typically keep it fed without requiring the full CPU power budget to do so. In my experiments with SotTR, I was able to limit my CPU to as low as 500-600 Mhz without hurting graphics performance. Of course, that's just one game and one laptop, but I think the point still stands that when trying to get the best performance, the GPU is typically going to be the throttle point with Integrated graphics.

Also, it's worth pointing out that the names "performance" and "powersave" are somewhat misunderstood these days. The "powersave" of today is not at all the same as the "powersave" of yesteryear when we had the 5 different governors. The "performance" governor basically means "give me maximum performance at all costs" whereas the "powersave" governor isn't "save power at all costs; I'm on battery", it's a more balanced "don't burn more power than you need to." My understanding (I may be wrong) is that we've been recommending the "powersave" governor for basically everything including desktops, laptops that are plugged in, and servers. It's not quite as eager to scale up as "performance" but it should give just as much performance in the aggregate.

None of that really answers the question of frame pacing issues. With a discrete GPU, you want the CPU to immediately jump into action the moment the GPU is ready for more work. With an integrated GPU, you want this as well but you don't want it at the cost of your power budget. Because the CPU governor is still the one in control today, if the game starts doing significant CPU work, the CPU will clock up and take power from the GPU if it needs to do so; it just won't over-compensate like it does with "performance". The balancing issues are something we're investigating but any improvements there are going to be in the kernel pstate code and a prerequisite for that working is for gamemode to stop forcing the CPU governor into super-aggressive mode.

@mdiluz
Copy link
Contributor

mdiluz commented Dec 18, 2019

Great, you've cleared up my worries.

The balancing issues are something we're investigating but any improvements there are going to be in the kernel pstate code and a prerequisite for that working is for gamemode to stop forcing the CPU governor into super-aggressive mode.

Good to hear this. Shifting GameMode away from messing with the CPU governor is a wish of mine as well - performance was always just a brute force method to work around a problematic hiccup.

@gfxstrand
Copy link
Contributor Author

  1. The algorithm is dynamic so if things aren't configured right, it can end up popping up a little authorization box every time it wants to switch governors rather than once when the game starts. This gets annoying quickly.

This still has me a bit worried. Is the pop-up notification expected in some setups or is it just a weird thing that I'm experiencing because I'm running a development build? I'm just building with bootstrap.sh and not doing anything interesting there.

@aejsmith
Copy link
Contributor

That happens sometimes if you reinstall without restarting polkit. Should go away if you restart it.

Might be worth adding systemctl restart polkit in bootstrap.sh...

@gfxstrand
Copy link
Contributor Author

Might be worth adding systemctl restart polkit in bootstrap.sh...

I'll add a commit for that.

@gfxstrand gfxstrand force-pushed the igpu branch 2 times, most recently from 4f80b4d to 8780f0e Compare December 18, 2019 20:09
@gfxstrand
Copy link
Contributor Author

I think I've addressed all the comments at this point. However, before we land anything, I'd like to play around with my laptop pile a bit and see what I think of that 0.5 figure for the default threshold. It mostly seems to work on my Ice Lake but every laptop is going to be different.

@gfxstrand gfxstrand changed the title WIP: Add an option for using a different governor for integrated GPUs Add an option for using a different governor for integrated GPUs Dec 19, 2019
@gfxstrand
Copy link
Contributor Author

I dug through my entire Laptop pile and ran some tests with power measurements this afternoon. On each laptop, I tested the following:

  1. CPU power at idle
  2. GPU power at idle
  3. GPU power just compositing. For this, I ran glxgears -fullscreen with vsync enabled on a 1920x1080 display. This forces gnome-shell to re-composite every frame and should be a reasonably good proxy for the case where a discrete GPU is doing most of the work and the integrated GPU is just compositing.
  4. CPU power under full single-threaded load. For this, I did a single-threaded compile.
  5. GPU power under full load. For this, I used the Unigine Valley benchmark.

I ran each of those tests on the following hardware:

  • Lenovo X220 (Sandybridge GT2)
  • Lenovo X230 (Ivybridge GT2)
  • Gigabyte Brix (Haswell GT3e)
  • Dell XPS 13 (Broadwell GT2)
  • Dell XPS 13 (Skylake GT2)
  • Skull Canyon NUC (Skylake GT4e)
  • Razer Blade Stealth 13 (Icelake -G7)

While I don't think I can actually share my raw data (we have rules about that sort of thing), I think I can share the following findings:

  1. GPU power for the compositing case never broke 1 Watt on any platform. Most cases were significantly below that. If we wanted, we could probably key off that instead of iGPU/CPU ratio and probably be fine.
  2. Max GPU power is generally lower than max CPU power. In partiular, the old 0.5 number was a bit on the high side because, with the "performance" governor, you could end up in a GPU-limited case where the CPU was still taking 2x the power.
  3. Sometimes, when the GPU is close to idle, the GPU power is so low that it reports zero power used in a 5s window. I've dropped the error message for this case. It now silently turns into a iGPU/CPU power ratio of 0 which is below the threshold and you get the "performance" governor.
  4. From what I can tell, 0.3 or so is probably the sweet spot. Any game with even moderate CPU usage where the integrated GPU is doing nothing but compositing should have a iGPU/CPU ratio lower than 0.3 and any case with significant iGPU load should have an iGPU/CPU ratio above 0.3. I've changed the default to 0.3.

@eero-t
Copy link

eero-t commented Dec 20, 2019

Few minor comments:

  • If you used i965 instead of Iris, most real workloads are at least slightly CPU limited with it. This includes Valley. Better benchmark for testing full GPU utilization is e.g. GpuTest FurMark. Or if you want to use Unigine benchmarks, Heaven is less CPU bound than Valley
  • Even for iGPU, power usage matters for performance only when workload gets TDP limited. On desktop machines with high enough TDP limit (e.g. i5-6600K) and cooling matching that, iGPU doesn't get throttled. There powersave governor just saves power
  • Increasing TDP limit from BIOS and that change being visible on Linux side RAPL limits, doesn't guarantee firmware actually using the new limit (so far I've seen that issue only with non-Intel BIOS versions)
  • Only reliable way I've found to see whether iGPU is throttled, is to ftrace gpu frequency change requests done by kernel, and compare that to CAGF (Current Actual GPU Frequency) value sampled from sysfs. If CAGF is lower than currently active kernel freq request, firmware is throttling GPU
  • RAPL data isn't always correct. At least on GEN9, it still needs to be calibrated for each new chip type. That can happen as late as when the HW comes publicly available, at which point distro kernels are too old to include that support. Before that, RAPL either doesn't provide data, or data can be bogus
  • I don't think it really matters for this, but RAPL "uncore" value (also reported by i915 kernel module as GPU power consumption), isn't just about GPU, AFAIK it's also other things on chip (e.g. LLC) besides CPU cores. I.e. in theory "uncore" may show slight power consumption also without GPU being used
  • There are some knobs in sysfs for tuning how eagerly P-state tries to save power: https://www.kernel.org/doc/html/latest/admin-guide/pm/intel_pstate.html#user-space-interface-in-sysfs

@stephanlachnit
Copy link
Contributor

stephanlachnit commented Dec 27, 2019

Even for iGPU, power usage matters for performance only when workload gets TDP limited. On desktop machines with high enough TDP limit (e.g. i5-6600K) and cooling matching that, iGPU doesn't get throttled. There powersave governor just saves power

Good point. Maybe it's a better idea to check the power consumption with Intel RAPL instead of the GPU usage? Meaning to switch to a different governor only if the you hit the TDP limit, and if the gpu usage is lower than ~70% for example.

Maybe this code helps for implementing: https://github.com/kitsunyan/intel-undervolt

@gfxstrand
Copy link
Contributor Author

gfxstrand commented Jan 2, 2020

  • If you used i965 instead of Iris, most real workloads are at least slightly CPU limited with it. This includes Valley. Better benchmark for testing full GPU utilization is e.g. GpuTest FurMark. Or if you want to use Unigine benchmarks, Heaven is less CPU bound than Valley

Yes, I'm aware that i965 and some of the unigine benchmarkes have some CPU stalling issues particularly with unsynchronized maps. However, my objective of these explorations was to get a rough idea of how much power the GPU burns under load. For this, it was more important to get something which lights up as much of the GPU as possible rather than something which is guaranteed to be GPU limited. If it's not 100%, that's fine. Again, I'm going for "rough idea" here not an exact measurement of max GPU power consumption.

  • Even for iGPU, power usage matters for performance only when workload gets TDP limited. On desktop machines with high enough TDP limit (e.g. i5-6600K) and cooling matching that, iGPU doesn't get throttled. There powersave governor just saves power

Sure, if you're gaming on a desktop GT2, powersave may not get you any better throughput. However, even with a 100W+ TDP, if you have a GT3 or GT4 in a desktop with good cooling, you can still end up TDP limited. You probably have to be running the CPU pretty hard to get there but that's exactly what the performance governor does (run things hard). 😄

  • Increasing TDP limit from BIOS and that change being visible on Linux side RAPL limits, doesn't guarantee firmware actually using the new limit (so far I've seen that issue only with non-Intel BIOS versions)

That is true and also completely irrelevant to this MR.

  • Only reliable way I've found to see whether iGPU is throttled, is to ftrace gpu frequency change requests done by kernel, and compare that to CAGF (Current Actual GPU Frequency) value sampled from sysfs. If CAGF is lower than currently active kernel freq request, firmware is throttling GPU

That probably works but it also requires running with ftrace and polling CAGF at a fairly high rate. Those two things are likely to put additional CPU load on the system and possibly make any power problems we have worse.

  • RAPL data isn't always correct. At least on GEN9, it still needs to be calibrated for each new chip type. That can happen as late as when the HW comes publicly available, at which point distro kernels are too old to include that support. Before that, RAPL either doesn't provide data, or data can be bogus

Yes, and if the data hasn't been calibrated, the RAPL driver in the kernel won't expose anything via sysfs and the code in this MR will log the lack of RAPL data and fall back to the old gamemode behavior. When the user finally does get a new enough kernel, the iGPU stuff will kick on and they'll start getting better perf.

  • I don't think it really matters for this, but RAPL "uncore" value (also reported by i915 kernel module as GPU power consumption), isn't just about GPU, AFAIK it's also other things on chip (e.g. LLC) besides CPU cores. I.e. in theory "uncore" may show slight power consumption also without GPU being used

Sure. However, I doubt USB or whatever other things they put on "uncore" burn that much power. It's close enough that our power management people have labled it "GFXWatt" in turbostat....

If some user wants to play with that, they're more than welcome to. However, I'm happy to leave pstate tuning up to our (Intel's) power management people. Right now the problem we have is that gamemode is overriding that tuning and throwing it into "burn all the power" mode.

I'm not sure what if anything from that list of bullet points is actionable. I'm not going to claim that looking at GPUWatt/CPUWatt is the best solution for over-all power management. However, it does seem to be fairly effective way to detect when the iGPU is under non-trivial load so gamemode will stop overriding the pstate driver and throwing the CPU into "burn all the power" mode. If we want to do finer tuning of pstate based on GPU usage, that really needs to happen in the kernel.

@gfxstrand
Copy link
Contributor Author

gfxstrand commented Jan 2, 2020

Even for iGPU, power usage matters for performance only when workload gets TDP limited. On desktop machines with high enough TDP limit (e.g. i5-6600K) and cooling matching that, iGPU doesn't get throttled. There powersave governor just saves power

Good point. Maybe it's a better idea to check the power consumption with Intel RAPL instead of the GPU usage? Meaning to switch to a different governor only if the you hit the TDP limit, and if the gpu usage is lower than ~70% for example.

That doesn't work. The TDP reported by BIOS and exposed via RAPL isn't the actual power at which things will start throttling. On my laptop, for instance, the BIOS reports a TDP of 35W even though the chip is only rated at 25W and actual throttling happens even lower yet. Throttling can happen due to any number of reasons including BIOS limits, motherboard power delivery limitations, cooling limits, internal chip limits, and others. The BIOS limit we have via RAPL is only one of the many limits. There's no way to objectively ask from software "Am I using 70% of the available power?" without significant tuning to your specific machine. Maybe for laptops one could build a database but that's not a practical general solution.

@mdiluz
Copy link
Contributor

mdiluz commented Jan 2, 2020

Thanks Jason, my vote goes to this more pragmatic solution as is.

The thing is, philosophically, the more complicated you go in analyzing the details and reacting to them, the harder it becomes to understand the emergent behavior of the system. It also becomes incredibly easy to introduce unknown edge cases with very bad issues, rather than a simpler set of known cases with easier to handle problems. I'd prefer GameMode stays on the simpler side, and we leave the more complicated things to driver code that has a much better handle on it's entire problem space.

TL;DR: Less code is more.

@stephanlachnit
Copy link
Contributor

That doesn't work. The TDP reported by BIOS and exposed via RAPL isn't the actual power at which things will start throttling. On my laptop, for instance, the BIOS reports a TDP of 35W even though the chip is only rated at 25W and actual throttling happens even lower yet. Throttling can happen due to any number of reasons including BIOS limits, motherboard power delivery limitations, cooling limits, internal chip limits, and others. The BIOS limit we have via RAPL is only one of the many limits.

Well that isn't RAPLs fault but from lazy manufactors. However, you're right if things like this are common it doesn't make sense to use it.
I would still like to see an option to completely disable this feature for dekstop users.

What also would be possible instead of just assuming that one uses an integrated GPU is a script that checks during install weather the machine uses an iGPU or not, and changes the desiredgov in the default config accordingly, and maybe add some notes in the config for people who use more complicated setups (like a hybrid GPU or eGPU).

Maybe something like this:

if $(glxinfo | grep -q Intel); then
    # change default config
fi

While glxinfo | grep -q Intel isn't the best option, it works since there are no non-integrated GPUs from Intel right now, and it should be easy to extend this for AMD iGPUs.

@aejsmith
Copy link
Contributor

aejsmith commented Jan 4, 2020

Well that isn't RAPLs fault but from lazy manufactors. However, you're right if things like this are common it doesn't make sense to use it.
I would still like to see an option to completely disable this feature for dekstop users.

At least on my desktop CPU which has an integrated GPU but is disabled (i7-9700K), I don't see an "uncore" RAPL entry in sysfs (only have "core" and "dram"). As far as I can see, if uncore isn't present, then we'll fall back to the original behaviour.

I think it can also be disabled by setting igpu_power_threshold to a very large number in the config such that the threshold is impossible to meet. Perhaps that could be documented in the example config.

What also would be possible instead of just assuming that one uses an integrated GPU is a script that checks during install weather the machine uses an iGPU or not, and changes the desiredgov in the default config accordingly, and maybe add some notes in the config for people who use more complicated setups (like a hybrid GPU or eGPU).

Maybe something like this:

if $(glxinfo | grep -q Intel); then
    # change default config
fi

While glxinfo | grep -q Intel isn't the best option, it works since there are no non-integrated GPUs from Intel right now, and it should be easy to extend this for AMD iGPUs.

While this may work for installing from source, it wouldn't help for distro packages unless they all made sure their post install scripts did this. I don't think it's reasonable to expect that to be done, plus it's also entirely possible that the package installation is being done from an environment where glxinfo won't work (anywhere $DISPLAY isn't set).

@stephanlachnit
Copy link
Contributor

stephanlachnit commented Jan 4, 2020

At least on my desktop CPU which has an integrated GPU but is disabled (i7-9700K), I don't see an "uncore" RAPL entry in sysfs (only have "core" and "dram"). As far as I can see, if uncore isn't present, then we'll fall back to the original behaviour.

You can still have the iGPU enabled without using it as your main graphics card, for example to power a secondary monitor. Most mainboards disable the iGPU by default if a GPU is installed, some don't. Relying on this is not a very clean solution imho.

I think it can also be disabled by setting igpu_power_threshold to a very large number in the config such that the threshold is impossible to meet. Perhaps that could be documented in the example config.

But isn't it pointless then to check the power threshold all the time? It shouldn't be too hard to add a option to the config file to disable this.

While this may work for installing from source, it wouldn't help for distro packages unless they all made sure their post install scripts did this. I don't think it's reasonable to expect that to be done, plus it's also entirely possible that the package installation is being done from an environment where glxinfo won't work (anywhere $DISPLAY isn't set).

Distro packages are a problem here, that's true. Using glxinfo was just an idea, one could also use for example lspci -v | grep -q Intel.*Graphics.*"[VGA controller]".
If lspci doesn't show [VGA controller] at the end of a GPU entry it shouldn't be the active one, but I have to double check this with my hybrid GPU.
Actually this test could also be done every time the daemon starts. The config could add an option which looks something like igpu_power_behavior={on,off,auto}, with auto being the default, checking for an iGPU every time the daemon starts.

Edit: at least for me it sadly doesn't work with hybrid GPUs, even with a fullscreen application lspci still shows the iGPU as the main one. However hybrid GPUs are an edge case anyway, with a normal discrete setup it should work fine, would be nice if someone can test this. Maybe there is a solution which is more elegant than lspci.

Edit2: I did a little bit of research and found this. If one has the pid (which the daemon should know afaik), one can check which GPU the process is using. It's not super trivial, but it should be somewhat clean.

First one checks if one has an iGPU, and if yes extract the PCI-ID. Then one checks with lsof -p ${pid} | grep /dev/dri which card the process is using and matches it with the device ID in /dev/dri/by-path. Now one knows which GPU the process uses and change the governor if it's an iGPU.
This even works with my hybrid GPU setup, though I don't know any trivial way to check for an internal GPU. The name should be enough, but I don't know how to find the name in sysfs.

@gfxstrand
Copy link
Contributor Author

That doesn't work. The TDP reported by BIOS and exposed via RAPL isn't the actual power at which things will start throttling. On my laptop, for instance, the BIOS reports a TDP of 35W even though the chip is only rated at 25W and actual throttling happens even lower yet. Throttling can happen due to any number of reasons including BIOS limits, motherboard power delivery limitations, cooling limits, internal chip limits, and others. The BIOS limit we have via RAPL is only one of the many limits.

Well that isn't RAPLs fault but from lazy manufactors. However, you're right if things like this are common it doesn't make sense to use it.

No, it's just reality. How much power you can burn is also determined by cooling which can be affected by things such as ambient temperature, cooling rig (not a variable for laptops), how good your thermal paste is (can change with age), and how much dust you've accumulated in your heat sink. There is absolutely no way for the manufacturer to provide accurate data for this even if they wanted to.

I would still like to see an option to completely disable this feature for dekstop users.

Set igpu_power_threshold=-1. Maybe that should be documented?

But isn't it pointless then to check the power threshold all the time? It shouldn't be too hard to add a option to the config file to disable this.

I already thought of this. If you set it to -1, it will always use desiredgov and never query RAPL. Even if it does query RAPL, it's a few sysfs reads every 5 seconds; it's not like that actually costs much.

What also would be possible instead of just assuming that one uses an integrated GPU is a script that checks during install weather the machine uses an iGPU or not, and changes the desiredgov in the default config accordingly, and maybe add some notes in the config for people who use more complicated setups (like a hybrid GPU or eGPU).

What if your add or remove a card? What if you move your drive between machines? Any sort of install-time hardware detection is sketchy at best.

At least on my desktop CPU which has an integrated GPU but is disabled (i7-9700K), I don't see an "uncore" RAPL entry in sysfs (only have "core" and "dram"). As far as I can see, if uncore isn't present, then we'll fall back to the original behaviour.

You can still have the iGPU enabled without using it as your main graphics card, for example to power a secondary monitor. Most mainboards disable the iGPU by default if a GPU is installed, some don't. Relying on this is not a very clean solution imho.

It doesn't rely on that. If you aren't using the iGPU, it will use little to no power and you'll get the old behavior. Even if you are using the iGPU but you're only using it for compositing and using the discrete GPU for rendering, the 0.3 threshold is high enough that it should consider that as being in the discrete GPU case and you'll get desiredgov. I've tested this thoroughly on my laptop. When I'm running with the NVIDIA card with the Intel compositing, I get desiredgov and when I'm running the game on the Intel cared, I get igpu_desiredgov.

While glxinfo | grep -q Intel isn't the best option, it works since there are no non-integrated GPUs from Intel right now

Not right now but they're coming. We don't want anything to be done based on hard-coding "Is it Intel?" because that's just a ticking time-bomb.

First one checks if one has an iGPU, and if yes extract the PCI-ID. Then one checks with lsof -p ${pid} | grep /dev/dri which card the process is using and matches it with the device ID in /dev/dri/by-path. Now one knows which GPU the process uses and change the governor if it's an iGPU.

I also considered that strategy and rejected it for two reasons: First, it's complicated and complicated things have a tendency to break in strange ways in the wild. Second, and more importantly, there's no way to guarantee that the app only opens one GPU. I think it's usually true with OpenGL that the app will only open one GPU but with Vulkan, it's not. The Vulkan loader opens every ICD on the system and presents them to the app and the app is free to pick-and-choose which devices it wants to use and possibly use both. From the perspective of lsof tricks, they're both open. There's no way to tell which of those GPUs the app is actually using except by trying to see which one is busy by, say, looking at power consumption.

@mdiluz
Copy link
Contributor

mdiluz commented Jan 4, 2020

Maybe that should be documented?

Yes please :)

Other than that, my previous comments stand. I especially agree with not assuming Intel is a iGPU. That's asking for trouble, once you add one bad assumption to a program, someone is bound to rely on that assumption elsewhere too, and be for you know it, you have computers unable to handle new millenniums.

@stephanlachnit
Copy link
Contributor

stephanlachnit commented Jan 4, 2020

Set igpu_power_threshold=-1. Maybe that should be documented?

Ah, that makes sense.

Not right now but they're coming. We don't want anything to be done based on hard-coding "Is it Intel?" because that's just a ticking time-bomb.

My code is an example, you could easily test it against stuff like checking if there is "GT" inside the name, or whatever, same goes for checking AMD iGPUs btw.
Assuming that the first device exposed by RAPL could go wrong with Intel dGPUs as well or if they start using chiplets as well. Not that I think it's going to be a problem, code can always be adjusted.

Second, and more importantly, there's no way to guarantee that the app only opens one GPU.

Yeah that is problematic, didn't think of that. Too bad that there is no easy way to check all this stuff.

@gfxstrand
Copy link
Contributor Author

Maybe that should be documented?

Yes please :)

Done.

My code is an example, you could easily test it against stuff like checking if there is "GT" inside the name, or whatever, same goes for checking AMD iGPUs btw.

I don't want to trust in any sort of name matching for a few reasons:

  1. I don't know what the discrete GPUs will be called so I don't know that whatever name matching thing I put in there will be valid.
  2. Even if I did know what they would be called, I wouldn't be able to say in a public forum until very close to launch.
  3. There's no real guarantee of stability or sensibility when it comes to marketing names. Intel uses "HD Graphics", "UHD Graphics", and "Iris", and "Iris Pro" and, while they roughly correspond to "how powerful is your GPU?" for any given generation, from a purly technical perspective, they're all over the map. They also come out with new ones from time to time. "UHD Graphics" and "Iris Pro" were both introduced with Kaby Lake IIRC even though it's really no different from Skylake (the previous generation). I also wouldn't at all be surprised if we end up with discrete and integrated GPUs with very similar names because someone in marketing wants to say "this is almost as good as discrete".

Assuming that the first device exposed by RAPL could go wrong with Intel dGPUs as well or if they start using chiplets as well. Not that I think it's going to be a problem, code can always be adjusted.

I highly doubt that RAPL will be used for discrete GPUs. So far, it's entirely used for things that are part of the CPU package and, therefore, share it's power budget. I expect if we report any power usage for discrete GPUs that it'll be reported through the graphics driver. That said, I can't make any guarantees about it so there is always the possibility that things will have to be re-thought. However, I think "if it's in RAPL it's integrated" is probably a fairly safe assumption at the moment.

In general, future-proofing this for Intel discrete GPUs is tricky. What I do know is that matching "intel" is guaranteed to be wrong and any other sort of name matching is likely to be wrong eventually.

@stephanlachnit
Copy link
Contributor

Yeah I guess this the best option for the average user right now, stuff like hybrid GPUs or Desktop iGPU setups are probably pretty rare anyway, so better this than a hacky solution that doesn't work properly. Thanks for the work 👍

Btw @mdiluz how are the chances that 1.6 (with this merged) is released in time before the Ubuntu 20.04 import freeze in late February? Since 20.04 is an LTS it's probably used by a lot of people who game, so it would be nice to have this and the latest improvements included.

@mdiluz
Copy link
Contributor

mdiluz commented Jan 5, 2020

@stephanlachnit I no longer work at Feral, so can't guarantee anything, but will ping someone who does.

@eero-t
Copy link

eero-t commented Jan 8, 2020

How much power you can burn is also determined by cooling which can be affected by things such as ambient temperature, cooling rig (not a variable for laptops), how good your thermal paste is (can change with age), and how much dust you've accumulated in your heat sink. There is absolutely no way for the manufacturer to provide accurate data for this even if they wanted to.

Thermal Design Power (TDP) limit set for a given chip is supposed to reflect cooling capability of the machine where it will be / is installed. If machine is getting temperature limited, I think user really needs to fix the cooling, as higher temperatures are going to degrade the CPU/iGPU faster.

Fixing cooling is also one of the cheapest ways to significantly improve (and stabilize) performance. Could the daemon show some note to the user about that when it detects things being temperature limited?

@gfxstrand
Copy link
Contributor Author

How much power you can burn is also determined by cooling which can be affected by things such as ambient temperature, cooling rig (not a variable for laptops), how good your thermal paste is (can change with age), and how much dust you've accumulated in your heat sink. There is absolutely no way for the manufacturer to provide accurate data for this even if they wanted to.

Thermal Design Power (TDP) limit set for a given chip is supposed to reflect cooling capability of the machine where it will be / is installed. If machine is getting temperature limited, I think user really needs to fix the cooling, as higher temperatures are going to degrade the CPU/iGPU faster.

Roughly, yes. However, that doesn't mean that it actually does and that we can rely on anything the OEM decides to advertise. Also, even if your system isn't broken (clogged fan etc.) there's still going to be a difference between when it's sitting on your lap vs. a table vs. a cooling pad. My point with that comment was that the best the OEM can provide is a rough estimate. The actual cooling abilities are environmentally dependent.

Fixing cooling is also one of the cheapest ways to significantly improve (and stabilize) performance. Could the daemon show some note to the user about that when it detects things being temperature limited?

No, there's no way to accurately detect that. There are too many components involved that are all trying to manage power and adjusting things up and down for various reasons. There's no way that we can reliably detect that your power management problems are due to cooling in a way that would actually be useful to users. Even if we could, this daemon would be the wrong place for that.

@eero-t
Copy link

eero-t commented Jan 8, 2020

No, there's no way to accurately detect that. There are too many components involved that are all trying to manage power and adjusting things up and down for various reasons. There's no way that we can reliably detect that your power management problems are due to cooling in a way that would actually be useful to users.

There are temp & power and their limit value files hwmon sysfs, and alarm files telling whether they were triggered:
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface.rst

However, although I see HWMon sysfs input files showing first Core and then Package temperature to get close (99C) to max/crit temperature values (100C) [1], and as results dmesg showing following for SkullCanyon:

...
[  859.103328] mce: CPU2: Core temperature above threshold, cpu clock throttled (total events = 1)
[  859.103331] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
...

All the alarms show still zero:

$ head /sys/class/hwmon/*/*alarm
==> /sys/class/hwmon/hwmon1/temp1_crit_alarm <==
0

==> /sys/class/hwmon/hwmon1/temp2_crit_alarm <==
0

==> /sys/class/hwmon/hwmon1/temp3_crit_alarm <==
0

==> /sys/class/hwmon/hwmon1/temp4_crit_alarm <==
0

==> /sys/class/hwmon/hwmon1/temp5_crit_alarm <==
0

[1] I assume *_input value never hits _max/_crit value, because of throttling, one sees only it getting very close.

And I assume that on this HW, alarms are triggered only if limits are exceeded.

=> While checking *_alarm files shouldn't be harmful, they don't seem useful on this HW.

(It's easy to test in a high-end NUC, just block ventilation holes and run some heavy loads.)

Then there's also:
https://www.kernel.org/doc/Documentation/hwmon/acpi_power_meter.rst

"Some computers have the ability to enforce a power cap in hardware. If this is the case, the power[1-*]_cap and related sysfs files will appear. When the average power consumption exceeds the cap, an ACPI event will be broadcast on the netlink event socket and a poll notification will be sent to the appropriate power[1-*]_alarm file to indicate that capping has begun, and the hardware has taken action to reduce power consumption."

@mdiluz
Copy link
Contributor

mdiluz commented Jan 8, 2020

@eero-t this seems like a separate feature request - could we split it into a new issue?

@afayaz-feral
Copy link
Contributor

I've forced over a fix for Travis CI from master just to check that all tests pass, unfortunately there's a couple of failures. Can you rebase and run clang-format on the code? The policykit issue might be due to the CI setup, I'm not entirely sure.

I've tested this using one of our games and it behaves as expected so I'll happily merge once those issues are fixed.

This commit adds two new configuration options: igpu_desiredgov and
igpu_power_threshold which allow for a different CPU governor when the
Intel integrated GPU is under load.  This currently only applies to
Intel integrated GPUs and not AMD APUs because it uses the Intel RAPL
infrastructure for getting power information.  If on a platform that
without an Intel integrated GPU or where the kernel does not support
RAPL, the new options will be ignored and it will fall back to the old
behavior.

One of the core principals of gamemoded to date has been that, when
playing a game, we want to use the "performance" CPU governor to
increase CPU performance and prevent CPU-limiting.  However, when the
integrated GPU is under load, this can be counter-productive because the
CPU and GPU share a thermal and power budget.  By throwing the CPU
governor to "performance" game mode currently makes the CPU frequency
management far too aggressive and it burns more power than needed.  With
a discrete GPU, this is fine because the worst that happens is a bit
more fan noise.  With an integrated GPU, however, the additional power
being burned by the CPU is power not available to the GPU and this can
cause the GPU to clock down and lead to significantly worse performance.

By using the "powersave" governor instead of the "performance" governor
while the integrated GPU is under load, we can save power on the CPU
side which lets the GPU clock up higher.  On my Razer Blade Stealth 13
with an i7-1065G7, this improves the performance of "Shadow of the Tomb
Raider" by around 25-30% according to its internal benchmark mode.
@gfxstrand gfxstrand force-pushed the igpu branch 2 times, most recently from 0b6694f to 5c769fd Compare January 9, 2020 17:04
This makes it more properly install things so that you don't get the
nasty pkexec pop-ups with dev builds.
@gfxstrand
Copy link
Contributor Author

@afayaz-feral Passing now. The polkit issue was thanks to the last patch and the fact that the ubuntu container doesn't have polkit installed. I made it query for available systemd services before attempting to reboot plokit and now we're good.

@afayaz-feral afayaz-feral merged commit f62b1e8 into FeralInteractive:master Jan 10, 2020
@eero-t
Copy link

eero-t commented Jan 10, 2020

@eero-t this seems like a separate feature request - could we split it into a new issue?

Added #185.

(I won't be verifying it as I don't use gamemode myself. Somebody else needs to do that if the feature gets implemented.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants