Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression with new osd-overlay API on unchanged input #7615

Closed
FichteFoll opened this issue Apr 14, 2020 · 6 comments
Closed

Performance regression with new osd-overlay API on unchanged input #7615

FichteFoll opened this issue Apr 14, 2020 · 6 comments

Comments

@FichteFoll
Copy link

FichteFoll commented Apr 14, 2020

Important Information

  • mpv 0.31+
  • Arch Linux x86_64 (Kernel: 5.5.13)
  • Arch repos and self-built using mpv-git's PKGBUILD
  • Problem first encountered: 0728726 (bisected)
  • Window Manager: i3 4.18, xorg-server 1.20.8-1
  • GPU: AMD Radeon HD 7950
  • Drivers: mesa 20.0.4-1, xf86-video-amdgpu 19.1.0-1

I initially reported and investigated this issue in torque/mpv-progressbar#56, but am moving it here now because I'm not sure whether this change was intentional. You can also find the script I used for bisecting there.

Reproduction steps

Create the following script:

osd = mp.create_osd_overlay("ass-events")
osd.res_x = 1920
osd.res_y = 1080
osd.data = [[{\an4\fs40}OSD test]]

function tick()
    -- no changes are performed between the updates
    osd:update()
end

mp.add_periodic_timer(0.01, tick)

Set the following config:

interpolation=yes
video-sync=display-resample
deband=yes
deband-iterations=4
deband-threshold=70
deband-range=20
deband-grain=10

The config is used to emphasize the problem.

Expected behavior

GPU usage (observed through radeontop) is ~0-2% when paused.

Actual behavior

GPU usage is around 60-100% when paused.

Log file

Nothing of interest, but here it is anyway: output.log

Sample files

Should be reproducable with any file. My test files were 1080p 10-bit h264 in mkv.

@FichteFoll
Copy link
Author

FichteFoll commented Apr 14, 2020

Changing the tick from 0.01 to 1 or seems to normalize GPU usage, as does checking whether the text to be displayed has changed (as indicated in the script's comment). The reason is most definitely a removed strcmp (and comparisons for the other paremters) in 0728726#diff-d88d582039dea993b6229da9f61ba76cL530.

I will submit a PR for this that re-adds the checks shortly after.

@FichteFoll
Copy link
Author

Actually, I will not submit a PR because the code has changed since that commit and I'm unsure what the intention behind the removal of the before and after comparisons was and whether it should be the script's responsibility to only update when necessary. I'd like a maintainer to review this issue first and determine the next steps.

@ghost
Copy link

ghost commented Apr 14, 2020

Wasn't this mentioned somewhere as intentional? It avoids having to check all the parameters etc. for changes in the code.

@FichteFoll
Copy link
Author

I don't see any mention of it in the commit message, documentation, or in the current code. It was likely an intentional change, as I find it hard to imagine to stumple upon these checks and just forget about them when rewriting. It was an undocumented API, but it is a change in behavior regardless.

I believe there are two ways forward for this issue:

  1. Leave the code as-is and document that the caller is responsible for checking whether the update is necessary. Unless the caller knows for sure that the text (or other properties) couldn't have changed, he needs to compare the args on his side. Same for every other caller. This API seems to see use in tick-based patterns.
  2. Add checking back into the API. Potentially does unnecessary checks when the caller was already sure there would be changes. If he was unsure, he could check on his site first and then not call the API at all.

Both are valid options, imo. In an ideal world, 1 is the obvious choice, as it involves the least computational overhead. However, the observed impact from improper usage when the osd is refreshed frequently without any changes was unexpectedly high for a 10ms timeout. It may be worth to do this double-duty internal comparison regardless if it is relatively cheap enough.

I submitted a PR to fix the API usage for syncplay but not for the progressbar script (because that uses moonscript and I switched to a different bar for now), as those were the two offending scripts I used. I have no idea whether other scripts also relied on the old internal check.

@FichteFoll
Copy link
Author

FichteFoll commented May 6, 2020

As an update, both scripts I used that were running into this have been updated and check for redundant update calls on their side now.

@Dudemanguy
Copy link
Member

Closing this one since it is old, the change was intentional, and it seems all relevant scripts have updated long ago to take into account the new behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants