Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dashboard ignored system setting cache #21621

Merged
merged 18 commits into from
Nov 10, 2022

Conversation

lunny
Copy link
Member

@lunny lunny commented Oct 28, 2022

This is a performance regression from #18058

@lunny lunny added issue/regression Indicates a previously functioning feature or behavior that has broken or regressed after a change performance/speed performance issues with slow downs skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features. outdated/backport/v1.18 This PR should be backported to Gitea 1.18 labels Oct 28, 2022
@lunny lunny added this to the 1.19.0 milestone Oct 28, 2022
@silverwind
Copy link
Member

#21611 related?

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Oct 28, 2022
@lunny
Copy link
Member Author

lunny commented Oct 28, 2022

#21611 related?

yes

@silverwind
Copy link
Member

Not sure thought, the issue mentiones 1.17 but #18058 is only on 1.18.

@lunny
Copy link
Member Author

lunny commented Oct 29, 2022

Not sure thought, the issue mentiones 1.17 but #18058 is only on 1.18.

Wow, so that's not the same issue.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Oct 31, 2022
models/user/avatar.go Outdated Show resolved Hide resolved
lunny and others added 4 commits November 2, 2022 09:27
There's too much duplication of the cache functionality within the PR, and
looking at other models packages we're happy to use cache - so - move the
caching functionality in to models/system and models/user.

Then remove the functions from modules/system and directly call functions in
models/system

Signed-off-by: Andrew Thornton <art27@cantab.net>
…hing

Reduce duplication and move functions out of modules/system
@lunny
Copy link
Member Author

lunny commented Nov 5, 2022

@zeripath done.

@zeripath
Copy link
Contributor

zeripath commented Nov 5, 2022

OK, there are a few other problems here:

  1. The length of time these things are cached needs to be considered. These settings should probably be only cached for 5 minutes rather than the 16h that they are being cached for by default. This is particularly an issue with an external cacher as that time will persist across restarts.
  2. In the case of the internal caches like the memory or twoqueue variants if a pointer is returned from Get it is the pointer that was stored. This means that things that come out of the cache need to be immutable or treated as immutable, otherwise there will be races and they will need to be copied before being touched. Are we sure that these settings are being used immutably?
  3. Could we get away without adding this caching at all because we can just get this data out of the db once per request? I suspect we could restructure things to just make getting the settings ONCE per request very easily.
  4. I suspect there is little point in storing these settings in an external cache like redis or memcache - it seems likely that the time taken to request the data from the db would be equal to (or even potentially less than) the time taken to get this data from the cache. So really this stuff should only be being stored in some memory cache like an LRU but with a strict time requirement.

As an aside, we really need to rethink our caching infrastructure. Ideally I think what we need is a two-or-three level caching infrastructure.

  1. Request-level cache (this could be a smallish LRU) - (this may be unnecessary)
  2. Internal server cache (e.g. memory or twoqueue)
  3. External cache(s) (if present e.g. leveldb, redis, memcache etc - perhaps even a DB one could be made)

Things cached in the internal memory caches would need be considered immutable on Get - request level cache could be considered mutable because they're unlikely to affected by concurrency and the external caches would create new structs by default anyway.


So, I am now concerned that this may not be the correct solution at all and I think the issue that this data is being requested over and over again unnecessary. With a little thought we could prevent this problem entirely.

@lunny
Copy link
Member Author

lunny commented Nov 6, 2022

OK, there are a few other problems here:

  1. The length of time these things are cached needs to be considered. These settings should probably be only cached for 5 minutes rather than the 16h that they are being cached for by default. This is particularly an issue with an external cacher as that time will persist across restarts.

Why the time is important need to be changed? I think once the value changed, the cache has been updated.

  1. In the case of the internal caches like the memory or twoqueue variants if a pointer is returned from Get it is the pointer that was stored. This means that things that come out of the cache need to be immutable or treated as immutable, otherwise there will be races and they will need to be copied before being touched. Are we sure that these settings are being used immutably?

I think yes. We should always return the values as string or int64 so that they cannot be changed.

  1. Could we get away without adding this caching at all because we can just get this data out of the db once per request? I suspect we could restructure things to just make getting the settings ONCE per request very easily.

I think no, we can't remove the caching, it has produced actual affect in the main branch especially in dashboard from some users' reports. A memory cache or a redis cache operation will be nanosecond level but a database operation is microsecond level, memory/redis is fast about 10~1000 than database operations. And even in one request, the same setting maybe reused serveral times.

  1. I suspect there is little point in storing these settings in an external cache like redis or memcache - it seems likely that the time taken to request the data from the db would be equal to (or even potentially less than) the time taken to get this data from the cache. So really this stuff should only be being stored in some memory cache like an LRU but with a strict time requirement.

As above.

As an aside, we really need to rethink our caching infrastructure. Ideally I think what we need is a two-or-three level caching infrastructure.

  1. Request-level cache (this could be a smallish LRU) - (this may be unnecessary)
  2. Internal server cache (e.g. memory or twoqueue)
  3. External cache(s) (if present e.g. leveldb, redis, memcache etc - perhaps even a DB one could be made)

Things cached in the internal memory caches would need be considered immutable on Get - request level cache could be considered mutable because they're unlikely to affected by concurrency and the external caches would create new structs by default anyway.

So, I am now concerned that this may not be the correct solution at all and I think the issue that this data is being requested over and over again unnecessary. With a little thought we could prevent this problem entirely.

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (main@fd89c06). Click here to learn what that means.
The diff coverage is 46.66%.

@@           Coverage Diff           @@
##             main   #21621   +/-   ##
=======================================
  Coverage        ?   48.19%           
=======================================
  Files           ?     1030           
  Lines           ?   140403           
  Branches        ?        0           
=======================================
  Hits            ?    67664           
  Misses          ?    64629           
  Partials        ?     8110           
Impacted Files Coverage Δ
routers/web/admin/config.go 23.17% <0.00%> (ø)
modules/cache/cache.go 36.97% <25.39%> (ø)
models/user/setting.go 56.09% <52.94%> (ø)
models/avatars/avatar.go 33.62% <85.71%> (ø)
models/system/setting.go 54.59% <86.36%> (ø)
models/system/setting_key.go 100.00% <100.00%> (ø)
models/user/avatar.go 58.20% <100.00%> (ø)
modules/activitypub/user_settings.go 55.55% <100.00%> (ø)
modules/templates/helper.go 46.51% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@zeripath
Copy link
Contributor

zeripath commented Nov 9, 2022

  1. The length of time these things are cached needs to be considered. These settings should probably be only cached for 5 minutes rather than the 16h that they are being cached for by default. This is particularly an issue with an external cacher as that time will persist across restarts.

Why the time is important need to be changed? I think once the value changed, the cache has been updated.

This is not definitely the case. Imagine a clustered system with a non-clustered cache. Imagine a change to the caching configuration.

...

  1. Could we get away without adding this caching at all because we can just get this data out of the db once per request? I suspect we could restructure things to just make getting the settings ONCE per request very easily.

I think no, we can't remove the caching, it has produced actual affect in the main branch especially in dashboard from some users' reports. A memory cache or a redis cache operation will be nanosecond level but a database operation is microsecond level, memory/redis is fast about 10~1000 than database operations. And even in one request, the same setting maybe reused serveral times.

I meant could we restructure this code to only request this only once per HTTP request (eg. per GET/POST level)? There's no good reason to repeatedly get the setting from the db over and over again. In fact it could/would even lead to inconsistent results within the handling of a request if the value were to change mid-way through something.

This is a good example where context passing could be used to ensure that data is cached at a HTTP request level.

Copy link
Contributor

@zeripath zeripath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should approve this PR as it stands as a bug-fix but we really need to address the issues I've noted above.

It's silly that we risk having to keep hit the db like this. There should be a better way of doing this.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Nov 9, 2022
@lunny
Copy link
Member Author

lunny commented Nov 10, 2022

I guess we should approve this PR as it stands as a bug-fix but we really need to address the issues I've noted above.

It's silly that we risk having to keep hit the db like this. There should be a better way of doing this.

It's a good idea to cache per request based on context.Context but that could be another PR since it will have a bigger changes.

@lunny lunny added the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Nov 10, 2022
@lunny
Copy link
Member Author

lunny commented Nov 10, 2022

make L-G-T-M work

@lunny lunny merged commit 385462d into go-gitea:main Nov 10, 2022
@lunny lunny deleted the lunny/fix_performance_dashboard branch November 10, 2022 06:43
@lunny lunny removed the reviewed/wait-merge This pull request is part of the merge queue. It will be merged soon. label Nov 10, 2022
lunny added a commit to lunny/gitea that referenced this pull request Nov 10, 2022
This is a performance regression from go-gitea#18058

Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Andrew Thornton <art27@cantab.net>
@lunny lunny added the backport/done All backports for this PR have been created label Nov 10, 2022
lunny added a commit that referenced this pull request Nov 10, 2022
backport #21621

This is a performance regression from #18058

Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Andrew Thornton <art27@cantab.net>
zjjhot added a commit to zjjhot/gitea that referenced this pull request Nov 11, 2022
* giteaofficial/main:
  Revert unrelated changes for SMTP auth (go-gitea#21767)
  Init git module before database migration (go-gitea#21764)
  Extract updateSession function to reduce repetition (go-gitea#21735)
  Fix dashboard ignored system setting cache (go-gitea#21621)
  Add .dockerignore (go-gitea#21753)
fsologureng pushed a commit to fsologureng/gitea that referenced this pull request Nov 11, 2022
This is a performance regression from go-gitea#18058

Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: Andrew Thornton <art27@cantab.net>
lunny added a commit that referenced this pull request Jan 1, 2023
Fix #22281

In #21621 , `Get[V]` and `Set[V]` has been introduced, so that cache
value will be `*Setting`. For memory cache it's OK. But for redis cache,
it can only store `string` for the current implementation. This PR
revert some of changes of that and just store or return a `string` for
system setting.
@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/done All backports for this PR have been created issue/regression Indicates a previously functioning feature or behavior that has broken or regressed after a change lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. outdated/backport/v1.18 This PR should be backported to Gitea 1.18 performance/speed performance issues with slow downs skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants