Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contenthash: improve the correctness of needsScan #5060

Merged
merged 3 commits into from
Jun 21, 2024

Conversation

cyphar
Copy link
Contributor

@cyphar cyphar commented Jun 19, 2024

Commit f724d6f ("contenthash: implement proper Linux symlink semantics for needsScan") fixed issues with needScan's handling of symlinks, but the logic used to figure out if a parent path is in the cache was incorrect in a couple of ways:

  1. The optimisation in getFollowSymlinksCallback to avoid looking up / in the cache lead to the callback not being called when we go through /. The upshot is that needsScan(/non-existent-path) would always return true because we didn't check if / has been scanned.
  2. Similarly, the logic of skipping a cache lookup if the path is the same as the current path meant that we skipped the first / lookup (because we start with currentPath=/).
  3. Because needsScan would only store the last good path, cases with symlink jumps to non-existent paths within directories already scanned would result in a re-scan that isn't necessary. Fix this by saving a set of prefix paths we have seen. Note that in combination with (1) and (2), if / has been scanned then needsScan will always return false now (because the / prefix is always checked against).

Fixes: f724d6f ("contenthash: implement proper Linux symlink semantics for needsScan")
Fixes #5042
Signed-off-by: Aleksa Sarai cyphar@cyphar.com

@cyphar
Copy link
Contributor Author

cyphar commented Jun 19, 2024

@tonistiigi Based on my testing, this fixes the issue you described as well as having a few other fixes that should improve other possible performance regressions (all mainly dealing with non-existent files).

I can cook up a few tests that call needsScan and scanPath directly if those are the kind of tests you'd like to add? (It's not really possible to test this issue with Checksum directly.)

@cyphar cyphar force-pushed the contenthash-scanpath-slash branch from b37ddba to 918c9f9 Compare June 19, 2024 06:17
@cyphar
Copy link
Contributor Author

cyphar commented Jun 19, 2024

pathSet is the simplest way of implementing the structure needed for the prefix checks. If you feel a more complex structure is needed, let me know -- but because we do path lookups from left-to-right now, in practice the very early ancestors will be at the head of the array of prefixes so I don't think it needs anything more complicated than a simple array.

@cyphar cyphar force-pushed the contenthash-scanpath-slash branch 2 times, most recently from 3b9491e to 542e4cb Compare June 19, 2024 11:17
@cyphar cyphar marked this pull request as draft June 20, 2024 04:27
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathSet is the simplest way of implementing the structure needed for the prefix checks. If you feel a more complex structure is needed, let me know

I guess another radix tree would work well for this but assuming the length of prefixes array is always expected to be small, the potentially inefficient lookup in includes shouldn't matter for practical cases.

I can cook up a few tests that call needsScan and scanPath directly if those are the kind of tests you'd like to add?

That sgtm, but we could also add some private counters logic using private variables that can be turned on by the tests so we can see how many times the scanning/walking happens for certain conditions and detect if some future change causes the more expensive scanning part happen more often than expected.

cache/contenthash/checksum.go Show resolved Hide resolved
@cyphar
Copy link
Contributor Author

cyphar commented Jun 20, 2024

I guess another radix tree would work well for this but assuming the length of prefixes array is always expected to be small, the potentially inefficient lookup in includes shouldn't matter for practical cases.

Ah yeah, go-immutable-radix has LongestPrefix which would work for this. But yeah, I think in practice it won't matter (and I'm not sure that it would be better for most cases anyway because this use is not write-few-read-many, it's write-many-read-few, so the copies in each Insert are probably not worth it in practice).

That sgtm, but we could also add some private counters logic using private variables that can be turned on by the tests so we can see how many times the scanning/walking happens for certain conditions and detect if some future change causes the more expensive scanning part happen more often than expected.

Already working on the needsScan tests (hence the draft). I'll add some counters as well while I'm at it, though I suspect that testing needsScan should be sufficient.

@cyphar cyphar force-pushed the contenthash-scanpath-slash branch 2 times, most recently from 39e3281 to 6dff9ad Compare June 20, 2024 06:56
@cyphar cyphar marked this pull request as ready for review June 20, 2024 07:00
@cyphar cyphar force-pushed the contenthash-scanpath-slash branch from 6dff9ad to 9a5ddc0 Compare June 20, 2024 07:12
@thompson-shaun thompson-shaun added this to the v0.14.2 milestone Jun 20, 2024
@cyphar cyphar force-pushed the contenthash-scanpath-slash branch 3 times, most recently from a485d4d to 8179763 Compare June 21, 2024 04:39
Commit f724d6f ("contenthash: implement proper Linux symlink
semantics for needsScan") fixed issues with needScan's handling of
symlinks, but the logic used to figure out if a parent path is in the
cache was incorrect in two ways:

 1. The optimisations in getFollowSymlinksCallback to avoid looking up /
    or no-op components in the cache lead to the callback not being
    called when we go through / (both in for the initial currentPath=/
    state and for absolute symlinks). The upshot is that
    needsScan(/non-existent-path) would always return true because we
    didn't check if / has been scanned.

    There were also some issues with the wrong cache record being
    returned if you hit a symlink to only /. These optimisations only
    make sense if are only returning a path (as in rootPath) and not
    anything else.

 2. Because needsScan would only store the _last_ good path, cases with
    symlink jumps to non-existent paths within directories already
    scanned would result in a re-scan that isn't necessary. Fix this by
    saving a set of prefix paths we have seen. This change also means
    that we can also simplify the logic in the needsScan callback.

    Note that in combination with (1) and (2), if / has been scanned
    then needsScan will always return false now (because the / prefix is
    always checked against, and every path has it as a parent).

    The pathSet structure is the "dumb" way of doing it. We could use a
    radix tree and LongestPrefix() to implement it in a smarter way, but
    in practice the list will be very small and is both short-lived and
    write-many-read-few, so go-immutable-radix's node copy cost probably
    makes the array better in practice anyway.

Fixes: f724d6f ("contenthash: implement proper Linux symlink semantics for needsScan")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Most of these tests revolve around making sure that we are scanning the
correct path during Checksum, and that we don't think a scan is required
for subpaths we have already scanned.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
While we now have tests ensuring that needsScan does not regress and
indicate that a scan is neccessary, it seems prudent to also include
checks that scanPath is definitely not running on any new paths when we
don't expect it.

Suggested-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
@cyphar cyphar force-pushed the contenthash-scanpath-slash branch from 8179763 to 7b630eb Compare June 21, 2024 14:47
@tonistiigi tonistiigi merged commit edcf130 into moby:master Jun 21, 2024
76 checks passed
@cyphar cyphar deleted the contenthash-scanpath-slash branch June 22, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v0.14] performance regression on scanning contenthash paths
3 participants