Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

de-duplication of key/value pairs #415

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pohly
Copy link

@pohly pohly commented Nov 21, 2024

What this PR does / why we need it:

Previously, de-duplication was limited such that keys passed to a log call as
parameters overwrote keys passed earlier to WithValues. Using the same key in
different WithValues calls or multiple times in a WithValues call or the
parameters led to duplicates in the output.

Now all key/value pairs get checked for duplicates. The last one wins and is
printed at the place where it was added. If a second occurrence is the same as
a prior one, the first one is kept and the second is discarded.

This is partly motivated by the concern that adding OpenTelemetry will lead to
duplicate IDs (cc @dashpole), partly by prior discussions around instrumentation
not being able to avoid duplicates when using WithValue when the instrumentation
is happening in unrelated packages (cc @sbueringer, if I remember correctly).

Special notes for your reviewer:

Because the new implementation avoids memory allocations does very little extra work
when there is nothing which needs to be de-duplicated, the performance
impact is minimal. Some cases even become faster. See commit messages for
benchmark results.

Release note:

The klog text output now consistently de-duplicates key/value pairs such that any given key is  only printed once with its most recent value.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 21, 2024
@k8s-ci-robot
Copy link

This issue is currently awaiting triage.

If klog contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 21, 2024
@pohly pohly changed the title de-duplication of key/value pairs WIP: de-duplication of key/value pairs Nov 21, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2024
Recent golangci-lint warns about shadowing the print function.
1.21 enables the usage of generics and the slices package.
When testing with Go v1.23.0 in the PATH, `go test go_vet` panics. Bumping the
tools dependency adds 1.23 as toolchain requirement and bumps the Go
version. This is limited to the examples module and thus does not affect users
of klog.

   panic: runtime error: invalid memory address or nil pointer dereference [recovered]
   	panic: runtime error: invalid memory address or nil pointer dereference
   [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x62d74f]

    goroutine 97 [running]:
    go/types.(*Checker).handleBailout(0xc0005bb340, 0xc0001afc88)
    	/nvme/gopath/go-1.23.0/src/go/types/check.go:412 +0x88
    panic({0x6e6c40?, 0x954490?})
    	/nvme/gopath/go-1.23.0/src/runtime/panic.go:785 +0x132
    go/types.(*StdSizes).Sizeof(0x0, {0x7a5420, 0x957e40})
    	/nvme/gopath/go-1.23.0/src/go/types/sizes.go:229 +0x30f
    go/types.(*Config).sizeof(...)
    	/nvme/gopath/go-1.23.0/src/go/types/sizes.go:334
    go/types.representableConst.func1({0x7a5420?, 0x957e40?})
    	/nvme/gopath/go-1.23.0/src/go/types/const.go:77 +0x86
    go/types.representableConst({0x7a7548, 0x94bb80}, 0xc0005bb340, 0x957e40, 0xc0001af588)
    	/nvme/gopath/go-1.23.0/src/go/types/const.go:93 +0x173
    go/types.(*Checker).representation(0xc0005bb340, 0xc00015be00, 0x957e40)
    	/nvme/gopath/go-1.23.0/src/go/types/const.go:257 +0x65
    go/types.(*Checker).implicitTypeAndValue(0xc0005bb340, 0xc00015be00, {0x7a5588, 0xc0000ec380})
    	/nvme/gopath/go-1.23.0/src/go/types/expr.go:377 +0x2d7
    go/types.(*Checker).assignment(0xc0005bb340, 0xc00015be00, {0x7a5588, 0xc0000ec380}, {0x738a90, 0x14})
    	/nvme/gopath/go-1.23.0/src/go/types/assignments.go:70 +0x451
    go/types.(*Checker).initConst(0xc0005bb340, 0xc000720a80, 0xc00015be00)
    	/nvme/gopath/go-1.23.0/src/go/types/assignments.go:144 +0x1da
    go/types.(*Checker).constDecl(0xc0005bb340, 0xc000720a80, {0x7a6580, 0xc0002828a0}, {0x7a6580, 0xc0002828c0}, 0x0)
    	/nvme/gopath/go-1.23.0/src/go/types/decl.go:482 +0x2e8
    go/types.(*Checker).objDecl(0xc0005bb340, {0x7a9e78, 0xc000720a80}, 0x0)
    	/nvme/gopath/go-1.23.0/src/go/types/decl.go:185 +0xa09
    go/types.(*Checker).packageObjects(0xc0005bb340)
    	/nvme/gopath/go-1.23.0/src/go/types/resolver.go:714 +0x454
    go/types.(*Checker).checkFiles(0xc0005bb340, {0xc00016cb10, 0x3, 0x3})
    	/nvme/gopath/go-1.23.0/src/go/types/check.go:467 +0x15a
    go/types.(*Checker).Files(0xc0001b2000?, {0xc00016cb10?, 0xc000474120?, 0x6?})
    	/nvme/gopath/go-1.23.0/src/go/types/check.go:430 +0x75
    golang.org/x/tools/go/packages.(*loader).loadPackage(0xc0001b2000, 0xc000432780)
    	/nvme/gopath/pkg/mod/golang.org/x/tools@v0.1.5/go/packages/packages.go:961 +0x70f
    golang.org/x/tools/go/packages.(*loader).loadRecursive.func1()
    	/nvme/gopath/pkg/mod/golang.org/x/tools@v0.1.5/go/packages/packages.go:799 +0x1a9
    sync.(*Once).doSlow(0x0?, 0x0?)
    	/nvme/gopath/go-1.23.0/src/sync/once.go:76 +0xb4
    sync.(*Once).Do(...)
    	/nvme/gopath/go-1.23.0/src/sync/once.go:67
    golang.org/x/tools/go/packages.(*loader).loadRecursive(0x0?, 0x0?)
    	/nvme/gopath/pkg/mod/golang.org/x/tools@v0.1.5/go/packages/packages.go:787 +0x3b
    golang.org/x/tools/go/packages.(*loader).loadRecursive.func1.1(0x0?)
    	/nvme/gopath/pkg/mod/golang.org/x/tools@v0.1.5/go/packages/packages.go:794 +0x26
    created by golang.org/x/tools/go/packages.(*loader).loadRecursive.func1 in goroutine 96
    	/nvme/gopath/pkg/mod/golang.org/x/tools@v0.1.5/go/packages/packages.go:793 +0x94
    FAIL	k8s.io/klog/examples/go_vet	0.438s
    FAIL
@pohly pohly force-pushed the deduplication branch 2 times, most recently from bdddf29 to ab1b250 Compare November 21, 2024 19:01
@pohly pohly changed the title WIP: de-duplication of key/value pairs de-duplication of key/value pairs Nov 21, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 21, 2024
@pohly
Copy link
Author

pohly commented Nov 21, 2024

This should be ready for a review.

@mengjiao-liu, @harshanarayana, who has time?

@harshanarayana
Copy link

Checking @pohly

Copy link
Member

@mengjiao-liu mengjiao-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debugged the log outputs, and these tests were passed.
Just a non-blocking comment, LGTM otherwise!

test/output.go Outdated
Comment on lines 578 to 596
t.Errorf("Output mismatch. Expected:\n%s\nActual:\n%s\n", expectedWithPlaceholder, actual)
t.Errorf("Output mismatch.\n\nExpected with placeholders:\n%s\nExpected without placeholds:\n%s\nActual:\n%s\n", expectedWithPlaceholder, expected, actual)
} else {
t.Errorf("Output mismatch. klog:\n%s\nExpected:\n%s\nActual:\n%s\n", test.expectedOutput, expectedWithPlaceholder, actual)
t.Errorf("Output mismatch. klog:\n%s\n\nExpected with placeholders:\n%s\nExpected without placeholders:\n%s\nActual:\n%s\n", test.expectedOutput, expectedWithPlaceholder, expected, actual)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the code in lines 844-850 also be changed? Their output is similar.

klog/test/output.go

Lines 844 to 850 in 7f02688

if actual != expected {
if expectedWithPlaceholder == test.output {
t.Errorf("Output mismatch. Expected:\n%s\nActual:\n%s\n", expectedWithPlaceholder, actual)
} else {
t.Errorf("Output mismatch. klog:\n%s\nExpected:\n%s\nActual:\n%s\n", test.output, expectedWithPlaceholder, actual)
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. Changed.

Printing the actual expected string makes it easier to run a diff, something
that we cannot do here automatically because we cannot depend on testify or
go-cmp.
Span and trace ID are potentially going to be duplicated a lot.
MergeKVs is about to become a lot simpler. Removing the tests and benchmarks
simplifies the before/after performance comparison.
Previously, de-duplication was limited such that keys passed to a log call as
parameters overwrote keys passed earlier to WithValues. Using the same key in
different WithValues calls or multiple times in a WithValues call or the
parameters led to duplicates in the output.

Now all key/value pairs get checked for duplicates. The last one wins and is
printed at the place where it was added. A potential improvement would be to
keep it at the place where it first appeared if the value is the same because
then different log entries are more likely to have it at the beginning of their
key/value pairs because it was added by a common WithValues call.

Because the new implementation avoids memory allocations, the performance
impact is minimal. Some cases even become faster.

    goos: linux
    goarch: amd64
    pkg: k8s.io/klog/examples/benchmarks
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                            │    before     │               after1               │
                                                                            │    sec/op     │    sec/op     vs base              │
    Output/klog/fail-verbosity-non-standard-struct-key-check/objects/0-36      385.5n ±  9%   358.9n ±  5%  -6.91% (p=0.041 n=6)
    Output/klog/fail-verbosity-non-standard-struct-key-check/objects/10-36     374.3n ±  4%   357.3n ±  5%  -4.55% (p=0.015 n=6)
    Output/klog/fail-verbosity-non-standard-struct-key-check/objects/20-36     388.4n ±  3%   359.2n ±  4%  -7.53% (p=0.002 n=6)
    ...
    Output/zapr/fail-verbosity-non-standard-int-key-check/objects/90-36        380.9n ±  4%   379.4n ±  2%       ~ (p=0.937 n=6)
    Output/zapr/fail-verbosity-non-standard-int-key-check/objects/100-36       375.1n ±  3%   381.9n ±  4%       ~ (p=0.093 n=6)
    geomean                                                                    3.461µ         3.442µ        -0.55%

    pkg: k8s.io/klog/examples/output_test

    ZaprOutput/regular_error_types_as_value-36                                 3.270µ ± 10%   3.530µ ±  9%       ~ (p=0.132 n=6)
    ZaprOutput/struct_values-36                                                3.932µ ± 11%   4.107µ ± 10%       ~ (p=1.000 n=6)
    ZaprOutput/klog.Format-36                                                  4.276µ ± 12%   4.439µ ±  7%       ~ (p=0.132 n=6)
    ZaprOutput/regular_error_types_when_using_logr.Error-36                    2.790µ ±  7%   2.834µ ± 15%       ~ (p=0.699 n=6)
    ZaprOutput/map_values-36                                                   5.802µ ±  9%   5.829µ ±  3%       ~ (p=1.000 n=6)
    ZaprOutput/log_with_name_and_values-36                                     4.051µ ±  4%   3.864µ ±  6%  -4.63% (p=0.015 n=6)
    ...
    geomean                                                                    5.581µ         5.646µ        +1.16%

    pkg: k8s.io/klog/v2

    Header-36                                                                   1.339µ ±  5%    1.311µ ±  6%        ~ (p=0.513 n=6)
    ...
    KlogOutput/klog.Format-36                                                   4.207µ ± 10%    4.667µ ±  3%  +10.95% (p=0.009 n=6)
    KlogOutput/cyclic_list-36                                                   151.3µ ±  1%    151.1µ ±  1%        ~ (p=0.818 n=6)
    KlogOutput/override_single_value-36                                         2.308µ ±  9%    3.501µ ± 15%  +51.66% (p=0.002 n=6)
    KlogOutput/multiple_WithValues-36                                           10.91µ ±  9%    12.46µ ±  7%  +14.16% (p=0.004 n=6)
    ...
    KlogOutput/duplicates-36                                                    21.34µ ±  2%    22.52µ ±  7%        ~ (p=0.065 n=6)
    ...
    geomean                                                                     252.6n          260.6n         +3.18%

    pkg: k8s.io/klog/v2/textlogger

    TextloggerOutput/html_characters-36                                              1.982µ ± 12%   1.978µ ±  6%        ~ (p=0.937 n=6)
    TextloggerOutput/map_values-36                                                   3.215µ ±  6%   3.429µ ±  4%   +6.67% (p=0.041 n=6)
    ...
    TextloggerOutput/mixed_duplicates-36                                             19.33µ ±  5%   19.59µ ±  7%        ~ (p=0.818 n=6)
    ...
    TextloggerOutput/print_duplicate_keys_in_arguments-36                            2.067µ ±  7%   2.388µ ±  6%  +15.53% (p=0.002 n=6)
    ...
    TextloggerOutput/duplicates-36                                                   16.80µ ± 11%   17.06µ ±  4%        ~ (p=0.310 n=6)
    ...
    geomean                                                                          2.798µ         2.860µ         +2.24%
Suppose the same key/value pair gets logged multiple times:

  WithValues("sameKey", 1).WithValues("newKey", 2).WithValues("sameKey", 1)

Previously, the de-duplication code would have emitted it near the end:

    "hello world" newKey=2 sameKey=1

Now, it gets emitted at the place where it first appeared:

    "hello world" sameKey=1 newKey=2

This makes log entries more consistent when some repeat the key/value pair and
others don't.

Performance is about the same as before.

    goos: linux
    goarch: amd64
    pkg: k8s.io/klog/examples/benchmarks
    cpu: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
                                                                            │    after1    │                after2                │
                                                                            │    sec/op    │    sec/op      vs base               │
    ...
    geomean                                                                   3.442µ          3.445µ         +0.09%

    pkg: k8s.io/klog/examples/output_test
    geomean                                                                    5.646µ         5.631µ        -0.26%

    pkg: k8s.io/klog/v2
    geomean                                                                     260.6n          260.5n        -0.07%

    pkg: k8s.io/klog/v2/textlogger

    TextloggerOutput/odd_WithValues-36                                               6.956µ ±  4%   7.263µ ±  4%  +4.42% (p=0.041 n=6)
    ...
    TextloggerOutput/log_with_multiple_names_and_values-36                           3.353µ ±  4%   3.172µ ±  2%  -5.40% (p=0.002 n=6)
    ...
    TextloggerOutput/duplicates-36                                                   17.06µ ±  4%   17.16µ ±  9%       ~ (p=0.937 n=6)
    ...
    TextloggerOutput/mixed_duplicates-36                                             19.59µ ±  7%   20.35µ ±  5%       ~ (p=0.065 n=6)
    ...
    TextloggerOutput/print_duplicate_keys_in_arguments-36                            2.388µ ±  6%   2.218µ ±  7%  -7.12% (p=0.026 n=6)
    ...
    geomean                                                                          2.860µ         2.849µ        -0.38%
This is a follow-up which removes several functions which don't provide enough
additional value anymore. This wasn't done earlier to keep the previous commits
simple.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants