feat: store well defined metrics as times-series data streams #9730

kruskall · 2022-12-02T09:10:19Z

Motivation/summary

Checklist

Update CHANGELOG.asciidoc
Update package changelog.yml (only if changes to apmpackage have been made)
Documentation has been updated

For functional changes, consider:

Is it observable through the addition of either logging or metrics?
Is its use being published in telemetry to enable product improvement?
Have system tests been added to avoid regression?

How to test these changes

Related issues

Closes #9649

mergify · 2022-12-02T09:10:53Z

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.x is the label to automatically backport to the 7.x branch.
backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

apmmachine · 2022-12-02T09:37:13Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-02-13T08:12:01.033+0000
Duration: 21 min 49 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate and publish the docker images.
/test windows : Build & tests on Windows.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

apmmachine · 2022-12-02T09:37:17Z

📚 Go benchmark report

Diff with the main branch

goos: linux
goarch: amd64
pkg: github.com/elastic/apm-server/internal/agentcfg
cpu: 12th Gen Intel(R) Core(TM) i5-12500
                                  │ build/main/bench.out │              bench.out              │
                                  │        sec/op        │    sec/op     vs base               │
FetchAndAdd/FetchFromCache-12               46.15n ± ∞ ¹   41.15n ± ∞ ¹  -10.83% (p=0.008 n=5)
geomean                                     69.01n         62.27n         -9.77%
¹ need >= 6 samples for confidence interval at level 0.95

                                  │ build/main/bench.out │              bench.out              │
                                  │         B/op         │    B/op      vs base                │
geomean                                                ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                  │ build/main/bench.out │              bench.out              │
                                  │      allocs/op       │  allocs/op   vs base                │
geomean                                                ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/internal/beater/request
                                             │ build/main/bench.out │              bench.out              │
                                             │        sec/op        │    sec/op     vs base               │
ContextResetContentEncoding/empty-12                   136.1n ± ∞ ¹   122.1n ± ∞ ¹  -10.29% (p=0.008 n=5)
ContextResetContentEncoding/uncompressed-12            161.5n ± ∞ ¹   145.4n ± ∞ ¹   -9.97% (p=0.008 n=5)
geomean                                                915.8n         968.4n         +5.74%
¹ need >= 6 samples for confidence interval at level 0.95

                                             │ build/main/bench.out │               bench.out               │
                                             │         B/op         │     B/op       vs base                │
geomean                                                           ³                  +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

                                             │ build/main/bench.out │              bench.out              │
                                             │      allocs/op       │  allocs/op   vs base                │
geomean                                                           ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/internal/publish
             │ build/main/bench.out │          bench.out           │
             │        sec/op        │   sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

             │ build/main/bench.out │           bench.out            │
             │         B/op         │     B/op       vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

             │ build/main/bench.out │           bench.out           │
             │      allocs/op       │  allocs/op    vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

pkg: github.com/elastic/apm-server/x-pack/apm-server/aggregation/spanmetrics
                 │ build/main/bench.out │           bench.out           │
                 │        sec/op        │    sec/op     vs base         │
¹ need >= 6 samples for confidence interval at level 0.95

                 │ build/main/bench.out │            bench.out             │
                 │         B/op         │     B/op       vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                 │ build/main/bench.out │           bench.out            │
                 │      allocs/op       │  allocs/op   vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/elastic/apm-server/x-pack/apm-server/aggregation/txmetrics
                        │ build/main/bench.out │             bench.out              │
                        │        sec/op        │    sec/op     vs base              │
AggregateTransaction-12           82.79n ± ∞ ¹   77.36n ± ∞ ¹  -6.56% (p=0.008 n=5)
¹ need >= 6 samples for confidence interval at level 0.95

                        │ build/main/bench.out │           bench.out            │
                        │         B/op         │    B/op      vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                        │ build/main/bench.out │           bench.out            │
                        │      allocs/op       │  allocs/op   vs base           │
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

pkg: github.com/elastic/apm-server/x-pack/apm-server/sampling
               │ build/main/bench.out │             bench.out              │
               │        sec/op        │    sec/op     vs base              │
geomean                  624.6n         593.3n        -5.01%
¹ need >= 6 samples for confidence interval at level 0.95

               │ build/main/bench.out │               bench.out               │
               │         B/op         │     B/op       vs base                │
Process-12              9.245Ki ± ∞ ¹   9.176Ki ± ∞ ¹  -0.75% (p=0.016 n=5)
geomean                             ³                  -0.38%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

               │ build/main/bench.out │              bench.out              │
               │      allocs/op       │  allocs/op   vs base                │
geomean                             ³                +0.00%               ³
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
³ summaries must be >0 to compute geomean

pkg: github.com/elastic/apm-server/x-pack/apm-server/sampling/eventstorage
                                            │ build/main/bench.out │              bench.out              │
                                            │        sec/op        │    sec/op     vs base               │
WriteTransaction/json_codec_big_tx-12                 9.292µ ± ∞ ¹   4.894µ ± ∞ ¹  -47.33% (p=0.008 n=5)
ReadEvents/json_codec/0_events-12                     352.7n ± ∞ ¹   310.3n ± ∞ ¹  -12.02% (p=0.008 n=5)
ReadEvents/json_codec_big_tx/0_events-12              346.6n ± ∞ ¹   315.5n ± ∞ ¹   -8.97% (p=0.016 n=5)
ReadEvents/nop_codec/0_events-12                      339.1n ± ∞ ¹   308.6n ± ∞ ¹   -8.99% (p=0.008 n=5)
ReadEvents/nop_codec_big_tx/0_events-12               336.5n ± ∞ ¹   306.9n ± ∞ ¹   -8.80% (p=0.016 n=5)
ReadEvents/nop_codec_big_tx/1000_events-12            978.0µ ± ∞ ¹   893.8µ ± ∞ ¹   -8.61% (p=0.032 n=5)
IsTraceSampled/sampled-12                             76.76n ± ∞ ¹   68.49n ± ∞ ¹  -10.77% (p=0.008 n=5)
IsTraceSampled/unsampled-12                           79.13n ± ∞ ¹   71.05n ± ∞ ¹  -10.21% (p=0.008 n=5)
IsTraceSampled/unknown-12                             414.2n ± ∞ ¹   373.3n ± ∞ ¹   -9.87% (p=0.008 n=5)
geomean                                               30.58µ         29.36µ         -3.99%
¹ need >= 6 samples for confidence interval at level 0.95

                                            │ build/main/bench.out │               bench.out                │
                                            │         B/op         │      B/op       vs base                │
WriteTransaction/json_codec_big_tx-12                3.687Ki ± ∞ ¹    3.686Ki ± ∞ ¹  -0.03% (p=0.008 n=5)
ReadEvents/nop_codec_big_tx/100_events-12            244.5Ki ± ∞ ¹    244.7Ki ± ∞ ¹  +0.05% (p=0.032 n=5)
geomean                                              31.39Ki          31.43Ki        +0.16%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

                                            │ build/main/bench.out │              bench.out               │
                                            │      allocs/op       │  allocs/op    vs base                │
geomean                                                144.7          144.7        +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal

report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat

simitt

My understanding is that the index_mode needs to be set per index template, and the fields are used to create dimensions. We need to look into which fields should be used for a dimension, and which metrics should have a time_series_metric definition.

Other things such as look-ahead time might also need to be considered.

This is supposed to be a PoC and we need to test implications on the APM UI. Did you mean to create this PR to be ready for review? I suggest to put it into draft until everything is figured out.

kruskall · 2022-12-05T12:54:20Z

My understanding is that the index_mode needs to be set per index template, and the fields are used to create dimensions. We need to look into which fields should be used for a dimension, and which metrics should have a time_series_metric definition.

Other things such as look-ahead time might also need to be considered.

This is supposed to be a PoC and we need to test implications on the APM UI. Did you mean to create this PR to be ready for review? I suggest to put it into draft until everything is figured out.

Thanks for sharing! 🙇

I think I misunderstood how this should work, I'll read up more docs about it and udpate the PR

mergify · 2022-12-15T08:58:09Z

This pull request is now in conflicts. Could you fix it @kruskall? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b feat/tsds-metric upstream/feat/tsds-metric
git merge upstream/main
git push upstream feat/tsds-metric

Remove index.sort.field for internal metrics: illegal_argument_exception: [illegal_argument_exception] Reason: [index.mode=time_series] is incompatible with [index.sort.field]

simitt

@kruskall lmk once this should be reviewed again. I know that there is currently a Kibana blocker, but we also discussed focusing on the dimensions.

simitt · 2023-01-17T12:37:59Z

@kruskall we discussed that (almost) all of the fields that are part of the transaction metrics aggregation key should be part of the dimensions. I find following fields from the key missing in this PR:

faasColdstart          
faasName               
hostOSPlatform         
kubernetesPodName      
cloudRegion            
cloudAvailabilityZone  
cloudAccountID         
cloudAccountName       
cloudMachineType       
cloudProjectID         
cloudProjectName       	
serviceNodeName        
transactionName        
transactionResult      
transactionType        
eventOutcome           
faasTriggerType        
hostHostname           
hostName               
containerID            
traceRoot

Can you explain why you excluded these fields from the TSDB key? If in doubt, I'd start out with adding all the fields from the aggregation key to the dimensions and see if that causes any issues or performance issues.

Since a goal of this PoC is not to merge the change to TSDB but evaluate potential issues on the UI or on performance, please update as soon as possible, so that the work on the evaluation can start.

kruskall · 2023-01-18T04:14:14Z

@kruskall we discussed that (almost) all of the fields that are part of the transaction metrics aggregation key should be part of the dimensions. I find following fields from the key missing in this PR:
faasColdstart          
faasName               
hostOSPlatform         
kubernetesPodName      
cloudRegion            
cloudAvailabilityZone  
cloudAccountID         
cloudAccountName       
cloudMachineType       
cloudProjectID         
cloudProjectName       	
serviceNodeName        
transactionName        
transactionResult      
transactionType        
eventOutcome           
faasTriggerType        
hostHostname           
hostName               
containerID            
traceRoot 
Can you explain why you excluded these fields from the TSDB key? If in doubt, I'd start out with adding all the fields from the aggregation key to the dimensions and see if that causes any issues or performance issues.

Since a goal of this PoC is not to merge the change to TSDB but evaluate potential issues on the UI or on performance, please update as soon as possible, so that the work on the evaluation can start.

@simitt I was trying to test the changes out progressively, unfortunately rally takes is uploading the corpora which takes an absurd amount of time on slow connections. I've added all the dimensions but I'm unable to get some numbers at the moment.

simitt · 2023-02-20T12:28:57Z

Closing this for now until the blocker with limiting the number of dimensions is closed.

StephanErb · 2024-03-14T23:11:27Z

Closing this for now until the blocker with limiting the number of dimensions is closed.

@simitt @salvatore-campagna @felixbarny now that elastic/elasticsearch#93564 is solved and the TSDB dimension limit is gone, is it possible to reopen this and bring TSBD-support to APM?

mergify bot added the backport-skip Skip notification from the automated backport with mergify label Dec 2, 2022

simitt requested changes Dec 5, 2022

View reviewed changes

kruskall marked this pull request as draft December 5, 2022 12:55

kruskall force-pushed the feat/tsds-metric branch 2 times, most recently from bb72beb to a9d756a Compare December 15, 2022 08:57

feat: store well defined metrics as times-series data streams

e070f52

Remove index.sort.field for internal metrics: illegal_argument_exception: [illegal_argument_exception] Reason: [index.mode=time_series] is incompatible with [index.sort.field]

kruskall force-pushed the feat/tsds-metric branch from a9d756a to e070f52 Compare December 15, 2022 12:08

kruskall marked this pull request as ready for review December 16, 2022 01:08

kruskall added 3 commits December 19, 2022 06:20

Merge branch 'main' into feat/tsds-metric

9501b0d

Merge branch 'main' into feat/tsds-metric

0cdb799

Merge branch 'main' into feat/tsds-metric

ea9a754

simitt reviewed Dec 23, 2022

View reviewed changes

kruskall added 8 commits December 29, 2022 04:26

Merge branch 'main' into feat/tsds-metric

31d9f61

feat: use different dimensions

ee7d617

Merge branch 'main' into feat/tsds-metric

04754cc

Merge branch 'main' into feat/tsds-metric

be446c9

feat: update dimension to be consistent with aggregation keys

221921c

feat: add agent.name as dimension

47b3385

feat: add more dimensions

5608987

feat: add more dimensions

7a99baa

simitt mentioned this pull request Jan 16, 2023

PoC: store well defined metrics as times-series data streams #9649

Open

Merge branch 'main' into feat/tsds-metric

4c0bd68

feat: add missing dimensions

640a3d6

kruskall added 6 commits January 28, 2023 05:13

Merge branch 'main' into feat/tsds-metric

e6a6a9b

feat: remove faas.coldstart from dimensions

cdec954

Merge branch 'main' into feat/tsds-metric

54a5966

fix mapper parsin exception

2c29a71

fix: attempt to reduce the number of dimensions to 16

e3f691b

Merge branch 'main' into feat/tsds-metric

60e139f

felixbarny mentioned this pull request Feb 13, 2023

[ECS] [TSDB] Centralisation of Dimension Fields elastic/integrations#5193

Closed

simitt closed this Feb 20, 2023

kruskall deleted the feat/tsds-metric branch April 15, 2024 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: store well defined metrics as times-series data streams #9730

feat: store well defined metrics as times-series data streams #9730

kruskall commented Dec 2, 2022

mergify bot commented Dec 2, 2022

apmmachine commented Dec 2, 2022 •

edited

Loading

Build stats

apmmachine commented Dec 2, 2022 •

edited

Loading

simitt left a comment

kruskall commented Dec 5, 2022 •

edited

Loading

mergify bot commented Dec 15, 2022

simitt left a comment

simitt commented Jan 17, 2023

kruskall commented Jan 18, 2023

simitt commented Feb 20, 2023

StephanErb commented Mar 14, 2024

feat: store well defined metrics as times-series data streams #9730

feat: store well defined metrics as times-series data streams #9730

Conversation

kruskall commented Dec 2, 2022

Motivation/summary

Checklist

How to test these changes

Related issues

mergify bot commented Dec 2, 2022

apmmachine commented Dec 2, 2022 • edited Loading

💚 Build Succeeded

Build stats

🤖 GitHub comments

apmmachine commented Dec 2, 2022 • edited Loading

📚 Go benchmark report

simitt left a comment

Choose a reason for hiding this comment

kruskall commented Dec 5, 2022 • edited Loading

mergify bot commented Dec 15, 2022

simitt left a comment

Choose a reason for hiding this comment

simitt commented Jan 17, 2023

kruskall commented Jan 18, 2023

simitt commented Feb 20, 2023

StephanErb commented Mar 14, 2024

apmmachine commented Dec 2, 2022 •

edited

Loading

apmmachine commented Dec 2, 2022 •

edited

Loading

kruskall commented Dec 5, 2022 •

edited

Loading