Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic decoding for v0.4 msgpack traces #545

Merged
merged 10 commits into from
Aug 6, 2024

Conversation

ekump
Copy link
Contributor

@ekump ekump commented Jul 22, 2024

What does this PR do?

Deserialize msgpack payloads to PB without using serde.

Originally authored by @hoolioh this PR replaces the use of serde to decode msgpack v04 trace payloads

  • Adds methods to decode string and &strs
  • Adds number abstraction layer in order to decode integer and floats.
  • Implements a decoder for Spans
  • Implements a decoder for meta attributes
  • Implements a decoder for metrics attributes
  • Implements a decoder for SpanLinks

Motivation

This is part of a larger effort to reduce memory allocations when sending traces through the data-pipeline. After we stop using serde we will move on to improving the internal representation of traces to use references rather than allocating new strings.

Additional Notes

The introduction of the deserialization process in trace-utils integrations tests will be introduced in a follow-up PR.

How to test the change?

unit and fuzz tests included.

@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 96.08611% with 40 lines in your changes missing coverage. Please review.

Project coverage is 71.27%. Comparing base (737c1f2) to head (87c03a4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #545      +/-   ##
==========================================
+ Coverage   70.44%   71.27%   +0.83%     
==========================================
  Files         214      219       +5     
  Lines       28884    29918    +1034     
==========================================
+ Hits        20346    21323     +977     
- Misses       8538     8595      +57     
Components Coverage Δ
crashtracker 21.20% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 50.00% <ø> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 83.07% <ø> (ø)
ddcommon-ffi 70.20% <ø> (ø)
ddtelemetry 58.95% <ø> (ø)
ipc 84.18% <ø> (ø)
profiling 84.26% <ø> (ø)
profiling-ffi 77.42% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 34.55% <ø> (ø)
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 54.87% <ø> (ø)
trace-mini-agent 70.88% <ø> (ø)
trace-normalization 98.24% <ø> (ø)
trace-obfuscation 95.73% <ø> (ø)
trace-protobuf 77.16% <ø> (ø)
trace-utils 92.96% <96.08%> (+0.57%) ⬆️

@pr-commenter
Copy link

pr-commenter bot commented Jul 22, 2024

Benchmarks

Comparison

Benchmark execution time: 2024-08-06 12:54:44

Comparing candidate commit 87c03a4 in PR branch ekump/APMSP-1214-removing-serde-from-deserialization with baseline commit 737c1f2 in branch main.

Found 1 performance improvements and 1 performance regressions! Performance is the same for 41 metrics, 1 unstable metrics.

scenario:benching deserializing traces from msgpack to their internal representation

  • 🟩 execution_time [-1.102µs; -1.099µs] or [-30.072%; -29.998%]

scenario:tags/replace_trace_tags

  • 🟥 execution_time [+130.551ns; +133.498ns] or [+3.022%; +3.090%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 29.104µs 52.870µs ± 13.680µs 53.372µs ± 1.560µs 55.590µs 64.229µs 81.871µs 179.258µs 235.87% 3.644 35.393 25.81% 0.967µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [50.975µs; 54.766µs] or [-3.586%; +3.586%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 700.712ns 702.237ns ± 1.690ns 702.141ns ± 0.421ns 702.527ns 703.219ns 708.018ns 720.256ns 2.58% 7.593 71.551 0.24% 0.119ns 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 579.318ns 599.928ns ± 7.233ns 599.966ns ± 2.983ns 602.940ns 606.097ns 610.567ns 673.124ns 12.19% 4.834 51.285 1.20% 0.511ns 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 255.916ns 270.597ns ± 9.210ns 276.616ns ± 1.393ns 277.688ns 278.121ns 278.458ns 278.690ns 0.75% -0.708 -1.448 3.39% 0.651ns 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 90.146ns 108.348ns ± 14.218ns 103.298ns ± 8.420ns 127.927ns 128.384ns 128.678ns 128.767ns 24.66% 0.538 -1.450 13.09% 1.005ns 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 96.178ns 110.590ns ± 13.232ns 103.312ns ± 6.222ns 127.189ns 128.298ns 129.141ns 129.522ns 25.37% 0.440 -1.662 11.93% 0.936ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [702.003ns; 702.471ns] or [-0.033%; +0.033%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [598.926ns; 600.930ns] or [-0.167%; +0.167%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [269.320ns; 271.873ns] or [-0.472%; +0.472%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [106.377ns; 110.318ns] or [-1.819%; +1.819%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [108.756ns; 112.424ns] or [-1.658%; +1.658%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 65.448µs 65.613µs ± 0.063µs 65.611µs ± 0.016µs 65.629µs 65.653µs 65.710µs 66.239µs 0.96% 5.688 53.461 0.10% 0.004µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [65.604µs; 65.622µs] or [-0.013%; +0.013%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 3.177µs 4.182µs ± 2.612µs 3.963µs ± 0.038µs 3.996µs 4.161µs 5.499µs 32.438µs 718.43% 9.862 97.097 62.31% 0.185µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [3.820µs; 4.544µs] or [-8.658%; +8.658%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 103.846µs 104.162µs ± 0.148µs 104.130µs ± 0.082µs 104.225µs 104.417µs 104.670µs 104.994µs 0.83% 1.648 5.320 0.14% 0.010µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [104.142µs; 104.183µs] or [-0.020%; +0.020%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 184.442µs 185.229µs ± 0.493µs 185.127µs ± 0.205µs 185.366µs 185.988µs 187.408µs 187.878µs 1.49% 2.403 8.506 0.27% 0.035µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [185.160µs; 185.297µs] or [-0.037%; +0.037%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 458.326ns 465.061ns ± 5.124ns 464.179ns ± 1.498ns 465.956ns 471.545ns 478.236ns 519.481ns 11.91% 6.607 63.350 1.10% 0.362ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [464.351ns; 465.771ns] or [-0.153%; +0.153%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 273.745ns 280.074ns ± 3.322ns 280.692ns ± 2.377ns 282.797ns 284.005ns 284.604ns 295.190ns 5.17% 0.126 0.551 1.18% 0.235ns 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 44.398ns 44.735ns ± 0.143ns 44.707ns ± 0.079ns 44.812ns 45.016ns 45.102ns 45.275ns 1.27% 0.728 1.058 0.32% 0.010ns 1 200
normalization/normalize_name/normalize_name/good execution_time 35.903ns 36.204ns ± 0.380ns 36.110ns ± 0.071ns 36.237ns 36.524ns 36.997ns 40.754ns 12.86% 8.981 101.513 1.05% 0.027ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [279.614ns; 280.535ns] or [-0.164%; +0.164%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [44.715ns; 44.755ns] or [-0.044%; +0.044%] None None None
normalization/normalize_name/normalize_name/good execution_time [36.151ns; 36.257ns] or [-0.146%; +0.146%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 2.552µs 2.564µs ± 0.008µs 2.563µs ± 0.003µs 2.565µs 2.577µs 2.598µs 2.605µs 1.64% 2.804 10.545 0.30% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [2.563µs; 2.565µs] or [-0.042%; +0.042%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 6.800ns 6.923ns ± 0.098ns 6.909ns ± 0.061ns 6.980ns 7.122ns 7.206ns 7.212ns 4.37% 0.917 0.577 1.41% 0.007ns 1 200
credit_card/is_card_number/ throughput 138665279.347op/s 144464618.232op/s ± 2020173.596op/s 144729556.307op/s ± 1291583.399op/s 145925567.719op/s 146891899.843op/s 147037984.478op/s 147052477.425op/s 1.61% -0.845 0.401 1.39% 142847.845op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 125.746ns 128.065ns ± 0.627ns 128.154ns ± 0.374ns 128.466ns 128.934ns 129.224ns 129.772ns 1.26% -0.658 1.088 0.49% 0.044ns 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 7705829.466op/s 7808696.806op/s ± 38358.778op/s 7803129.942op/s ± 22748.420op/s 7831282.629op/s 7877179.914op/s 7913340.077op/s 7952517.721op/s 1.91% 0.697 1.164 0.49% 2712.375op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 121.227ns 122.915ns ± 0.562ns 122.969ns ± 0.401ns 123.321ns 123.725ns 123.894ns 124.326ns 1.10% -0.516 0.168 0.46% 0.040ns 1 200
credit_card/is_card_number/ 378282246310005 throughput 8043356.917op/s 8135898.323op/s ± 37307.838op/s 8132112.365op/s ± 26544.463op/s 8160056.036op/s 8205134.949op/s 8237704.366op/s 8248968.609op/s 1.44% 0.542 0.208 0.46% 2638.063op/s 1 200
credit_card/is_card_number/37828224631 execution_time 6.800ns 6.876ns ± 0.088ns 6.852ns ± 0.043ns 6.915ns 7.050ns 7.201ns 7.215ns 5.30% 1.798 3.534 1.28% 0.006ns 1 200
credit_card/is_card_number/37828224631 throughput 138602397.353op/s 145449650.278op/s ± 1822479.513op/s 145945707.493op/s ± 930890.461op/s 146866370.753op/s 146899657.807op/s 147041160.653op/s 147051675.612op/s 0.76% -1.712 3.138 1.25% 128868.762op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 117.995ns 120.274ns ± 0.800ns 120.242ns ± 0.602ns 120.990ns 121.343ns 121.556ns 121.640ns 1.16% -0.444 -0.389 0.66% 0.057ns 1 200
credit_card/is_card_number/378282246310005 throughput 8220989.096op/s 8314684.600op/s ± 55483.735op/s 8316592.590op/s ± 41747.039op/s 8351059.453op/s 8420409.368op/s 8450943.684op/s 8474933.324op/s 1.90% 0.473 -0.338 0.67% 3923.293op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 106.175ns 108.248ns ± 0.701ns 108.529ns ± 0.344ns 108.793ns 108.958ns 109.184ns 109.975ns 1.33% -0.977 0.544 0.65% 0.050ns 1 200
credit_card/is_card_number/37828224631000521389798 throughput 9092979.277op/s 9238421.301op/s ± 60186.421op/s 9214153.865op/s ± 29218.569op/s 9278843.089op/s 9340906.285op/s 9416844.026op/s 9418430.853op/s 2.22% 1.008 0.620 0.65% 4255.823op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 61.303ns 62.969ns ± 0.678ns 63.113ns ± 0.578ns 63.571ns 63.808ns 63.893ns 64.047ns 1.48% -0.424 -1.021 1.07% 0.048ns 1 200
credit_card/is_card_number/x371413321323331 throughput 15613531.887op/s 15882551.876op/s ± 171844.207op/s 15844646.655op/s ± 144233.244op/s 16041005.227op/s 16163719.383op/s 16240546.760op/s 16312342.502op/s 2.95% 0.450 -0.986 1.08% 12151.220op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 7.200ns 7.327ns ± 0.075ns 7.311ns ± 0.024ns 7.335ns 7.493ns 7.601ns 7.605ns 4.02% 2.043 4.647 1.02% 0.005ns 1 200
credit_card/is_card_number_no_luhn/ throughput 131497729.547op/s 136496729.549op/s ± 1368180.233op/s 136781728.689op/s ± 441274.957op/s 137218809.235op/s 137996207.841op/s 138376388.976op/s 138882721.810op/s 1.54% -1.968 4.363 1.00% 96744.952op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 106.997ns 108.953ns ± 0.674ns 109.008ns ± 0.413ns 109.347ns 110.015ns 110.381ns 110.834ns 1.67% -0.333 0.553 0.62% 0.048ns 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 9022530.583op/s 9178658.656op/s ± 56911.319op/s 9173619.702op/s ± 34745.999op/s 9214203.586op/s 9270576.219op/s 9336615.334op/s 9346052.572op/s 1.88% 0.379 0.591 0.62% 4024.238op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 103.503ns 105.000ns ± 0.423ns 105.007ns ± 0.244ns 105.254ns 105.525ns 105.992ns 107.183ns 2.07% 0.260 3.373 0.40% 0.030ns 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 9329859.482op/s 9523936.641op/s ± 38308.920op/s 9523162.340op/s ± 22157.484op/s 9544676.531op/s 9587000.643op/s 9610231.231op/s 9661597.818op/s 1.45% -0.197 3.178 0.40% 2708.850op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 7.214ns 7.335ns ± 0.079ns 7.313ns ± 0.027ns 7.346ns 7.512ns 7.601ns 7.625ns 4.26% 1.862 3.690 1.08% 0.006ns 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 131147117.944op/s 136350068.944op/s ± 1444878.435op/s 136735588.117op/s ± 501566.549op/s 137155737.945op/s 137887133.029op/s 138488841.833op/s 138618541.786op/s 1.38% -1.791 3.430 1.06% 102168.334op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 98.680ns 101.321ns ± 0.923ns 101.358ns ± 0.540ns 101.908ns 102.663ns 103.088ns 103.153ns 1.77% -0.463 0.013 0.91% 0.065ns 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 9694290.924op/s 9870431.853op/s ± 90349.816op/s 9865994.909op/s ± 52558.637op/s 9917789.891op/s 10050774.697op/s 10097896.032op/s 10133761.222op/s 2.71% 0.512 0.067 0.91% 6388.697op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 106.149ns 107.971ns ± 0.793ns 108.068ns ± 0.536ns 108.623ns 108.974ns 109.264ns 109.410ns 1.24% -0.605 -0.355 0.73% 0.056ns 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 9139947.317op/s 9262282.011op/s ± 68296.890op/s 9253450.242op/s ± 45666.595op/s 9295932.607op/s 9410398.920op/s 9415935.175op/s 9420681.936op/s 1.81% 0.633 -0.315 0.74% 4829.319op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 61.424ns 63.116ns ± 0.661ns 63.214ns ± 0.418ns 63.545ns 64.059ns 64.276ns 64.400ns 1.88% -0.755 0.413 1.04% 0.047ns 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 15528011.508op/s 15845507.282op/s ± 167364.646op/s 15819179.689op/s ± 104704.156op/s 15944199.462op/s 16233420.632op/s 16269949.816op/s 16280221.242op/s 2.91% 0.813 0.504 1.05% 11834.468op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [6.910ns; 6.937ns] or [-0.196%; +0.196%] None None None
credit_card/is_card_number/ throughput [144184641.601op/s; 144744594.864op/s] or [-0.194%; +0.194%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [127.979ns; 128.152ns] or [-0.068%; +0.068%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [7803380.649op/s; 7814012.964op/s] or [-0.068%; +0.068%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [122.837ns; 122.993ns] or [-0.063%; +0.063%] None None None
credit_card/is_card_number/ 378282246310005 throughput [8130727.815op/s; 8141068.830op/s] or [-0.064%; +0.064%] None None None
credit_card/is_card_number/37828224631 execution_time [6.864ns; 6.889ns] or [-0.178%; +0.178%] None None None
credit_card/is_card_number/37828224631 throughput [145197072.145op/s; 145702228.410op/s] or [-0.174%; +0.174%] None None None
credit_card/is_card_number/378282246310005 execution_time [120.164ns; 120.385ns] or [-0.092%; +0.092%] None None None
credit_card/is_card_number/378282246310005 throughput [8306995.088op/s; 8322374.112op/s] or [-0.092%; +0.092%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [108.151ns; 108.345ns] or [-0.090%; +0.090%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [9230080.042op/s; 9246762.560op/s] or [-0.090%; +0.090%] None None None
credit_card/is_card_number/x371413321323331 execution_time [62.875ns; 63.063ns] or [-0.149%; +0.149%] None None None
credit_card/is_card_number/x371413321323331 throughput [15858735.922op/s; 15906367.830op/s] or [-0.150%; +0.150%] None None None
credit_card/is_card_number_no_luhn/ execution_time [7.317ns; 7.337ns] or [-0.142%; +0.142%] None None None
credit_card/is_card_number_no_luhn/ throughput [136307112.928op/s; 136686346.171op/s] or [-0.139%; +0.139%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [108.859ns; 109.046ns] or [-0.086%; +0.086%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [9170771.295op/s; 9186546.018op/s] or [-0.086%; +0.086%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [104.942ns; 105.059ns] or [-0.056%; +0.056%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [9518627.393op/s; 9529245.889op/s] or [-0.056%; +0.056%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [7.324ns; 7.346ns] or [-0.150%; +0.150%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [136149822.689op/s; 136550315.199op/s] or [-0.147%; +0.147%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [101.193ns; 101.449ns] or [-0.126%; +0.126%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [9857910.237op/s; 9882953.468op/s] or [-0.127%; +0.127%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [107.861ns; 108.080ns] or [-0.102%; +0.102%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [9252816.719op/s; 9271747.303op/s] or [-0.102%; +0.102%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [63.025ns; 63.208ns] or [-0.145%; +0.145%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [15822312.152op/s; 15868702.412op/s] or [-0.146%; +0.146%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz 87c03a4 1722947865 ekump/APMSP-1214-removing-serde-from-deserialization
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 4.432µs 4.452µs ± 0.007µs 4.453µs ± 0.004µs 4.457µs 4.462µs 4.466µs 4.472µs 0.42% -0.464 0.088 0.16% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [4.451µs; 4.453µs] or [-0.023%; +0.023%] None None None

Baseline

Omitted due to size.

@ekump ekump force-pushed the ekump/APMSP-1214-removing-serde-from-deserialization branch 4 times, most recently from 9d56098 to 066807b Compare July 26, 2024 13:44
@ekump ekump marked this pull request as ready for review July 26, 2024 13:44
@ekump ekump requested review from a team as code owners July 26, 2024 13:44
@ekump ekump force-pushed the ekump/APMSP-1214-removing-serde-from-deserialization branch from 066807b to 97aadee Compare July 26, 2024 15:22
Copy link
Contributor

@paullegranddc paullegranddc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what the custom deserializer brings.
The goal is to capture &str instead of String for span fields in the future right?

But from what I know this can be done with serde, using &'a str references for strings, and #[serde::borrow] annotations right?

Comment on lines 8 to 13
pub enum Number {
U8(u8),
U32(u32),
U64(u64),
I8(i8),
I32(i32),
I64(i64),
F64(f64),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see why you need this runtime representation of how the message pack is encoded?

You could make the code simpler by having two read function

  • read_number_signed<T: TryFrom>() -> Result
  • read_number_unsigned<T: TryFrom>() -> Result

What would these functions do is:

  • Widening the decoded numbers to i64 or u64, since these type would be big enough to represent any signed or unsigned number using the existing TryFrom trait implementation for these, then call T::try_from to cast to the desired output.

See this playground for a small implementation of this idea https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=2dfcc2b984bcebb4395179a62087c97f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your suggestion correctly we'd change Number to be:

pub enum Number {
    Signed(i64),
    Unsigned(u64),
    Float(f64),
}

The existing read_number function wouldn't be simplified much because we still need to handle all the different Marker types. It would wind up looking like:

pub fn read_number(buf: &mut &[u8]) -> Result<Number, DecodeError> {
    match rmp::decode::read_marker(buf)
        .map_err(|_| DecodeError::InvalidFormat("Unable to read marker for number".to_owned()))?
    {
        Marker::FixPos(val) => Ok(Number::Unsigned(val as u64)),
        Marker::FixNeg(val) => Ok(Number::Signed(val as i64)),
        Marker::U8 => Ok(Number::Unsigned(
            buf.read_data_u8().map_err(|_| DecodeError::IOError)? as u64,
        )),
        Marker::U16 => Ok(Number::Unsigned(
            buf.read_data_u16().map_err(|_| DecodeError::IOError)? as u64,
        )),
        Marker::U32 => Ok(Number::Unsigned(
            buf.read_data_u32().map_err(|_| DecodeError::IOError)? as u64,
        )),
        Marker::U64 => Ok(Number::Unsigned(
            buf.read_data_u64().map_err(|_| DecodeError::IOError)?,
        )),
        Marker::I8 => Ok(Number::Signed(
            buf.read_data_i8().map_err(|_| DecodeError::IOError)? as i64,
        )),
        Marker::I16 => Ok(Number::Signed(
            buf.read_data_i16().map_err(|_| DecodeError::IOError)? as i64,
        )),
        Marker::I32 => Ok(Number::Signed(
            buf.read_data_i32().map_err(|_| DecodeError::IOError)? as i64,
        )),
        Marker::I64 => Ok(Number::Signed(
            buf.read_data_i64().map_err(|_| DecodeError::IOError)?,
        )),
        Marker::F32 => Ok(Number::Float(
            buf.read_data_f32().map_err(|_| DecodeError::IOError)? as f64,
        )),
        Marker::F64 => Ok(Number::Float(
            buf.read_data_f64().map_err(|_| DecodeError::IOError)?,
        )),
        _ => Err(DecodeError::InvalidType("Invalid number type".to_owned())),
    }
}

A bigger problem arises when we implement the TryFrom traits and need to downcast.

The following will silently truncate a u64 value to u32:

impl TryFrom<Number> for u32 {
    type Error = DecodeError;
    fn try_from(value: Number) -> Result<Self, Self::Error> {
        match value {
            Number::Unsigned(val) => Ok(val as u32),
            _ => Err(DecodeError::InvalidConversion(format!(
                "unable to convert {} to u32",
                value
            ))),
        }
    }
}

To raise an error when we can't downcast we wind up with:

impl TryFrom<Number> for u32 {
    type Error = DecodeError;
    fn try_from(value: Number) -> Result<Self, Self::Error> {
        match value {
            Number::Unsigned(val) if val <= u32::MAX as u64 => Ok(val as u32),
            _ => Err(DecodeError::InvalidConversion(format!(
                "unable to convert {} to u32",
                value
            ))),
        }
    }
}

I lean slightly towards preferring the existing implementation as it seems a bit clearer to me. @hoolioh you originally implemented this. Do you have any thoughts?

And, apologies if I misunderstood your suggestion.

@hoolioh
Copy link
Contributor

hoolioh commented Jul 29, 2024

I don't quite understand what the custom deserializer brings. The goal is to capture &str instead of String for span fields in the future right?

But from what I know this can be done with serde, using &'a str references for strings, and #[serde::borrow] annotations right?

The idea is to to have this a a building block to gradually introduce future performance optimizations. We're evaluating different mechanisms for reducing allocation overhead: string interning, implementing a string arena, memory pools, etc. Also we would want to test different ways of representing the data. In essence this will give us the flexibility to test those different approaches easily.

@paullegranddc
Copy link
Contributor

The idea is to to have this a a building block to gradually introduce future performance optimizations. We're evaluating different mechanisms for reducing allocation overhead: string interning, implementing a string arena, memory pools, etc. Also we would want to test different ways of representing the data. In essence this will give us the flexibility to test those different approaches easily.

That makes sense

match decode::read_marker(buf)
.map_err(|_| DecodeError::InvalidFormat("Unable to read marker for map".to_owned()))?
{
Marker::FixMap(len) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the V04 spec impose to have less than 16 entries, wouldn't it be more future proof to support Map16 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I'm not aware of anything that says v04 can't have maps with > 16 entries. @hoolioh - Are you aware of anything preventing larger maps from being sent from the client libraries?

I've updated the decoder to also support Map16.

@ekump ekump force-pushed the ekump/APMSP-1214-removing-serde-from-deserialization branch 3 times, most recently from 8e81eb8 to 9007960 Compare July 31, 2024 20:59
@ekump ekump requested a review from a team as a code owner August 1, 2024 23:35
@ekump ekump force-pushed the ekump/APMSP-1214-removing-serde-from-deserialization branch 2 times, most recently from 654cd8a to 1b29d18 Compare August 1, 2024 23:37
Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall, and the performance numbers are promising (-20% even without much in the way of optimizations)

}

#[inline]
fn read_string(buf: &mut &[u8]) -> Result<String, DecodeError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this function just use read_string_ref, and then bump the buf like the code in fill_span?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that works. Good idea!


match key {
SpanKey::Service => {
let (value, next) = read_string_ref(buf)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use read_string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and all the other ones below that directly convert the &str to a String can use read_string.

@ekump ekump force-pushed the ekump/APMSP-1214-removing-serde-from-deserialization branch from 2737928 to 58cbd82 Compare August 2, 2024 18:22
@ekump ekump requested a review from a team as a code owner August 5, 2024 13:53
@ekump ekump force-pushed the ekump/APMSP-1214-removing-serde-from-deserialization branch from 3548fb9 to 4421efd Compare August 5, 2024 13:56
Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance regression is real and can be fixed.


match key {
SpanKey::Service => {
let (value, next) = read_string_ref(buf)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and all the other ones below that directly convert the &str to a String can use read_string.

hoolioh and others added 3 commits August 6, 2024 08:37
* Add methods to get string and &str.
* Add Number abstraction layer in order to decode integer and floats.
* Implement decoder for Span.
* Implement decoder for meta attributes.
* Implement decoder for metrics attributes.
* Implement decoder for SpanLinks.
* Add tests.
- introduce enums for span_link and span keys and switch from long if
  statements to matches to codify all possible enumerations.

- include trace payload schema version in namespace for decoder.

- break up decoder into multiple files to be easier to follow.
@ekump ekump force-pushed the ekump/APMSP-1214-removing-serde-from-deserialization branch from 4421efd to 87c03a4 Compare August 6, 2024 12:37
@ekump
Copy link
Contributor Author

ekump commented Aug 6, 2024

The performance regression is real and can be fixed.

@bantonsson - Not only fix, but looks like improve upon:

scenario:benching deserializing traces from msgpack to their internal representation

  • 🟩 execution_time [-1.102µs; -1.099µs] or [-30.072%; -29.998%]

scenario:tags/replace_trace_tags

  • 🟥 execution_time [+130.551ns; +133.498ns] or [+3.022%; +3.090%]

I think the replace_trace_tags regression is a false positive. The changes shouldn't influence that benchmark at all from what I can tell.

Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're back on track. The criterion results are a bit too sensitive to other things happening on the system, but there are definite performance improvements for the deserialization.

@ekump ekump merged commit 7a0e111 into main Aug 6, 2024
34 checks passed
@ekump ekump deleted the ekump/APMSP-1214-removing-serde-from-deserialization branch August 6, 2024 16:46
hoolioh added a commit that referenced this pull request Aug 21, 2024
* Replace the use of rmp_serde with a custom decoder for decoding v04 traces in msgpack format.
* Introduce fuzz testing for trace_utils 

This is a precursor to reducing the number of String allocations that occur when processing traces.

---------

Co-authored-by: Julio Gonzalez <julio.gonzalez@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants