Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduced allocation v04 span representation #598

Merged
merged 19 commits into from
Sep 19, 2024

Conversation

ekump
Copy link
Contributor

@ekump ekump commented Aug 26, 2024

What does this PR do?

Deserialize V.04 Spans into a new internal representation that attempts to minimize new String allocations. Using the existing protobuf definitions leads to a level of performance that tracer libraries are/will be uncomfortable with.

Introduce a BytesString that borrows from a tinybytes::Bytes buffer slice and a BufferWrapper for keeping track of where in the bytes buffer we are slicing from.

This will impact only v04 Spans at the moment.

Motivation

The PHP Language Platform team observed an unacceptable level of increased memory footprint when testing the use of the Sidecar for sending traces to the agent.

Additional Notes

The use of non-static &str or Cow types was not possible as once spans are decoded they may need to be passed to async code paths in SendData on the way to the agent and this caused several lifetime issues. The use of tinybytes is necessary as Bytes doesn't support &[u8] input, which is necessary when we can't own the data (i.e. shared memory from the tracer using IPC).

How to test the change?

Unit and integration tests have been updated.

@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch 2 times, most recently from 93c4665 to c7e268b Compare August 26, 2024 20:27
@pr-commenter
Copy link

pr-commenter bot commented Aug 26, 2024

Benchmarks

Comparison

Benchmark execution time: 2024-09-19 15:25:19

Comparing candidate commit 7fc6dcf in PR branch ekump/APMSP-1217-reduced-alloc-span-structs with baseline commit 790a48b in branch main.

Found 8 performance improvements and 34 performance regressions! Performance is the same for 9 metrics, 2 unstable metrics.

scenario:benching deserializing traces from msgpack to their internal representation

  • 🟩 execution_time [-189.771ns; -172.867ns] or [-13.775%; -12.548%]

scenario:credit_card/is_card_number/

  • 🟥 execution_time [+1.207µs; +1.207µs] or [+99.741%; +99.786%]
  • 🟥 throughput [-412992386.902op/s; -412669589.754op/s] or [-49.960%; -49.921%]

scenario:credit_card/is_card_number/ 3782-8224-6310-005

  • 🟥 execution_time [+5.059µs; +5.457µs] or [+4.557%; +4.915%]
  • 🟥 throughput [-421956.436op/s; -391609.695op/s] or [-4.684%; -4.347%]

scenario:credit_card/is_card_number/ 378282246310005

  • 🟥 execution_time [+3.529µs; +3.913µs] or [+3.407%; +3.777%]
  • 🟥 throughput [-350820.681op/s; -317044.785op/s] or [-3.634%; -3.284%]

scenario:credit_card/is_card_number/37828224631

  • 🟥 execution_time [+1.207µs; +1.207µs] or [+99.751%; +99.807%]
  • 🟥 throughput [-413038721.942op/s; -412709509.251op/s] or [-49.965%; -49.925%]

scenario:credit_card/is_card_number/378282246310005

  • 🟥 execution_time [+5.291µs; +5.703µs] or [+5.277%; +5.687%]
  • 🟥 throughput [-536226.171op/s; -498291.671op/s] or [-5.377%; -4.997%]

scenario:credit_card/is_card_number/37828224631000521389798

  • 🟥 execution_time [+11.981µs; +12.170µs] or [+12.675%; +12.874%]
  • 🟥 throughput [-1208276.642op/s; -1188976.660op/s] or [-11.421%; -11.238%]

scenario:credit_card/is_card_number/x371413321323331

  • 🟥 execution_time [+1.668µs; +1.801µs] or [+7.876%; +8.505%]
  • 🟥 throughput [-3715142.206op/s; -3439539.757op/s] or [-7.865%; -7.281%]

scenario:credit_card/is_card_number_no_luhn/

  • 🟥 execution_time [+1.208µs; +1.208µs] or [+99.936%; +99.959%]
  • 🟥 throughput [-413670041.297op/s; -413572859.387op/s] or [-49.993%; -49.981%]

scenario:credit_card/is_card_number_no_luhn/ 3782-8224-6310-005

  • 🟩 execution_time [-8.797µs; -8.619µs] or [-9.164%; -8.978%]
  • 🟩 throughput [+1028836.897op/s; +1048942.384op/s] or [+9.876%; +10.069%]

scenario:credit_card/is_card_number_no_luhn/ 378282246310005

  • 🟩 execution_time [-7.177µs; -6.931µs] or [-8.227%; -7.945%]
  • 🟩 throughput [+991131.207op/s; +1026146.237op/s] or [+8.646%; +8.951%]

scenario:credit_card/is_card_number_no_luhn/37828224631

  • 🟥 execution_time [+1.206µs; +1.206µs] or [+99.606%; +99.650%]
  • 🟥 throughput [-412451298.865op/s; -412126939.540op/s] or [-49.926%; -49.887%]

scenario:credit_card/is_card_number_no_luhn/378282246310005

  • 🟩 execution_time [-5.899µs; -5.620µs] or [-7.110%; -6.773%]
  • 🟩 throughput [+877660.039op/s; +921192.963op/s] or [+7.282%; +7.643%]

scenario:credit_card/is_card_number_no_luhn/37828224631000521389798

  • 🟥 execution_time [+12.111µs; +12.270µs] or [+12.832%; +13.000%]
  • 🟥 throughput [-1219868.373op/s; -1204056.048op/s] or [-11.513%; -11.364%]

scenario:credit_card/is_card_number_no_luhn/x371413321323331

  • 🟥 execution_time [+1.562µs; +1.696µs] or [+7.367%; +7.995%]
  • 🟥 throughput [-3504027.980op/s; -3227006.546op/s] or [-7.430%; -6.842%]

scenario:normalization/normalize_name/normalize_name/bad-name

  • 🟥 execution_time [+2.297µs; +2.311µs] or [+8.901%; +8.954%]
  • 🟥 throughput [-3185944.625op/s; -3166373.941op/s] or [-8.221%; -8.171%]

scenario:normalization/normalize_name/normalize_name/good

  • 🟥 execution_time [+1.097µs; +1.114µs] or [+7.134%; +7.249%]
  • 🟥 throughput [-4397824.088op/s; -4329698.581op/s] or [-6.761%; -6.656%]

scenario:normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...

  • 🟥 execution_time [+30.547µs; +30.680µs] or [+6.018%; +6.044%]
  • 🟥 throughput [-112295.354op/s; -111814.434op/s] or [-5.700%; -5.676%]

scenario:normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters

  • 🟥 execution_time [+4.506µs; +4.615µs] or [+2.498%; +2.558%]
  • 🟥 throughput [-138358.820op/s; -135076.812op/s] or [-2.496%; -2.436%]

scenario:normalization/normalize_service/normalize_service/[empty string]

  • 🟥 execution_time [+1.058µs; +1.088µs] or [+2.419%; +2.489%]
  • 🟥 throughput [-555950.165op/s; -539940.691op/s] or [-2.431%; -2.361%]

scenario:redis/obfuscate_redis_string

  • 🟥 execution_time [+1.450µs; +1.806µs] or [+3.856%; +4.801%]

scenario:sql/obfuscate_sql_string

  • 🟩 execution_time [-3.032µs; -2.973µs] or [-4.005%; -3.927%]

scenario:tags/replace_trace_tags

  • 🟥 execution_time [+55.317ns; +61.954ns] or [+2.043%; +2.288%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 276.029µs 276.337µs ± 0.178µs 276.315µs ± 0.131µs 276.471µs 276.648µs 276.791µs 276.866µs 0.20% 0.382 -0.327 0.06% 0.013µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 3611857.804op/s 3618771.843op/s ± 2333.481op/s 3619054.798op/s ± 1718.576op/s 3620288.421op/s 3622179.341op/s 3622688.770op/s 3622802.770op/s 0.10% -0.379 -0.331 0.06% 165.002op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 28.030µs 28.109µs ± 0.030µs 28.119µs ± 0.011µs 28.126µs 28.151µs 28.185µs 28.256µs 0.49% 0.523 2.287 0.11% 0.002µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 35390990.475op/s 35575791.714op/s ± 37469.197op/s 35563756.497op/s ± 14308.154op/s 35615371.115op/s 35619284.850op/s 35651251.439op/s 35675681.980op/s 0.31% -0.511 2.234 0.11% 2649.472op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 16.328µs 16.479µs ± 0.054µs 16.472µs ± 0.035µs 16.517µs 16.567µs 16.607µs 16.620µs 0.90% 0.047 0.010 0.33% 0.004µs 1 200
normalization/normalize_name/normalize_name/good throughput 60169028.687op/s 60685764.697op/s ± 200630.504op/s 60709727.862op/s ± 129304.800op/s 60807381.706op/s 61007738.356op/s 61148114.196op/s 61242628.638op/s 0.88% -0.027 0.018 0.33% 14186.719op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [276.312µs; 276.362µs] or [-0.009%; +0.009%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [3618448.445op/s; 3619095.241op/s] or [-0.009%; +0.009%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [28.105µs; 28.113µs] or [-0.015%; +0.015%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [35570598.844op/s; 35580984.585op/s] or [-0.015%; +0.015%] None None None
normalization/normalize_name/normalize_name/good execution_time [16.471µs; 16.486µs] or [-0.046%; +0.046%] None None None
normalization/normalize_name/normalize_name/good throughput [60657959.239op/s; 60713570.155op/s] or [-0.046%; +0.046%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 142.442µs 143.428µs ± 0.385µs 143.367µs ± 0.145µs 143.526µs 144.063µs 144.773µs 145.555µs 1.53% 1.903 6.908 0.27% 0.027µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [143.375µs; 143.482µs] or [-0.037%; +0.037%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 296.692ns 306.338ns ± 12.164ns 301.241ns ± 2.973ns 308.153ns 331.340ns 347.429ns 350.786ns 16.45% 1.957 3.235 3.96% 0.860ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [304.652ns; 308.024ns] or [-0.550%; +0.550%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 38.691µs 39.245µs ± 0.891µs 38.833µs ± 0.048µs 38.947µs 41.137µs 41.187µs 42.295µs 8.91% 1.723 1.145 2.26% 0.063µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [39.121µs; 39.368µs] or [-0.315%; +0.315%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 19.181µs 25.717µs ± 11.750µs 19.978µs ± 0.254µs 27.826µs 45.986µs 49.481µs 104.230µs 421.73% 3.115 14.708 45.58% 0.831µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [24.088µs; 27.345µs] or [-6.332%; +6.332%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 72.561µs 72.715µs ± 0.138µs 72.702µs ± 0.031µs 72.735µs 72.792µs 72.944µs 74.488µs 2.46% 10.757 135.098 0.19% 0.010µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [72.695µs; 72.734µs] or [-0.026%; +0.026%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 2.415µs 2.417µs ± 0.001µs 2.416µs ± 0.001µs 2.417µs 2.418µs 2.419µs 2.420µs 0.13% 0.761 -0.082 0.04% 0.000µs 1 200
credit_card/is_card_number/ throughput 413298132.078op/s 413807866.387op/s ± 158359.460op/s 413841073.745op/s ± 109190.200op/s 413934870.331op/s 413996913.492op/s 414024132.792op/s 414057005.183op/s 0.05% -0.759 -0.086 0.04% 11197.705op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 110.854µs 116.269µs ± 1.231µs 116.566µs ± 0.616µs 117.082µs 117.840µs 118.344µs 118.628µs 1.77% -1.214 2.219 1.06% 0.087µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 8429720.404op/s 8601716.096op/s ± 92307.299op/s 8578866.423op/s ± 45326.066op/s 8640955.305op/s 8790912.923op/s 8880351.569op/s 9020882.721op/s 5.15% 1.304 2.571 1.07% 6527.112op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 104.280µs 107.311µs ± 1.220µs 107.198µs ± 0.836µs 108.108µs 109.370µs 110.436µs 111.099µs 3.64% 0.325 0.084 1.13% 0.086µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 9000965.593op/s 9319886.436op/s ± 105590.387op/s 9328520.224op/s ± 72844.812op/s 9388739.472op/s 9488637.231op/s 9557504.396op/s 9589566.552op/s 2.80% -0.258 0.024 1.13% 7466.368op/s 1 200
credit_card/is_card_number/37828224631 execution_time 2.415µs 2.417µs ± 0.002µs 2.417µs ± 0.001µs 2.417µs 2.418µs 2.419µs 2.439µs 0.90% 10.143 125.569 0.07% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 410074920.430op/s 413788190.676op/s ± 294585.281op/s 413780800.575op/s ± 85772.818op/s 413920813.145op/s 414016528.576op/s 414054533.814op/s 414071423.818op/s 0.07% -10.091 124.679 0.07% 20830.325op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 101.439µs 105.776µs ± 1.319µs 105.911µs ± 0.775µs 106.680µs 107.717µs 108.037µs 108.316µs 2.27% -0.657 0.339 1.24% 0.093µs 1 200
credit_card/is_card_number/378282246310005 throughput 9232271.250op/s 9455425.449op/s ± 118953.933op/s 9441883.855op/s ± 68766.572op/s 9515689.095op/s 9694996.549op/s 9774860.808op/s 9858133.247op/s 4.41% 0.730 0.476 1.25% 8411.313op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 105.678µs 106.601µs ± 0.381µs 106.647µs ± 0.264µs 106.844µs 107.164µs 107.484µs 107.704µs 0.99% 0.011 -0.145 0.36% 0.027µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 9284667.715op/s 9380910.987op/s ± 33492.623op/s 9376690.018op/s ± 23204.447op/s 9406863.483op/s 9436256.836op/s 9453809.620op/s 9462683.655op/s 0.92% 0.009 -0.157 0.36% 2368.286op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 21.976µs 22.909µs ± 0.344µs 22.948µs ± 0.222µs 23.124µs 23.530µs 23.658µs 23.749µs 3.49% -0.087 -0.084 1.50% 0.024µs 1 200
credit_card/is_card_number/x371413321323331 throughput 42107149.688op/s 43661223.292op/s ± 656654.262op/s 43576562.251op/s ± 420180.180op/s 44136485.567op/s 44789956.665op/s 45181355.685op/s 45505190.924op/s 4.43% 0.173 -0.073 1.50% 46432.468op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 2.415µs 2.416µs ± 0.001µs 2.416µs ± 0.001µs 2.417µs 2.418µs 2.419µs 2.420µs 0.16% 0.944 0.607 0.04% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 413236259.919op/s 413837220.560op/s ± 151374.759op/s 413884210.318op/s ± 97114.133op/s 413948763.164op/s 414016796.213op/s 414045257.844op/s 414064549.252op/s 0.04% -0.943 0.601 0.04% 10703.812op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 86.977µs 87.289µs ± 0.256µs 87.259µs ± 0.070µs 87.323µs 87.537µs 88.817µs 89.005µs 2.00% 4.463 24.130 0.29% 0.018µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 11235338.128op/s 11456274.833op/s ± 33153.902op/s 11460137.642op/s ± 9190.636op/s 11470283.363op/s 11485037.524op/s 11488551.911op/s 11497286.640op/s 0.32% -4.409 23.644 0.29% 2344.335op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 79.640µs 80.183µs ± 0.636µs 79.977µs ± 0.231µs 80.269µs 81.896µs 82.561µs 82.723µs 3.43% 2.334 5.480 0.79% 0.045µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 12088556.315op/s 12472196.528op/s ± 97121.338op/s 12503651.069op/s ± 36195.550op/s 12533916.159op/s 12552254.660op/s 12555196.095op/s 12556542.109op/s 0.42% -2.286 5.252 0.78% 6867.516op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 2.415µs 2.416µs ± 0.001µs 2.416µs ± 0.001µs 2.417µs 2.418µs 2.419µs 2.420µs 0.16% 1.005 1.397 0.04% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 413183416.002op/s 413828415.192op/s ± 148759.089op/s 413855800.206op/s ± 107811.217op/s 413951630.118op/s 414002753.413op/s 414030882.202op/s 414045495.587op/s 0.05% -1.003 1.387 0.04% 10518.856op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 76.627µs 77.214µs ± 0.756µs 76.904µs ± 0.111µs 77.165µs 78.992µs 79.638µs 81.427µs 5.88% 2.584 7.237 0.98% 0.053µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 12280927.153op/s 12952143.144op/s ± 123739.897op/s 13003189.683op/s ± 18688.196op/s 13015956.262op/s 13031052.999op/s 13039953.283op/s 13050223.786op/s 0.36% -2.513 6.668 0.95% 8749.732op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 105.523µs 106.575µs ± 0.416µs 106.592µs ± 0.283µs 106.846µs 107.115µs 107.881µs 108.021µs 1.34% 0.262 0.645 0.39% 0.029µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 9257431.847op/s 9383182.345op/s ± 36623.362op/s 9381604.131op/s ± 24946.210op/s 9408600.637op/s 9439643.007op/s 9460075.641op/s 9476649.988op/s 1.01% -0.232 0.592 0.39% 2589.663op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 21.821µs 22.838µs ± 0.341µs 22.882µs ± 0.218µs 23.085µs 23.319µs 23.562µs 23.573µs 3.02% -0.588 0.214 1.49% 0.024µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 42421501.406op/s 43796948.985op/s ± 659814.839op/s 43702106.133op/s ± 415778.155op/s 44252970.956op/s 45055643.291op/s 45561969.450op/s 45826581.656op/s 4.86% 0.673 0.339 1.50% 46655.955op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [2.416µs; 2.417µs] or [-0.005%; +0.005%] None None None
credit_card/is_card_number/ throughput [413785919.289op/s; 413829813.485op/s] or [-0.005%; +0.005%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [116.098µs; 116.440µs] or [-0.147%; +0.147%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [8588923.192op/s; 8614509.000op/s] or [-0.149%; +0.149%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [107.142µs; 107.480µs] or [-0.158%; +0.158%] None None None
credit_card/is_card_number/ 378282246310005 throughput [9305252.624op/s; 9334520.248op/s] or [-0.157%; +0.157%] None None None
credit_card/is_card_number/37828224631 execution_time [2.416µs; 2.417µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/37828224631 throughput [413747363.989op/s; 413829017.363op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number/378282246310005 execution_time [105.593µs; 105.959µs] or [-0.173%; +0.173%] None None None
credit_card/is_card_number/378282246310005 throughput [9438939.578op/s; 9471911.320op/s] or [-0.174%; +0.174%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [106.548µs; 106.654µs] or [-0.049%; +0.049%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [9376269.232op/s; 9385552.743op/s] or [-0.049%; +0.049%] None None None
credit_card/is_card_number/x371413321323331 execution_time [22.861µs; 22.956µs] or [-0.208%; +0.208%] None None None
credit_card/is_card_number/x371413321323331 throughput [43570217.327op/s; 43752229.258op/s] or [-0.208%; +0.208%] None None None
credit_card/is_card_number_no_luhn/ execution_time [2.416µs; 2.417µs] or [-0.005%; +0.005%] None None None
credit_card/is_card_number_no_luhn/ throughput [413816241.475op/s; 413858199.646op/s] or [-0.005%; +0.005%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [87.254µs; 87.325µs] or [-0.041%; +0.041%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [11451680.021op/s; 11460869.645op/s] or [-0.040%; +0.040%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [80.095µs; 80.271µs] or [-0.110%; +0.110%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [12458736.444op/s; 12485656.611op/s] or [-0.108%; +0.108%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [2.416µs; 2.417µs] or [-0.005%; +0.005%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [413807798.613op/s; 413849031.771op/s] or [-0.005%; +0.005%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [77.110µs; 77.319µs] or [-0.136%; +0.136%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [12934993.984op/s; 12969292.304op/s] or [-0.132%; +0.132%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [106.518µs; 106.633µs] or [-0.054%; +0.054%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [9378106.699op/s; 9388257.991op/s] or [-0.054%; +0.054%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [22.791µs; 22.885µs] or [-0.207%; +0.207%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [43705504.994op/s; 43888392.976op/s] or [-0.209%; +0.209%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 1.130µs 1.196µs ± 0.023µs 1.204µs ± 0.012µs 1.214µs 1.217µs 1.219µs 1.219µs 1.29% -1.428 1.160 1.94% 0.002µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [1.193µs; 1.200µs] or [-0.269%; +0.269%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 1.491µs 3.244µs ± 1.544µs 3.094µs ± 0.023µs 3.115µs 3.177µs 13.860µs 17.715µs 472.50% 7.927 63.613 47.49% 0.109µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [3.030µs; 3.458µs] or [-6.598%; +6.598%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 9.019ms 9.066ms ± 0.015ms 9.066ms ± 0.010ms 9.076ms 9.092ms 9.100ms 9.106ms 0.43% -0.051 0.035 0.17% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [9.064ms; 9.068ms] or [-0.023%; +0.023%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 2.723µs 2.766µs ± 0.017µs 2.772µs ± 0.011µs 2.778µs 2.785µs 2.796µs 2.796µs 0.87% -0.724 -0.244 0.60% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [2.764µs; 2.769µs] or [-0.083%; +0.083%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 7fc6dcf 1726758945 ekump/APMSP-1217-reduced-alloc-span-structs
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 537.621µs 538.219µs ± 0.375µs 538.170µs ± 0.205µs 538.366µs 538.877µs 539.433µs 539.730µs 0.29% 1.089 1.567 0.07% 0.027µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1852777.837op/s 1857979.420op/s ± 1294.970op/s 1858149.217op/s ± 707.772op/s 1858855.668op/s 1859859.693op/s 1859999.812op/s 1860047.407op/s 0.10% -1.084 1.551 0.07% 91.568op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 466.799µs 467.495µs ± 0.295µs 467.480µs ± 0.168µs 467.660µs 468.000µs 468.144µs 468.934µs 0.31% 0.670 2.133 0.06% 0.021µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2132496.976op/s 2139059.726op/s ± 1349.677op/s 2139130.994op/s ± 766.667op/s 2139829.947op/s 2141208.557op/s 2141604.755op/s 2142251.353op/s 0.15% -0.663 2.106 0.06% 95.437op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 184.564µs 184.929µs ± 0.252µs 184.867µs ± 0.142µs 185.026µs 185.421µs 185.929µs 186.205µs 0.72% 1.817 5.020 0.14% 0.018µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5370412.091op/s 5407500.364op/s ± 7353.060op/s 5409282.545op/s ± 4150.975op/s 5412304.082op/s 5417243.583op/s 5417921.015op/s 5418188.670op/s 0.16% -1.801 4.940 0.14% 519.940op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 44.722µs 44.797µs ± 0.037µs 44.795µs ± 0.023µs 44.819µs 44.855µs 44.875µs 45.018µs 0.50% 1.115 5.161 0.08% 0.003µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 22213581.881op/s 22323028.554op/s ± 18375.595op/s 22324049.027op/s ± 11265.901op/s 22335086.430op/s 22349203.730op/s 22358688.714op/s 22360492.419op/s 0.16% -1.100 5.068 0.08% 1299.351op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 49.717µs 49.806µs ± 0.116µs 49.741µs ± 0.017µs 49.860µs 49.925µs 49.982µs 50.980µs 2.49% 5.416 50.929 0.23% 0.008µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 19615584.719op/s 20078089.704op/s ± 46108.598op/s 20104259.286op/s ± 6983.951op/s 20108090.338op/s 20110387.554op/s 20112125.051op/s 20113773.795op/s 0.05% -5.255 48.698 0.23% 3260.370op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [538.167µs; 538.271µs] or [-0.010%; +0.010%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1857799.949op/s; 1858158.890op/s] or [-0.010%; +0.010%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [467.454µs; 467.536µs] or [-0.009%; +0.009%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2138872.673op/s; 2139246.778op/s] or [-0.009%; +0.009%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [184.894µs; 184.964µs] or [-0.019%; +0.019%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5406481.301op/s; 5408519.428op/s] or [-0.019%; +0.019%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [44.792µs; 44.802µs] or [-0.011%; +0.011%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [22320481.874op/s; 22325575.235op/s] or [-0.011%; +0.011%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [49.790µs; 49.822µs] or [-0.032%; +0.032%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [20071699.496op/s; 20084479.912op/s] or [-0.032%; +0.032%] None None None

Baseline

Omitted due to size.

@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2024

Codecov Report

Attention: Patch coverage is 96.99399% with 15 lines in your changes missing coverage. Please review.

Project coverage is 73.19%. Comparing base (790a48b) to head (7fc6dcf).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #598      +/-   ##
==========================================
+ Coverage   73.05%   73.19%   +0.13%     
==========================================
  Files         252      254       +2     
  Lines       36072    36312     +240     
==========================================
+ Hits        26352    26578     +226     
- Misses       9720     9734      +14     
Components Coverage Δ
crashtracker 20.53% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 90.17% <0.00%> (+0.05%) ⬆️
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 83.08% <ø> (ø)
ddcommon-ffi 69.52% <ø> (ø)
ddtelemetry 59.10% <ø> (ø)
ipc 83.58% <0.00%> (-0.06%) ⬇️
profiling 84.26% <ø> (ø)
profiling-ffi 77.42% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 40.10% <0.00%> (-0.07%) ⬇️
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 50.36% <ø> (ø)
tinybytes 94.77% <100.00%> (+3.10%) ⬆️
trace-mini-agent 72.47% <100.00%> (-0.03%) ⬇️
trace-normalization 98.25% <ø> (ø)
trace-obfuscation 95.73% <ø> (ø)
trace-protobuf 77.67% <ø> (ø)
trace-utils 93.54% <100.00%> (+0.23%) ⬆️

@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch from b3b1fdc to 2626b37 Compare August 29, 2024 21:29
trace-utils/src/no_alloc_string.rs Outdated Show resolved Hide resolved
trace-utils/src/no_alloc_string.rs Outdated Show resolved Hide resolved
trace-utils/src/no_alloc_string.rs Outdated Show resolved Hide resolved
trace-utils/src/msgpack_decoder/v04/decoder/mod.rs Outdated Show resolved Hide resolved
trace-utils/src/msgpack_decoder/v04/decoder/mod.rs Outdated Show resolved Hide resolved
trace-utils/src/msgpack_decoder/v04/decoder/span_link.rs Outdated Show resolved Hide resolved
trace-utils/src/msgpack_decoder/v04/decoder/mod.rs Outdated Show resolved Hide resolved
trace-utils/src/msgpack_decoder/v04/decoder/span.rs Outdated Show resolved Hide resolved
trace-utils/src/no_alloc_string.rs Outdated Show resolved Hide resolved
@bwoebi
Copy link
Contributor

bwoebi commented Aug 30, 2024

I wonder whether we should put NoAllocString (Maybe call it BytesString?) into tinybytes crate directly. Seems like an useful wrapper for usage with tinybytes in general.

@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch from a3e38c1 to f48e785 Compare September 4, 2024 13:22
@ekump
Copy link
Contributor Author

ekump commented Sep 4, 2024

I wonder whether we should put NoAllocString (Maybe call it BytesString?) into tinybytes crate directly. Seems like an useful wrapper for usage with tinybytes in general.

@bwoebi - That's a good idea. The NoAllocString is sufficiently coupled to tinybytes at the moment that it doesn't make sense to separate them.

@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch from f48e785 to bd5d7bb Compare September 4, 2024 16:51
@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch from bd5d7bb to 3caff9e Compare September 4, 2024 17:05
@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch from 1a18a73 to 1de65da Compare September 5, 2024 20:33
@ekump ekump marked this pull request as ready for review September 5, 2024 20:35
@ekump ekump requested review from a team as code owners September 5, 2024 20:35
@bwoebi
Copy link
Contributor

bwoebi commented Sep 6, 2024

@ekump (and reviewers) Please consider 386a684 first. It is quite a refactor on the ByteWrapper approach, which hopefully increases soundness and is probably also a bit more straightforward. (see also commit description)
It implements what I've been commenting before (albeit at a larger scale than I initially anticipated).

To view the full diff of this PR plus my commit, please look at https://github.com/DataDog/libdatadog/compare/bob/refactor-bytes-wrapper-away.

@bantonsson
Copy link
Contributor

There is a reason why Bytes is immutable @bwoebi, so it's Sync and Send and I don't want to introduce a way to mutate it.

@bwoebi
Copy link
Contributor

bwoebi commented Sep 6, 2024

@bantonsson What actually needs to be immutable is the underlying content, which it still is. This commit just allows manipulating the range into the underlying slice. Which in itself doesn't violate Sync/Send constraints (as you still need a &mut reference to even call Bytes::as_mut_slice() - i.e. you're the sole owner of the Bytes struct (the struct, not the underlying data) at that point).

@bantonsson
Copy link
Contributor

@bwoebi You're indeed right about Sync and Send, and for decoding MessagePack this way we will only create new slices within the existing slice before we advance it.

@bwoebi
Copy link
Contributor

bwoebi commented Sep 6, 2024

Okay, brought my commit to this branch after talking to Edmund :-)

@@ -10,8 +10,7 @@ use std::{
/// Immutable bytes type with zero copy cloning and slicing.
#[derive(Clone)]
pub struct Bytes {
ptr: *const u8,
len: usize,
slice: &'static [u8],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the {ptr, len} is better, since the slice is not really static.
I think it'd be pretty easy to misuse the definition.

On a sidenote, NonNull<u8> is better than *const since the type can then benefit from non null pointer optimization

Copy link
Contributor

@bwoebi bwoebi Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer (ptr, len), but it doesn't really work with as_mut_slice(). But I think 'static is fine - it's ultimately no different to a bare pointer in terms of lifetime, whenever you reassemble it into a slice.

Comment on lines 48 to 56
pub const fn len(&self) -> usize {
self.len
self.slice.len()
}

/// Returns `true` if the `Bytes` is empty.
#[inline]
pub const fn is_empty(&self) -> bool {
self.len == 0
self.slice.is_empty()
}
Copy link
Contributor

@paullegranddc paullegranddc Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit but theses functions don't actually need to be implemented on Bytes since already present the type from impl Deref<[u8]>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except they are not const through deref, so that would be one reason to keep them (doubt that it helps here, this is merely an educational mention).


let mut trace: Vec<Span> = Default::default();
(0..trace_count).try_fold(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really a small nit, but I think the for loops was clearer than the nested try_fold

Comment on lines +171 to +173
pub unsafe fn as_mut_slice(&mut self) -> &mut &'static [u8] {
&mut self.slice
}
Copy link
Contributor

@paullegranddc paullegranddc Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you added this function because you want to pass en &mut &[u8] to the rpm decode functions, but really there is a simpler and safer way to do that.
rpm reads from types implementing RmpRead, which is implemented by default on all types implementing io::Read.

So IMO the correct way to use Bytes with rpm is to just impl Read on it and advance the base pointer from how many bytes were read.

impl io::Read for Bytes {
  fn read(&mut self, &mut buff) -> Result<u8> {
    let bytes_read = min(buff.len, self.len);
    std::slice::copy_from_slice(&self[..bytes_read], buff[..bytes_read]);
    self.ptr +=  bytes_read;
    self.len -= bytes_read;
    Ok(bytes_read)
  }
}

Then you could just do rmp::decode::read_array_len(&mut data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just as simple as that, In particular with the string reading:

    read_string_ref_nomut(unsafe { buf.as_mut_slice() }).map(|(str, newbuf)| {
        let string = BytesString::from_bytes_slice(buf, str);
        *unsafe { buf.as_mut_slice() } = newbuf;
        string
    })

there's a dance around the ordering of executions: first, read the string. Then fetch the string slice. Then adjust the size.
Because Bytes asserts that any read from it is within its bounds. So we'd then have to craft an unsafe method for that instead.

Also, your Read impl is actually copying every single byte. Passing a slice directly does not incur that overhead. It's maybe a bit more elegant to use Read, but actually not as performant. And this PR here is supposed to solve ... performance problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just been profiling the whole patch. There's currently still one copy happening (the whole data buffer at once) in SidecarServer::send_trace_v04. That single copy accounts for 35% of the whole overhead. So copying here is most likely unacceptable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for sure we don't want to copy the strings we read, but for that you can make cleaner abstractions than using read_string_ref_nomut.

We could very well rewrite it to use the following primitives to encapsulate the unsafe manipulation in the Bytes implementation

impl Bytes {
  // make the bytes instance point to original_buffer[n..]
  // returns None if the buffer is smaller than n
  fn advance((&mut self, n: usize) -> Option<()> {
    if n > self.len { return None }
    self.ptr += n;
    self.len -= n;
    Ok(())
  }
  
  // make the bytes instance point to original_buffer[..n]
  // returns None if the buffer is smaller than n
  fn shrink((&mut self, n: usize) -> Option<()> {
    if n > self.len { return None }
    self.len = n;
    Ok(())
  }

  // Yield the first n bytes and advance the offset from as much 
  fn split_front_ref(&mut self, n: usize) -> Option<&[u8]> {
    let start = self.ptr;
    self.advance(n)?
    // Safe because the lifetime of the slice is smaller than the Bytes
    // and n smaller than length
    Ok(unsafe { slice:: from_raw_parts(start, n) })
  }
  
  fn split_front(&mut self, n: usize) -> Option<Bytes> {
    if n > self.len { return None }
    let mut front = self.clone();
    self.advance(n)?;
    front.shrink(n)?;
    Ok(front)
  }

}

fn read_string_ref(bytes: &'a mut Bytes) -> Result<&'a str, DecodeError> {
  let len = rpm::decode::read_str_len(bytes)?;
  let str_ref = self.split_front_ref(len).map_err(|| DecodeStringError::BufferSizeTooSmall(len))?;
  str::from_utf8(str_ref).map_err(|_| DecodeStringError::InvalidUtf8())
}

fn read_string_bytes(bytes: &'a mut Bytes) -> Result<&'a str, DecodeError> {
  let len = rpm::decode::read_str_len(bytes)?;
  let str_bytes = self.split_front(len).map_err(|| DecodeStringError::BufferSizeTooSmall(len))?;
  BytesString::from_bytes(str_bytes).map_err(...)
}

Copy link
Contributor

@paullegranddc paullegranddc Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I'm curious were we spend time other than the serialization, can your share the profile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwoebi and @paullegranddc - Is this conversation still ongoing? Any changes either of you require here? I don't have a strong opinion, so happy to implement whatever the group agrees on.

Copy link
Contributor

@bwoebi bwoebi Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to implement @paullegranddc suggestion with io::Read and the additional methods to Bytes. Should probably also fix the miri failure I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my end, I don't have a strong opinion. The code currently works, so I think we can merge it, but it'd be nice if you have time to refactor it. Either in this PR or latter

Copy link
Contributor Author

@ekump ekump Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paullegranddc @bwoebi - I do like the impl io::Read solution as it seems to contain less sharp edges, but benchmarks wind up being around 30% slower compared to the as_mut_slice implementation. It's possible I misunderstood the suggestion and implemented something incorrectly here. But if neither of you see any glaring errors I will revert to the as_mut_slice implementation for now and create a ticket to follow up on this in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not surprise me too much, at least it would need much more optimization by the compiler. So, if it's actually 30% slower, then please revert it.

@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch from 7df2b5b to ec21732 Compare September 13, 2024 20:05
tinybytes/src/lib.rs Outdated Show resolved Hide resolved
@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch 3 times, most recently from c814f9b to 7ca377c Compare September 18, 2024 12:56
ekump and others added 18 commits September 19, 2024 10:16
Bytes itself is already a structure with a slice into some underlying bytes.
Thus, we are adding a cheap way to directly modify the slice of the Bytes struct.

To facilitate this, we now store the tuple (ptr, len) as a &[] slice within the Bytes struct. The representation of these is exactly identical to before, but it allows trivial manipulation of the slice via direct assignment or &mut &[u8] reference.

The way the underlying field was exposed on the BytesWrapper struct was also unsound. Access to it was not wrapped within an unsafe method.

The decoder code can now trivially clone() the given Bytes struct and carry it around, directly shrinking the slice size of the underlying Bytes instance as the data is being processed.

Additionally, there now exist two helper methods for numbers and strings (read_string_bytes and read_number_bytes) which handle a bit of the boilerplate.
Sadly it is necessary to always write "unsafe { buf.as_mut_slice() }", but this is necessary for soundness, we only can actually guarantee safety when calling the actual decode functions, as long as we want to carry Bytes around.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
@ekump ekump force-pushed the ekump/APMSP-1217-reduced-alloc-span-structs branch from a9211b7 to e0f127e Compare September 19, 2024 14:53
tinybytes/Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@bwoebi bwoebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, thanks a lot for the effort in optimizing this!

@ekump ekump merged commit 4c7552b into main Sep 19, 2024
30 checks passed
@ekump ekump deleted the ekump/APMSP-1217-reduced-alloc-span-structs branch September 19, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants