Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use Uint8Array for ids #786

Closed

Conversation

Flarna
Copy link
Member

@Flarna Flarna commented Feb 14, 2020

Which problem is this PR solving?

closes #698

Short description of the changes

Use Uint8Array for traceId/spandId to avoid conversion from buffer to
hex string during creation and from hex string to buffer during export.

The opentelemetry protocol uses also a byte array in protobuf.

There are still conversions needed for propagation via HTTP headers but
overall the number of conversions should be reduced.

Use Uint8Array for traceId/spandId to avoid conversion from buffer to
hex string during creation and from hex string to buffer during export.

The opentelemetry protocol uses also a byte array in protobuf.

There are still conversions needed for propagation via HTTP headers but
overall the number of conversions should be reduced.
@Flarna Flarna changed the title feat: use Uint8Array for ids WIP feat: use Uint8Array for ids Feb 14, 2020
@OlivierAlbertini
Copy link
Member

Can we get quick benchmark ? Like ab test would be acceptable to me.

@Flarna
Copy link
Member Author

Flarna commented Feb 14, 2020

Can we get quick benchmark ? Like ab test would be acceptable to me.

Are there somewhere instructions how to run OTel benchmarks?

@dyladan
Copy link
Member

dyladan commented Feb 14, 2020

Can we get quick benchmark ? Like ab test would be acceptable to me.

Are there somewhere instructions how to run OTel benchmarks?

you can just do npm run bench in the root of the project

@codecov-io
Copy link

codecov-io commented Feb 14, 2020

Codecov Report

Merging #786 into master will increase coverage by 1.96%.
The diff coverage is 93.48%.

@@            Coverage Diff             @@
##           master     #786      +/-   ##
==========================================
+ Coverage   90.67%   92.64%   +1.96%     
==========================================
  Files         236      243       +7     
  Lines       11045    10816     -229     
  Branches     1024     1064      +40     
==========================================
+ Hits        10015    10020       +5     
+ Misses       1030      796     -234
Impacted Files Coverage Δ
...s/opentelemetry-metrics/test/MeterProvider.test.ts 100% <ø> (ø) ⬆️
...elemetry-tracing/src/export/SimpleSpanProcessor.ts 100% <ø> (ø) ⬆️
...telemetry-tracing/src/export/BatchSpanProcessor.ts 100% <ø> (ø) ⬆️
...opentelemetry-node/test/NodeTracerProvider.test.ts 100% <ø> (ø)
...ackages/opentelemetry-api/src/metrics/NoopMeter.ts 72.09% <ø> (-9.04%) ⬇️
packages/opentelemetry-api/src/metrics/Metric.ts 100% <ø> (ø) ⬆️
...kages/opentelemetry-metrics/test/mocks/Exporter.ts 50% <ø> (-50%) ⬇️
...s/opentelemetry-web/test/WebTracerProvider.test.ts 0% <0%> (ø)
...ges/opentelemetry-exporter-jaeger/src/transform.ts 100% <100%> (ø) ⬆️
packages/opentelemetry-metrics/src/export/types.ts 100% <100%> (ø) ⬆️
... and 193 more

@Flarna
Copy link
Member Author

Flarna commented Feb 14, 2020

Result of benchmarks:

IDs as Uint8Array

Beginning NoopTracerProvider Benchmark...
  5 tests completed.

  #startSpan                     x 1,092,254,357 ops/sec ±1.06% (20 runs sampled)
  #startSpan:parent              x 1,098,253,393 ops/sec ±1.24% (20 runs sampled)
  #startSpan with attribute      x 1,088,682,868 ops/sec ±1.94% (20 runs sampled)
  #startSpan with 30 attributes  x    11,387,274 ops/sec ±1.49% (20 runs sampled)
  #startSpan with 100 attributes x     3,721,347 ops/sec ±0.20% (20 runs sampled)

Beginning BasicTracerProvider Benchmark...
  5 tests completed.

  #startSpan                     x 209,398 ops/sec ±1.49% (20 runs sampled)
  #startSpan:parent              x 115,787 ops/sec ±1.21% (20 runs sampled)
  #startSpan with attribute      x 207,648 ops/sec ±1.53% (20 runs sampled)
  #startSpan with 30 attributes  x  64,260 ops/sec ±0.43% (20 runs sampled)
  #startSpan with 100 attributes x   5,902 ops/sec ±1.33% (20 runs sampled)

Beginning BasicTracerProvider with SimpleSpanProcessor Benchmark...
  5 tests completed.

  #startSpan                     x 167,168 ops/sec ±3.13% (20 runs sampled)
  #startSpan:parent              x  97,077 ops/sec ±3.16% (20 runs sampled)
  #startSpan with attribute      x 143,967 ops/sec ±16.37% (20 runs sampled)
  #startSpan with 30 attributes  x  52,678 ops/sec ±1.60% (20 runs sampled)
  #startSpan with 100 attributes x   5,680 ops/sec ±1.82% (20 runs sampled)

Beginning BasicTracerProvider with BatchSpanProcessor Benchmark...
  5 tests completed.

  #startSpan                     x 164,040 ops/sec ±3.30% (20 runs sampled)
  #startSpan:parent              x  93,457 ops/sec ±1.19% (20 runs sampled)
  #startSpan with attribute      x 124,553 ops/sec ±15.51% (20 runs sampled)
  #startSpan with 30 attributes  x  49,418 ops/sec ±2.17% (20 runs sampled)
  #startSpan with 100 attributes x   5,572 ops/sec ±1.12% (20 runs sampled)


Beginning B3Format Benchmark...
  2 tests completed.

  #Inject  x 1,188,475 ops/sec ±0.51% (100 runs sampled)
  #Extract x 1,263,576 ops/sec ±2.83% (100 runs sampled)

Beginning HttpTraceContext Benchmark...
  2 tests completed.

  #Inject  x 1,448,021 ops/sec ±0.59% (100 runs sampled)
  #Extract x 1,064,664 ops/sec ±0.44% (100 runs sampled)

IDs as hex strings

Beginning NoopTracerProvider Benchmark...
  5 tests completed.

  #startSpan                     x 1,059,885,282 ops/sec ±2.87% (20 runs sampled)
  #startSpan:parent              x 1,040,236,747 ops/sec ±2.38% (20 runs sampled)
  #startSpan with attribute      x 1,054,605,044 ops/sec ±2.23% (20 runs sampled)
  #startSpan with 30 attributes  x    11,243,262 ops/sec ±1.37% (20 runs sampled)
  #startSpan with 100 attributes x     3,674,193 ops/sec ±0.79% (20 runs sampled)

Beginning BasicTracerProvider Benchmark...
  5 tests completed.

  #startSpan                     x 146,849 ops/sec ±1.65% (20 runs sampled)
  #startSpan:parent              x  97,562 ops/sec ±0.57% (20 runs sampled)
  #startSpan with attribute      x 148,188 ops/sec ±0.45% (20 runs sampled)
  #startSpan with 30 attributes  x  54,051 ops/sec ±0.30% (20 runs sampled)
  #startSpan with 100 attributes x   5,793 ops/sec ±1.01% (20 runs sampled)

Beginning BasicTracerProvider with SimpleSpanProcessor Benchmark...
  5 tests completed.

  #startSpan                     x 131,322 ops/sec ±3.49% (20 runs sampled)
  #startSpan:parent              x  87,738 ops/sec ±2.40% (20 runs sampled)
  #startSpan with attribute      x 135,653 ops/sec ±1.92% (20 runs sampled)
  #startSpan with 30 attributes  x  43,870 ops/sec ±14.04% (20 runs sampled)
  #startSpan with 100 attributes x   5,693 ops/sec ±1.21% (20 runs sampled)

Beginning BasicTracerProvider with BatchSpanProcessor Benchmark...
  5 tests completed.

  #startSpan                     x 132,999 ops/sec ±2.03% (20 runs sampled)
  #startSpan:parent              x  85,832 ops/sec ±2.31% (20 runs sampled)
  #startSpan with attribute      x 131,084 ops/sec ±2.65% (20 runs sampled)
  #startSpan with 30 attributes  x  47,688 ops/sec ±1.20% (20 runs sampled)
  #startSpan with 100 attributes x   5,545 ops/sec ±2.47% (20 runs sampled)


Beginning B3Format Benchmark...
  2 tests completed.

  #Inject  x 5,770,521 ops/sec ±0.33% (100 runs sampled)
  #Extract x 5,599,810 ops/sec ±0.48% (100 runs sampled)

Beginning HttpTraceContext Benchmark...
  2 tests completed.

  #Inject  x 15,769,315 ops/sec ±1.15% (100 runs sampled)
  #Extract x  3,545,804 ops/sec ±0.39% (100 runs sampled)

Don't know if this is enough for a conclusion. As expected span creation and binary propagation is faster but text propagation is slower. An end2end setup using text tagging and a few internal spans with e.g. jaeger exporter would be better.

@Flarna Flarna changed the title WIP feat: use Uint8Array for ids feat: use Uint8Array for ids Feb 14, 2020
@OlivierAlbertini
Copy link
Member

Thx @Flarna

@dyladan
Copy link
Member

dyladan commented Feb 14, 2020

Summary of benchmark:

  • Minor speed improvement in span creation
  • B3 Inject 5.7M/sec to 1B/sec (~600x)
  • B3 Extract 5.7M to 1.3M (~0.25x)
  • TC Inject 15.8M to 900K (~0.057x)
  • TC Extract 3.5M to 1M (~0.3x)

The speed improvement in span creation is fairly minor and seems fine, but the very interesting results are the propagators. The B3 inject has a seemingly impossible speedup, but everything else suffered quite a bit with the trace context injection showing the biggest drop.

@dyladan
Copy link
Member

dyladan commented Feb 14, 2020

All the conversions around Buffer/Uint8Array here https://github.com/open-telemetry/opentelemetry-js/pull/786/files#diff-c818867edf5b3960f94f5c579ff06043 are unfortunate as in most cases a Buffer can be used as a Uint8Array anyways. I wonder if we would be better adding platform-specific wrapper classes with the methods we use (toHex, fromHex, equality), so that the platform specific code could use whatever platform specific object (node: buffer, web: Uint8Array) makes the most sense for it.

@Flarna
Copy link
Member Author

Flarna commented Feb 15, 2020

I updated benchmark results above. I noticed that I forgot to adapt the benchmarks therefore measurements for propagators were garbage.
It still looks like the improvement is quite low. Most likely the main overhead of Otel is in other areas then string/buffer conversions.

@Flarna
Copy link
Member Author

Flarna commented Feb 17, 2020

I think we should be also careful in interpretation of the benchmarks as they show ops/sec but the real usecase don't do the ops independent. e.g. you will always have at least one created spans for inject and one more for extract.
Maybe they should display time/op instead which would allow to sum up.

Examples:
#startSpan:parent improved from 11.4us/op to 10.3us/op
#inject degraded from 0.063us/op to 0.69us/op
#extract degraded from 0.28us/op to 0.94us/op

summing this up (2 spans + inject + extract): 23,14us => 22.23us

But I'm still not sure if the added complexity is it worth to do this.

@obecny
Copy link
Member

obecny commented Feb 27, 2020

Guys with regards to your test I completely agree with comparing the total sum of cpu usage, taking into consideration the creation of span and then exporting that - in the most typical scenario. As it will be hard to understand which of this operation is "more valuable" / "more used" and which of this operation will have bigger impact of the total cpu usage.

@obecny
Copy link
Member

obecny commented Feb 27, 2020

are those tests run only for node or also for different browsers ?

@dyladan
Copy link
Member

dyladan commented Feb 27, 2020

I agree using total timings makes sense.

operation speed improvement (µs)
span creation (root or non-injected outgoing) 1.1
incoming request (span + extract) 0.44
outgoing request (span + inject) 0.473
scenario speed improvement (µs)
1 incoming + 1 outgoing 0.913
1 incoming + 10 outgoing 5.17
1 incoming + 1 db (no inject) 1.54
1 incoming + 10 db 11.44

Overall this seems like a micro-optimization that may not be worth the effort of maintaining a more complex implementation. It may also end up being slower in cases where ids are logged many times. It is easy to envision a use-case where span context is logged on every log message. In this case you must pay the encoding cost each time you log (ignoring the possibility of saving and re-using the encoded version, which is yet more complexity).

@obecny
Copy link
Member

obecny commented Feb 27, 2020

what if we add some caching when converting ? so each id will be converted once only no matter how many times you will convert it ?

@dyladan
Copy link
Member

dyladan commented Feb 27, 2020

I think conversion caching would add additional complexity that we may not want to maintain for such a small performance gain.

Maybe @Flarna can provide some stats as to the mean and median numbers of outgoing span operations for an incoming request?

@Flarna
Copy link
Member Author

Flarna commented Feb 28, 2020

are those tests run only for node or also for different browsers ?

I used only Node.js (I think one of the latest 12.x versions on windows). No idea how to run OTel and it's benchmarks in a browser...
But it would be interesting to get some numbers as at least some conversion functions have different implementations in browser and node.

It may also end up being slower in cases where ids are logged many times. It is easy to envision a use-case where span context is logged on every log message.

Well if people start logging span context that often the conversion cost is most likely the wrong area to start for improving...

what if we add some caching when converting ? so each id will be converted once only no matter how many times you will convert it ?

I thought about this also. But this would require to encapsulate this into some class with getters/setters which adds overhead again. And it's no longer that easy to just create a SpanContext from literals. see also #698 (comment))

Maybe @Flarna can provide some stats as to the mean and median numbers of outgoing span operations for an incoming request?

Sorry, no numbers regarding this. Incoming and outgoing spans have usually several attributes on it (e.g. HTTP semantic conventions,...) therefore the relative gain for them will be small.
Most relative gain will be on internal spans with no attributes but I would not assume that they are created that frequently.


In general I was expecting more gain from this change. I have to admit that using Uint8Array instead string is more error prone. I have not yet closed this PR to give some more time to discuss on it and let people try some other usecases to get more insight on impact.
I think we should decide on this before reaching Beta as it is a non compatible API change.

@Flarna
Copy link
Member Author

Flarna commented Feb 28, 2020

I had a chat with @dyladan today and we found that a significant part of the improvents seen here (at least for nodejs) are just because randomTraceId() has been changed from two calls to randomBytes to one call requesting 16 bytes.
Or in other words replacing randomSpanId() + randomSpanId() by crypto.randomBytes(TRACE_ID_BYTES).toString('hex') is most likely the better way to go.

Seems that at least V8 is significant better in working with string compared to Buffer/Uint8Array. I think I add a similar experience a while ago in a different context. I assume the reason is that string is much more used and present since a longer time so much work work in optimization has been done for them.

@dyladan
Copy link
Member

dyladan commented Feb 28, 2020

Closing in favor of #824

@dyladan dyladan closed this Feb 28, 2020
@Flarna Flarna deleted the ids-as-uint8array branch February 28, 2020 21:36
pichlermarc pushed a commit to dynatrace-oss-contrib/opentelemetry-js that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use buffer instead string for traceId and spanId
5 participants