Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Size() method to pdata Structures #5920

Closed
cpheps opened this issue Aug 16, 2022 · 7 comments
Closed

Add Size() method to pdata Structures #5920

cpheps opened this issue Aug 16, 2022 · 7 comments

Comments

@cpheps
Copy link
Contributor

cpheps commented Aug 16, 2022

Currently all the pdata structures have a way to count the number of objects in them:

  • plogs have LogRecordCount()
  • pmetrics have MetricCount() and DataPointCount()
  • ptraces have SpanCount()

It would be useful in some cases to have access to the size of the pdata structure. The primary use case would be so components could measure their data throughput and record it via custom telemetry metrics. I have done some benchmarks and this operation can be expensive at scale. Due to this I would not want to include the data throughput metrics in the current set of built in component metrics.

The implementation is very straight forward for all pdata types, here is an example using metrics:

func (md Metrics) Size() int {
	return md.orig.Size()
}

This will give the protobuf size of the object, counting all the sub objects. I would want to add a warning to the function comment indicating possible performance impact at scale.

@bogdandrutu
Copy link
Member

This operation is pretty expensive, and we try to discourage the use of this in metrics, since the amount of CPU needed to calculate this is way too much compare with the usefulness.

This may change when we change to lazy unmarshalling, but I would suggest to retain that until then. As an alternative for the moment you still have Sizer as an interface that the encoders may implement (the protobuf does that), see https://github.com/open-telemetry/opentelemetry-collector/blob/main/pdata/ptrace/encoding.go#L33 and https://github.com/open-telemetry/opentelemetry-collector/blob/main/pdata/ptrace/pb.go#L23

@cpheps
Copy link
Contributor Author

cpheps commented Aug 17, 2022

@bogdandrutu I wasn't aware of the Sizer interface. I think that more or less is the same thing I'm proposing. I will look at that. Closing this issue.

@cpheps cpheps closed this as completed Aug 17, 2022
@cpheps
Copy link
Contributor Author

cpheps commented Aug 17, 2022

@bogdandrutu One note about this it's not very friendly to interact with. Seems like the only way to get a Sizer is to cast a Marshaler as a Sizer. Shown here in the batch processor. To someone importing the collector as a dependency it isn't obvious this is a safe operation without digging into the code. Seems like there should be a function analogous to NewProtoMarshaler and NewProtoUnmarshaler called NewProtoSizer that instantiates something that directly implements the Sizer interface.

@cpheps cpheps reopened this Aug 17, 2022
@bogdandrutu
Copy link
Member

Someone closed to soon this #3842

Happy to review a proposed API (short PR just showing the public things initially to discuss).

@bogdandrutu
Copy link
Member

Where do you intend to use this? As mentioned usually is expensive operation and value added is not worth it (in my opinion)

@cpheps
Copy link
Contributor Author

cpheps commented Aug 18, 2022

@bogdandrutu We plan to use it to observe how data flow changes over a pipeline. While the operation is expensive we plan to mitigate that by doing sampling and short-duration observations.

@bogdandrutu
Copy link
Member

fyi: @cpheps we have a metric at grpc/http level for bytes received/exported that may help, that is cheap to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants