-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InstrumentedStreams for input & output streams #1314
Conversation
Track bytes read/written via meter and throughput histogram
Generate changelog in
|
Need baseline bump #1313 to merge first to fix |
import java.io.OutputStream; | ||
import java.util.Objects; | ||
|
||
abstract class ForwardingOutputStream extends FilterOutputStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left these Forwarding*Stream
as package private for now. If one really wants, Apache commons-io's Proxy*Stream
provides similar framework
final class InstrumentedInputStream extends ForwardingInputStream { | ||
private final Meter bytes; | ||
private final Histogram throughput; | ||
private long start; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concurrent stream access never works great, but we may want to move start
to a parameter instead rather than object field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, are you thinking passing startNanos
as another arg to after
?
From API consumer perspective, I need to JavaDoc these to make it clear on the args as there will be both long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the throughput histograms, I think that would be cleaner. I commented about some questionable aspects of the throughput histograms given we don't know which operations actually flush and incur cost, or how much data is included within those operations. Perhaps if we do keep the throughput histograms, we should instead accumulate total bytes written for the lifespan of the stream, and sum all the time spent in read/write/flush/close.
protected void after(long bytesWritten) { | ||
double elapsedSeconds = (System.nanoTime() - start) / 1_000_000_000.0; | ||
long bytesPerSecond = Math.round(bytesWritten / elapsedSeconds); | ||
throughput.update(bytesPerSecond); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I'd always trust the throughput value because all the work may occur in flush()
and close()
, where write
methods largely push data into a buffer.
Given the variance in histogram values, we may be better off only using the meter (which is limited to the reporting interval). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm leaning toward removing the throughput histogram. There might be some value in tracking a histogram of write sizes to identify small reads/writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we can always add that sort of thing later on if we need it
* @param throughput bytes read per second | ||
* @return instrumented input stream | ||
*/ | ||
public static InputStream input(InputStream in, Meter bytes, Histogram throughput) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on taking a TaggedMetricRegistry
+ name, and using metric-schema to define a standard structure? That way we can define reusable dashboards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, will do when I have some cycles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interested in thoughts for the tagging structure. Right now I have a single type
tag that one sets to distinguish streams. We will need to be cautious with tag cardinality, though we can enforce compile time tagging.
io.stream.read:{libraryName=tritium, libraryVersion=unknown, type=test-in}
count = 2147483648
mean rate = 146651358.56 events/second
1-minute rate = 155940259.30 events/second
5-minute rate = 156866618.01 events/second
15-minute rate = 157027104.69 events/second
io.stream.write:{libraryName=tritium, libraryVersion=unknown, type=gzip-out}
count = 2147483648
mean rate = 146654007.80 events/second
1-minute rate = 155943552.02 events/second
5-minute rate = 156869248.25 events/second
15-minute rate = 157029620.15 events/second
io.stream.write:{libraryName=tritium, libraryVersion=unknown, type=raw-out}
count = 10409991
mean rate = 710861.07 events/second
1-minute rate = 752956.85 events/second
5-minute rate = 757115.18 events/second
15-minute rate = 757835.58 events/second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like it. I'm not sure if it's worthwhile to limit values to compile-time constants because that prevents the toll from being used within another library, even when the cardinality is known to be low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier to open that up later than to ratchet it down, happy to merge with that constraint.
Released 0.37.0 |
Before this PR
Tracking
InputStream
andOutputStream
progress and throughput required rolling your own stream wrappers.After this PR
==COMMIT_MSG==
InstrumentedStreams for input & output streams
Track bytes read/written via meter and throughput histogram
==COMMIT_MSG==
Possible downsides?