Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InstrumentedStreams for input & output streams #1314

Merged
merged 9 commits into from
Jan 13, 2022

Conversation

schlosna
Copy link
Contributor

Before this PR

Tracking InputStream and OutputStream progress and throughput required rolling your own stream wrappers.

After this PR

==COMMIT_MSG==
InstrumentedStreams for input & output streams

Track bytes read/written via meter and throughput histogram
==COMMIT_MSG==

Possible downsides?

Track bytes read/written via meter and throughput histogram
@changelog-app
Copy link

changelog-app bot commented Jan 12, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

InstrumentedStreams for input & output streams

Track bytes read/written via meter and throughput histogram

Check the box to generate changelog(s)

  • Generate changelog entry

@schlosna
Copy link
Contributor Author

Need baseline bump #1313 to merge first to fix FilterOutputStreamSlowMultibyteWrite (see palantir/gradle-baseline#2031)

import java.io.OutputStream;
import java.util.Objects;

abstract class ForwardingOutputStream extends FilterOutputStream {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left these Forwarding*Stream as package private for now. If one really wants, Apache commons-io's Proxy*Stream provides similar framework

@schlosna schlosna added the update me Keep PR updated with any merged changes label Jan 12, 2022
@schlosna schlosna marked this pull request as ready for review January 12, 2022 03:22
@policy-bot policy-bot bot requested a review from tetigi January 12, 2022 03:22
@schlosna schlosna requested review from carterkozak and removed request for tetigi January 12, 2022 03:22
final class InstrumentedInputStream extends ForwardingInputStream {
private final Meter bytes;
private final Histogram throughput;
private long start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrent stream access never works great, but we may want to move start to a parameter instead rather than object field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, are you thinking passing startNanos as another arg to after?

From API consumer perspective, I need to JavaDoc these to make it clear on the args as there will be both long

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep the throughput histograms, I think that would be cleaner. I commented about some questionable aspects of the throughput histograms given we don't know which operations actually flush and incur cost, or how much data is included within those operations. Perhaps if we do keep the throughput histograms, we should instead accumulate total bytes written for the lifespan of the stream, and sum all the time spent in read/write/flush/close.

protected void after(long bytesWritten) {
double elapsedSeconds = (System.nanoTime() - start) / 1_000_000_000.0;
long bytesPerSecond = Math.round(bytesWritten / elapsedSeconds);
throughput.update(bytesPerSecond);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I'd always trust the throughput value because all the work may occur in flush() and close(), where write methods largely push data into a buffer.

Given the variance in histogram values, we may be better off only using the meter (which is limited to the reporting interval). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm leaning toward removing the throughput histogram. There might be some value in tracking a histogram of write sizes to identify small reads/writes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we can always add that sort of thing later on if we need it

* @param throughput bytes read per second
* @return instrumented input stream
*/
public static InputStream input(InputStream in, Meter bytes, Histogram throughput) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on taking a TaggedMetricRegistry + name, and using metric-schema to define a standard structure? That way we can define reusable dashboards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will do when I have some cycles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interested in thoughts for the tagging structure. Right now I have a single type tag that one sets to distinguish streams. We will need to be cautious with tag cardinality, though we can enforce compile time tagging.

Example from https://17129-66020851-gh.circle-artifacts.com/0/~/artifacts/junit/tritium-lib/test/classes/com.palantir.tritium.io.InstrumentedStreamsTest.html

io.stream.read:{libraryName=tritium, libraryVersion=unknown, type=test-in}
             count = 2147483648
         mean rate = 146651358.56 events/second
     1-minute rate = 155940259.30 events/second
     5-minute rate = 156866618.01 events/second
    15-minute rate = 157027104.69 events/second
io.stream.write:{libraryName=tritium, libraryVersion=unknown, type=gzip-out}
             count = 2147483648
         mean rate = 146654007.80 events/second
     1-minute rate = 155943552.02 events/second
     5-minute rate = 156869248.25 events/second
    15-minute rate = 157029620.15 events/second
io.stream.write:{libraryName=tritium, libraryVersion=unknown, type=raw-out}
             count = 10409991
         mean rate = 710861.07 events/second
     1-minute rate = 752956.85 events/second
     5-minute rate = 757115.18 events/second
    15-minute rate = 757835.58 events/second

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I like it. I'm not sure if it's worthwhile to limit values to compile-time constants because that prevents the toll from being used within another library, even when the cardinality is known to be low.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier to open that up later than to ratchet it down, happy to merge with that constraint.

@schlosna schlosna requested a review from carterkozak January 13, 2022 20:53
@bulldozer-bot bulldozer-bot bot merged commit d9bcd62 into develop Jan 13, 2022
@bulldozer-bot bulldozer-bot bot deleted the ds/instrumented-streams branch January 13, 2022 21:12
@svc-autorelease
Copy link
Collaborator

Released 0.37.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autorelease merge when ready update me Keep PR updated with any merged changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants