Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JEP-210] Log handling rewrite #17

Merged
merged 43 commits into from
Oct 4, 2018
Merged

Conversation

jglick
Copy link
Member

@jglick jglick commented Sep 23, 2016

JEP-210

  • way to get a log

@reviewbybees esp. @oleg-nenashev

jglick added a commit to jglick/workflow-support-plugin that referenced this pull request Dec 7, 2016
jglick added a commit to jglick/workflow-job-plugin that referenced this pull request Dec 7, 2016
* @param start the start position to begin reading from (normally 0)
* @throws EOFException if the start position is larger than the log size (or you may simply return EOF immediately when read)
*/
public @Nonnull InputStream getLog(long start) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Beta ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ought to be.

Otherwise when resuming a build you can get:
… org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1 run
WARNING: null
java.io.IOException: java.io.IOException: not implemented
	at org.jenkinsci.plugins.workflow.log.BrokenLogStorage.overallListener(BrokenLogStorage.java:52)
	at org.jenkinsci.plugins.workflow.flow.FlowExecutionOwner.getListener(FlowExecutionOwner.java:130)
	at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:96)
	at …
Caused by: java.io.IOException: not implemented
	at org.jenkinsci.plugins.workflow.flow.FlowExecutionOwner$DummyOwner.getExecutable(FlowExecutionOwner.java:166)
	at io.jenkins.plugins.pipeline_log_fluentd_cloudwatch.PipelineBridge.forBuild(PipelineBridge.java:55)
	at org.jenkinsci.plugins.workflow.log.LogStorage.of(LogStorage.java:115)
	... 10 more
…omplains about, causing bad display of logs from before upgrade.
lastLine = line;
}
}
if (lastLine != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the content did not finish writing and the "lastId" has been truncated? (i.e. the line has only partially written)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the line → step mapping for this build will not be perfect. For example, in classic UI it would mean that some text would be incorrectly not associated with a step block but just shown as part of general output. Too minor to worry about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a bug to me, but I'll let you fix it when the first user reports it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If log-index has been corrupted for some reason, then by definition some aspects of the line → step mapping will be incorrect. The only question is how much of an effort we make to reconstruct some of it.

// TODO can probably be done a bit more efficiently with FileChannel methods
byte[] data = new byte[(int) (lastTransition - pos)];
raf.readFully(data);
buf.write(data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐜 Could we not write this a chunk at a time or stream it? This is going to generate a lot of GC pressure due to excess array allocation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of beside the point when the whole buf is memory-allocated. Which is hard to get around in current core versions since the only [Annotated]LargeText constructors take a file (useless) or a memory buffer. To write this properly, we would likely need for the Source and Session interfaces to be made public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question after digging in a bit more: no, because our ByteBuffer cannot be pre-sized for our content since its ensureCapacity method is private.

For those who can't follow the code: essentially what we're doing it allocating an array containing all the log content and then copying it into the Jenkins-internal ByteBuffer implement (org.kohsuke.stapler.framework.io.ByteBuffer) which will dynamically resize to fit the content. Essentially we're double-allocating.

However the ByteBuffer will resize to either double or fit content (whichever is larger). Which means that if we copied a chunk at a time, we'd likely end up triggering multiple backing-array resize operations for the ByteBuffer.

Kind of a wash in that case, without a way to more effectively size the buffer, which would require upstream changes to Stapler (which can be done later).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the FileChannel comment -- which is another clever option -- won't work because our ByteBuffer (drumroll please) does not extend off of java.nio.ByteBuffer... instead it extends OutputStream 😆

So... I smell some room for core improvements in LargeText/AnnotatedLargeText that reduce our tendency to throw around (needless) byte array allocations. Might help with some of the humongous object allocations I've seen spiking our GC logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I smell some room for core improvements in LargeText/AnnotatedLargeText

Optimizing buffer allocations would be nice, but the critical improvement needed in these APIs is the ability to use an opaque, implementation-dependent cursor rather than a long, which is fine for the built-in storage but terrible for e.g. CloudWatch Logs. That would require changes in both the classic UI and Blue Ocean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough -- nothing to do here at the moment but I'd like to leave the conversation marked unresolved to nudge us about future enhancements to do.

BufferedReader indexBR = index.isFile() ? Files.newBufferedReader(index.toPath(), StandardCharsets.UTF_8) : new BufferedReader(new NullReader(0))) {
String line;
long pos = -1; // -1 if not currently in this node, start position if we are
while ((line = indexBR.readLine()) != null) {
Copy link
Member

@svanoort svanoort Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those keeping track at home: any time we try to read a single node's logs we trigger a full read of the entire index file, and as we scan we look for lines with our FlowNode ID. Which thankfully is smaller than the entire logfile by probably orders of magnitude.

But still this feels both overly complicated and inefficient - with O(m) performance to find all a FlowNode's logs, where m is the size of the index. Is there a reason we can't simply store when FlowNode is "done" for logging purposes? We have that logic handy already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store when FlowNode is "done"

Not sure what you are proposing exactly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jglick We know how to compute when a FlowNode is "closed" for logging, depending on if it's an AtomNode, StartNode, or EndNode. This limits the scope we'd need to scan.

Now, if we specially marked the offset for the first and last log content for a FlowNode, and stored it specially (say as fields in our LogAction implementation, or showing boundaries a bit more clearly in the index) we could further limit the amount of scanning done.

However... eh, this can be optimized later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stored it specially (say as fields in our LogAction implementation

I preferred to avoid duplicating information like this. In other words, log + log-index should be self-contained.

Now another line type (not a transition) could be added to log-index indicating that a given FlowNode has now been closed and that thus there is no need to search past that point in the file looking for subsequent blocks. The extra complexity just to avoid reading the rest of what would usually be a short text file (smaller than a typical “disk” block size I think) does not seem worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave this comment unresolved so we know in the future where to start if we need to optimize.

Copy link
Member

@svanoort svanoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a -0 review: needs some explanatory comments or this won't maintainable - and lodged a couple questions that I'd like to see an answer to before giving this a firm +1 / -1 (they may be buglets, but potentially not worth handling).


/**
* Simple implementation of log storage in a single file that maintains a side file with an index indicating where node transitions occur.
* Each line in the index file is a byte offset, optionally followed by a space and then a node ID.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I had considered use of Data{Input,Output}Stream whereby the format would be a sequence of 64-bit offsets followed by node ID (as null-terminated UTF-8, or maybe even finally declare that these must be ints). Would make the log-index file more compact and quicker to parse, at the expense of it being harder to debug problems. I do not have strong feelings either way.

Copy link
Member

@svanoort svanoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving -- with caveats that we'll probably have to do some quick-fixes as we discover totally unexpected failure modes with the way we're storing and indexing logs. That's kind of a given for a change this big though, you just grin and bear it.

@jglick jglick merged commit 6f39eb9 into jenkinsci:master Oct 4, 2018
@jglick jglick deleted the logs-JENKINS-38381 branch October 4, 2018 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants