-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JEP-210] Log handling rewrite #17
Conversation
Using jitpack for jenkinsci/workflow-api-plugin#17, and picking up jenkinsci/maven-hpi-plugin#46.
Also using jitpack and linking to jenkinsci/maven-hpi-plugin#46 & jenkinsci/workflow-api-plugin#17 & jenkinsci/workflow-support-plugin#15.
* @param start the start position to begin reading from (normally 0) | ||
* @throws EOFException if the start position is larger than the log size (or you may simply return EOF immediately when read) | ||
*/ | ||
public @Nonnull InputStream getLog(long start) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Beta
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ought to be.
Otherwise when resuming a build you can get: … org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1 run WARNING: null java.io.IOException: java.io.IOException: not implemented at org.jenkinsci.plugins.workflow.log.BrokenLogStorage.overallListener(BrokenLogStorage.java:52) at org.jenkinsci.plugins.workflow.flow.FlowExecutionOwner.getListener(FlowExecutionOwner.java:130) at org.jenkinsci.plugins.workflow.support.pickles.TryRepeatedly$1.run(TryRepeatedly.java:96) at … Caused by: java.io.IOException: not implemented at org.jenkinsci.plugins.workflow.flow.FlowExecutionOwner$DummyOwner.getExecutable(FlowExecutionOwner.java:166) at io.jenkins.plugins.pipeline_log_fluentd_cloudwatch.PipelineBridge.forBuild(PipelineBridge.java:55) at org.jenkinsci.plugins.workflow.log.LogStorage.of(LogStorage.java:115) ... 10 more
…omplains about, causing bad display of logs from before upgrade.
lastLine = line; | ||
} | ||
} | ||
if (lastLine != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the content did not finish writing and the "lastId" has been truncated? (i.e. the line has only partially written)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the line → step mapping for this build will not be perfect. For example, in classic UI it would mean that some text would be incorrectly not associated with a step block but just shown as part of general output. Too minor to worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a bug to me, but I'll let you fix it when the first user reports it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If log-index
has been corrupted for some reason, then by definition some aspects of the line → step mapping will be incorrect. The only question is how much of an effort we make to reconstruct some of it.
// TODO can probably be done a bit more efficiently with FileChannel methods | ||
byte[] data = new byte[(int) (lastTransition - pos)]; | ||
raf.readFully(data); | ||
buf.write(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐜 Could we not write this a chunk at a time or stream it? This is going to generate a lot of GC pressure due to excess array allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of beside the point when the whole buf
is memory-allocated. Which is hard to get around in current core versions since the only [Annotated]LargeText
constructors take a file (useless) or a memory buffer. To write this properly, we would likely need for the Source
and Session
interfaces to be made public
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answering my own question after digging in a bit more: no, because our ByteBuffer
cannot be pre-sized for our content since its ensureCapacity
method is private.
For those who can't follow the code: essentially what we're doing it allocating an array containing all the log content and then copying it into the Jenkins-internal ByteBuffer
implement (org.kohsuke.stapler.framework.io.ByteBuffer
) which will dynamically resize to fit the content. Essentially we're double-allocating.
However the ByteBuffer will resize to either double or fit content (whichever is larger). Which means that if we copied a chunk at a time, we'd likely end up triggering multiple backing-array resize operations for the ByteBuffer.
Kind of a wash in that case, without a way to more effectively size the buffer, which would require upstream changes to Stapler (which can be done later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the FileChannel
comment -- which is another clever option -- won't work because our ByteBuffer
(drumroll please) does not extend off of java.nio.ByteBuffer
... instead it extends OutputStream
😆
So... I smell some room for core improvements in LargeText/AnnotatedLargeText that reduce our tendency to throw around (needless) byte array allocations. Might help with some of the humongous object allocations I've seen spiking our GC logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I smell some room for core improvements in
LargeText
/AnnotatedLargeText
Optimizing buffer allocations would be nice, but the critical improvement needed in these APIs is the ability to use an opaque, implementation-dependent cursor rather than a long
, which is fine for the built-in storage but terrible for e.g. CloudWatch Logs. That would require changes in both the classic UI and Blue Ocean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough -- nothing to do here at the moment but I'd like to leave the conversation marked unresolved to nudge us about future enhancements to do.
src/main/java/org/jenkinsci/plugins/workflow/log/FileLogStorage.java
Outdated
Show resolved
Hide resolved
BufferedReader indexBR = index.isFile() ? Files.newBufferedReader(index.toPath(), StandardCharsets.UTF_8) : new BufferedReader(new NullReader(0))) { | ||
String line; | ||
long pos = -1; // -1 if not currently in this node, start position if we are | ||
while ((line = indexBR.readLine()) != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those keeping track at home: any time we try to read a single node's logs we trigger a full read of the entire index file, and as we scan we look for lines with our FlowNode ID. Which thankfully is smaller than the entire logfile by probably orders of magnitude.
But still this feels both overly complicated and inefficient - with O(m)
performance to find all a FlowNode's logs, where m
is the size of the index. Is there a reason we can't simply store when FlowNode is "done" for logging purposes? We have that logic handy already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store when FlowNode is "done"
Not sure what you are proposing exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jglick We know how to compute when a FlowNode is "closed" for logging, depending on if it's an AtomNode, StartNode, or EndNode. This limits the scope we'd need to scan.
Now, if we specially marked the offset for the first and last log content for a FlowNode, and stored it specially (say as fields in our LogAction implementation, or showing boundaries a bit more clearly in the index) we could further limit the amount of scanning done.
However... eh, this can be optimized later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stored it specially (say as fields in our
LogAction
implementation
I preferred to avoid duplicating information like this. In other words, log
+ log-index
should be self-contained.
Now another line type (not a transition) could be added to log-index
indicating that a given FlowNode
has now been closed and that thus there is no need to search past that point in the file looking for subsequent blocks. The extra complexity just to avoid reading the rest of what would usually be a short text file (smaller than a typical “disk” block size I think) does not seem worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave this comment unresolved so we know in the future where to start if we need to optimize.
src/main/java/org/jenkinsci/plugins/workflow/log/FileLogStorage.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jenkinsci/plugins/workflow/log/FileLogStorage.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall a -0 review: needs some explanatory comments or this won't maintainable - and lodged a couple questions that I'd like to see an answer to before giving this a firm +1 / -1 (they may be buglets, but potentially not worth handling).
|
||
/** | ||
* Simple implementation of log storage in a single file that maintains a side file with an index indicating where node transitions occur. | ||
* Each line in the index file is a byte offset, optionally followed by a space and then a node ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I had considered use of Data{Input,Output}Stream
whereby the format would be a sequence of 64-bit offsets followed by node ID (as null-terminated UTF-8, or maybe even finally declare that these must be int
s). Would make the log-index
file more compact and quicker to parse, at the expense of it being harder to debug problems. I do not have strong feelings either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving -- with caveats that we'll probably have to do some quick-fixes as we discover totally unexpected failure modes with the way we're storing and indexing logs. That's kind of a given for a change this big though, you just grin and bear it.
JEP-210
@reviewbybees esp. @oleg-nenashev