Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join Docker log lines when they are split because of size #6967

Merged
merged 1 commit into from
May 8, 2018

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Apr 27, 2018

Docker json-file driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
combine_partials flag.

Fixes #6605

return message, err
}
// Handle multiline messages, join lines that don't end with \n
for message.Content[len(message.Content)-1] != byte('\n') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we enable this first through a config option and have it as default later on? This would allow us to start playing around with it.

@exekias I'm glad we have the docker prospector ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, I'll add it

@exekias
Copy link
Contributor Author

exekias commented Apr 27, 2018

@ruflin I added the partial setting, after some time we should switch it to true by default, as this should be its default behavior

===== `containers.partial`

Enable partial messages joining. Docker splits log lines larger than 16k bytes, enable this
to join them while reading (default: false).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to specify that it works by joining lines that don't end with \n.

Copy link
Contributor Author

@exekias exekias Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm that's something specific to the json-file format, it's not the user who adds the \n. Do you think it's worth it? If yes, I would explain the whole story about how the format works

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this whole input was specific to json-file log driver?

I do think it's useful for the user to understand what's happening if for nothing else than to avoid questions on the subject. I would add the info it's simple to explain, but if the explanation is complicated I wouldn't put it in the docs because it could confuse users.

@ruflin
Copy link
Contributor

ruflin commented Apr 30, 2018

Code LGTM. For the config I was thinking to put it outside container and directly on the prospector level. Otherwise it gives the impression it's a container specific setting but I don't think it's really related to containers itself.

Naming wise I was thinking if we could find a name that is more directed into what it does instead of what problem it solves. So instead of partial more like combine_partial_messages which is too long but I hope describes what i mean. Any suggestions?

@exekias
Copy link
Contributor Author

exekias commented Apr 30, 2018

👍 How about combine_partials? I'm not too worried as this will be a rather exotic flag (once we set it to true by default)

@ruflin
Copy link
Contributor

ruflin commented Apr 30, 2018

👍 on combine_partials. I just realised the the docker input is still in experimental, so we could already turn it on by default and see what happens. Still good to have a config option so in case things go wrong, we can turn it off.

@exekias
Copy link
Contributor Author

exekias commented May 7, 2018

I've renamed the flag to combine_partials, set it to true by default and updated docs

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you add the config entry to the reference file?

@ruflin
Copy link
Contributor

ruflin commented May 8, 2018

@exekias Could you rebase and squash your PR when updating the config file and update the commit and PR message?

@exekias exekias force-pushed the docker-multiline branch from b96357f to cfc8798 Compare May 8, 2018 10:04
Docker `json-file` driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
`combine_partials` flag.

Fixes elastic#6605
@exekias
Copy link
Contributor Author

exekias commented May 8, 2018

done!

@exekias exekias force-pushed the docker-multiline branch from cfc8798 to 6909ce1 Compare May 8, 2018 10:11
@kvch kvch merged commit 1789ef9 into elastic:master May 8, 2018
@bitsofinfo
Copy link

nice!

stevea78 pushed a commit to stevea78/beats that referenced this pull request May 20, 2018
Docker `json-file` driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
`combine_partials` flag.

Fixes elastic#6605
stevea78 pushed a commit to stevea78/beats that referenced this pull request May 20, 2018
Docker `json-file` driver splits lines longer than 16k bytes, this
change adapts the code to detect that situation and join them again to
output a single event.

This behavior is enabled by default, it can be disabled by using the new
`combine_partials` flag.

Fixes elastic#6605
@SharpEdgeMarshall
Copy link

When should be available? version 6.4?

@exekias
Copy link
Contributor Author

exekias commented May 29, 2018

That's correct, it should be available by 6.4

@towolf
Copy link

towolf commented Jun 13, 2018

@exekias This was not released with 6.3.0 today right?

@bitsofinfo
Copy link

?! hope it was.... really need this

@bitsofinfo
Copy link

Whats the eta on 6.4?

@exekias
Copy link
Contributor Author

exekias commented Jun 14, 2018

I'm afraid this didn't make it to 6.3.0, it should be available with 6.4.0. We can consider a backport to 6.3.1if we consider this a bugfix, any opinion here @ruflin ?

@ruflin
Copy link
Contributor

ruflin commented Jun 15, 2018

As Carlos mentioned we normally don't backport any features and I don't think we can qualify it as a bug. The biggest worry with backporting is that it could break something that works now which can always happen. I personally would prefer to not backport but make sure it is as stable as possible in 6.4.

@towolf
Copy link

towolf commented Jun 15, 2018

So the cadence is roughly quarterly and we can expect 6.4 in 3 months or so?

@ruflin
Copy link
Contributor

ruflin commented Jun 15, 2018

We don't give an estimates on the release dates ;-) If you want to test / use the feature we have some snapshots available: https://beats-package-snapshots.s3.amazonaws.com/index.html?prefix=filebeat/

@bitsofinfo
Copy link

so 3 more months?

@buddyledungarees
Copy link

Is it possible to use existing multiline match parameters to replicate this functionality?

multiline.pattern: '\n$'
multiline.negate: true
multiline.match: after

@wallies
Copy link

wallies commented Jul 6, 2018

We are running a fork of this fix, but in the for loop, we are noticing that it doesnt catch EOF of files and will sometimes exit half way through processing a long line. This means that the currently-processed partial log line will not have been completely processed. Later, when the next part of the now-half-processed log line arrives in the tailed log file, filebeat will fail to parse it. . Specifically, the following check will fail:

if strings.HasPrefix(string(message.Content), "{") {

Below is the parse error we see

ERROR`	log/harvester.go:243	Read line error: parsing CRI timestamp: parsing time ",\\\"protocol\\\":\\\" as "2006-01-02T15:04:05Z07:00": cannot parse 

We have added a retry if io.EOF is reached for now, but im wondering whether the backoff code, that ive seen in other parts of the code, is getting into this bit of the code, or whether it should.

@exekias
Copy link
Contributor Author

exekias commented Jul 6, 2018

Hi @wallies, thank you for your feedback, and thank you for testing unreleased features! Could you please open a new issue for this one (or pull request 😉)? happy to discuss it there and try to fix it for 6.4

@towolf
Copy link

towolf commented Aug 23, 2018

@ruflin it's been 32 months now. Can't you just push this out?

@ruflin
Copy link
Contributor

ruflin commented Aug 23, 2018

@towolf It will be released with the rest of the stack in 6.4 hopefully pretty soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants