Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser_json: Add stream_buffer_size config param #2381

Merged
merged 1 commit into from
Apr 25, 2019

Conversation

benwh
Copy link
Contributor

@benwh benwh commented Apr 14, 2019

Which issue(s) this PR fixes:
Fixes #609

What this PR does / why we need it:
Allow configuration of the size of the buffer that Yajl uses when parsing streaming input.

The advantage of this is that when using out_exec_filter, and parsing
as JSON, it's now possible to configure this plugin to avoid having to
wait for 8192 bytes of data to be parsed before events are emitted.

Docs Changes: fluent/fluentd-docs#631

Release Note:
Use PR title.

benwh added a commit to benwh/fluentd-docs that referenced this pull request Apr 14, 2019
See fluent/fluentd#2381

In addition to documenting the parameter, update the `in_exec` and
`out_exec_filter` articles to provide a workaround for users
encountering this problem.
This note exists in the v0.12 version of the `in_exec` page but was
absent in the v1.0 docs, so add the whole note.

Closes fluent#624

Signed-off-by: Ben Wheatley <contact@benwh.com>
benwh added a commit to benwh/fluentd-docs that referenced this pull request Apr 14, 2019
See fluent/fluentd#2381

In addition to documenting the parameter, update the `in_exec` and
`out_exec_filter` articles to provide a workaround for users
encountering this problem.
This note exists in the v0.12 version of the `in_exec` page but was
absent in the v1.0 docs, so add the whole note.

Closes fluent#624

Signed-off-by: Ben Wheatley <contact@benwh.com>
@@ -95,6 +95,7 @@ class ExecFilterOutput < Output
COMPAT_PARSE_PARAMS = {
'out_format' => '@type',
'out_keys' => 'keys',
'out_stream_buffer_size' => 'stream_buffer_size',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason that this was added was to shave another 30s off the tests

@@ -111,4 +111,29 @@ def test_parse_with_keep_time_key_without_time_format(data)
assert_equal text, record['time']
end
end

data('yajl' => 'yajl')
def test_yajl_parse_io_with_buffer_smaller_than_input(data)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering adding a test case for the inverse of this (input smaller than buffer) but didn't do this because it seemed like it would be quite awkward to implement - due to needing to assert on the parser not emitting an event - and probably not provide much value.
Happy to reconsider this if anyone has a suggestion on how this could be done neatly though!

# The Yajl library defines a default buffer size of 8092 when parsing
# from IO streams, so maintain this for backwards-compatibility.
# https://www.rubydoc.info/github/brianmario/yajl-ruby/Yajl%2FParser:parse
config_param :stream_buffer_size, :integer, default: 8092
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add :desc parameter to explain the parameter. This is only for yajl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use desc method, not :desc parameter, it should be put on top of config_param.
Currently, desc 'Set the buffer size that Yajl will use when parsing streaming input' is for time_tupe, not stream_buffer_size.
See https://docs.fluentd.org/v1.0/articles/api-plugin-base

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that was a silly mistake, thanks for the pointer! Fixed.

@@ -111,4 +111,29 @@ def test_parse_with_keep_time_key_without_time_format(data)
assert_equal text, record['time']
end
end

data('yajl' => 'yajl')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use data for only 1 pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, sorted :)

@benwh benwh force-pushed the configurable-json-stream-buffer branch from e6ef491 to 09ae8bc Compare April 15, 2019 09:06
@benwh
Copy link
Contributor Author

benwh commented Apr 16, 2019

Hi @repeatedly, those issues are fixed so could you take another look when you get some time? Thanks!

@benwh benwh force-pushed the configurable-json-stream-buffer branch from 09ae8bc to 9f9a631 Compare April 17, 2019 18:02
@@ -30,6 +30,12 @@ class JSONParser < Parser
desc 'Set JSON parser'
config_param :json_parser, :enum, list: [:oj, :yajl, :json], default: :oj

# The Yajl library defines a default buffer size of 8092 when parsing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nits] It may be better a default buffer size is 8192 because of brianmario/yajl-ruby@9d4cca0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a nit; thanks for picking up on this! I did think that 8092 bytes was a bit of a strange size for a buffer, but didn't look into it 🤦‍♂️.
Fixed and I'll update the docs PR too.

Copy link
Member

@ganmacs ganmacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although I left a few nits comments.

)
text = "100"

waiting(5) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need? this test suit seems not to block since L134.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's not actually required for the test to work, I put it there to protect against any regression in the code being tested (or the test scaffolding) that might cause the test to fail, because if it does then it will block forever which would be quite annoying when it runs in CI, if nothing else.

@benwh benwh force-pushed the configurable-json-stream-buffer branch from 9f9a631 to 34567ee Compare April 22, 2019 20:32
Allow configuration of the size of the buffer that Yajl uses when
parsing streaming input.

The advantage of this is that when using `out_exec_filter`, and parsing
as JSON, it's now possible to configure this plugin to avoid having to
wait for 8192 bytes of data to be parsed before events are emitted.

Configuration in the `out_exec_filter` tests has been modified to use
this parameter, as it shaves 60 seconds off the test run time.

Signed-off-by: Ben Wheatley <contact@benwh.com>
@benwh benwh force-pushed the configurable-json-stream-buffer branch from 34567ee to 811390d Compare April 22, 2019 20:34
benwh added a commit to benwh/fluentd-docs that referenced this pull request Apr 22, 2019
See fluent/fluentd#2381

In addition to documenting the parameter, update the `in_exec` and
`out_exec_filter` articles to provide a workaround for users
encountering this problem.
This note exists in the v0.12 version of the `in_exec` page but was
absent in the v1.0 docs, so add the whole note.

Closes fluent#624

Signed-off-by: Ben Wheatley <contact@benwh.com>
@repeatedly repeatedly merged commit 73a4f01 into fluent:master Apr 25, 2019
@repeatedly
Copy link
Member

Thx!

@benwh benwh deleted the configurable-json-stream-buffer branch February 23, 2020 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

in_exec json streams does not work as expected
3 participants