-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
in_sample: duplicate record to support destructive change #4586
Conversation
checking whether it does not break existing test cases with CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix!
Shouldn't we remove this dup
?
fluentd/lib/fluent/plugin/in_sample.rb
Lines 127 to 130 in c9dcf25
if @auto_increment_key | |
d = d.dup | |
d[@auto_increment_key] = @storage.update(:auto_increment_value){|v| v + 1 } | |
end |
Thanks, it should be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested the impact on performance.
There seems to be a noticeable load on the CPU and RES for large data flows.
It seems acceptable to me because it is hard to imagine that the performance of in_sample
would be a problem.
In addition, this fix looks very natural for in_sample
's design.
This may affect users using in_sample
to test performance.
So, it would be enough to report at the release that the in_sample
has become a bit overloaded when flowing large data for testing.
I would like to hear one more person's opinion just to be sure.
the impact on performance
<source>
@type sample
tag test
size xxx
rate xxx
</source>
size: 10000, rate: 100
Current
$ top -o RES -p xxx
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1610784 daipom 20 0 273060 61036 11152 S 41.6 0.2 0:10.14 ruby
This PR
$ top -o RES -p xxx
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1610963 daipom 20 0 274264 69680 11260 S 57.0 0.2 0:09.73 ruby
size: 100000, rate: 100
Current
$ top -o RES -p xxx
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1611452 daipom 20 0 289000 80492 11072 S 101.0 0.2 0:16.52 ruby
This PR
$ top -o RES -p xxx
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1611243 daipom 20 0 304720 148012 11152 S 100.0 0.5 0:11.36 ruby
Thanks for checking in point of performance. If user uses in_sample for their own plugin benchmark, it affects because baseline was changed. |
I'll wait a bit to see if anyone else has an opinion. |
e51097c
to
2d546cc
Compare
Added renew_record parameter like this:
size: 10000
size: 100000
With added conditional switch, there is a slightly performance regression with previous behavior. |
Changed to more conservative approach. What do you think? @daipom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It's very clear with reuse_record
!
Sorry for the minor points..., but the following are points of concern.
When in_sample plugin is used with filter parser which uses remove_key_name_field, it raises the following error repeatedly. #0 dump an error event: error_class=ArgumentError error="message does not exist" This kind of error occurs when key_name and remove_key_name_field removes key from record with destructive change in filter parser. It affects generated sample data. To fix this issue, it is simple to just dup every record even though it has a significant performance penalty. Considering keeping compatibility and providing way to a workaround, added option to enable previous behavior - reuse_record, disabled by default. ref. fluent#4575 Here is the small benchmark: <source> @type sample tag test size xxx rate 100 reuse_record </source> <match test> @type null </match> size: 100000 master: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 330767 kenhys 20 0 364316 81036 13620 S 100.3 0.1 0:52.10 ruby reuse_record: true PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 333640 kenhys 20 0 364328 80956 13560 S 100.0 0.1 0:17.04 ruby reuse_record: false PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 335843 kenhys 20 0 366188 113300 13536 S 100.3 0.2 0:17.24 ruby Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Which issue(s) this PR fixes:
Fixes #4575
What this PR does / why we need it:
When in_sample plugin is used with filter parser which uses remove_key_name_field, it raises the following error repeatedly.
This kind of error occurs when key_name and remove_key_name_field removes key from record with destructive change in filter parser affects generated sample data.
Docs Changes:
TODO: Add
reuse_record
parameter.Release Note:
N/A