-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trim large messages #2644
Trim large messages #2644
Conversation
Build FAILURE |
11ab942
to
0efdf2c
Compare
Build SUCCESS |
This pull request introduces 1 alert when merging 0efdf2c3313ae1bad02dbe90a5d5fe160d92ecb6 into b1620af - view on LGTM.com new alerts:
Comment posted by LGTM.com |
0efdf2c
to
6c40bc9
Compare
Build FAILURE |
6c40bc9
to
9c4aefc
Compare
Build SUCCESS |
Could you cover the case with unit test when a small message and a large message is available in the buffer, and the large message is not truncated below
|
9c4aefc
to
cce53c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Build FAILURE |
@kira-syslogng retest this please |
Build SUCCESS |
note to your ToDo: aux (or ancillary) data is a transport functionality(credentials, peer addr, pid, etc...), trim is a proto functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- required : check Refactor trim incoming messages szemere/syslog-ng#6
Note:
I'm not sure if it's not possible to minimize the state machine.
@szemere : and one more note to the other topic (the tagging issue): maybe it is time to modify the fetch method: what if we create a |
additionally corrected some nearby indentations Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
The framed server has never supported the encoding option, thus init_buffer_size == max_buffer_size == max_msg_size Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
As a preparation to make log_proto_framed_server_fetch states more independent from each other (will be easier to introduce a new state) eliminated the try_read variable with some code duplication. Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
Implementing the "trim_large_messages" option. (Currently aligning with the logic of log_proto_framed_server_fetch, with lots of goto statements and code duplication.) Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
in log_proto_framed_server_fetch into enums and continue statements Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
cce53c6
to
8d45b47
Compare
Build SUCCESS |
On caller side we are only interested in the case when we were able to fetch any data from the input. Before this change EAGAIN was masked with LPS_SUCCESS, giving mixed results on caller side. Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
With the previous change of log_proto_framed_server_fetch_data, it was easier to create "read" states in log_proto_framed_server_fetch. By separating the "read" and "extract" steps, eliminated the code duplications caused by the "extract -> read -> extract" logic. Extract state will jump into read, if there is not enough data to finish. Read state will return LPS_SUCCESS, and continue later if there is nothing to read. Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
From now on frame header reading will ensure that there is enough space in the buffer for reading the frame header. (With a low chance of unsuccessfull parsing atempt.) And with the knowledge of the actual frame_len it will also make sure that later, there will be enough room for the message. Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
8d45b47
to
245bf0f
Compare
Build SUCCESS |
Build FAILURE |
I'm dismissing my review because I'm feeling sick and don't want to block the PR. Thanks for addressing my notes!
0594915
to
d9bb667
Compare
Build FAILURE |
@kira-syslogng retest this please |
Build SUCCESS |
from the state machine in log_proto_framed_server_fetch Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
split into smaller methods, each state got a separate method Signed-off-by: Laszlo Budai <stentor.bgyk@gmail.com>
Signed-off-by: Laszlo Budai <stentor.bgyk@gmail.com>
Signed-off-by: Laszlo Budai <stentor.bgyk@gmail.com>
to prevent the starvation of other sources. Signed-off-by: Laszlo Szemere <laszlo.szemere@balabit.com>
d9bb667
to
4cd0e8f
Compare
@@ -99,6 +101,9 @@ log_proto_framed_server_fetch_data(LogProtoFramedServer *self, gboolean *may_rea | |||
if (!(*may_read)) | |||
return FALSE; | |||
|
|||
if (self->fetch_counter++ >= MAX_FETCH_COUNT) | |||
return FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not necessary.
I just read about this yesterday, every edge-triggered server works like this: they have to read until EAGAIN is returned, and server programs usually do not deal with the possibility of starvation on this level.
In theory, starvation is possible in case of super-fast senders, but this is not a real-world scenario in my opinion.
The same question has been asked on the nginx forum a few years ago:
https://forum.nginx.org/read.php?29,250351,250360#msg-250360
When an in-memory loopback
interface is used, it might be possible to reproduce a starvation problem, but in case of real network connections or files, the order of magnitude of reading from a memory buffer is different than the speed of those devices, so the problem does not really exist (unless the CPU has a very high load).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if we really want to care with this case, our other proto server implementations have to be adjusted as well.
Fortunately, epoll uses round robin, so the concept will actually work:
If more than maxevents file descriptors are ready when epoll_wait()
is called, then successive epoll_wait() calls will round robin
through the set of ready file descriptors. This behavior helps avoid
starvation scenarios, where a process fails to notice that additional
file descriptors are ready because it focuses on a set of file
descriptors that are already known to be ready.
Note: We have log-msg-size
and log-iw-size
, they can avoid starvation together with our flow-control mechanism even if flags(flow-control)
is not set (but this is on a different level).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAX_FETCH_COUNT
adds back the old try try_read
logic in a much cleaner way. Removing this patch did not really change our performance numbers, so I'm approving the PR.
Build SUCCESS |
@kira-syslogng test this please test branch=pzolee-trim-large-messages; |
Build SUCCESS |
Add the
trim-large-messages
option to the logproto-framed-server. (syslog source driver)Without trimming, the framed server simply drops oversized messages (>log_msg_size) and closes the incoming connection.
With trim-large-messages enabled, framed server will create a log message from the first (
log_msg_size
sized) part of the message, and ignores the rest of it. The communication will continue uniterrupted with the following message.TODO:
LogReader
viaLogTransportAuxData
, and can be marked on the new log message i.e. with atag
. But I don't like the idea, thatLogReader
has to deal with a property from one specific protocol.