Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downstream: PR 6882 rework #6906

Merged
merged 10 commits into from
Feb 24, 2023
Merged

Conversation

leonardo-albertovich
Copy link
Collaborator

This pull request addresses the same issues that PR #6882 covered.

input: additionally, changed the stream parameter type to make the
client code simpler.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
Removed a stored reference to the main event loop in favor of obtaining
the current event loop pointer from the TLS which is required to prevent
client sockets from being registered in the main thread when the plugin
instance is running in its own thread.

Additionally, a call to flb_input_downstream_set was added in order to
enable thread safety and move the downstream to the plugin instance
downstream list if needed in order to ensure that the downstream is
processed by the timeout callback runing in the plugins thread instead
of the main thread.

Signed-off-by: Leonardo Alminana <leonardo@calyptia.com>
@edsiper
Copy link
Member

edsiper commented Feb 23, 2023

the changes makes sense to me, but we need more testing. @nokute78 can you validate the approach proposed here ?

@leonardo-albertovich leonardo-albertovich temporarily deployed to pr February 23, 2023 17:05 — with GitHub Actions Inactive
@leonardo-albertovich
Copy link
Collaborator Author

I just realized that the test case I created for this has already been running for 24 hours when I went to switch branches to backport it to 2.0.

The configuration I used was this :

[SERVICE]
    flush 1
    grace 1
    log_level info

[INPUT]
    name forward
    threaded On

[OUTPUT]
    name stdout
    match *

The clients were eight shells running this line :

while [ 1 ] ; do  telnet 127.0.0.1 24224 <sample_file >/dev/null 2>&1; sleep 0.1 ; done

And the contents of sample_file were these :

{"a": "b"}

Which obviously is wrong considering this is in_forward but what's important in this case is that those 8 clients were constantly establishing new connections, sending a bit of data and closing the connection which combined with the timers and main thread event loop processing caused all "three" issues to trigger originally.

So I am fairly confident that this PR works.

@lecaros lecaros added this to the Fluent Bit v2.1.0 milestone Feb 24, 2023
@edsiper edsiper merged commit fc947c5 into master Feb 24, 2023
@edsiper edsiper deleted the leonardo-master-downstream-pr-6882-rework branch February 24, 2023 15:32
@nokute78
Copy link
Collaborator

@leonardo-albertovich Thank you for reworking.
I tested using thread sanitizer and the race was not detected (except #6776)

Note: Side effect of thread sanitizer ?
#6882 (comment)

I tested following configuration and enabled thread sanitizer.
The receiver didn't output and ignored signal (Ctrl+C).

If I disabled thread sanitizer, the receiver worked correctly.

Receiver:

[INPUT]
    Name forward
    Threaded on

[OUTPUT]
    Name stdout

Sender:

fluent-bit -i dummy -p rate=100000 -o forward

Note: Detail of the PR is
#6882 (comment)
#6882 (comment)

@leonardo-albertovich
Copy link
Collaborator Author

I think once we move the atomic operation abstraction layer from cmetrics to cfl we'll be in a much better position to fix that race in a performant way.

I think it's about time we move atomics to cfl, it's not the first time I've been in a situation where I'd like to use them outside of cmetrics and from a purely organizational point of view it would make sense (which I don't think anyone disagrees with).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants