Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsafe concurrent writes #196

Closed
thedevop opened this issue Mar 24, 2023 · 3 comments
Closed

Unsafe concurrent writes #196

thedevop opened this issue Mar 24, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@thedevop
Copy link
Collaborator

thedevop commented Mar 24, 2023

Resurfacing this issue.

Currently, both the client read/write Go routine may call cl.WritePacket:
https://github.com/mochi-co/mqtt/blob/7bd7bd5087c40f96015a61d01ce7ae4e6951c807/clients.go#L481

An example of when this can occur:

  1. Broker starts to publish messages to the client from the WriteLoop
  2. The read Go routine acks PingReq (or QoS 2 packet)

Possible solutions include:

  1. Add a client lock before nb.WriteTo(cl.Net.Conn)
    https://github.com/mochi-co/mqtt/blob/7bd7bd5087c40f96015a61d01ce7ae4e6951c807/clients.go#L554
  2. Send all writes to the WriteLoop, but caller will no longer receive write error without refactor
@mochi-co mochi-co added the bug Something isn't working label Apr 17, 2023
@mochi-co
Copy link
Collaborator

Thanks for identifying this @thedevop! Apologies again for the delay.

With 1. we potentially introduce lock contention, but with 2. we lose the ability to prioritise control messages (if memory serves, the spec indicates that certain messages should not be delayed based on the order of transmission of publish packets).

  1. is probably the most straightforward solution. If I recall, you are running at a pretty good scale - have you performed any benchmarks on adding a lock here to see if it makes any significant difference?

@thedevop
Copy link
Collaborator Author

thedevop commented Apr 23, 2023

@mochi-co ,

The use case I have is a bit different, currently it's mainly one way communication from client -> broker (will change in the future). Messages from broker to client are mostly control messages. The environment handles very large number of concurrent clients, but limited amount of message per client. The bottlenecks I see so far are (in order):

  1. AWS security group connection tracking limit. This limited how many concurrent clients a node can handle.
  2. Memory - using C (M is on the edge) instance type will bump this to 1.
  3. CPU - With the exception during connection thundering herd, there are excess CPU capacity.

Logically, since each client only has 2 threads: read and write, the client also has buffered channel to handle publishing, there will not be any cross-client lock contentions. Then the only time lock contention occurs is between read/write thread, as intended.

@mochi-co
Copy link
Collaborator

mochi-co commented May 4, 2023

I opted for solution 1 as it was simpler - this should be fixed in v2.2.8 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants