Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL_write_ex on stream object sometimes retransmits parts of stream data #924

Closed
nhorman opened this issue Nov 19, 2024 · 0 comments
Closed
Assignees

Comments

@nhorman
Copy link
Contributor

nhorman commented Nov 19, 2024

In attempting to build out QUIC server testing using the quic-interop-runner, I have written a demo server here:
https://github.com/nhorman/openssl/blob/quic-server-interop/demos/guide/quic-hq-interop-server.c

When running several tests with openssl specified as a server, a number of them failed indicating:

File size of /tmp/download_4d63aram/tknvfrrbre doesn't match. Original: 3145729 bytes, downloaded: 3147207 bytes.

Some debugging in the server code seems to have confirmed that we in fact wrote 3145729 bytes through the SSL_write_ex call, but the client still received 3147207 bytes.

Some additional debugging, in which I modified the interop runner harness to write sequential data into the file to download (I filled the file with a series of sequential values followed by a newline) led to an identification of the differences in the file that was sent and the file that was received:

--- ./tknvfrrbre        2024-11-19 00:50:20.000000000 +0000
+++ ../../client/client-download/tknvfrrbre     2024-11-19 00:50:24.000000000 +0000
@@ -110078,6 +110078,12 @@
 110078
 110079
 110080
+15
+110076
+110077
+110078
+110079
+110080
 110081
 110082
 110083
@@ -128802,6 +128808,18 @@
 128802
 128803
 128804
+128808793
+128794
+128795
+128796
+128797
+128798
+128799
+128800
+128801
+128802
+128803
+128804
 128805
 128806
 128807
@@ -147527,6 +147545,17 @@
...

From the diff it appears that, periodically, chunks of data within the stream get sent a second time over the wire.

Need to investigate the root cause and fix it. The error might be in the server code itself, but it doesn't appear so to me. I think the issue is a race condition in the reactor consuming bytes from the stream map with updates to the stream map from the application, but I've not looked closely enough yet

@nhorman nhorman removed this from Project Board Nov 19, 2024
@nhorman nhorman moved this from Refining to Pre-Refinement in Development Board Nov 19, 2024
@nhorman nhorman added this to the 3.5.0 milestone Nov 19, 2024
@nhorman nhorman self-assigned this Nov 19, 2024
@nhorman nhorman moved this from Pre-Refinement to In Progress in Development Board Nov 19, 2024
@nhorman nhorman removed this from Project Board Nov 19, 2024
@nhorman nhorman moved this from Refining to In Progress in Project Board 3.5.0 planning Nov 19, 2024
nhorman added a commit to nhorman/openssl that referenced this issue Nov 19, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924
@nhorman nhorman moved this from In Progress to Waiting Review in Development Board Nov 19, 2024
nhorman added a commit to nhorman/openssl that referenced this issue Nov 19, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924
nhorman added a commit to nhorman/openssl that referenced this issue Nov 19, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924
nhorman added a commit to nhorman/openssl that referenced this issue Nov 21, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924
openssl-machine pushed a commit to openssl/openssl that referenced this issue Nov 21, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #26023)

(cherry picked from commit 2de7e1d)
openssl-machine pushed a commit to openssl/openssl that referenced this issue Nov 21, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #26023)

(cherry picked from commit 2de7e1d)
openssl-machine pushed a commit to openssl/openssl that referenced this issue Nov 21, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #26023)

(cherry picked from commit 2de7e1d)
openssl-machine pushed a commit to openssl/openssl that referenced this issue Nov 21, 2024
When writing to a blocking quic stream, we sometimes get duplicate
transmitted data.  This occurs when a call to quic_write_blocking has to
wait for space to become available in the ring buffer.  When we do a
wait, the call sets *written to the value returned in args.total_written
as filled out by the calls to block_until_pred->quic_write_again.
However, the value there is based on the amount we requested, which is
only the remaining data that we didn't append in xso_sstream_write.  So
if we call quic_write_blocking with a buffer of length X, and initially
append Y bytes, and write the remainig X-Y bytes via a block_until_pred
call, then *written will return with the value X-Y, even though we wrote
the full X bytes to the ring buffer.

Fix it by recording the initial amount appended into *written, and then
add the args.total_written value if we have to wait on more space

Fixes openssl/project#924

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from #26023)
@nhorman nhorman closed this as completed Nov 21, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Project Board 3.5.0 planning Nov 21, 2024
@github-project-automation github-project-automation bot moved this from Waiting Review to Done in Development Board Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Done
1 participant