Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataChan send file fail #1298

Open
changtsing opened this issue Nov 27, 2024 · 10 comments
Open

DataChan send file fail #1298

changtsing opened this issue Nov 27, 2024 · 10 comments

Comments

@changtsing
Copy link

changtsing commented Nov 27, 2024

Hello.
My program is running on the macOS system and sends files to the peer (app) through Datachan. Datachan is set to send files sequentially.
The general logic of the code is as follows:

while (read file) {
    std::cout << "bufferedAmount ....." << dc->bufferedAmount() << std::endl;
    while (dc->bufferedAmount()!= 0) {
        std::cout << "bufferedAmount wait....." << dc->bufferedAmount() << std::endl;
        std::this_thread::sleep_for(std::chrono::milliseconds(500));
    }
    dc->send(data);
}

At the beginning, the peer could receive messages. However, after a period of time, the peer couldn't receive any messages. Then I added a waiting mechanism for the condition that bufferedAmount!= 0, and I found that there was a situation where bufferedAmount didn't decrease.
I don't know what happened and I'm looking forward to your suggestions.

iperf3 test results

image
@paullouisageneau
Copy link
Owner

If the other peer stops receiving messages, it probably means the connection is lost. The peer connection on sender side should time out after some time and the data channel should close. You need to check for it in your code.

Looping with a sleep call is not an efficient way to control the buffer level. You need to use the onBufferedAmountLow callback like in the benchmark for instance.

@changtsing
Copy link
Author

changtsing commented Nov 29, 2024

If the other peer stops receiving messages, it probably means the connection is lost. The peer connection on sender side should time out after some time and the data channel should close. You need to check for it in your code.

The peer has not stopped receiving, and the channel has not been closed either. The peer (the receiver) can still send messages and the receiver can also receive them.

Looping with a sleep call is not an efficient way to control the buffer level. You need to use the onBufferedAmountLow callback like in the benchmark for instance.

Yes, these codes are rather crude. I just wanted to conduct tests. When bufferedAmount!= 0 occurs, the onBufferedAmountLow function will not be triggered.
I'll add some more logs.

@paullouisageneau
Copy link
Owner

The peer has not stopped receiving, and the channel has not been closed either. The peer (the receiver) can still send messages and the receiver can also receive them.

You wrote "At the beginning, the peer could receive messages. However, after a period of time, the peer couldn't receive any messages." If the peer stops receiving but the connection is not lost, it probably means your receiver application is blocking a callback.

When bufferedAmount!= 0 occurs, the onBufferedAmountLow function will not be triggered.

This is how it works, the onBufferedAmountLow callback is called only when bufferedAmount falls at or below the threshold (by default 0). The is the same behavior as the onbufferedamountlow event on the Web API, except the libdatachannel callback is also called when the threshold is 0.

log.zip

It looks like the remote peer stops receiving. You really don't need debug logging on server side here, it looks like a straightforward receiver side issue, probably a blocked callback.

@changtsing
Copy link
Author

It looks like the remote peer stops receiving. You really don't need debug logging on server side here, it looks like a straightforward receiver side issue, probably a blocked callback.

Based on your suspicion, I conducted a test and found that when the sender is replaced with one written in JavaScript and files are sent via the browser, this problem doesn't occur. Could it be related to the sending rate?
I also have a doubt about how to clear the buffer of SCTP when it's full. Is it necessary to wait for the SACK from the peer end? My understanding is that as long as the channel isn't closed, the sender will attempt to retry the cached messages until an abnormality occurs in the channel. However, I didn't see this in the logs. What I saw seemed to be the sending of heartbeat messages.
This problem is quite thorny.

@paullouisageneau
Copy link
Owner

Based on your suspicion, I conducted a test and found that when the sender is replaced with one written in JavaScript and files are sent via the browser, this problem doesn't occur. Could it be related to the sending rate?

Please show your code to send and receive data.

I also have a doubt about how to clear the buffer of SCTP when it's full. Is it necessary to wait for the SACK from the peer end?

You don't "clear" the buffer, SCTP can't send more until the receiver acks something.

@changtsing
Copy link
Author

Please show your code to send and receive data.请显示您的代码以发送和接收数据。

Recipient :

 async #registerChannelEvent() {
        this.#channel.onopen = () => {
            console.log('DataChannel opened!');
        }
        this.#channel.onclose = () => {
            console.log('DataChannel closed!');
        }
        this.#channel.onerror = (e) => {
            this.#configure.handleErrorCallback(ErrorCode.callFailed);
            console.log(`DataChannel err:${e.toString()}`);
        }
        this.#channel.onmessage = (e) => {
            this.#receiveChannelMsg(e.data);
        };
    }
    
     async #receiveChannelMsg(msg) {
        const uint8Array = new Uint8Array(msg, 0, 32);
        const hexString = Array.from(uint8Array).map(byte => byte.toString(16).padStart(2, '0')).join(' ');
        console.log('Received  data:', hexString);
        const view = new DataView(msg);
        let magic = view.getUint32(0, true);
        let isText = view.getUint32(4, true);
        let number = view.getUint32(8, true);
        let dataLen = view.getUint32(12, true);
        const data = new Uint8Array(msg, 16); 
        if (isText === 0x00) {
            const decoder = new TextDecoder('utf-8');
            const jsonString = decoder.decode(data);
            const jsonData = JSON.parse(jsonString);
            console.log(jsonData)
            
            if (jsonData.value.action === 'start') {
                
                if (!this.#file) {
                    this.#channel.send(this.#buildCommand({type: "transfer", value: {action: "no"}}));
                    return
                }
                
                this.#channel.send(this.#buildCommand({type: "transfer", value: {action: "ok"}}));
                
                await this.#sendFile(this.#file, 0);
                
                this.#channel.send(this.#buildCommand({type: "transfer", value: {action: "stop"}}));
            } else if (jsonData.value.action === 'stop') {
                this.#configure.receiveChunkCallback((new TextEncoder()).encode("END").buffer);
            } else if (jsonData.value.action === 'ok') {
                this.#configure.receiveChunkCallback((new TextEncoder()).encode("START").buffer);
            }
        } else if (isText === 0x01) {
            console.log('seq num:', number);
            this.#configure.receiveChunkCallback(msg.slice(16));
        }
    }

   receiveChunkCallback: (chunk) => {
            console.log('Got chunk:', chunk);
        }

sender :

void fileSendThreadFunction(FileSendParams& params) {
        std::cout << "filepath:" << params.filePath << std::endl;
        const size_t bufferSize = 1024;
        std::ifstream file(params.filePath, std::ios::binary);
        if (file.is_open()) {
            std::vector<char> buffer(bufferSize);
            int i = 0;
            while (file.read(buffer.data(), bufferSize)) {
                buffer.insert(buffer.begin(), static_cast<char>(i));
                std::vector<std::byte> byteBuffer(buffer.size());
                std::transform(buffer.begin(), buffer.end(), byteBuffer.begin(),
                            [](char c) { return static_cast<std::byte>(c); });
                {
                    rtc::binary fileData(byteBuffer);
                    if (!params.dataChannel->isOpen()) {
                        std::cout << "Data sent fail. datachan open fail" << std::endl;
                        break;
                    }
                    try {
                        std::cout << "bufferedAmount....." << params.dataChannel->bufferedAmount() << std::endl;
                        while (params.dataChannel->bufferedAmount() > 0) {
                            std::cout << "bufferedAmount wait....." << params.dataChannel->bufferedAmount() << std::endl;
                            std::this_thread::sleep_for(std::chrono::milliseconds(500));
                        }
                        params.dataChannel->send(byteBuffer);
                        std::cout << "Data sent successfully. " <<std::endl;
                        
                        
                    } catch (const std::exception &e) {
                        std::cout << "Send failed: " << e.what() << std::endl;
                    }
                }
                buffer.assign(bufferSize, '\0');
                
            }
            if (params.dataChannel) {
                params.dataChannel->send("0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF");
            }
            std::cout << "send ok " << params.filePath << "  end " << std::endl;
            file.close();
        } else {
            std::cerr << "file open fail " << params.filePath << std::endl;
        }
        
}

The code is rather simple and crude, and some unimportant exceptions haven't been handled. I mainly focused on testing the function of sending files.

Another question is that when the peer side doesn't receive or when the onMessage function gets blocked, how can I find clues? The packets captured by Wireshark are in the DTLS format. I'm trying to figure out a way to decrypt them and then analyze the acknowledgments (acks) and windows of SCTP to look for clues. Is this feasible? Do you have any other recommended methods?

@paullouisageneau
Copy link
Owner

So the receiver is web? I assumed it was libdatachannel. The code makes no sense. Sender and receiver are completely unrelated and obviously not made to work together.

Still, it shouldn't end up with a blocked transfer. Please show the whole sender code.

@changtsing
Copy link
Author

changtsing commented Dec 3, 2024

So it's very strange that I can't find anything suspicious.

sender code.

@changtsing
Copy link
Author

changtsing commented Dec 5, 2024

After a large number of tests and packet capture and analysis with Wireshark, it was found that the problem was caused by the fact that the sender did not retransmit. It is speculated that there is a high probability that the issue is caused by the usrsctp library, as it did not resend the lost packets. An attempt was made to make changes and the problem was solved. Of course, continuous testing is still being carried out. I'm not sure whether the problem has been completely resolved.

The following is the modified code, which has been provided for others as a reference.

diff --git a/usrsctplib/netinet/sctputil.c b/usrsctplib/netinet/sctputil.c
index bd41414..24d7282 100755
--- a/usrsctplib/netinet/sctputil.c
+++ b/usrsctplib/netinet/sctputil.c
@@ -1913,7 +1913,7 @@ sctp_timeout_handler(void *t)
 #endif
                sctp_chunk_output(inp, stcb, SCTP_OUTPUT_FROM_T3, SCTP_SO_NOT_LOCKED);
                did_output = true;
-               if ((stcb->asoc.num_send_timers_up == 0) &&
+               if (
                    (stcb->asoc.sent_queue_cnt > 0)) {
                        struct sctp_tmit_chunk *chk;

Do you have any other suggestions regarding the bug? I'd be glad to hear your suggestions.

@paullouisageneau
Copy link
Owner

The usrsctp code you modified has a comment indicating this is only a safeguard for a situation that should not happen: there is something to send but no timer running. You removed the check for running timers, so now as soon as there is something to send it overrides everything and always starts the timer. Don't do this.

If you really believe this is a problem with usrsctp, you should report the issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants