Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket Write and Close Hangs Indefinitely When There Is No Connectivity #905

Closed
alvarolb opened this issue Jun 21, 2024 · 1 comment · Fixed by #906 or #909
Closed

Socket Write and Close Hangs Indefinitely When There Is No Connectivity #905

alvarolb opened this issue Jun 21, 2024 · 1 comment · Fixed by #906 or #909

Comments

@alvarolb
Copy link

alvarolb commented Jun 21, 2024

Hi,

I am not sure why this problem has not been reported before, but there is a serious issue when using TCP/TLS sockets. I think that the current write implementation is wrong. This is the related code from MbedClient.cpp:

size_t arduino::MbedClient::write(const uint8_t *buf, size_t size) {
  if (sock == nullptr)
    return 0;

  sock->set_blocking(true);
  sock->set_timeout(SOCKET_TIMEOUT);
  int ret = NSAPI_ERROR_WOULD_BLOCK;
  do {
    ret = sock->send(buf, size);
  } while ((ret != size && ret == NSAPI_ERROR_WOULD_BLOCK) && connected());
  configureSocket(sock);
  return size;
}

The problem is that if you turn off the WiFi router or remove the Ethernet cable while a socket is open, a normal write to this socket will hang indefinitely. In this situation, the send function will always report NSAPI_ERROR_WOULD_BLOCK, and the connected() method will always return true. As a result, the loop remains indefinitely, blocking the application. If the Ethernet cable is reconnected, the application resumes normal operation. However, this is not the desired behavior, as the application cannot detect disconnections.

I have slightly modified the method, and it seems to work:

size_t arduino::MbedClient::write(const uint8_t *buf, size_t size) {
  if (sock == nullptr)
    return 0;

  sock->set_blocking(true);
  sock->set_timeout(SOCKET_TIMEOUT);
  int ret = sock->send(buf, size);
  configureSocket(sock);
  if(ret==NSAPI_ERROR_WOULD_BLOCK) return 0;
  return size;
}

I'm not sure why the code included this while loop, as the socket is configured to block. If no bytes are written after the specified timeout, it will always return NSAPI_ERROR_WOULD_BLOCK. For more information, refer to the documentation: mbed-os TCP Socket.

By default, send blocks until all data is sent. If socket is set to non-blocking or times out, a partial amount can be written. NSAPI_ERROR_WOULD_BLOCK is returned if no data was written.

Additionally, I’m not sure why we have to call configureSocket(sock) repeatedly after each write. This could potentially impact performance.

void arduino::MbedClient::configureSocket(Socket *_s) {
  _s->set_timeout(_timeout);
  _s->set_blocking(false);
  _s->getpeername(&address);

  if (event == nullptr) {
    event = new rtos::EventFlags;
  }
  if (mutex == nullptr) {
    mutex = new rtos::Mutex;
  }
  mutex->lock();
  if (reader_th == nullptr) {
    reader_th = new rtos::Thread(osPriorityNormal - 2);
    reader_th->start(mbed::callback(this, &MbedClient::readSocket));
  }
  mutex->unlock();
  _s->sigio(mbed::callback(this, &MbedClient::getStatus));
  _status = true;
}

Moreover, if we detect a write failure in the application (via the modified write method) and try to close the socket, it also hangs indefinitely (only on TLS). I suspect that the current mBed configuration included in the Arduino build uses the default behavior when closing a TLS connection, which involves sending a close notify to the remote server. This close notify requires a connection to work, but since the connection is not available, it hangs forever, just like the previous write. This issue is also reported in this ARMmbed topic:

ARMmbed/mbed-os#15209

This is the close method from TLSWrapper.cpp using the mbed_tls_close_notify:

nsapi_error_t TLSSocketWrapper::close()
{
    if (!_transport) {
        return NSAPI_ERROR_NO_SOCKET;
    }

    tr_info("Closing TLS");

    int ret = 0;
    if (_handshake_completed) {
        _transport->set_blocking(true);
        ret = mbedtls_ssl_close_notify(&_ssl);
        if (ret) {
            print_mbedtls_error("mbedtls_ssl_close_notify", ret);
        }
        _handshake_completed = false;
    }

    if (_close_transport) {
        int ret2 = _transport->close();
        if (!ret) {
            ret = ret2;
        }
    }

    _transport = NULL;

    return ret;
}

I would definitely recommend to skip the close notify, or at least configure a timeout.

@domfie
Copy link

domfie commented Jun 27, 2024

Finished my application with the Opta, started testing for basic things like what happens if ethernet goes down due to switch failure and stumbled upon this issue. Huge disappointment. If arduino claims to do something industrial, it should definitely have been tested to work even if parts of external systems failing.

After applying your patch it no longer hangs during mqttclient.poll(). Thank you @alvarolb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants