Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.3.0 infinite loop in WifiClientSecure.write when no internet (found reason) #3537

Closed
odelot opened this issue Aug 21, 2017 · 8 comments
Closed

Comments

@odelot
Copy link

odelot commented Aug 21, 2017

Hi guys,

I had this bug, then tried to migrate to 2.4.0rc but there the behavior is more unpredictible (details here). So, I came back to 2.3.0 and tried to debug.

the infinite loop happens when you are connected with a secured TCP socket in a long duration connection, is connected to a wifi router but the wifi router lost internet access.

ClientContext write tries to send the package and after a few tries logs the error :wr !ERR_OK and returns 0

err_t err = tcp_write(_pcb, data, will_send, 0);
            if(err != ERR_OK) {
                DEBUGV(":wr !ERR_OK\r\n");
                return 0;
            }

but the tls1.c send_raw_packet function (from igrr/axtls-8266 library - used this tree version to build and debug) has this while code that just gives error with a ret < 0

while (sent < pkt_size)
    {
        ret = SOCKET_WRITE(ssl->client_fd, 
                        &ssl->bm_all_data[sent], pkt_size-sent);

        if (ret >= 0)
            sent += ret;
        else
        {

#ifdef WIN32
            if (GetLastError() != WSAEWOULDBLOCK)
#else
            if (errno != EAGAIN && errno != EWOULDBLOCK)
#endif
                return SSL_ERROR_CONN_LOST;
        }
#ifndef ESP8266
        /* keep going until the write buffer has some space */
        if (sent != pkt_size)
        {
            fd_set wfds;
            FD_ZERO(&wfds);
            FD_SET(ssl->client_fd, &wfds);

            /* block and wait for it */
            if (select(ssl->client_fd + 1, NULL, &wfds, NULL, NULL) < 0)
                return SSL_ERROR_CONN_LOST;
        }
#endif
    }

so it never detects the error and stay in loop.

Is this analysis right?

I was thinking to change the ClientContext write to return err on error. But I cannot predict all the impacts of this change.

Solution (so far so good)

ClientContext write

err_t err = tcp_write(_pcb, data, will_send, 0);
            if(err != ERR_OK) {
                DEBUGV(":wr !ERR_OK\r\n");
                return err;                                           //<- CHANGED HERE
            }

tsl1.c send_raw_packet

while (sent < pkt_size)
    {
        ret = SOCKET_WRITE(ssl->client_fd, 
                        &ssl->bm_all_data[sent], pkt_size-sent);

        if (ret >= 0)
            sent += ret;
        else
        {
            errno = ret; //? can I do this?                     //<- CHANGED HERE

#ifdef WIN32
            if (GetLastError() != WSAEWOULDBLOCK)
#else
            if (errno != EAGAIN && errno != EWOULDBLOCK)
#endif
                return SSL_ERROR_CONN_LOST;
        }
#ifndef ESP8266
        /* keep going until the write buffer has some space */
        if (sent != pkt_size)
        {
            fd_set wfds;
            FD_ZERO(&wfds);
            FD_SET(ssl->client_fd, &wfds);

            /* block and wait for it */
            if (select(ssl->client_fd + 1, NULL, &wfds, NULL, NULL) < 0)
                return SSL_ERROR_CONN_LOST;
        }
#endif
    }
@cristovao-trevisan
Copy link

cristovao-trevisan commented Aug 21, 2017

Hello @odelot, where is the tsl1.c filr? I am having what seems to be same problem, but cannot find the file to test your solution.

@odelot
Copy link
Author

odelot commented Aug 21, 2017

@cristovao-trevisan this file is from igrr/axtls-8266 library - they include it compiled

as it does not have tag versions (and I am not very good using git) I took the date of 2.3.0 release to get a version of the source code. I've used this snapshot ->
https://github.com/igrr/axtls-8266/tree/ab516f799dd43faccaf01ce5eacd5f7a1b9b109a

I´ve had to configure a linux environment to compile it. I can send you the binary, but I am not sure if this is the right fix. Need to test more and have the opinion from those who are used to these libs and files.

@cristovao-trevisan
Copy link

Since errno is global there is no need to change the axtls library. My solution was to create a retry count and close the connection after 25 tries (got the number somewhere in this repo). Here it is (in file libraries/ESP8266WiFi/src/WiFiClientSecure.cpp):

uint8_t retries = 0;
extern "C" int ax_port_write(int fd, uint8_t* buffer, size_t count) {
    ClientContext* _client = SSLContext::getIOContext(fd);
    if (!_client || _client->state() != ESTABLISHED) {
        errno = EIO;
        return -1;
    }
    size_t cb = _client->write((const char*) buffer, count);
    if (cb != count) {
        if (++retries > 0x19) {
            DEBUGV("ssl_write: Exceeded max write retries");
            _client->close();
            retries = 0;
            errno = ENOTCONN;
        }
        else errno = EAGAIN;
    } else {
        retries = 0;
    }
    return cb;
}

Not sure about the global variable, but ax_port_write is already global (and the ClientContext is static), so I don't think there is a problem. Also didn't check for memory leaks, but it does prevent the issue for now.

Can someone confirm this solution so I can make a PR out of it?

cristovao-trevisan added a commit to cristovao-trevisan/Arduino that referenced this issue Aug 22, 2017
@odelot
Copy link
Author

odelot commented Aug 22, 2017

Wow. nice. Way better than my fix (Y) I didn't realize that errno was global.

Have you tried this fix on 2.4.0rc? In this release the WifiClientSecure.write instead of went to a infinite loop was returning that it has sent data even without internet. I didn't debug this code to find why.

@cristovao-trevisan
Copy link

I will try to dig more on this (maybe on sunday), I didn't check for the result

@martinren87
Copy link

Hello guys, are there any news regarding to this issue? Will it be fixed soon? Or can anybody share with me a fixed unofficial version of the library? Thank you!

@cristovao-trevisan
Copy link

cristovao-trevisan commented Sep 28, 2017

If you're using v2.3 follow my instructions above and replace the code. For v2.4 take a look at my PR (fork referenced here), though I didn't test it (2.4, cause 2.3 fix works for sure).

@devyte
Copy link
Collaborator

devyte commented May 29, 2018

The referenced code in ClientContext has since been rewritten, and seems to have a different behavior now.
Also, BearSSL is merged in #4273 , with alternate BearSSL::WiFi* classes. Although axtls-based classes are still available and even the default, they are planned for deprecation and then retirement, hence won't be fixed. Any issues with BearSSL-based classes should be reported in new issues.
Closing this. If the issue is still relevant, please open a new issue.

@devyte devyte closed this as completed May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants