Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'http-callback' connects to the local 8085 server port, with a certain probability of socket connection failure ret=1018(No such file or directory) #533

Closed
haofz opened this issue Nov 18, 2015 · 8 comments
Assignees
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Milestone

Comments

@haofz
Copy link
Contributor

haofz commented Nov 18, 2015

Hello Winlin,

I am using the on_publish callback in SRS 1.0, and recently I have encountered an issue where I am unable to connect to port 8085. The Python server is using the code provided by SRS itself (# python research/api-server/server.py 8085). I have previously used a Python script to continuously call the program on that port without any socket connection issues, so I suspect it might be a problem with st_connect(). Is it similar to this issue?
#511

Error log:

[2015-11-01 00:14:22.306][error][7187][198601][2] connect to server error. ip=127.0.0.1, port=8085, ret=1018(No such file or directory)
[2015-11-01 00:14:22.306][warn][7187][198601][16] http client failed, server=127.0.0.1, port=8085, timeout=3000000, ret=1018
[2015-11-01 00:14:22.306][warn][7187][198601][16] http connect server failed. ret=1018
[2015-11-01 00:14:22.306][error][7187][198601][16] http post on_publish uri failed.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Nov 18, 2015

Please try SRS2 and see if it also has this issue.

TRANS_BY_GPT3

@winlinvip winlinvip added the Bug It might be a bug. label Nov 18, 2015
@winlinvip winlinvip modified the milestone: srs 1.0 release Nov 18, 2015
@haofz
Copy link
Contributor Author

haofz commented Nov 19, 2015

I really want to mention that SRS2 also has issues and ask Winlin to pay attention to it...
But in reality, this bug is difficult to reproduce. Whether it's SRS1.0 or 2.0, I continuously push 100 streams at the same time for half a day without encountering any problems, but occasionally it occurs in daily use.
Until now, I haven't been able to find a pattern that can reproduce the bug.

TRANS_BY_GPT3

@winlinvip winlinvip added this to the srs 2.0 release milestone Nov 19, 2015
@haofz
Copy link
Contributor Author

haofz commented Nov 27, 2015

  1. After testing for several days, I still have no clue. What I can confirm is that there is no problem with the Cherrypy Python program. Firstly, there is no problem with stressing the Cherrypy program. Secondly, even when I write the callback address of "on_publish" to another test address or an Nginx interface, I often encounter the issue of being unable to connect to "st_connect()".

  2. If I don't exit the session in the "http_callback" and continue retrying, it usually takes about 10 retries to succeed, with a total time of approximately 30 seconds.

  3. I noticed that in SRS 1.0, the value of "SRS_HTTP_CLIENT_SLEEP_US" is 3 seconds, while in SRS 2.0, the value of "SRS_HTTP_CLIENT_TIMEOUT_US" is 30 seconds. So, I changed the value of SRS 1.0 to 30 seconds and tested it. I found that:
    (1) The occurrence probability of the aforementioned "ret=1018" error has decreased significantly.
    (2) However, once the "ret=1018" error occurs, the SRS program immediately restarts... ToT

TRANS_BY_GPT3

@haofz
Copy link
Contributor Author

haofz commented Nov 27, 2015

Just a quick question:

  1. Why does srs2.0 set the time_out to a default of 30s? I remember there was a debate about the 30s parameter in a previous issue, but I can't find it now.
  2. If the HTTP connection is blocked within 30s, it will cause the srs service to restart, but the reason is unknown.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Nov 27, 2015

It seems like there is no argument. If a request cannot be processed within 30 seconds, the client will have to wait for 30 seconds. Which user would wait for 30 seconds continuously?

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Nov 27, 2015

In addition, after a 30-second timeout, the connection is only disconnected. Why does the server restart?

TRANS_BY_GPT3

@haofz
Copy link
Contributor Author

haofz commented Nov 27, 2015

The test is for srs1.0, indeed, if the HTTP connection is blocked for 30 seconds, the service will restart.
I will continue testing srs2.0.
Now the most puzzling thing is why an HTTP request may not return for 30 seconds.
1. Generally, a regular HTTP callback request can return successfully within 20ms.
2. "During these 30 seconds," other newly initiated streaming HTTP requests can be completed normally, and continuous HTTP pressure requests are also completed normally.

TRANS_BY_GPT3

@winlinvip
Copy link
Member

winlinvip commented Jan 11, 2017

This bug may be related to the fly fd, refer to #511.
It is also possible that cherrypy cannot handle it properly. It seems that it can only handle a few requests at a time, and if there are too many, it will cause cherrypy to hang. You can try switching to a backend API in golang.

TRANS_BY_GPT3

@winlinvip winlinvip self-assigned this Sep 24, 2021
@winlinvip winlinvip changed the title http-callback连接本机的8085服务端口,一定机率socket连接失败ret=1018(No such file or directory) 'http-callback' connects to the local 8085 server port, with a certain probability of socket connection failure ret=1018(No such file or directory) Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. TransByAI Translated by AI/GPT.
Projects
None yet
Development

No branches or pull requests

2 participants