-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local HTTP listener causing startup issues #1273
Comments
This is the port we are opening. We should be properly shutting it down, so it is concerning you are seeing this cause stalling on restarting your application. That being said, I find it interesting that you would even have this code starting up. We only try to turn it on for non-.NET apps, as it is only used in our out-of-proc language SDKs (i.e. JavaScript and soon Python). To mitigate in the meantime, can you make sure your application has |
@ConnorMcMahon thanks for the swift reply! Yes it appears that I don't have Btw is this configuration mandatory? I guess I'm confused because it doesn't feel mandatory, but it kinda actually is? |
I know it is something we flag as a warning in our internal tooling for investigating customer issues. That being said, I'm not sure if it is explicitly required, because we are often able to infer based on the code we end up seeing. In general, I think our tooling now sets this automatically when you select your language, but I may be incorrect about that. Assuming your function application has been around for a long time, it could be from before we started setting that in the default app creation in our tools. |
It just occurred to me that this listener behavior could be very problematic for apps that use slots because of how multiple instances may be running simultaneously on the same VM. We probably need to prioritize making the listener port selection dynamic like @anthonychu suggested to avoid these kinds of problems. |
@cgillum @ConnorMcMahon We had this again last night, my fault for not getting the config update rolled out to prod in time. However this time, we didn't get any traces from the host trying to start-up. Just exceptions. This Function App was therefor not consuming messages building up on Service Bus for the last 9 hours. It seems weird that we've had 2 instances of this failure in the last few days. Has something changed in the underlying host environment that would make this issue more likely to happen? |
This is being addressed in #1307. |
@cgillum @ConnorMcMahon Just got the error: Failed to bind to address http://127.0.0.1:17071: address already in use. Only one usage of each socket address (protocol/network address/port) is normally permitted. Only one usage of each socket address (protocol/network address/port) is normally permitted. I am using NodeJS for my functions code. |
Hmm, that is very curious. Do you have an application name you can share with us publicly (or privately). Also, a rough timestamp in UTC of when this error occured would be helpful. |
I am having this issue as well. A node durable function app that has been running for months suddenly went dead in the water and cannot start due to port in use error |
@CastleArg does it stay in that state or does it recover after a minute or so? |
nope it can't start at all. I even deleted and redeployed the app with no result. We have a prod environment set up in an identical way and that is still running funnily enough. 2020-05-28T08:21:29.406 [Error] A host error has occurred during startup operation '629af924-5cfd-4642-9647-847b8124ac55'. |
Thanks for the info. Which version of the Durable extension are you using? The latest version is supposed to select a different port number based on availability. In any case, we added a kill switch to this feature just in case it caused problems in unexpected scenarios. You can disable it by setting the Here is an example: {
"version": "2.0",
"extensions": {
"durableTask": {
"localRpcEndpointEnabled": false
}
}
} Try that and let us know if it resolves the issue. The side effect of this is to revert the durableClient to the old behavior of invoking the external-facing management APIs instead of using the internal ones on the local machine. In most cases, you should only notice a slight performance degradation for durableClient API calls. |
Thanks will give this a try. |
{ I am using extension bundle like so. |
@cgillum setting "localRpcEndpointEnabled": false allowed it to start again. Thanks for the quick advice. |
This bug still affects my productive durable functions V3 instances! Earlier today all backend operations died because some other function app on my app service plan randomly snagged away port 17071 causing my production functions instance to die / not to be able to start. I'm seeing this behavior on my azure since Mai 28 |
@kepikoi, did you try disabling the feature in your application like recommended above? The problem for apps using Node is that you are using extension bundles, which operate on v1.x of the extension. This feature was introduced in v1.8.5 of the extension, which likely rolled out in extension bundles recently. The fix for this is currently only on v2 of the extension. You can manually install v2 of the extension by not using extension bundles and installing via the CLI, or when version 2 of the extension bundles rolls out in the near future, you can update to that. I am reopening the issue until we port this fix into v1 of the extension and include it in the extension bundle release. |
@ConnorMcMahon I am up and running again using
Good to know that manual extension install fixes it. Looks like I need to move away from extension bundles to regain control over my environments |
@ConnorMcMahon @cgillum Since getting an updated bundles release out has a bit of lead time, wondering if we can prioritize backporting this to v1 so we can get it out ASAP. |
The backport is merged and I am hoping to release 1.8.6 today, and update the extension bundles repo so it will go out on the next train. |
Is there any update on this, when we can expect it? |
Just wanted to comment that I experienced this issue tonight. Durable Function running on Node. Added the Wondering if the backport to v1 was released at some point or not. Second question is ancillary and wondering if Node.js Durable Functions will be able to run on the 2.x extension at some point. |
The backport should still definitely be running on v1 of bundles at this point. @jawa-the-hutt, do you have an app name and timestamp? As for running v2 of the extension, you should be able to do that easily now with extension bundles v2. V2 of the extension bundles uses v2.x of the extension. |
App Name: mea-browserless-fa-testsuite The issue first came up when we auto-deployed code to the Here's a list of the approximate GMT times based on the graph below. I think these are the times, but not 100% sure sure. Regardless, the overall timeframe between 12:27AM GMT and 4:30AM GMT is when I was working last night and was experiencing issues.
|
Sincere apologies for the delay here. For some reason your application was still on 1.8.5, instead of 1.8.6, which has the fix for this issue. There was some internal issue with extension bundles regressing this version of the extension during the fall, but this should have been fixed by January when you saw this. Unfortunately I don't have enough telemetry to identify why this happened. I noticed you switched to extension bundles v2 at some point, and that is our recommendation for all customers at this point, as it gets fixes/features much quicker. |
@ConnorMcMahon Coming back to this as we are still getting this error with the v2 bundle as far as I can tell. Just ran some things through the Function App at around Would also mention that due to some other issues, we redeployed a couple of weeks ago on a different app name: also, not sure if it's related but as things start to scale down, we get something like this each time: |
I'm taking a look, and it looks like there is still a somewhat sizeable gap between where we determine what ports are available and when we start listening to those ports. That means that the fix in v2.2.1 only reduced the surface area of this issue, and did not completely reduce the possibility. In v2.4.2 we will aim to completely eliminate this gap to completely fix this issue. In the meantime, if you are experiencing this issue, the only solution is unfortunately to either stagger the startup of your slots, or to disable localRpc. |
This will be released in v2.4.2 of the extension this week. It will take some time to deploy to extension bundles. The fix for v1 (and v1 extension bundles) is tracked at #1723. |
Tracking this information as an issue here from @olitomlinson. Quotes below from him:
I've got an open support case (120031225000535) because my DF app failed to start up for approx 30 minutes and then eventually corrected itself without intervention.
The error message during startup was :
Failed to bind to address http://127.0.0.1:17071: address already in use. Only one usage of each socket address (protocol/network address/port) is normally permitted
My non-scientific googling has brought me to this GitHub issue. I don't really know if this work item impacts my particular issue, it probably doesn't, but are you in a position to shed any light? No worries if not, ill just keep following up with support to get to the root cause. Thanks in advance!
The text was updated successfully, but these errors were encountered: