Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async methods cause event hanging in express #325

Closed
paddyb opened this issue Oct 8, 2015 · 42 comments
Closed

async methods cause event hanging in express #325

paddyb opened this issue Oct 8, 2015 · 42 comments

Comments

@paddyb
Copy link

paddyb commented Oct 8, 2015

Hi
Since recently upgrading from node 0.10 & edge 0.9.3 to 4.0.0 & 4.0.0 I've noticed regular event loop blocking/hanging when responding from async methods. The issue only seems to manifest from inside an express route handler.
I have a precompiled assembly with a single function getProxy() to return a proxy object with an async function getData() that tends to hang after the callback is received. resolve(result)is added to the event loop but doesn't immediately execute. It can take several minutes or another request into the application can release it (such as a browser refresh).

var edge = require('edge'),
    q = require('q')

var getProxy = edge.func({
    assemblyFile: __dirname + '\\bin\\debug\\proxy.dll',
    typeName: 'proxy.DataProxy',
    methodName: 'GetProxy'
})

var proxy = getProxy(null,true)
module.exports.proxy={
    getData: function () {
        return q.Promise(function (resolve, reject) {
            proxy.getData({},
                function (error, result) {
                    if (error)
                        reject(error)
                    else
                        resolve(result)
                })
        })
    }
}

The clr setup in proxy.dll is pretty straightforward;

public class DataProxy
{
    private readonly IEnumerable<string> data = new[] { "s1", "s2", "s3" };
    public async Task<object> GetProxy(object args)
    {
        return new { getData = (Func<object, Task<object>>)GetData };
    }

    public async Task<object> GetData(object args)
    {
        //return data; // returning directly here is fine. the issue is only under async calls
        return await Task.Run(()=> data);
    }
}

Any help would be greatly appreciated.
Thanks
A full example is @ https://github.com/paddyb/edge_case

@webarchymeta
Copy link

I had notice it in march this year (see #259) and it seems that it had not been solved yet ...

@dsandor
Copy link

dsandor commented Dec 11, 2015

+1 to this issue but I am using restify.

I found a workaround that is really quite wierd. If I console.log('.'); before I resolve(..); the calls do not hang anymore.

I noticed this with the Edge 4.0.0, NodeJS 4.2.3, Restify 4.0.0.

@24HrDeveloper
Copy link

+1 to this issue..worked with dsandor on the issue he experienced.

@Jeff-Lewis
Copy link

We have this issue as well with Node 4+. Works fine with node 0.10.x.
@dsandor What promise lib are you using? We are wrapping our edge C# call in Q.nbind() which never gets then() or catch().

@dsandor
Copy link

dsandor commented Dec 11, 2015

@Jeff-Lewis I am using ES6 Promises.

@paddyb
Copy link
Author

paddyb commented Dec 11, 2015

I'm using kriskowal/q for promises. Saw the same results with console.log(). As far as I could tell log unblocks underlying streams in node (blocked by express). I was lucky enough to be able to remove the async calls at the clr side.

@rabbers
Copy link

rabbers commented May 27, 2016

I'm also having this problem. I had express in my project, but because it's a test I commented it out. Removal of Express and the listener allows promises to complete properly. Is there another way to unblock the event loop (other than logging) either in node, edge or express? Flush or something?

@rabbers
Copy link

rabbers commented May 27, 2016

It looks like it's not Express - I can create a web server using http.createServer, still no problem, but on creating a listener, Promises triggered in an edge callback don't flow properly.

@rabbers
Copy link

rabbers commented May 27, 2016

console.log is a good workaround, but I wanted something somewhat silent. I notice that console internally has _stdout. If you execute console._stdout.write(''); you get a silent unlock. in Typescript you have to do (console as any)._stdout.write('');. Being silent, perhaps the edge crew could pop one of these in before calling back on the proxy as a stopgap solution.

@Jeff-Lewis
Copy link

@rabbers are you able to determine the minimum amount of code to reproduce the hang? Is it just creating a http listener and using promises around the edge calls? Having a easy repro of the issue would be helpful for @tjanczuk.

Our next release (which upgrades our project from node v0.10.x to v4.4) is being held up by this and I would love to make a small repro of this for @tjanczuk.

@Jeff-Lewis
Copy link

adding reference #402

@rabbers
Copy link

rabbers commented May 27, 2016

@Jeff-Lewis I don't currently have a stripped down model, but some visibility on what I've been doing is on my StackOverflow question:

http://stackoverflow.com/questions/37471705/promise-not-completing-when-using-edge-with-node

@tjanczuk can get an idea of the type of thing that I'm doing. The C# bit could be hollowed down to sleeping and then returning a result.

It's pretty much as found in examples on various web sites - nothing clever or special - I'll see what I can do.

@Jeff-Lewis
Copy link

@rabbers Thanks for the info. I'll take a look.

@paddyb
Copy link
Author

paddyb commented May 27, 2016

@Jeff-Lewis take a look at https://github.com/paddyb/edge_case
source from the original issue

@Jeff-Lewis
Copy link

@paddyb awesome. I take a look at your repro later today.

For what it's worth, just now I just replaced Q.nbind with a bluebird new Promise() wrapper around our edge function call and it did NOT block and completed successfully...

@rabbers
Copy link

rabbers commented May 27, 2016

Whilst trying to use edge, I noticed a few things, which are now clearer whilst getting this demo running.

  1. The suggested c# pattern from edge is async (input) => {}.
  2. The given pattern actually marshalls back into node synchronously, I've checked with a long sleep in a thread, and response passes back to node at around the same time that the instruction following the caller is reached.
  3. The actual pattern to return an unfinished async process to node is to not have an async method (as in my demo below).
  4. This means that async operation via edge is acheived via Task handle return rather than async task promise return.
  5. The async mode (which is not actually asynchronous) doesn't break the js event queue, but the task handle method does.
  6. I think that async Task promise is detected, but the result is marshalled straight away, which causes blocking. Conversely, I think that the non-async task version (although working better) is unexpected and somehow causes the JS event faults.

NB: This is all guesswork, but might strike a chord somewhere along the line.

Here's a single piece demo: (It's in Typescript)

/// <reference path="d.ts/node/node.d.ts" />
import * as http from "http";

//--End of References--------------------------------------------------

const edge = require("edge");

const hostname: string = '127.0.0.1';
const port: number = 3000;

const server : http.Server = http.createServer();

server.listen(port, hostname, function () {
    console.log("HTTP Server Listening on port: " + port);
});

function eek(args: Object) : Promise<string> {
    var proxy = edge.func(function() {
        /*
        (input) =>
        {
            return System.Threading.Tasks.Task.Run<object>(
                () => {
                    System.Threading.Thread.Sleep(10000);
                    return System.Threading.Tasks.Task.FromResult<object>("Hurrah!");
                }
            );
        }  
        */
    });
    return new Promise<string>(function (resolve, reject) {
        proxy(args, function(error, result) {
            //(console as any)._stdout.write('');
            //console.log("GLReportProxy.openDI proxy returns, Err = " + (!!error));
            if (error) {
                reject(error);
            } else {
                resolve(result);
            }
        });
    });
}

var p$Problem = eek({});

p$Problem.then(function(message: string){
    console.log("If we get here then it's all good!");
    console.log("message = " + message);

    return true;
});

@ljani
Copy link

ljani commented May 27, 2016

@Jeff-Lewis here's two more (afaik they're all the same bug). The first one is really short:
#381 (comment)
#373 (comment)

EDIT: see also this poor workaround if it helps.

@webarchymeta
Copy link

webarchymeta commented May 27, 2016

We had this problem long times ago. Our solution is not to use true asyc method on the C# side of the system because the mid-layers can not handle it correctly at present. For the current topic, instead of

public class DataProxy
{
    private readonly IEnumerable<string> data = new[] { "s1", "s2", "s3" };
    public async Task<object> GetProxy(object args)
    {
        return new { getData = (Func<object, Task<object>>)GetData };
    }

    public async Task<object> GetData(object args)
    {
        //return data; // returning directly here is fine. the issue is only under async calls
        return await Task.Run(()=> data);
    }
}

The asyc, await method will enter in one thread and leave on another one. To "avoid" that, one can use a less ideal "solution"

public class DataProxy
{
    private readonly IEnumerable<string> data = new[] { "s1", "s2", "s3" };
    public async Task<object> GetProxy(object args)
    {
        return new { getData = (Func<object, Task<object>>)GetData };
    }

    public Task<object> GetData(object args)
    {
        //return data; // returning directly here is fine. the issue is only under async calls
        return Task.FromResult<object>(data);
    }
}

In addition, if there are callbacks from C# to node, the calls are better to be initiated in the thread that setup the callback. In our system before using the same thread principle, when the callback is infrequent, everything seems to be fine, but if the callbacks are very frequent, the system could eventually stops responding after a few consecutive callbacks.

I belief like Windows event loop, libuv or other part of edge do not like multiple threads other than their own. @tjanczuk Maybe adding a custom SynchronizationContext (like WindowsFormsSynchronizationContext on Windows) on the edge layer to manage the entry and exist of true async, await calls could eventually solve the problem?

@nhz-io
Copy link

nhz-io commented May 29, 2016

This Produces nothing:

var nextTick = Edge.Func(
    @"return function (d, cb) {
        process.nextTick(function() {
            console.log('Next Tick');
            cb(null, null);
        })
    }"
);
nextTick(null);

This produces expected "Set immediate"

var setImmediate = Edge.Func(
    @"return function (d, cb) {
        setImmediate(function() {
            console.log('Set immediate');
            cb(null, null);
        })
    }"
);
setImmediate(null);

To make Promises work

npm i promise
const Promise = require('promise/setimmediate')
...

@brettsam
Copy link

We bumped into this in our repo and did some investigation. Hopefully this helps debug the issue.

I found the following scenario:

  • Script1 calls process.nextTick and gets stuck
  • While Script1 is stuck, Script2 is called that performs any (I think?) async operation and Script1 then becomes unstuck. I verified it unsticks with a few async methods -- fs.appendFile(), fs.stat(), setTimeout(), dns.lookup()

Here's a little console app that shows what I mean -- https://gist.github.com/brettsam/42221a70ed2d97a43fc33dada8f5b2e8

@webarchymeta
Copy link

webarchymeta commented Jul 12, 2016

@brettsam we notice that behavior a long times ago (after node upgraded to 0.12.x). It seems that somewhere inside one of the mid-wares, the last result does not get flushed if the calling entering (.NET) thread and the exit one are different. I raised an issue at the libuv project trying to find if the problem is in libuv due to "lazy flushing", but no definite answer was received ... It was not pursuit further due to time constraints.

@brettsam
Copy link

Thanks @webarchymeta -- linking to the issue you mentioned for historical purposes: libuv/libuv#533.

Has anyone found a reliable way to use process.nextTick from Edge?

@dpolivy
Copy link
Contributor

dpolivy commented Jul 19, 2016

Chiming in here as I'm in the process of upgrading from node v0.10.x and edge 0.9.3 to node v4 and edge v5 -- and I think I'm now running into this issue. We use edge to access our SQL database from .NET, and extensively use async/await so refactoring the .NET side isn't really an option.

I don't have much to add to the debugging of others' in the thread, but I'm certainly motivated to find a reliable fix so we can move this into production in the near term. Has anyone figured out a reliable fix, or is anyone actively working on this?

@dpolivy
Copy link
Contributor

dpolivy commented Jul 19, 2016

BTW, as I've been reading through the Node API changes between v0.10 and v4 guide, I found this little tidbit; any change it's related to the issue we're seeing?

  • process.maxTickDepth has been removed, allowing process.nextTick() to starve I/O indefinitely.
    • This is due to adding setImmediate() in 0.10.
    • It is suggested you use setImmediate() over process.nextTick(). setImmediate() likely does what you are hoping for (a more efficient setTimeout(..., 0)), and runs after this tick's I/O. process.nextTick() does not actually run in the "next" tick anymore and will block I/O as if it were a synchronous operation.
    • Refs: 0761c90, 9a6c085

@dpolivy
Copy link
Contributor

dpolivy commented Apr 26, 2017

I think I have a positive update to report: I've figured out what's happening and may have a fix! It appears to be related to how the JS callback is called from C++; in the current code, it is using v8::Function::Call directly. Node has their own node::MakeCallback function which is supposed to be used for async callbacks instead. Apparently, the impact of not doing so leads to strange errors such as the next tick never happening. (See nodejs/nan#284 (comment).) That's exactly the scenario that we seem to be seeing. A quick update of the code to use Nan::Callback to call back into JS appears to fix it, but I need to do a bit more testing on this to be sure there are no other side effects.

Is anyone on this thread still actively interested in a fix for this issue, and have a scenario that we can try out the fix on? I have the repros others have posted, but I'm looking for other real-world scenarios.

@webarchymeta
Copy link

It would be nice to have a fix. We use edge to call into our RDB midwares, which is written in C#, to connect to relational databases in our nodejs services. Here is an repo I used to demonstrate this very issue a long while ago (#259). (I didn't write it for just the demonstration purposes :-), it's actually generated using a tool of us )

@agracio
Copy link
Collaborator

agracio commented Apr 26, 2017

@dpolivy could you create a branch or a PR with this fix? I could do some more testing on some of our internal projects.

@dpolivy
Copy link
Contributor

dpolivy commented Apr 29, 2017

@webarchymeta @agracio I pushed up my fixes to a branch here: https://github.com/dpolivy/edge/tree/fix-async-issues

Do you want to continue the discussion in this thread, or open up a new one?

I was able to test the JS -> .NET path using the https://github.com/paddyb/edge_case repo. I was also able to test the .NET -> JS path using this https://gist.github.com/brettsam/42221a70ed2d97a43fc33dada8f5b2e8 (comment out the Unstick() method call, and it should work with these fixes).

I don't have .NET Core/Mono installed and wasn't able to test any of the alternate codepaths, but from looking at the code it seems they are likely susceptible to the same problem, and a similar fix can be applied.

There is one known issue that I've found: with this change, when calling .NET -> JS, exceptions thrown in JS are not properly marshalled back to .NET. Instead, they propagate in JS land as uncaught errors. This is a regression in behavior and is caught by the automated tests. It appears to be fixed in node v6.4.0, though. I found one possible (& ugly) workaround, but not sure how it will impact performance. Perhaps it's OK to just accept this and document the change well.

I haven't yet done any performance or stress testing to see if this change holds up over time.

I also ran into issues testing the core CLR bits. I have v1.0.3 installed, but perhaps that's not compatible with the current state of things.

@agracio
Copy link
Collaborator

agracio commented Apr 29, 2017

For .Net Core CLR issue there is a separate discussion here: #544, so far we've made very limited progress on the issue.

I am testing your code, however not seeing any regression in tests for .NET -> JS issue.
UPDATE:
There is an issue with the .nextTick() test on both mono and .NET Core : https://travis-ci.org/agracio/edge/builds/227092106

@dpolivy
Copy link
Contributor

dpolivy commented Apr 29, 2017

@agracio OK, let's keep the .NET Core issues separate.

When testing, what version of node are you using? I've found the default test script seemed to use the wrong version of the native edge module, so I had to force it myself here. It works on 6.4.0 but fails on lower versions. This is the test after the process.nextTick() test on the net2node.js tests.

For Mono and .NET Core, looks like we'll need to apply the same fixes to those builds. I can take a look at that later.

@agracio
Copy link
Collaborator

agracio commented Apr 29, 2017

Actually Mono and .Net Core version tests started failing on all NodeJs versions after I merged fixes from your branch as per Travis build https://travis-ci.org/agracio/edge/builds/227092106.
I see no failures on my Windows PC, but I will try to add AppVeyor build to test all NodeJs versions on Windows

@agracio
Copy link
Collaborator

agracio commented Apr 30, 2017

You are right, and I have replicated this behaviour using AppVeyor tests: https://ci.appveyor.com/project/agracio/edge. 6.4.0 and 7.8.0 work while anything below fails that particular test.
I will look into it in the next few days.

@agracio
Copy link
Collaborator

agracio commented Apr 30, 2017

Found out that it is/was a known issue in NodeJs: nodejs/nan#284, nodejs/node#2380.
I believe that this is a serious regression issue and needs addressing before it can be released.

@dpolivy
Copy link
Contributor

dpolivy commented May 2, 2017

@agracio It seems the only real fix to preserve the exception behavior is to revert back to using Nan::Call (basically v8::Function::Call) and then use Nan::MakeCallback with a no-op function to kick the event loop. I'm not sure what the performance overhead of this is. It seems like a less than elegant solution, though.

I'm not sure whether one could wrap the JS call with additional JS to catch/serialize exceptions as another approach.

Properly handling exceptions in JS callbacks seems like a better behavior, though it is a change from previous versions. Since this would have a major version bump, can we just document it in release notes?

@dpolivy
Copy link
Contributor

dpolivy commented May 3, 2017

@agracio For kicks, I put together a revised patch that uses Nan::Call and then manually forces the next tick with a no-op function that uses Nan::Callback. Same branch as before -- give it a try. I have only done cursory testing, but it seems to resolve the exception issue.

I can't speak to performance or stress for this yet, but perhaps it will be ok.

@agracio
Copy link
Collaborator

agracio commented May 3, 2017

@dpolivy I think your approach might have a number of problems.
First I wasn't able to handle exceptions in JS calls as the exception ignores try/catch block.
Second is that if this PR was accepted it would cause errors in thousands of production applications for existing users and would essentially lock them out of any future updates to this project.
So perhaps we need a better approach :)
Let me test your changes on the branch in the next couple of days.

@dpolivy
Copy link
Contributor

dpolivy commented May 3, 2017

@agracio Are you referring to my actual changes, or my proposal for requiring developers to properly handle exceptions in their JS callbacks?

The actual changes in the branch should be the same as the current behavior, just with a fix so the next tick doesn't get stuck.

Also, I'm not sure what the usage of this is in production -- from the threads here, it seems like most folks have abandoned it due to issues and lack of ongoing support/updates. I certainly use it, but the impact may be overstated.

If you have time to catch up on slack, that might be helpful.

@agracio
Copy link
Collaborator

agracio commented May 3, 2017

I was referring to proposal to require developers to handle JS exception. I had no luck wrapping JS call in try/catch as the exception ignores the block and results in hard error.
Also while this and other threads suggest that some people have abandoned the efforts of using edge in their projects, there are more than 17k monthly downloads for the package. So there are many active users out there for whom a breaking change would be a huge issue.
I believe a slight performance hit is a much preferable solution.

@therightstuff
Copy link

@dpolivy - whether there are or aren't a lot users out there, i can assure you that edge is being used in important places :)

personally, i don't have a problem with breaking changes if the end result is more correct (and versioning / documentation is up to standard), but if you do go that route is there any way to provide a wrapper of some sort to ease the transition?

@dpolivy
Copy link
Contributor

dpolivy commented May 3, 2017

@agracio @therightstuff Fair enough, thanks for the input. So far, this issue really only seems to impact the "double edge" case (e.g. .NET -> NodeJS). The fix in my branch just adds the extra call on those invocations. Exceptions generated from a JS callback for a JS -> .NET call seem to work fine.

I'm going to continue testing, but it seems like this might be converging on a reasonable fix that doesn't change the existing behavior. I've had trouble building the double edge stuff in order to run the stress testing for it; not sure if either of you can give that a try?

@agracio
Copy link
Collaborator

agracio commented May 5, 2017

@dpolivy tested your new changes in one of our apps that calls C# from NodeJs. Async issue solved!
You think you could apply the same fix to mono as well?

@dpolivy
Copy link
Contributor

dpolivy commented May 7, 2017

Good news, all! I was able to find some time to meet with @tjanczuk and discuss this issue (amongst others), and he took my fix and pushed it across the finish line. edge@7.10.0 is now published to npm and NuGet, and should have a fix for the async issues on all platforms. Node v7 support was also added.

Give it a spin and report back if you see any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests