Fix consume and redirect. #175

vqvu · 2014-11-29T03:55:01Z

Reimplement the stream backend to fix redirect. The old way of doing things was too hard to get right (as evidenced by the number of bugs that we have related to redirect).

This involves:

Totally reimplementing stream redirection.
Reimplementing consume, pull, fork, and resume.
debounce, throttle, and latest were reimplemented using the public interface instead of private methods.

Possible backward incompatibility concerns;

You can no longer call fork on a stream that has been consumed.
The stream no longer pauses if next isn't synchronously called in consume, since this is no longer necessary to signal (lack of) demand to the source.
The callback for the stream generator and consume is guaranteed not to be called a second time until next is called. Before, you just needed to resume the stream.

Only (1) is even a potential problem I think. Strictly speaking, the API never said you were allowed to fork after consume, especially since you were never allowed to consume after fork.

Fixes:

Comments welcome.

vqvu · 2014-11-29T04:14:56Z

Hmm...looks like some bugs creeped in when I was cleaning up. Ignore this for now.

vqvu · 2014-11-29T05:16:48Z

Ok, fixed. I ran into a weird issue with the merge - handle backpressure test failing on the browser but not in node. Had to re implement merge in a different way to fix it.

apaleslimghost · 2014-11-29T09:30:26Z

Wow.

This involves: - Totally reimplementing stream redirection. - Reimplementing `consume`, `pull`, `fork`, and `resume`. Backward incompatible; - The stream no longer pauses if `next` isn't synchronously called in `consume`, since this is no longer necessary to signal (lack of) demand to the source. - Lots of methods and properties that start with `_` changed (these were private anyway). - The callback for the stream generator and `consume` is guaranteed not to be called a second time until `next` is called. Fixes: - caolan#41 - caolan#141 - caolan#142 - caolan#173

jeromew · 2014-11-29T18:17:08Z

Cool ! it's nice to see that you managed to fix all those redirect & pulling bugs.

Now this is the kind of commit that everybody fears in a project since it changes so many things. At least the effort that was put in tests make it possible to envision such a change. Maybe we could try and publish one last version based on the current engine before merging this and then start a new major version. I wonder how @caolan would see this go forward.

Regarding merge I have yet to push a new version of #133 . Basically the new version I have locally closely looks like your new version except that it deals with late generation of source.

vqvu · 2014-12-05T20:06:30Z

Benefits/risk statement requested by @jeromew in #176.

Benefits:

Cleaner implementation of stream redirects. It doesn't require any method overriding or special case handling in methods.
Cleaner implementation of consume. No custom overriding of core methods (i.e., _send) required. Also reuses the implementation for the push and next callbacks from the stream generator logic instead of having to roll our own.
Backpressure is implicitly communicated upstream via pull and explicitly requested via a check for the paused field in the upstream's resume. This saves a lot of the pause/resume dance between upstream and downstream that was happening before.
Simplifies buffering. Only one _outgoing queue required instead of both incoming and outgoing queues, one of which is only used by consume.

The obvious risk is that the change breaks documented but untested features or that it blows up spectacularly in certain corner cases that aren't being tested for. There's also minor compatibility breaking that would likely require a major version.

caolan · 2014-12-08T17:38:23Z

There's a lot to review here, but here's what I've noticed so far:

It's a lot slower on sync data processing, especailly large arrays (this might not be important but makes me worry a little about long running processes)
I'm concerned there may be a memory leak when there is a recursive redirect, it's hard to test though because...
I get a stack overflow after many (where presumably many = stack size) next() calls

That said, the code looks nicer. I'm also happy to sacrifice some speed for better predictability / fewer bugs / cleaner code. Are there any easy wins in performance you can find, and do you think you can address the call stack problem? There's a very naive and not particularly important benchmark in the bench directory you can run by doing node bench/bench.js, that tests the speed of synchronous data processing - I also suggest you try running the following code for the recursive next() call stack problem:

var _ = require('highland');

var recursiveCounter = function (n) {
    return _(function (push, next) {
        if (n === 0) {
            next(_.nil);
        }
        else {
            setImmediate(function () {
                push(null, n);
                // log memory usage for every 1k redirects
                if (n % 1000 === 0) {
                    console.log(process.memoryUsage());
                }
                next(recursiveCounter(n - 1));
            });
        }
    });
};

// initial memory usage
console.log(process.memoryUsage());

recursiveCounter(1000000).each(function (x) {
    // do nothing
});

Once the stack issue is resolved and we can run this over a long period of time and check the memory usage then I'd be happy with this change. Preferably there would be some improvements in the sync processing speeds too.

caolan · 2014-12-08T17:41:40Z

Just so you know, the original code had a bit of a hack for nested redirects, where it would trim unused intermediaries from the chain. So I have a stream A, which calls next(B), which calls next(C) - instead of doing A->B->C when binding to A it actually trims B out and does A->C.

vqvu · 2014-12-08T21:20:18Z

The stack overflow is not supposed to happen, so it is certainly fixable.

Not sure about performance, but I'll look at it.

How are you intending to run it over large periods of time? Do you have some existing code for this?

vqvu · 2014-12-09T05:48:39Z

I cut sync data processing latency by about 54%. It's not back to where we were, but better.

Turns out

var fn = s.foo.bind(s);

is much slower than

var fn = function () { s.foo(); }

for critical path functions.

vqvu · 2014-12-09T07:09:51Z

Ok, ArrayStream case is at roughly perf. parity with underscore. The sync GeneratorStream case is also better.

I know why it stack overflows. It's because right now, redirection looks a lot like consume, and we can't chain infinite consumes. It does this to preserve the redirection chain, so that things like callbacks registered with on before the redirect still works after.

Is there a good reason to support infinite redirects in this way, considering we don't support infinite consume? Do you have a use-case in mind?

I think it's possible to fix; I'm just wondering if it's worth the effort...

jeromew · 2014-12-10T08:32:16Z

@vqvu congrats on the speed improvement ; regarding the stack usage, the main issue I see is that you can't presume of how people are going to use the library and you don't want it to break unexpectedly on people.

Of course it's more easier said than done. I would like highland to be particularly resistant to huge number of values and consume. the way I see it, consumes could event be considered as streams with a way to re-organize the pipeline (inject transforms at a specific location, ..)

In this sense, consume should probably not be implemented as a call stack descent. I must admit that I have not made my homework on the core engine of highland - you probably are the one the most knowledgeable on it with @caolan - so I have no idea how difficult (or even possible) it could be to have a flattened implementation that has a minimal call stack impact whatever the number of transforms.

vqvu · 2014-12-10T21:46:37Z

Agreed that it would be nice for consume to not be implemented recursively. Of course, in principle, it is possible, as recursion can always be transformed into iteration, but right now I don't see an obvious way to do it without inserting setImmediate everywhere...I'll have to think about this some more.

jeromew · 2014-12-11T21:27:45Z

maybe you could 'squash' your commits on this and commit them on a '3.0.0' branch ?
we will probably still need some time before this gets out and it will be easier to not freeze the master branch if we try and keep 3.0.0 on par with master. Maybe others will be able to contribute on the new engine before it gets @caolan's approval.

From a very rough estimate, I feel like this will need 2 or 3 rountrips of review with @caolan before it is ready to be merged.

Maybe it is a good occasion to improve the benchmarks and to create degenerate tests that hit the engine where it hurts.

vqvu · 2014-12-11T22:18:29Z

Do you mean create a new branch on caolan/highland and put these commits
there? If so, I can do that.

On Thu, Dec 11, 2014 at 2:27 PM, jeromew notifications@github.com wrote:

maybe you could 'squash' your commits on this and commit them on a '2.0.0'
branch ?
we will probably still need some time before this gets out and it will be
easier to not freeze the master branch if we try and keep 2.0.0 on par with
master. Maybe others will be able to contribute on the new engine before it
gets @caolan https://github.com/caolan's approval.

From a very rough estimate, I feel like this will need 2 or 3 rountrips or
review with @caolan https://github.com/caolan before it is ready to be
merged.

Maybe it is a good occasion to improve the benchmarks and to create
degenerate tests that hit the engine where it hurts.

—
Reply to this email directly or view it on GitHub
#175 (comment).

jeromew · 2014-12-12T13:21:31Z

@vqvu yes that is what i meant. It will be easier to collaborate on this if a branch is created.

vqvu · 2014-12-13T03:16:43Z

I pushed a 3.0.0 branch. Not all commits were squashed, cause I wanted to keep around the development history. A single big commit is not useful for me.

Closing this PR in favor of #179, since it's no longer needed.

vqvu force-pushed the fix-consume-and-redirect branch from 59c8567 to a486516 Compare November 29, 2014 04:04

vqvu force-pushed the fix-consume-and-redirect branch 2 times, most recently from 6297c77 to d1c194b Compare November 29, 2014 05:14

vqvu added 3 commits November 29, 2014 10:36

Add failing tests from github issues.

49fc477

Lintify.

16c2145

vqvu force-pushed the fix-consume-and-redirect branch from d1c194b to 16c2145 Compare November 29, 2014 17:42

jeromew mentioned this pull request Dec 5, 2014

bump & way forward #176

Open

Stop using bind. It's super slow.

fc892a1

vqvu added 4 commits December 8, 2014 23:52

Speedify ArrayStream construction for large arrays.

f928392

Fix a bug where fork may hang after deregistering a stream.

8af8326

Speedify the sync GeneratorStream case too.

0eac284

Minor cleanup.

88bb45a

jshintify.

645599e

vqvu mentioned this pull request Dec 13, 2014

Highland v3.0.0 #179

Open

34 tasks

vqvu closed this Dec 13, 2014

LewisJEllis mentioned this pull request Jan 8, 2015

Impact Of Arguments Arraylike Object On Performance #193

Closed

vqvu deleted the fix-consume-and-redirect branch November 18, 2015 05:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix consume and redirect. #175

Fix consume and redirect. #175

vqvu commented Nov 29, 2014

vqvu commented Nov 29, 2014

vqvu commented Nov 29, 2014

apaleslimghost commented Nov 29, 2014

jeromew commented Nov 29, 2014

vqvu commented Dec 5, 2014

caolan commented Dec 8, 2014

caolan commented Dec 8, 2014

vqvu commented Dec 8, 2014

vqvu commented Dec 9, 2014

vqvu commented Dec 9, 2014

jeromew commented Dec 10, 2014

vqvu commented Dec 10, 2014

jeromew commented Dec 11, 2014

vqvu commented Dec 11, 2014

jeromew commented Dec 12, 2014

vqvu commented Dec 13, 2014

Fix consume and redirect. #175

Fix consume and redirect. #175

Conversation

vqvu commented Nov 29, 2014

vqvu commented Nov 29, 2014

vqvu commented Nov 29, 2014

apaleslimghost commented Nov 29, 2014

jeromew commented Nov 29, 2014

vqvu commented Dec 5, 2014

caolan commented Dec 8, 2014

caolan commented Dec 8, 2014

vqvu commented Dec 8, 2014

vqvu commented Dec 9, 2014

vqvu commented Dec 9, 2014

jeromew commented Dec 10, 2014

vqvu commented Dec 10, 2014

jeromew commented Dec 11, 2014

vqvu commented Dec 11, 2014

jeromew commented Dec 12, 2014

vqvu commented Dec 13, 2014