Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consume and redirect. #175

Closed
wants to merge 9 commits into from
Closed

Conversation

vqvu
Copy link
Collaborator

@vqvu vqvu commented Nov 29, 2014

Reimplement the stream backend to fix redirect. The old way of doing things was too hard to get right (as evidenced by the number of bugs that we have related to redirect).

This involves:

  • Totally reimplementing stream redirection.
  • Reimplementing consume, pull, fork, and resume.
  • debounce, throttle, and latest were reimplemented using the public interface instead of private methods.

Possible backward incompatibility concerns;

  1. You can no longer call fork on a stream that has been consumed.
  2. The stream no longer pauses if next isn't synchronously called in consume, since this is no longer necessary to signal (lack of) demand to the source.
  3. The callback for the stream generator and consume is guaranteed not to be called a second time until next is called. Before, you just needed to resume the stream.

Only (1) is even a potential problem I think. Strictly speaking, the API never said you were allowed to fork after consume, especially since you were never allowed to consume after fork.

Fixes:

Comments welcome.

@vqvu
Copy link
Collaborator Author

vqvu commented Nov 29, 2014

Hmm...looks like some bugs creeped in when I was cleaning up. Ignore this for now.

@vqvu vqvu force-pushed the fix-consume-and-redirect branch 2 times, most recently from 6297c77 to d1c194b Compare November 29, 2014 05:14
@vqvu
Copy link
Collaborator Author

vqvu commented Nov 29, 2014

Ok, fixed. I ran into a weird issue with the merge - handle backpressure test failing on the browser but not in node. Had to re implement merge in a different way to fix it.

@apaleslimghost
Copy link
Collaborator

Wow.

This involves:
  - Totally reimplementing stream redirection.
  - Reimplementing `consume`, `pull`, `fork`, and `resume`.

Backward incompatible;
  - The stream no longer pauses if `next` isn't synchronously called
    in
    `consume`, since this is no longer necessary to signal (lack of)
    demand to the source.
  - Lots of methods and properties that start with `_` changed
    (these
    were private anyway).
  - The callback for the stream generator and `consume` is
    guaranteed
    not to be called a second time until `next` is called.

Fixes:
  - caolan#41
  - caolan#141
  - caolan#142
  - caolan#173
@jeromew
Copy link
Collaborator

jeromew commented Nov 29, 2014

Cool ! it's nice to see that you managed to fix all those redirect & pulling bugs.

Now this is the kind of commit that everybody fears in a project since it changes so many things. At least the effort that was put in tests make it possible to envision such a change. Maybe we could try and publish one last version based on the current engine before merging this and then start a new major version. I wonder how @caolan would see this go forward.

Regarding merge I have yet to push a new version of #133 . Basically the new version I have locally closely looks like your new version except that it deals with late generation of source.

@jeromew jeromew mentioned this pull request Dec 5, 2014
@vqvu
Copy link
Collaborator Author

vqvu commented Dec 5, 2014

Benefits/risk statement requested by @jeromew in #176.

Benefits:

  • Cleaner implementation of stream redirects. It doesn't require any method overriding or special case handling in methods.
  • Cleaner implementation of consume. No custom overriding of core methods (i.e., _send) required. Also reuses the implementation for the push and next callbacks from the stream generator logic instead of having to roll our own.
  • Backpressure is implicitly communicated upstream via pull and explicitly requested via a check for the paused field in the upstream's resume. This saves a lot of the pause/resume dance between upstream and downstream that was happening before.
  • Simplifies buffering. Only one _outgoing queue required instead of both incoming and outgoing queues, one of which is only used by consume.

The obvious risk is that the change breaks documented but untested features or that it blows up spectacularly in certain corner cases that aren't being tested for. There's also minor compatibility breaking that would likely require a major version.

@caolan
Copy link
Owner

caolan commented Dec 8, 2014

There's a lot to review here, but here's what I've noticed so far:

  • It's a lot slower on sync data processing, especailly large arrays (this might not be important but makes me worry a little about long running processes)
  • I'm concerned there may be a memory leak when there is a recursive redirect, it's hard to test though because...
  • I get a stack overflow after many (where presumably many = stack size) next() calls

That said, the code looks nicer. I'm also happy to sacrifice some speed for better predictability / fewer bugs / cleaner code. Are there any easy wins in performance you can find, and do you think you can address the call stack problem? There's a very naive and not particularly important benchmark in the bench directory you can run by doing node bench/bench.js, that tests the speed of synchronous data processing - I also suggest you try running the following code for the recursive next() call stack problem:

var _ = require('highland');

var recursiveCounter = function (n) {
    return _(function (push, next) {
        if (n === 0) {
            next(_.nil);
        }
        else {
            setImmediate(function () {
                push(null, n);
                // log memory usage for every 1k redirects
                if (n % 1000 === 0) {
                    console.log(process.memoryUsage());
                }
                next(recursiveCounter(n - 1));
            });
        }
    });
};

// initial memory usage
console.log(process.memoryUsage());

recursiveCounter(1000000).each(function (x) {
    // do nothing
});

Once the stack issue is resolved and we can run this over a long period of time and check the memory usage then I'd be happy with this change. Preferably there would be some improvements in the sync processing speeds too.

@caolan
Copy link
Owner

caolan commented Dec 8, 2014

Just so you know, the original code had a bit of a hack for nested redirects, where it would trim unused intermediaries from the chain. So I have a stream A, which calls next(B), which calls next(C) - instead of doing A->B->C when binding to A it actually trims B out and does A->C.

@vqvu
Copy link
Collaborator Author

vqvu commented Dec 8, 2014

The stack overflow is not supposed to happen, so it is certainly fixable.

Not sure about performance, but I'll look at it.

How are you intending to run it over large periods of time? Do you have some existing code for this?

@vqvu
Copy link
Collaborator Author

vqvu commented Dec 9, 2014

I cut sync data processing latency by about 54%. It's not back to where we were, but better.

Turns out

var fn = s.foo.bind(s);

is much slower than

var fn = function () { s.foo(); }

for critical path functions.

@vqvu
Copy link
Collaborator Author

vqvu commented Dec 9, 2014

Ok, ArrayStream case is at roughly perf. parity with underscore. The sync GeneratorStream case is also better.

I know why it stack overflows. It's because right now, redirection looks a lot like consume, and we can't chain infinite consumes. It does this to preserve the redirection chain, so that things like callbacks registered with on before the redirect still works after.

Is there a good reason to support infinite redirects in this way, considering we don't support infinite consume? Do you have a use-case in mind?

I think it's possible to fix; I'm just wondering if it's worth the effort...

@jeromew
Copy link
Collaborator

jeromew commented Dec 10, 2014

@vqvu congrats on the speed improvement ; regarding the stack usage, the main issue I see is that you can't presume of how people are going to use the library and you don't want it to break unexpectedly on people.

Of course it's more easier said than done. I would like highland to be particularly resistant to huge number of values and consume. the way I see it, consumes could event be considered as streams with a way to re-organize the pipeline (inject transforms at a specific location, ..)

In this sense, consume should probably not be implemented as a call stack descent. I must admit that I have not made my homework on the core engine of highland - you probably are the one the most knowledgeable on it with @caolan - so I have no idea how difficult (or even possible) it could be to have a flattened implementation that has a minimal call stack impact whatever the number of transforms.

@vqvu
Copy link
Collaborator Author

vqvu commented Dec 10, 2014

Agreed that it would be nice for consume to not be implemented recursively. Of course, in principle, it is possible, as recursion can always be transformed into iteration, but right now I don't see an obvious way to do it without inserting setImmediate everywhere...I'll have to think about this some more.

@jeromew
Copy link
Collaborator

jeromew commented Dec 11, 2014

maybe you could 'squash' your commits on this and commit them on a '3.0.0' branch ?
we will probably still need some time before this gets out and it will be easier to not freeze the master branch if we try and keep 3.0.0 on par with master. Maybe others will be able to contribute on the new engine before it gets @caolan's approval.

From a very rough estimate, I feel like this will need 2 or 3 rountrips of review with @caolan before it is ready to be merged.

Maybe it is a good occasion to improve the benchmarks and to create degenerate tests that hit the engine where it hurts.

@vqvu
Copy link
Collaborator Author

vqvu commented Dec 11, 2014

Do you mean create a new branch on caolan/highland and put these commits
there? If so, I can do that.

On Thu, Dec 11, 2014 at 2:27 PM, jeromew notifications@github.com wrote:

maybe you could 'squash' your commits on this and commit them on a '2.0.0'
branch ?
we will probably still need some time before this gets out and it will be
easier to not freeze the master branch if we try and keep 2.0.0 on par with
master. Maybe others will be able to contribute on the new engine before it
gets @caolan https://github.com/caolan's approval.

From a very rough estimate, I feel like this will need 2 or 3 rountrips or
review with @caolan https://github.com/caolan before it is ready to be
merged.

Maybe it is a good occasion to improve the benchmarks and to create
degenerate tests that hit the engine where it hurts.


Reply to this email directly or view it on GitHub
#175 (comment).

@jeromew
Copy link
Collaborator

jeromew commented Dec 12, 2014

@vqvu yes that is what i meant. It will be easier to collaborate on this if a branch is created.

@vqvu vqvu mentioned this pull request Dec 13, 2014
34 tasks
@vqvu
Copy link
Collaborator Author

vqvu commented Dec 13, 2014

I pushed a 3.0.0 branch. Not all commits were squashed, cause I wanted to keep around the development history. A single big commit is not useful for me.

Closing this PR in favor of #179, since it's no longer needed.

@vqvu vqvu closed this Dec 13, 2014
@vqvu vqvu deleted the fix-consume-and-redirect branch November 18, 2015 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants