Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highland v3.0.0 #179

Open
22 of 34 tasks
vqvu opened this issue Dec 13, 2014 · 67 comments
Open
22 of 34 tasks

Highland v3.0.0 #179

vqvu opened this issue Dec 13, 2014 · 67 comments
Labels
Milestone

Comments

@vqvu
Copy link
Collaborator

vqvu commented Dec 13, 2014

This issue tracks any changes we want for v3.0.0. See the 3.0.0 branch.

The biggest change for now is a reimplementation of the Highland engine. See the original PR at #175.

Breaking changes

Before release

Nice to have but can be delayed until after release

  • Port over sync optimizations.
  • Port over iterative consume and redirect.
  • Check for memory leaks.
  • Create some benchmarks.
@vqvu
Copy link
Collaborator Author

vqvu commented Dec 13, 2014

Ref #176.

@jeromew
Copy link
Collaborator

jeromew commented Dec 20, 2014

@vqvu thanks for creating the branch. I haven't had time to zoom into it before the holiday season. That could be on the new year's resolutions ;-) happy holidays !

@LewisJEllis
Copy link
Collaborator

@vqvu how are iterative resume/redirects going? Is there a way I can be helpful? I was contemplating features/changes for 3.0.0 a bit, but @jeromew wisely suggested that I don't get carried away while the engine is still up in the air.

Added a 3.x potential label to track related items.

@jeromew
Copy link
Collaborator

jeromew commented Jan 2, 2015

@LewisJEllis I am not sure @vqvu has had time to look into this over the holiday season. I think that the changes needed to flatten the stack are still up for discovery. Last time @vqvu mentioned it he said he had no idea how convoluted this could get.

I unfortunately could not find time to look deeply at the new branch. If you feel like diving into it, it would be great and @vqvu will certainly appreciate to be able to discuss potential solutions with someone.

I was planning on working on the merge issue #133. @vqvu has rewritten merge in 3.0.0 but, from my understanding, this has nothing to do with the new engine. So I would like to take his new implementation and adapt it to have it meet #133's requirements.

This way we will be able to remove this from the 3.0.0 branch.

After this, we should make sure that 3.0.0 and 2.x are in sync feature wise (I have seen a few commits that need to be checked)

If possible, we should have strictly the same tests between 2.x and 3.0.0

@vqvu
Copy link
Collaborator Author

vqvu commented Jan 3, 2015

I haven't had time to do any work on Highland for the last few weeks...

@LewisJEllis I'm happy to discuss ideas on how iterative resume/redirect would be done. We also need some sort of a "memory leak checker" and better performance testing. Also the back-prop stuff (this is pretty useful behavior that's not at all implemented) and pipe passing errors to Highland streams.

@jeromew
Copy link
Collaborator

jeromew commented Jan 3, 2015

I modified the PR for merge on #133 ; if @vqvu and @LewisJEllis you have time to review it we could remove the merge refactoring from 3.0.0

@jeromew
Copy link
Collaborator

jeromew commented Jan 3, 2015

@LewisJEllis would it make sense to backport ESLint from 3.0.0 to 2.x ? I understand that caolan was ok for the change and don't see a reason to wait for 3.0.0 on this.

@LewisJEllis
Copy link
Collaborator

Yes, definitely. I originally did only 3.0.0 because I didn't want to make all the silly little changes twice, and because I wasn't sure how much would be done on 2.x in the meantime. It's now evident that 2.x needs it too, and a couple regexes did most of the work anyway, so I'll go ahead and backport it. That should also help avoid silly conflicts when 3.0.0 lands.

Regarding keeping 2.x and 3.0.0 in sync as mentioned in #189 - most things being done on 2.x in the meantime shouldn't cause any major conflicts, so I think an occasional rebase should work just fine. I'll try rebasing 3.0.0 onto master after I backport the linting stuff.

I'm not too familiar with merge or the new engine yet, but I'll get up to speed soon so I can take a look at #133 and try to think about the iterative stuff.

@jeromew
Copy link
Collaborator

jeromew commented Jan 5, 2015

ok thanks. We'll have to agree with @vqvu and the 'rebase or not rebase'. I'm all for removing

  • eslint
  • merge
  • other simple things

from the 3.0.0 potential conflict footprint. I can't say for sure but I think that it will take some time before we get approval from caolan on 3.0.0 so we'd better not freeze 2.x in the meantime and have an easy workflow for merging/rebasing and later have a clean merge from 3.0.0 to master.

@jeromew jeromew closed this as completed Jan 5, 2015
@vqvu
Copy link
Collaborator Author

vqvu commented Jan 5, 2015

@jeromew Why was this issue closed?

As for "rebase vs merge", I suppose it doesn't really matter too much if we don't expect non-collaborators to contribute. Only issue is that if we start getting a large-ish number of commits on 3.0, it'll start getting more annoying and error-prone to do rebases. That said, I haven't actually tried to do the rebase yet, so maybe this is a non-issue.

Agreed that 2.x should still be the branch that gets all back-compat improvements for now.

@jeromew
Copy link
Collaborator

jeromew commented Jan 5, 2015

Oops sorry my mistake. I thought I was closing the issue regarding ESLint.

@jeromew jeromew reopened this Jan 5, 2015
@jeromew
Copy link
Collaborator

jeromew commented Jan 5, 2015

@vqvu maybe we could :

  • backport to 2.x what can be backported
  • make a new 2.x release
  • merge or rebase or re-create the branch

It is always a pain when there are 2 active branches but we did not have a choice here given the surface area of the refactoring.

do you see other things that can be backported without breaking back-compat on 2.x ? (eslint and merge is done)

@vqvu
Copy link
Collaborator Author

vqvu commented Jan 5, 2015

throttle, debounce, and latest were rewritten to purely use pull/consume because they depended on private methods and did weird things like override write and pause.

There's also a few uses of setImmediate in the tests that were changed to _.setImmediate to get things to work in a browser.

I think these can all be cleanly backported to 2.x.

@vqvu
Copy link
Collaborator Author

vqvu commented Jan 5, 2015

Same thing with pipeline, I think.

@jeromew
Copy link
Collaborator

jeromew commented Jan 8, 2015

@vqvu I backported throttle, debounce and latest + added one missing _. before setImmediate in test/tests.

I tried to backport pipeline but the tests freeze with the new impl. I did not try to understand why it freezes this is maybe related to the 2.x engine. Do you see a reason why it would freeze ?

@jeromew
Copy link
Collaborator

jeromew commented Jan 8, 2015

@vqvu, @LewisJEllis - I propose that we bump the master version to 2.3.0 and then rebase/merge/recreate the 3.0.0 branch what do you think ?

@LewisJEllis
Copy link
Collaborator

Sounds good, since most/all of the backporting is done, and 2.x needs a bump anyway.

As for whether to rebase or merge - the difference isn't very significant, but I think it comes down to how we want to land 3.0.0. If we want to have a single giant pull request to merge into master, I think rebase should keep that cleaner (although merge probably also works). If we want to eventually just swap the 3.0.0 branch in for master, merge will theoretically keep more information in the history. I've always used rebase in this sort of situation, so I can't comment as well on the merge option.

@vqvu
Copy link
Collaborator Author

vqvu commented Jan 8, 2015

Agreed with the 2.x version bump. We can rebase this time to get rid of the backported transforms. I just didn't want to have to rebase every time we add new features to the 2.x branch.

I think I know why pipeline doesn't work...probably some difference between the old _send and the new _send. It doesn't matter too much though; we don't have to backport it.

@jeromew
Copy link
Collaborator

jeromew commented Jan 8, 2015

@caolan, for info, I just bumped the highland npm version to 2.3.0

@caolan
Copy link
Owner

caolan commented Jan 8, 2015

thanks @jeromew, I'm hoping to get some time to look at the engine changes in more depth soon too

@jeromew
Copy link
Collaborator

jeromew commented Jan 23, 2015

@vqvu do you want to try to rebase 3.0.0 or else I might try do it next week.

@vqvu
Copy link
Collaborator Author

vqvu commented Jan 23, 2015

Yes, I will do it this weekend.

@vqvu
Copy link
Collaborator Author

vqvu commented Jan 25, 2015

Done. Slightly painful cause of the eslint changes, but it wasn't too bad.

@jeromew
Copy link
Collaborator

jeromew commented Jan 27, 2015

@vqvu thanks. I started digging into it a little trying to see if I could find some performance booster (only using bench.js)

There are 3 things that I tried at this stage:

  • using a derivation of https://github.com/herby/queue to try and have a faster queue. This did not change much on bench.js. hqueue could maybe improve things on a benchmark with lots of async enqueue/dequeue since it tries to re-use memory allocation. In our case in bench.js it does not change much
  • removing the try/catch in map. This improves the 1M benchmark. From what I understand, underscore and lodash do not have a unit-based try/catch (I mean around each f()call). Not sure what we can/want to do about this
  • inlining the ConsumeStream object. I did not reach an all tests passing status because I did this rapidly and was not sure about the teardown but basically I removed the ConsumeStream inheritance by calling directly new Stream(function() { self.pull(pullCb) } and defining pullCb inside of consume. On my setup this gives a perf boost of about 10%.

it seems that in the 2.x version, the 'sync' path is even shorter which is probably why it seems to get a better bench than lodash on the 1M test.

do you have the same results ? (just to make sure that we can compare things on our respective setups)

@svozza
Copy link
Collaborator

svozza commented Jan 28, 2015

For the try-catch problem - assuming we want to keep them - would using the trick in the Bluebird performance optimisations document help?

https://github.com/petkaantonov/bluebird/wiki/Optimization-killers#2-unsupported-syntax

Some constructs are flat out not supported in the optimizing compiler and using such syntax will make the containing function unoptimizable.

Currently not optimizable:

  • ...
  • Functions that contain a try-catch statement
  • Functions that contain a try-finally statement
  • ...

Some of these statements cannot be avoided in production code such as try-finally and try-catch. To use such statements with minimal impact, they must be isolated to a minimal function so that the main code is not affected:

var errorObject = {value: null};
function tryCatch(fn, ctx, args) {
    try {
        return fn.apply(ctx, args);
    }
    catch(e) {
        errorObject.value = e;
        return errorObject;
    }
}

var result = tryCatch(mightThrow, void 0, [1,2,3]);
//Unambiguously tells whether the call threw
if(result === errorObject) {
    var error = errorObject.value;
}
else {
    //result is the returned value
}

Depending on the performance improvement it could be overkill.

@jeromew
Copy link
Collaborator

jeromew commented Jan 28, 2015

@svozza the map loop in the 1M case indeed seem to be helped by extruding a function like

function tryCatch(fn, x) {
  try {
        return fn(x);
  }
  catch(e) {
    return new StreamError(e);
  }
}

and then doing in the loop

var fnRes;
            fnRes = tryCatch(f, x);
            if (_._isStreamError(fnRes)) {
              push(fnRes);
            } else {
              push(null, fnRes);
            }

with this modification + the one on inlining ConsumeStream, the 1M bench.js is on par between lodash and highland on my machine.

the try/catch inside the highland loop has a penalty.

@jeromew
Copy link
Collaborator

jeromew commented Jan 28, 2015

@vqvu do you have a test somewhere regarding the stack exhaustion problem ?

@svozza
Copy link
Collaborator

svozza commented Jul 28, 2015

@LewisJEllis Do you mind if I port your changes from #166 to the 3.0.0 branch?

@LewisJEllis
Copy link
Collaborator

I did that in #191 but I think something about #240 made it no longer needed after some other changes. I'm not too fresh on the details though, so feel free to take another look. If it does seem necessary, maybe we can just merge #191?

@svozza
Copy link
Collaborator

svozza commented Jul 28, 2015

Well, I think the reason we didn't do this in 2.x was because it would be a breaking change but I think we still want to do it. #240 was related because through had the same behaviour as pipe even though it works exclusively with Highland streams so we considered its behaviour a bug.

@LewisJEllis
Copy link
Collaborator

Right, okay. I do recall the decision not to merge it on 2.x, but I'm otherwise not too sure what the situation is. I left a comment on #191.

@vqvu
Copy link
Collaborator Author

vqvu commented Jul 28, 2015

@svozza Why is #191 necessary when you can just use through and get the same result?

I see a checkbox in the original post about merging in #191 (which I added), but I wasn't thinking about #240 at the time.

Edit: To clarify. My thinking was that pipe would be used to mimic the behavior of node stream's pipe, while through would be used when you want a Highland stream back. We can even make through short-circuit the double-pipe when the piped-to stream is already a Highland stream. But I didn't see a reason to have pipe behave slightly differently depending on the type of its input. That's why I originally closed #191.

@svozza
Copy link
Collaborator

svozza commented Jul 28, 2015

Well I'm not pushed but it's hardly inconceivable that a user might not read the docs and pipe a Highland stream into another one and if the price of us preventing a strange crash for one of those users is just a simple if statement in pipe then I don't see why we wouldn't do it.

@vqvu
Copy link
Collaborator Author

vqvu commented Jul 28, 2015

The standard behavior of pipe is to throw if you don't have an error handler attached though, so someone not reading the docs would probably assume the standard behavior, right? How about adding a statement in the pipe docs pointing to through instead?

@svozza
Copy link
Collaborator

svozza commented Jul 28, 2015

Yep, sounds good to me.

@vqvu
Copy link
Collaborator Author

vqvu commented Jul 28, 2015

I'll make a PR.

@LewisJEllis
Copy link
Collaborator

I'm interested in spending some time soon to help push 3.0 out; @vqvu and @quarterto can you give some guidance on what's left to do that I can help with? The todo at the top seems pretty up-to-date, but I haven't been following closely enough to know exactly what each task entails. Happy to help with docs, testing, and any low- to medium-hanging fruit.

Also, is breaking things down into a stream engine core + .useable modules part of the scope of 3.0?

@vqvu
Copy link
Collaborator Author

vqvu commented Sep 2, 2015

We have:

  1. Test for making sure back propagation works for all transforms. Roughly speaking, something along the lines of
stream.onDestroy(cb)
    .transform(...)
    .destroy();

// assert that cb was called.

for all transforms. This is more of a problem for transforms (listed in the todo) that return a stream created with this.create and not this.consume. We need a this.createDownstream (or equivalent name) that does a this.create and binds an onDestroy handler like consume does.

  1. Tests to make sure that a fork that is destroyed does not contribute backpressure. That is, assert that the following no longer blocks indefinitely.
var s = _([1, 2, 3]);
var s1 = s.fork().take(2);
var s2 = s.fork();

s1.each(_.log);
s2.each(_.log);
// => 1
// => 1
// => 2
// => 2
// => 3
  1. docs for use.

  2. a highland-2 module that restores the behavior of the transforms that changed (i.e., undoes the renames and reargs that we made). I'm not sure exactly how to expose this. Ideally as a separate npm module, but where would we put the repo?

Also, is breaking things down into a stream engine core + .useable modules part of the scope of 3.0?

I'd like to get this done for 3.0, but we'd want to do this last, since the code movement would make it much harder to merge in the PRs that were applied to 2.x.

@jeromew
Copy link
Collaborator

jeromew commented Nov 13, 2015

@vqvu I saw your mention of 3.0 here - #388 (comment) and realize that bugfixing 2.x engine bugs are a bummer since many things are automatically fixed in the work you did on 3.x.

I think that the engine fixes + the use work that you did with @quarterto on #337 are big improvements to the library that we should not hold very much longer.

The main drawback as I understand it would be on performance where we would lose some throughput (I don't have numbers on the last version) but the fixes in the engine + the extensibility would help the community and @vqvu has already shown that there was a way to unroll his engine to gain more speed at the expense of readability of the code.

Regarding speed, I noted one of the remarks of @caolan :

I'm a tentative +1 (since I've not had time to try this out on some real projects yet), but the code
looks good and I trust your (and the other collaborators) judgement regarding a release. I'm
interested in the ramda integration plans, since that could give us an complimentary tool for sync use > cases where performance is critical.

What is the state-of-affairs is regarding integration of highland with other sync libraries that are performance oriented (lodash?). Do we already have an elegant integration these for sync cases when speed is a requirement ?

If we strip down the functions, i would rather call it highland-core and highland-2 but that is just a naming preference.

highland could be a version of highland-core pre-packaged with basic transforms.

@vqvu how can we help towards the release of 3.0 ?

@vqvu
Copy link
Collaborator Author

vqvu commented Nov 13, 2015

I think that the engine fixes + the use work that you did with @quarterto on #337 are big improvements to the library that we should not hold very much longer.

Agreed. The hold up is more me (and I suspect the rest of the collaborators too) not having time to work on this rather than any real blocking issue. @quarterto had a good suggestion to release a 3.0.0-beta1 to npm while we work on finalizing the release, and I think that's a good idea.

What is the state-of-affairs is regarding integration of highland with other sync libraries that are performance oriented (lodash?). Do we already have an elegant integration these for sync cases when speed is a requirement ?

We currently only have integration with transducers, and while I don't have hard numbers, I suspect using transducers for sync transforms is faster than using highland directly. Transducers could very well be our answer to the sync problem.

I'm not sure about integrations with libraries like lodash. I don't see a way to integrate with them beyond this little snippet.

stream.batch(some_reasonable_number)
    .flatMap(function (array) {
        // use lodash here to turn array into result.
        return _(result);
    });

At this point, sync performance isn't that bad compared to 2.x, so I'm not too concerned about it. My opinion is that while we should of course care about sync performance the utility of Highland is more about sequencing async computations than raw throughput.

Ramda integration is kind of stalled at the moment, but I think it's more about allowing people to leverage their library of transforms, and especially the FP programming style that comes with it, rather than performance. The integrations that was being proposed did not have to do with performance. As it relates to the 3.0.0 release, I can't think of any further breaking changes that we might need to make to make to support ramda integration, so I think we're safe to proceed here. Unless someone things otherwise?

If we strip down the functions, i would rather call it highland-core and highland-2 but that is just a naming preference.

highland could be a version of highland-core pre-packaged with basic transforms.

The highland-2 module would be more of a "undo the naming/arg order changes that we did to the transforms". We'd need another one for "non-basic" transforms.

Here's a question for you and the rest of the collaborators. How do we want to handle the multi-module situation? Do we simply package the modules with highland proper? Or is it worth it to create a highland organization to house the different modules in different repos? Bundling everything up would be the simplest, but I think that would defeat the purpose of spliting up the transforms into basic and non-basic packages. If the user is going to download them all anyway, they might as well be allowed to use them by default.

@vqvu how can we help towards the release of 3.0 ?

I think my previous comment here is a good list of the things still pending. They're all boring documentation/testing tasks, which explains why they've been stalled for so long.

Also, if you or someone else with npm access can release the current 3.0.0 branch as 3.0.0-beta1, that would be great. The 3.0.0 branch is a little behind master, but it's only lacking the dependency updates we've made within the past week. If possible, it'd also be nice to add a release:major-beta so we have it for future uses.

I'd like to refactor index.js and test.js into multiple files so that they're easier to manage, but that's quite a bit of work and can be done after the release.

@svozza
Copy link
Collaborator

svozza commented Nov 18, 2015

What exactly is involved in this task:

Create a test like noValueOnError to test the backpropagation behavior

I should be able to look at it tomorrow.

@vqvu
Copy link
Collaborator Author

vqvu commented Nov 18, 2015

Basically, given an infinite stream s, if we do

var i = 0;
var s = _(function generator(push, next) {
    push(null, i++);
    next();
});

s.onDestroy(function destructor1() {
})
.through(_.someTransform(...))
.onDestroy(function destructor2() {
})
.take(1)
.resume();

destructor2 and destructor1 should both be called at some point. Furthermore, generator should never be called after destructor2 has been called.

Many transforms get this behavior for free when they use consume. However, some (like sequence) actually return a new generator stream. These transforms should be using this.create to create the new streams, and create should take care of setting up the appropriate hooks so that an onDestroy on the result stream gets propagated to the parent.

@svozza
Copy link
Collaborator

svozza commented Nov 19, 2015

Cool. I'll have a go at this tomorrow evening. Btw, I think the idea of creating an organisation and having the various modules in different repos is a great idea.

@apaleslimghost
Copy link
Collaborator

I did register the org highland-js a while back...

@jeromew
Copy link
Collaborator

jeromew commented Nov 19, 2015

@caolan would that make sense for you to migrate highland to an highlandjs org ? I don't know if you have followed the work on extensibility that was done by @quarterto and @vqvu in #337 but a lot of modules could exist in this framework.

@svozza
Copy link
Collaborator

svozza commented Nov 27, 2015

So, I was having a look at this:

Test that destroyed fork does not contribute backpressure

And it looks like the destroyed fork still contributes backpressure. Not quite sure where to look to start fixing it though.

@vqvu
Copy link
Collaborator Author

vqvu commented Nov 28, 2015

Backpressure for forks is managed by StreamMultiplexer. New forks are created using StreamMultiplexer#newStream. Destroyed forks are removed via StreamMultiplexer#removeConsumer.

When forks want data, they call StreamMultiplexer#pull with their id (an integer assigned at creation time) and a callback. This calls StreamMultiplexer#_resume. _resume will check to see if all forks have requested data. Once this is true, it calls Stream#pull on the source and distributes the result to all forks.

Forks may be added or removed while the Stream#pull call resolves. If they are added, the result will not be distributed. Instead, it's saved in this._cached_value.

The problem here is that the multiplexer uses this._consumers[id] !== undefined to track that what streams are currently registered (link). However, the this._consumers object gets reset every time a value is emitted, which wipes away that information.

Fix in #411.

@svozza
Copy link
Collaborator

svozza commented Nov 28, 2015

Makes sense. Thanks for the explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants