Highland v3.0.0 #179

vqvu · 2014-12-13T03:16:22Z

This issue tracks any changes we want for v3.0.0. See the 3.0.0 branch.

The biggest change for now is a reimplementation of the Highland engine. See the original PR at #175.

Breaking changes

Can no longer call fork after consume. Must call fork at the start.
Can never consume a stream twice.
The (undocumented) stream.source no longer exists.
~~pipe passes along errors if piping to a Highland stream instead of calling this.emit('error') (Modifies pipe to pass along highland errors #166).~~
wrapCallback and stream constructor passes an array for the default mapping hint when there are more than one argument is passed to the callback (Implement mappingHint for wrapCallback, fixes #246 #247, hintMapper default mapper should return array when more than one arg #335). See Reconsider making mappingHint's default mapper wrap in an array. #360 (comment)
Renames and reargs (Highland v3.0.0 #179 (comment))
- zipAll renamed to zipEach.
- zipAll0 renamed to zipAll.
- reduce and scan argument order reversed
map will now throw an error if the argument is not a function. The old behavior instead mapped all stream elements to the argument if it was not a function.

Before release

Nice to have but can be delayed until after release

Port over sync optimizations.
Port over iterative consume and redirect.
Check for memory leaks.
Create some benchmarks.

The text was updated successfully, but these errors were encountered:

vqvu · 2014-12-13T03:17:09Z

Ref #176.

jeromew · 2014-12-20T00:08:55Z

@vqvu thanks for creating the branch. I haven't had time to zoom into it before the holiday season. That could be on the new year's resolutions ;-) happy holidays !

LewisJEllis · 2015-01-02T13:26:02Z

@vqvu how are iterative resume/redirects going? Is there a way I can be helpful? I was contemplating features/changes for 3.0.0 a bit, but @jeromew wisely suggested that I don't get carried away while the engine is still up in the air.

Added a 3.x potential label to track related items.

jeromew · 2015-01-02T13:37:27Z

@LewisJEllis I am not sure @vqvu has had time to look into this over the holiday season. I think that the changes needed to flatten the stack are still up for discovery. Last time @vqvu mentioned it he said he had no idea how convoluted this could get.

I unfortunately could not find time to look deeply at the new branch. If you feel like diving into it, it would be great and @vqvu will certainly appreciate to be able to discuss potential solutions with someone.

I was planning on working on the merge issue #133. @vqvu has rewritten merge in 3.0.0 but, from my understanding, this has nothing to do with the new engine. So I would like to take his new implementation and adapt it to have it meet #133's requirements.

This way we will be able to remove this from the 3.0.0 branch.

After this, we should make sure that 3.0.0 and 2.x are in sync feature wise (I have seen a few commits that need to be checked)

If possible, we should have strictly the same tests between 2.x and 3.0.0

vqvu · 2015-01-03T20:46:30Z

I haven't had time to do any work on Highland for the last few weeks...

@LewisJEllis I'm happy to discuss ideas on how iterative resume/redirect would be done. We also need some sort of a "memory leak checker" and better performance testing. Also the back-prop stuff (this is pretty useful behavior that's not at all implemented) and pipe passing errors to Highland streams.

jeromew · 2015-01-03T21:34:38Z

I modified the PR for merge on #133 ; if @vqvu and @LewisJEllis you have time to review it we could remove the merge refactoring from 3.0.0

jeromew · 2015-01-03T21:44:48Z

@LewisJEllis would it make sense to backport ESLint from 3.0.0 to 2.x ? I understand that caolan was ok for the change and don't see a reason to wait for 3.0.0 on this.

LewisJEllis · 2015-01-03T22:06:58Z

Yes, definitely. I originally did only 3.0.0 because I didn't want to make all the silly little changes twice, and because I wasn't sure how much would be done on 2.x in the meantime. It's now evident that 2.x needs it too, and a couple regexes did most of the work anyway, so I'll go ahead and backport it. That should also help avoid silly conflicts when 3.0.0 lands.

Regarding keeping 2.x and 3.0.0 in sync as mentioned in #189 - most things being done on 2.x in the meantime shouldn't cause any major conflicts, so I think an occasional rebase should work just fine. I'll try rebasing 3.0.0 onto master after I backport the linting stuff.

I'm not too familiar with merge or the new engine yet, but I'll get up to speed soon so I can take a look at #133 and try to think about the iterative stuff.

jeromew · 2015-01-05T17:42:06Z

ok thanks. We'll have to agree with @vqvu and the 'rebase or not rebase'. I'm all for removing

eslint
merge
other simple things

from the 3.0.0 potential conflict footprint. I can't say for sure but I think that it will take some time before we get approval from caolan on 3.0.0 so we'd better not freeze 2.x in the meantime and have an easy workflow for merging/rebasing and later have a clean merge from 3.0.0 to master.

vqvu · 2015-01-05T19:41:39Z

@jeromew Why was this issue closed?

As for "rebase vs merge", I suppose it doesn't really matter too much if we don't expect non-collaborators to contribute. Only issue is that if we start getting a large-ish number of commits on 3.0, it'll start getting more annoying and error-prone to do rebases. That said, I haven't actually tried to do the rebase yet, so maybe this is a non-issue.

Agreed that 2.x should still be the branch that gets all back-compat improvements for now.

jeromew · 2015-01-05T20:26:15Z

Oops sorry my mistake. I thought I was closing the issue regarding ESLint.

jeromew · 2015-01-05T20:35:57Z

@vqvu maybe we could :

backport to 2.x what can be backported
make a new 2.x release
merge or rebase or re-create the branch

It is always a pain when there are 2 active branches but we did not have a choice here given the surface area of the refactoring.

do you see other things that can be backported without breaking back-compat on 2.x ? (eslint and merge is done)

vqvu · 2015-01-05T21:45:06Z

throttle, debounce, and latest were rewritten to purely use pull/consume because they depended on private methods and did weird things like override write and pause.

There's also a few uses of setImmediate in the tests that were changed to _.setImmediate to get things to work in a browser.

I think these can all be cleanly backported to 2.x.

vqvu · 2015-01-05T21:49:24Z

Same thing with pipeline, I think.

jeromew · 2015-01-08T12:21:29Z

@vqvu I backported throttle, debounce and latest + added one missing _. before setImmediate in test/tests.

I tried to backport pipeline but the tests freeze with the new impl. I did not try to understand why it freezes this is maybe related to the 2.x engine. Do you see a reason why it would freeze ?

jeromew · 2015-01-08T13:57:53Z

@vqvu, @LewisJEllis - I propose that we bump the master version to 2.3.0 and then rebase/merge/recreate the 3.0.0 branch what do you think ?

LewisJEllis · 2015-01-08T16:12:43Z

Sounds good, since most/all of the backporting is done, and 2.x needs a bump anyway.

As for whether to rebase or merge - the difference isn't very significant, but I think it comes down to how we want to land 3.0.0. If we want to have a single giant pull request to merge into master, I think rebase should keep that cleaner (although merge probably also works). If we want to eventually just swap the 3.0.0 branch in for master, merge will theoretically keep more information in the history. I've always used rebase in this sort of situation, so I can't comment as well on the merge option.

vqvu · 2015-01-08T16:56:34Z

Agreed with the 2.x version bump. We can rebase this time to get rid of the backported transforms. I just didn't want to have to rebase every time we add new features to the 2.x branch.

I think I know why pipeline doesn't work...probably some difference between the old _send and the new _send. It doesn't matter too much though; we don't have to backport it.

jeromew · 2015-01-08T17:45:20Z

@caolan, for info, I just bumped the highland npm version to 2.3.0

caolan · 2015-01-08T18:04:27Z

thanks @jeromew, I'm hoping to get some time to look at the engine changes in more depth soon too

jeromew · 2015-01-23T20:05:45Z

@vqvu do you want to try to rebase 3.0.0 or else I might try do it next week.

vqvu · 2015-01-23T23:28:21Z

Yes, I will do it this weekend.

vqvu · 2015-01-25T01:45:29Z

Done. Slightly painful cause of the eslint changes, but it wasn't too bad.

jeromew · 2015-01-27T22:16:12Z

@vqvu thanks. I started digging into it a little trying to see if I could find some performance booster (only using bench.js)

There are 3 things that I tried at this stage:

using a derivation of https://github.com/herby/queue to try and have a faster queue. This did not change much on bench.js. hqueue could maybe improve things on a benchmark with lots of async enqueue/dequeue since it tries to re-use memory allocation. In our case in bench.js it does not change much
removing the try/catch in map. This improves the 1M benchmark. From what I understand, underscore and lodash do not have a unit-based try/catch (I mean around each f()call). Not sure what we can/want to do about this
inlining the ConsumeStream object. I did not reach an all tests passing status because I did this rapidly and was not sure about the teardown but basically I removed the ConsumeStream inheritance by calling directly new Stream(function() { self.pull(pullCb) } and defining pullCb inside of consume. On my setup this gives a perf boost of about 10%.

it seems that in the 2.x version, the 'sync' path is even shorter which is probably why it seems to get a better bench than lodash on the 1M test.

do you have the same results ? (just to make sure that we can compare things on our respective setups)

svozza · 2015-01-28T09:43:43Z

For the try-catch problem - assuming we want to keep them - would using the trick in the Bluebird performance optimisations document help?

https://github.com/petkaantonov/bluebird/wiki/Optimization-killers#2-unsupported-syntax

Some constructs are flat out not supported in the optimizing compiler and using such syntax will make the containing function unoptimizable.

Currently not optimizable:

...

Functions that contain a try-catch statement

Functions that contain a try-finally statement

...

Some of these statements cannot be avoided in production code such as try-finally and try-catch. To use such statements with minimal impact, they must be isolated to a minimal function so that the main code is not affected:
var errorObject = {value: null};
function tryCatch(fn, ctx, args) {
    try {
        return fn.apply(ctx, args);
    }
    catch(e) {
        errorObject.value = e;
        return errorObject;
    }
}

var result = tryCatch(mightThrow, void 0, [1,2,3]);
//Unambiguously tells whether the call threw
if(result === errorObject) {
    var error = errorObject.value;
}
else {
    //result is the returned value
}

Depending on the performance improvement it could be overkill.

jeromew · 2015-01-28T10:07:42Z

@svozza the map loop in the 1M case indeed seem to be helped by extruding a function like

function tryCatch(fn, x) {
  try {
        return fn(x);
  }
  catch(e) {
    return new StreamError(e);
  }
}

and then doing in the loop

var fnRes;
            fnRes = tryCatch(f, x);
            if (_._isStreamError(fnRes)) {
              push(fnRes);
            } else {
              push(null, fnRes);
            }

with this modification + the one on inlining ConsumeStream, the 1M bench.js is on par between lodash and highland on my machine.

the try/catch inside the highland loop has a penalty.

jeromew · 2015-01-28T10:09:31Z

@vqvu do you have a test somewhere regarding the stack exhaustion problem ?

svozza · 2015-07-28T11:55:30Z

@LewisJEllis Do you mind if I port your changes from #166 to the 3.0.0 branch?

LewisJEllis · 2015-07-28T14:04:47Z

I did that in #191 but I think something about #240 made it no longer needed after some other changes. I'm not too fresh on the details though, so feel free to take another look. If it does seem necessary, maybe we can just merge #191?

svozza · 2015-07-28T14:25:15Z

Well, I think the reason we didn't do this in 2.x was because it would be a breaking change but I think we still want to do it. #240 was related because through had the same behaviour as pipe even though it works exclusively with Highland streams so we considered its behaviour a bug.

LewisJEllis · 2015-07-28T14:35:30Z

Right, okay. I do recall the decision not to merge it on 2.x, but I'm otherwise not too sure what the situation is. I left a comment on #191.

vqvu · 2015-07-28T16:17:37Z

@svozza Why is #191 necessary when you can just use through and get the same result?

I see a checkbox in the original post about merging in #191 (which I added), but I wasn't thinking about #240 at the time.

Edit: To clarify. My thinking was that pipe would be used to mimic the behavior of node stream's pipe, while through would be used when you want a Highland stream back. We can even make through short-circuit the double-pipe when the piped-to stream is already a Highland stream. But I didn't see a reason to have pipe behave slightly differently depending on the type of its input. That's why I originally closed #191.

svozza · 2015-07-28T16:20:20Z

Well I'm not pushed but it's hardly inconceivable that a user might not read the docs and pipe a Highland stream into another one and if the price of us preventing a strange crash for one of those users is just a simple if statement in pipe then I don't see why we wouldn't do it.

vqvu · 2015-07-28T16:23:46Z

The standard behavior of pipe is to throw if you don't have an error handler attached though, so someone not reading the docs would probably assume the standard behavior, right? How about adding a statement in the pipe docs pointing to through instead?

svozza · 2015-07-28T16:24:19Z

Yep, sounds good to me.

vqvu · 2015-07-28T16:34:26Z

I'll make a PR.

LewisJEllis · 2015-09-01T19:29:34Z

I'm interested in spending some time soon to help push 3.0 out; @vqvu and @quarterto can you give some guidance on what's left to do that I can help with? The todo at the top seems pretty up-to-date, but I haven't been following closely enough to know exactly what each task entails. Happy to help with docs, testing, and any low- to medium-hanging fruit.

Also, is breaking things down into a stream engine core + .useable modules part of the scope of 3.0?

vqvu · 2015-09-02T00:53:06Z

We have:

Test for making sure back propagation works for all transforms. Roughly speaking, something along the lines of

stream.onDestroy(cb)
    .transform(...)
    .destroy();

// assert that cb was called.

for all transforms. This is more of a problem for transforms (listed in the todo) that return a stream created with this.create and not this.consume. We need a this.createDownstream (or equivalent name) that does a this.create and binds an onDestroy handler like consume does.

Tests to make sure that a fork that is destroyed does not contribute backpressure. That is, assert that the following no longer blocks indefinitely.

var s = _([1, 2, 3]);
var s1 = s.fork().take(2);
var s2 = s.fork();

s1.each(_.log);
s2.each(_.log);
// => 1
// => 1
// => 2
// => 2
// => 3

docs for use.
a highland-2 module that restores the behavior of the transforms that changed (i.e., undoes the renames and reargs that we made). I'm not sure exactly how to expose this. Ideally as a separate npm module, but where would we put the repo?

Also, is breaking things down into a stream engine core + .useable modules part of the scope of 3.0?

I'd like to get this done for 3.0, but we'd want to do this last, since the code movement would make it much harder to merge in the PRs that were applied to 2.x.

jeromew · 2015-11-13T12:55:46Z

@vqvu I saw your mention of 3.0 here - #388 (comment) and realize that bugfixing 2.x engine bugs are a bummer since many things are automatically fixed in the work you did on 3.x.

I think that the engine fixes + the use work that you did with @quarterto on #337 are big improvements to the library that we should not hold very much longer.

The main drawback as I understand it would be on performance where we would lose some throughput (I don't have numbers on the last version) but the fixes in the engine + the extensibility would help the community and @vqvu has already shown that there was a way to unroll his engine to gain more speed at the expense of readability of the code.

Regarding speed, I noted one of the remarks of @caolan :

I'm a tentative +1 (since I've not had time to try this out on some real projects yet), but the code
looks good and I trust your (and the other collaborators) judgement regarding a release. I'm
interested in the ramda integration plans, since that could give us an complimentary tool for sync use > cases where performance is critical.

What is the state-of-affairs is regarding integration of highland with other sync libraries that are performance oriented (lodash?). Do we already have an elegant integration these for sync cases when speed is a requirement ?

If we strip down the functions, i would rather call it highland-core and highland-2 but that is just a naming preference.

highland could be a version of highland-core pre-packaged with basic transforms.

@vqvu how can we help towards the release of 3.0 ?

vqvu · 2015-11-13T22:09:01Z

I think that the engine fixes + the use work that you did with @quarterto on #337 are big improvements to the library that we should not hold very much longer.

Agreed. The hold up is more me (and I suspect the rest of the collaborators too) not having time to work on this rather than any real blocking issue. @quarterto had a good suggestion to release a 3.0.0-beta1 to npm while we work on finalizing the release, and I think that's a good idea.

What is the state-of-affairs is regarding integration of highland with other sync libraries that are performance oriented (lodash?). Do we already have an elegant integration these for sync cases when speed is a requirement ?

We currently only have integration with transducers, and while I don't have hard numbers, I suspect using transducers for sync transforms is faster than using highland directly. Transducers could very well be our answer to the sync problem.

I'm not sure about integrations with libraries like lodash. I don't see a way to integrate with them beyond this little snippet.

stream.batch(some_reasonable_number)
    .flatMap(function (array) {
        // use lodash here to turn array into result.
        return _(result);
    });

At this point, sync performance isn't that bad compared to 2.x, so I'm not too concerned about it. My opinion is that while we should of course care about sync performance the utility of Highland is more about sequencing async computations than raw throughput.

Ramda integration is kind of stalled at the moment, but I think it's more about allowing people to leverage their library of transforms, and especially the FP programming style that comes with it, rather than performance. The integrations that was being proposed did not have to do with performance. As it relates to the 3.0.0 release, I can't think of any further breaking changes that we might need to make to make to support ramda integration, so I think we're safe to proceed here. Unless someone things otherwise?

If we strip down the functions, i would rather call it highland-core and highland-2 but that is just a naming preference.

highland could be a version of highland-core pre-packaged with basic transforms.

The highland-2 module would be more of a "undo the naming/arg order changes that we did to the transforms". We'd need another one for "non-basic" transforms.

Here's a question for you and the rest of the collaborators. How do we want to handle the multi-module situation? Do we simply package the modules with highland proper? Or is it worth it to create a highland organization to house the different modules in different repos? Bundling everything up would be the simplest, but I think that would defeat the purpose of spliting up the transforms into basic and non-basic packages. If the user is going to download them all anyway, they might as well be allowed to use them by default.

@vqvu how can we help towards the release of 3.0 ?

I think my previous comment here is a good list of the things still pending. They're all boring documentation/testing tasks, which explains why they've been stalled for so long.

Also, if you or someone else with npm access can release the current 3.0.0 branch as 3.0.0-beta1, that would be great. The 3.0.0 branch is a little behind master, but it's only lacking the dependency updates we've made within the past week. If possible, it'd also be nice to add a release:major-beta so we have it for future uses.

I'd like to refactor index.js and test.js into multiple files so that they're easier to manage, but that's quite a bit of work and can be done after the release.

svozza · 2015-11-18T23:36:38Z

What exactly is involved in this task:

Create a test like noValueOnError to test the backpropagation behavior

I should be able to look at it tomorrow.

vqvu · 2015-11-18T23:57:46Z

Basically, given an infinite stream s, if we do

var i = 0;
var s = _(function generator(push, next) {
    push(null, i++);
    next();
});

s.onDestroy(function destructor1() {
})
.through(_.someTransform(...))
.onDestroy(function destructor2() {
})
.take(1)
.resume();

destructor2 and destructor1 should both be called at some point. Furthermore, generator should never be called after destructor2 has been called.

Many transforms get this behavior for free when they use consume. However, some (like sequence) actually return a new generator stream. These transforms should be using this.create to create the new streams, and create should take care of setting up the appropriate hooks so that an onDestroy on the result stream gets propagated to the parent.

svozza · 2015-11-19T00:03:33Z

Cool. I'll have a go at this tomorrow evening. Btw, I think the idea of creating an organisation and having the various modules in different repos is a great idea.

apaleslimghost · 2015-11-19T10:42:54Z

I did register the org highland-js a while back...

jeromew · 2015-11-19T16:34:24Z

@caolan would that make sense for you to migrate highland to an highlandjs org ? I don't know if you have followed the work on extensibility that was done by @quarterto and @vqvu in #337 but a lot of modules could exist in this framework.

svozza · 2015-11-27T16:57:58Z

So, I was having a look at this:

Test that destroyed fork does not contribute backpressure

And it looks like the destroyed fork still contributes backpressure. Not quite sure where to look to start fixing it though.

vqvu · 2015-11-28T03:29:30Z

Backpressure for forks is managed by StreamMultiplexer. New forks are created using StreamMultiplexer#newStream. Destroyed forks are removed via StreamMultiplexer#removeConsumer.

When forks want data, they call StreamMultiplexer#pull with their id (an integer assigned at creation time) and a callback. This calls StreamMultiplexer#_resume. _resume will check to see if all forks have requested data. Once this is true, it calls Stream#pull on the source and distributes the result to all forks.

Forks may be added or removed while the Stream#pull call resolves. If they are added, the result will not be distributed. Instead, it's saved in this._cached_value.

The problem here is that the multiplexer uses this._consumers[id] !== undefined to track that what streams are currently registered (link). However, the this._consumers object gets reset every time a value is emitted, which wipes away that information.

Fix in #411.

svozza · 2015-11-28T11:22:22Z

Makes sense. Thanks for the explanation!

vqvu mentioned this issue Dec 13, 2014

Fix consume and redirect. #175

Closed

jeromew closed this as completed Jan 5, 2015

jeromew reopened this Jan 5, 2015

svozza mentioned this issue Jul 28, 2015

swap arg order of reduce and rename zip functions #351

Merged

LewisJEllis mentioned this issue Jul 28, 2015

Modifies pipe to pass along highland errors #191

Closed

apaleslimghost mentioned this issue Oct 17, 2015

Question: take items from stream while value is less than a max value #378

Closed

vqvu mentioned this issue Nov 13, 2015

highland doesn't handle backpressure with a mongo cursor stream #388

Closed

svozza mentioned this issue Nov 21, 2015

Add onDestroyTest #402

Merged

vqvu mentioned this issue Nov 24, 2015

Map shouldn't accept a value. Move that to value() #404

Closed

vqvu mentioned this issue Nov 28, 2015

Fix destroyed fork backpressure. #411

Merged

apaleslimghost mentioned this issue Feb 1, 2016

Question: What's left before 3.0 release? #440

Closed

vqvu added the 3.x label Sep 6, 2016

Highland v3.0.0 #179

Highland v3.0.0 #179

Comments

vqvu commented Dec 13, 2014

Breaking changes

Before release

Nice to have but can be delayed until after release

vqvu commented Dec 13, 2014

jeromew commented Dec 20, 2014

LewisJEllis commented Jan 2, 2015

jeromew commented Jan 2, 2015

vqvu commented Jan 3, 2015

jeromew commented Jan 3, 2015

jeromew commented Jan 3, 2015

LewisJEllis commented Jan 3, 2015

jeromew commented Jan 5, 2015

vqvu commented Jan 5, 2015

jeromew commented Jan 5, 2015

jeromew commented Jan 5, 2015

vqvu commented Jan 5, 2015

vqvu commented Jan 5, 2015

jeromew commented Jan 8, 2015

jeromew commented Jan 8, 2015

LewisJEllis commented Jan 8, 2015

vqvu commented Jan 8, 2015

jeromew commented Jan 8, 2015

caolan commented Jan 8, 2015

jeromew commented Jan 23, 2015

vqvu commented Jan 23, 2015

vqvu commented Jan 25, 2015

jeromew commented Jan 27, 2015

svozza commented Jan 28, 2015

jeromew commented Jan 28, 2015

jeromew commented Jan 28, 2015

svozza commented Jul 28, 2015

LewisJEllis commented Jul 28, 2015

svozza commented Jul 28, 2015

LewisJEllis commented Jul 28, 2015

vqvu commented Jul 28, 2015

svozza commented Jul 28, 2015

vqvu commented Jul 28, 2015

svozza commented Jul 28, 2015

vqvu commented Jul 28, 2015

LewisJEllis commented Sep 1, 2015

vqvu commented Sep 2, 2015

jeromew commented Nov 13, 2015

vqvu commented Nov 13, 2015

svozza commented Nov 18, 2015

vqvu commented Nov 18, 2015

svozza commented Nov 19, 2015

apaleslimghost commented Nov 19, 2015

jeromew commented Nov 19, 2015

svozza commented Nov 27, 2015

vqvu commented Nov 28, 2015

svozza commented Nov 28, 2015