-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Records list now accepts any iterable, not only an array #11
Records list now accepts any iterable, not only an array #11
Conversation
Previously there were several calls to `input.map` which work for arrays but not for generators or most other iterable objects. The code has been changed to use a `for(x of y)` loop instead of `map` functions. (As a side-effect, the new code is also more performant and memory-efficient!)
@@ -16,10 +16,12 @@ class AbstractCsvStringifier { | |||
} | |||
|
|||
stringifyRecords(records) { | |||
const csvLines = records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it also work if we just change records
to Array.from(records)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be functional, but it would be very inefficient.
Hello @pineapplemachine , thank you for your PR! Code looks good to me except one trivial point I asked in the review. Do you want to share the context of why you want it to accept iterable instead of just an array? When we introduce new functionalities, I want to understand how they help you (not how they may help you in the future). This is something I try to ask people including myself 😉 It could be just you heavily use generator and without this you have to do |
I am working on adding functionality to a web backend to export certain data in CSV format. It is potentially rather a lot of data. When I installed the package and used your package for the CSV export functionality, I was surprised to receive an error when I passed in my list of records. When I looked at the source I saw that this was because you assume that the input has a There are many advantages to accepting any iterable, not only arrays, but the one that I need is that the function can accept generators. (Generators are iterators that do not contain the entire result, but compute the output list one element at a time, as each element as needed.) Note also that each call to
Accepting a generator and not using
Since the application I am working on needs to be able to export large datasets, it is advantageous to pass a generator instead of an array. If the function effectively stores the dataset in memory four times over, I run the real risk of servers running out of ram. |
I wrote before,
I realize now that I chose my words poorly and this may have led to confusion. My uncertainty was not about this implementation being more efficient than what was there before, but about the performance and memory characteristics of string concatenation vs. joining. It may be worth investigating later how concatenation (what's in the PR) compares to eagerly building an array of row strings and joining, and how it compares to using Array.prototype.join.call with a generator function as input to transform the input records into string rows. (If Array.join will in fact accept an arbitrary iterable, which I don't know off the top of my head.) |
To find out without a doubt, I wrote a script to determine how the performance characteristics actually compare. I have determined that V8 JIT works in mysterious ways. Any time I think I know about the performance characteristics of code, V8 turns out to be doing something strange. Using Array.prototype.join with a generator sadly did not work. Simply adding Overall, this implementation of the method seems to come out on top, stringifyRecords(records) {
const array = [];
for (let record of records) {
array.push(this._getCsvLine(this._getRecordAsArray(record)));
}
array.push('');
return array.join(RECORD_DELIMITER);
} I also tested this against using It's also worth noting that in V8, loops of the form Here are the full results of the tests on my machine, using node v10.7.0. Here's the code to reproduce the results: https://gist.github.com/pineapplemachine/ec5f2356b6470729084f022441d0954c
Anyway, that was an interesting exercise. I expect I will update the PR to use the join implementation. |
See #11 (comment) for an in depth explanation
Thanks for the detailed explanations and the performance benchmark! Few thoughts.
So, in short, I'm not too convinced about the performance improvements necessity, I'm happy with your change. (But without continuous performance test proving the benefit, someday someone might, by accident, change it to immutable code...) |
Unfortunately, JavaScript does not have immutability, which means writing code as you would for a language that does have immutability will get you poor performance. Languages with good immutability features are able to back that up with good optimizations. JavaScript does not. As you can see in the benchmarks I ran, the implementation in the PR runs in roughly half the time of your Since I'm not writing to a file it had not occurred to me to use streams. I'll have to look into it more to see if it's possible in my case. A little bit of googling conveys a resounding "maybe". Regardless, I can't think of any reason not to use the implementation in this PR. If you're concerned about continuous performance tests, then I encourage you to add them. All the tools you might need are in that gist linked in my previous post. |
I'm not trying to achieve the true immutability as you can see in my code, e.g. private methods are just relying on underscore naming convention and not even using closures. But avoiding variable reassignment or object updates makes it easy to reason the code and that's what I'm after. Whether you write large data in a single But as I wrote in my previous comment, I'm going to merge the change. Thanks for your contribution 👍 |
I found that I can pass an optional map function to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from#Syntax Below is the benchmark with your snippet, the new version is So it's good enough for me to keep the functional way of writing it. I decided to go with this.
|
Released as v1.2.0 |
Previously there were several calls to
input.map
inAbstractCsvStringifier.stringifyRecords
, which works for arrays but not for generators or most other iterable objects. The code has been changed to use afor(x of y)
loop instead ofmap
functions.Also added two new unit tests to verify this behavior.
As a side-effect, the new code is also more performant and memory-efficient! (edit: Probably? I haven't actually benchmarked. Depends on how string concatenation compares to building an array and then calling join, I guess)