-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Deno.lines #2101
Add Deno.lines #2101
Conversation
if (newLineIndex > -1) { | ||
const line = scratch.slice(0, newLineIndex); | ||
scratch = scratch.slice(newLineIndex + 1); | ||
yield line; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't only one yield per chunk mean scratch could grow linearly?
I think this newLineIndex/yield should be in a while loop.
const decoder = new TextDecoder(); | ||
let scratch = ""; | ||
|
||
for await (const chunk of toAsyncIterator(r)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we could import code from deno_std here. The correct way to solve this is to use BufReader in std/io/bufio.ts
https://github.com/denoland/deno_std/blob/master/io/bufio.ts#L220
In order to use that code, we need to change //js/ to use extensions in the imports... (cc @kitsonk )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is complicated (using extensions). I was planning to tackle it in the medium term. Essentially we have to replace rollup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move bufio into //js then (problem is that it will likely bring all of std/io with it as dependencies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe "lines" should just be in deno_std instead of core..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like BufReader
provides a nice way to solve lines
👍
Reading line by line is such a common operation that IMHO it should be Deno.lines
instead of living in deno_std.
It's greatly illustrated by @ry's example. This is super simple:
cat /etc/passwd | deno eval " \
for await (let line of Deno.lines(Deno.stdin) { \
console.log(line.split(':')[0]) }"
This is not:
cat /etc/passwd | deno eval " \
import { lines } from "https://deno.land/x/io/mod.ts; \
for await (let line of lines(Deno.stdin) { \
console.log(line.split(':')[0]) }"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartlomieju I agree it would be great to have built in to the CLI... but I really want to ensure that we use the proper bufio infrastructure to solve these problems rather than having a bunch of one-off algorithms for buffering. So either we need to first be able to import deno_std code into the CLI (ideal) or move bufio from deno_std into the main repo (less ideal)
Closing this one for now. Thanks @bartlomieju !
for await (const chunk of toAsyncIterator(r)) { | ||
scratch += decoder.decode(chunk); | ||
|
||
let newLineIndex = scratch.indexOf("\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget about windows carriage return line feeds \r\n
. Maybe you could keep the code for finding the index of \n
, but also check the index in scratch before the \n
and check if it's the \r
character then skip both the characters if that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using something like this for the remaining lines too.
let crLf = false;
let idx = newLineIndex;
if(scratch[idx-1]=="\r") {
idx--;
crlf = true;
}
const line = scratch.slice(0, idx);
...
const remainingLines;
if(crLF) {
remainingLines = scratch.split("\n");
} else {
remainingLines = scratch.split("\r\n");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnsonjo4531 you're right, that's where your implementation is better :) let's wait on BufReader
it should solve this issue.
Great job @bartlomieju your code is very clean and simple! One of the differences between my implementation and Bartek's is I tried not to decode mine as a string and kept it in bytes (i.e. Uint8Arrays). I should also remark I am not used to working with byte data, so I could have made some very newbie mistakes, so feel free to leave some comments on my commits in my repo. @bartlomieju also used a lot better tools from Deno for the job. Maybe it would be good to not only have a method for getting the lines as strings (as in @bartlomieju's implemenation), but maybe also have the ability to get the lines as Uint8Arrays (as in my implementation)? I'm not sure the benefit other than preventing overhead of decoding the data as a string just throwing ideas out there. Honestly if we did use both we might just make the string lines implementation wrap the bytes lines implementation just to save build size. It could look like this. Yet again just throwing ideas out there. async function* lines (r: Deno.Reader) {
const encoder = new TextEncoder();
for await(const line of lines_bytes(r)) {
yield encoder.encode(line);
}
} |
Renamed the main file. Decided to follow @ry's suggestion in denoland/deno#2101 (comment) Added two extra examples. Decided not to open and close files for the user, but instead use a Reader.
@johnsonjo4531 thanks I appreciate it. I agree with you, we can solve it simply with: |
This PR adds
Deno.lines
method, mentioned in #2081Alternative implementation by @johnsonjo4531.