api: add record-oriented IO functions #41

Freaky · 2020-02-28T05:41:05Z

This adds:

byte_records() iterator
for_byte_terminated_record_with_terminator()
for_byte_terminated_record()

Which are basically parametrised versions of the existing functions for reading lines.

Freaky · 2020-02-28T05:41:47Z

I blame any mistakes on Japanese whisky.

BurntSushi

Thanks! I left a few comments that we should address before merging. But overall this looks great.

In terms of whether we want these APIs though... Could you say more about your use case for them? I grant that they are a nice convenience, but still, always good to hear about use cases.

BurntSushi · 2020-02-28T11:29:50Z

src/io.rs

+    /// assert_eq!(records[2], "dolor".as_bytes());
+    /// # Ok(()) }; example().unwrap()
+    /// ```
+    fn for_byte_terminated_record<F>(


Why the inconsistency in naming? Seems like this should be for_byte_record given that you have byte_records above. It's also shorter. :-)

That was actually the last bit I added - I was leaning towards being more explicit, but I guess I was a bit sick of it by the time I got to the iterator :)

BurntSushi · 2020-02-28T11:33:47Z

src/io.rs

@@ -208,6 +321,20 @@ pub struct ByteLines<B> {
    buf: B,
 }

+/// An iterator over records from an instance of
+/// [`std::io::BufRead`](https://doc.rust-lang.org/std/io/trait.BufRead.html).


Could this add a sentence explaining what a byte record is? e.g., "A byte record is any sequence of bytes terminated by a particular byte chosen by the caller. For example, NUL separate byte strings are said to be NUL-terminated byte records."

BurntSushi · 2020-02-28T11:36:10Z

src/io.rs

+            if chunk.last() == Some(&terminator) {
+                for_each_record(&chunk[0..chunk.len() - 1])
+            } else {
+                for_each_record(&chunk)


Just to keep things consistent, maybe rename trim_slice to trim_line_slice and add trim_record_slice so that it can be used here. Or at least, it would be nice to call for_each_record in only one place in the source.

Freaky · 2020-02-28T21:09:39Z

Regarding use-case, I mainly see it being used for apps wanting to support -print0-style output. I'm sure there are the odd weird data files using other delimiters too - someone has to have used these, right?

This article is what prompted me - she wrote her own custom implementation, lamenting the effort it took. Which is bit of a shame when I've had a previous iteration of this lying on my disk forgotten for the past 6 months.

thomcc · 2020-02-28T21:24:20Z

I also mentioned similar use cases in #12 (comment) -- I've worked with files that delimited entries with \xff before (as well as \x00). That said, for my case it would have been better on ByteSlice rather than on io (sorry, I missed where this was in my initial comment).

Freaky · 2020-02-28T21:38:27Z

Note there are iterators included here - they're just thin wrappers around read_until(), and each iteration involves allocating a vec and filling it from BufRead's internal buffer.

The callback functions simply lend out slices to the buffer, only copying when the buffer does not contain a full record. You can't do this with an iterator, and this can have a substantial impact on performance.

thomcc · 2020-02-28T21:46:33Z

Yeah, I missed that this was in the io module, and thus the iterator approach involved a good amount of overhead, and edited my comment to remove that bit.

This mirrors the line iterating methods, but permits the caller to specify an arbitrary terminator. Closes #41

Freaky force-pushed the byte-record branch from af2cb60 to f3e6c49 Compare February 28, 2020 05:45

BurntSushi requested changes Feb 28, 2020

View reviewed changes

api: add record-oriented IO functions

002107b

Freaky force-pushed the byte-record branch from f3e6c49 to 002107b Compare February 28, 2020 21:51

Freaky requested a review from BurntSushi February 28, 2020 23:24

BurntSushi pushed a commit that referenced this pull request May 10, 2020

api: add byte record oriented IO functions

342e443

This mirrors the line iterating methods, but permits the caller to specify an arbitrary terminator. Closes #41

BurntSushi closed this in 8b26a9f May 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api: add record-oriented IO functions #41

api: add record-oriented IO functions #41

Freaky commented Feb 28, 2020

Freaky commented Feb 28, 2020

BurntSushi left a comment

BurntSushi Feb 28, 2020

Freaky Feb 28, 2020

BurntSushi Feb 28, 2020

BurntSushi Feb 28, 2020

Freaky commented Feb 28, 2020

thomcc commented Feb 28, 2020 •

edited

Loading

Freaky commented Feb 28, 2020

thomcc commented Feb 28, 2020

api: add record-oriented IO functions #41

api: add record-oriented IO functions #41

Conversation

Freaky commented Feb 28, 2020

Freaky commented Feb 28, 2020

BurntSushi left a comment

Choose a reason for hiding this comment

BurntSushi Feb 28, 2020

Choose a reason for hiding this comment

Freaky Feb 28, 2020

Choose a reason for hiding this comment

BurntSushi Feb 28, 2020

Choose a reason for hiding this comment

BurntSushi Feb 28, 2020

Choose a reason for hiding this comment

Freaky commented Feb 28, 2020

thomcc commented Feb 28, 2020 • edited Loading

Freaky commented Feb 28, 2020

thomcc commented Feb 28, 2020

thomcc commented Feb 28, 2020 •

edited

Loading