-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream from stdin rather than doing stdin.readlines()
#138
Comments
stdin.readlines()
.
stdin.readlines()
.stdin.readlines()
On Wed, Nov 30 2016, Peter VandeHaar wrote:
I'd like for stdin to only be read as needed. This could probably also
let `tabview` work with iterators when used from Python, but I don't
use that so I don't know.
This is hard with the current code.
(Do you know of another tool that does this? I haven't found one.)
Never found one, even though that's something I'd like as well.
You could actually cheat with a buffering program that, besides
buffering, sends EOF at regular intervals (so that you could just reload
the file live in tabview), but the ones that I know don't do strictly
that.
The problems this introduces are:
1. When `self.column_width_mode` is `max` or `mode`, the width won't
reflect rows that haven't been read yet.
This wouldn't be a problem really, if you show what's going on.
Triggering a recalculation is generally more user-friendly than
auto-sizing the columns randomly.
If I start work on a PR, do you have recommendations?
Godspeed? ;)
I'm not sure I fully understood the implementation details.
Your plan is just to keep appending on csv_data directly as far as I
understood.
In this case, I would keep an initial buffer *in* the generator to
perform the encoding detection *and* padding which is unrelated to what
the viewer is going.
The viewer shouldn't be concerned with any part of the reading process.
Just provide it with a matrix to show. This way, as the data comes in,
you can append to csv_data into chunks and update the internal state as
little as possible.
If you see it the other way around, if you have a data structure you
want to show in tabview, when used as a module, you'd like to skip all
this process entirely.
|
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently, tabview doesn't work well when used with large or unending files. For example,
cat /dev/urandom | tr -cd "fish,\n" | tabview -
doesn't work.I'd like for stdin to only be read as needed. Maybe this will also let tabview display iterators when used from Python.
(Do you know of any other commandline csv-viewer that does streaming to handle large files? I haven't found one.)
Changes that will be needed:
process_data
needs to be a generator. Thenview()
will dodata_processor = process_data(...)
.Viewer
will docsv_data.append(next(data_processor))
when it reaches the end ofcsv_data
.detect_encoding()
will be run on the first 1000 lines to determineenc
. After those lines are exhausted,detect_encoding()
will be run on each new line, updatingenc
if needed.pad_data()
can't happen inprocess_data
.Viewer
will runcsv_data = pad_data(csv_data)
if a new line fromdata_processor
is longer thanself.num_data_columns
.Viewer
needs to have a few minor changes.Forward searching will still work, and will just rapidly consume lines from
data_processor
.When the user tries to sort,
Viewer
will docsv_data.extend(data_processor)
, which might take too long or possibly forever. User's problem.Later on, it'd be fun to make mode and max column widths update as new data is read in, by storing the
collections.Counter()
, updating it for each new line, and updatingself.column_width
as needed.The problems this introduces are:
self.column_width_mode
ismax
ormode
, the width won't reflect rows that haven't been read yet.If you consider these drawbacks quite bad, I'd be happy with a flag
--stream
.If I start work on a PR, do you have any recommendations?
The text was updated successfully, but these errors were encountered: