Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TableTools.readCsv validation of column names happens after all the loading work has been done. #1561

Closed
jcferretti opened this issue Nov 11, 2021 · 7 comments
Assignees
Labels
bug Something isn't working csv
Milestone

Comments

@jcferretti
Copy link
Member

For a huge CSV file that means you may have been waiting for several minutes for the data to load to only then blow up because a column has a name that doesn't validate.

We should do column name validation earlier, before the bulk of the load work.

@jcferretti jcferretti added bug Something isn't working triage labels Nov 11, 2021
@jcferretti jcferretti added this to the Backlog milestone Nov 11, 2021
@rcaudy rcaudy added csv and removed triage labels Nov 11, 2021
@rcaudy rcaudy modified the milestones: Backlog, Nov 2021 Nov 11, 2021
@jcferretti
Copy link
Member Author

The particular case I hit this is interesting. By default, Pandas's dataframe.to_csv method writes the index of the dataframe alongside the columns (unless you explicitly ask the call to omit it adding index=False to the named arguments). The index is written as the first column, but with no column name, with the net result of the header looking like this:

,column_1,column_2,column_3

And DHC trips over the empty column name with the validation exception. Interestingly enough, DHE bard loaded the same file no problem.

Example:

cfs@erke 14:18:46 ~/dh/oss1/deephaven-core/data
$ head -n 10 /a1/tmp/workqueue-no-nulls-100m.csv 
,workqueue_id,user_id,timestamp
0,1,1001,2021-11-07 14:00:00.011463807+00:00
1,2,1003,2021-11-07 14:00:00.080748962+00:00
2,3,1002,2021-11-07 14:00:00.114976256+00:00
3,4,1004,2021-11-07 14:00:00.118275772+00:00
4,5,1003,2021-11-07 14:00:00.131173174+00:00
5,6,1001,2021-11-07 14:00:00.174901899+00:00
6,7,1001,2021-11-07 14:00:00.194499326+00:00

DHE makes up a column name of Column1 for the first column above.

@devinrsmith
Copy link
Member

#1469 is relevant to empty column names

@devinrsmith
Copy link
Member

I'm happy to add empty column name support for our read csv - but I'm a bit curious why pandas would choose to write out an empty column name by default. Is there a way we can dataframe.to_csv w/ a named index column?

@devinrsmith
Copy link
Member

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv:

index_labelstr or sequence, or False, default None

    Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.

@jcferretti
Copy link
Member Author

jcferretti commented Nov 15, 2021

The particular file I was working with was obtained via

import pyarrow.parquet as pq
df = pq.read_table("/blah.parquet").to_pandas()
df.to_csv("/bluh.csv")

So the index for the dataframe that was output to CSV was what was created by the conversion from pyarrow table to dataframe. Which I think is likely to just be the default index that is created by dataframe contructors when one is not provided, namely, a RangeIndex(0..rows-1) with no name.

If I am right in that guess, it means that a default initialized dataframe, when no index is provided explicitly during construction, results in a dataframe with an index with no name. Which, when output to CSV, creates a corresponding column with empty name. Which one could argue is a pretty sad outcome, but maybe this is the world we live in.

@pete-petey pete-petey reopened this Dec 31, 2021
@pete-petey pete-petey modified the milestones: Nov 2021, Jan 2022 Dec 31, 2021
@pete-petey pete-petey assigned kosak and unassigned devinrsmith Dec 31, 2021
@pete-petey
Copy link
Member

@devinrsmith and @jcferretti to watch it.

@kosak
Copy link
Contributor

kosak commented Jan 11, 2022

Fixed in #1629

@kosak kosak closed this as completed Jan 11, 2022
mofojed pushed a commit that referenced this issue Jan 17, 2024
# [0.59.0](deephaven/web-client-ui@v0.58.0...v0.59.0) (2024-01-17)


### Bug Fixes

* GoToRow timestamp fails when selected row is out of view ([#1717](deephaven/web-client-ui#1717)) ([9ddc973](deephaven/web-client-ui@9ddc973)), closes [#1561](deephaven/web-client-ui#1561)
* Interface for IrisGridTableModelTemplate.backgroundColorForCell ([#1699](deephaven/web-client-ui#1699)) ([73e1837](deephaven/web-client-ui@73e1837)), closes [#1697](deephaven/web-client-ui#1697)
* Moved logos so they show in production build ([#1713](deephaven/web-client-ui#1713)) ([a3bea73](deephaven/web-client-ui@a3bea73)), closes [#1712](deephaven/web-client-ui#1712)
* re-colorize command codeblocks when theme changes ([#1731](deephaven/web-client-ui#1731)) ([b1e42f5](deephaven/web-client-ui@b1e42f5))
* TimeInput not triggering onChange on incomplete values ([#1711](deephaven/web-client-ui#1711)) ([6894d96](deephaven/web-client-ui@6894d96)), closes [#1710](deephaven/web-client-ui#1710)


### Features

* Action button tooltips ([#1706](deephaven/web-client-ui#1706)) ([bff6bf9](deephaven/web-client-ui@bff6bf9)), closes [#1705](deephaven/web-client-ui#1705)
* Add support for useDeferredApi ([#1725](deephaven/web-client-ui#1725)) ([51ebe1b](deephaven/web-client-ui@51ebe1b))
* Improved preload variable handling ([#1723](deephaven/web-client-ui#1723)) ([ed41c42](deephaven/web-client-ui@ed41c42)), closes [#1695](deephaven/web-client-ui#1695) [#1679](deephaven/web-client-ui#1679)
* NavTabList component ([#1698](deephaven/web-client-ui#1698)) ([96641fb](deephaven/web-client-ui@96641fb))
* Reject promise immediately if var not found ([#1718](deephaven/web-client-ui#1718)) ([43d40bd](deephaven/web-client-ui@43d40bd)), closes [#1701](deephaven/web-client-ui#1701)
* theming tweaks ([#1727](deephaven/web-client-ui#1727)) ([f919a7e](deephaven/web-client-ui@f919a7e))


### BREAKING CHANGES

* - Subclasses of IrisGridTableModelTemplate or it's subclasses that use
backgroundColorForCell may need to update their signature to accept the
theme if they are calling the superclass


Co-authored-by: deephaven-internal <deephaven-internal@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working csv
Projects
None yet
Development

No branches or pull requests

5 participants