-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fread skip doesn't get the header names. #2080
Comments
This is consistent with other readers (e.g. I suppose an option along the lines of |
Thanks @skanskan. Yes I know what you mean and agree. A recent change in dev is that the skip= control determines which line the data starts on. Whether column names or not is now correctly determined by |
I think current behavior is good. I notice
|
That output looks as intended; i.e., no attempt made (not now nor in future) to remove junk lines between the column names and the first data row. If column names are present, they must be on the line immediately before where the data rows start (well, other than blank lines, depending on |
If you view skipped rows as junk lines, this makes sense. However, if you use skipping rows as a way to save reading time, this does not make sense. I have million rows data saved as CSV, which has timestamp in the first column as sorted index. I read the first column first to locate the rows I needed, and then I read the full data between row a and row b use Not saying which way is right or wrong, just present a valid use case for keeping the first row as the header row. |
@jflycn To reliably implement that there should be two different arguments, one for the purpose of skipping junk rows, and another one using for chunking. I recall @st-pasha recently explained why that matters. |
Currently the This would be independent of skipping, so that you can say This could be taken even further: |
@st-pasha, is there another open or resolved issue associated with this? I see it was closed but has no reference to a PR. I am very much interested in this feature. |
When you use fread with the option skip= the file is read skipping the first lines...
That's OK, but there is a small problem, the first line contains the header, and you end up having no column names in your data.table.
I solve the problem using fread twice. By first reading only the first line and saving its content, and later reading the file again skiping as desired, and the renaming the columns.
I think it would be a good idea if fread always read the first line and use it as column names in case you decided so and no matter the skip value.
The text was updated successfully, but these errors were encountered: