fwrite(): final items #1664

mattdowle · 2016-04-20T17:22:53Z

eantonya · 2016-04-20T18:24:12Z

Do people actually like having quote=TRUE when writing to csv? I find it to be a big nuisance and would much prefer for fwrite to have quote=FALSE by default.

MichaelChirico · 2016-04-20T18:33:54Z

I find quote = TRUE to be more robust -- you never know when you have a JAMES SMITH, JR in a character column and it can be a huge pain to get a .csv read when it has nuisance commas strewn about.

mattdowle · 2016-04-20T18:34:16Z

@eantonya Agree. I prefer quote=FALSE too. The base R thinking I believe has numbers/ids with leading 0's stored as character format ... the default ensures they get read by Excel as character and the leading 0's not lost. But fwrite could detect that situation and quote just that situation by default. Where character columns contain letters and no embedded quotes, I really don't see why quotes are needed. Plus we save a bit on file size by saving the 2 extra quotes per field.

mattdowle · 2016-04-20T18:37:43Z

@MichaelChirico Agree with you too. fwrite can detect that and put the quotes in those situations. fwrite already does a first-pass through all strings to calculate maximum line length before allocating buffer sizes. It could test if there are any sep or quote in the string at that point. So I guess I'm suggesting quote='auto' by default.

MichaelChirico · 2016-04-20T19:50:54Z

@mattdowle great, good point. Should only marginally affect speed then.

PS IIRC Excel converts "001" to 1 anyway :|

mattdowle · 2016-04-20T20:46:41Z

@MichaelChirico Now you mention it I do seem to remember Excel doing that. I haven't used Excel for many years now thankfully.

jangorecki · 2016-04-20T21:02:46Z

I would assume Excel behave in an inconsistent (os versions, office versions, os locales, office localces, 365s, etc.) way about that matter.

rafapereirabr · 2016-04-25T10:45:53Z

Are you planning to include the append = T ? Please?! Anyway, congrats for the great job with data.table that will become even greater with fwrite() !

MichaelChirico · 2016-04-25T19:45:55Z

@rafapereirabr? append = TRUE in fact works for me, are you suggesting that should be the default?

rafapereirabr · 2016-04-25T20:08:52Z

@MichaelChirico , I didn't know it was already implemented ! I couldn't try it as I was planning to test it tonight. Just ignore my comment then. ps. I don't think this should be the default.

MichaelChirico · 2016-06-17T09:51:19Z

[ Update : quote='auto' now fully implemented ]

Can we please set quote = TRUE to default until auto is supported?

I'm being royally screwed right now by having written a data file I needed to carry remotely with quote = FALSE by accident. The file is now basically un-usable because of all the unpredictable commas scattered in some string fields, which really sucks because I have no way of fixing the mistake (other than by hand).

quote = "auto" sounds like the best solution, but I would hate for this to happen to others in the meantime.

Until then, the marginal cost to FALSE of adding " (conservatively, a 5% speed/file size hit) seems to be far outweighed by the cost to TRUE of creating un-usably dirty data files.

MichaelChirico · 2016-08-24T19:21:08Z

[ Update: now fixed and fwrite is consistent with write.csv ]

I find it a bit odd that fwrite distorts numerics, e.g.:

fwrite(data.table(a = -75.16374), "test.csv")

Has output:

a
-7.516374E1

fread and read.csv indeed recognize this as a numeric column still (fread("test.csv")$a is numeric), but I'm not sure this is robust across all readers -- I've got in mind outputting a .csv from R and sharing it with users who may be using any platform.

Not sure the ideal approach, as floating points are always going to cause headaches...

Also note that write.csv doesn't have this effect.

mattdowle · 2016-11-02T20:22:40Z

integer64 implemented: 6d55d2f

…stimate based on sample for efficiency and to prep for sep2 now we can realloc the buffers if needed. #1664

…ngest from sample. #1664

HughParsonage · 2016-11-07T00:37:37Z

Can you clarify what you are looking for in Confirm fwrite() writes 10GB ok on Windows (it should do) to ensure 'big' file > 4GB ok. User help needed please as we don't have Windows other than via AppVeyor for test suite.? I managed to do it (Windows 10) on a 14Gb file -- with a minor bug (there's a s. printed on the far right of the console afterwards).

It's also tremendously fast: less than a minute (fread takes 5 minutes; readRDS takes 3:40).

mattdowle · 2016-11-07T08:56:51Z

@HughParsonage Perfect - that's a pass then. Thanks! Windows has different C functions for reading from files bigger than 4GB so it was feasible that something extra was required for writing too.
I'll increase the blanking width that clears the progress status ... sounds like what the s. is.

…Parsonage, #1664

…umns are present. Changed default sep2 from ; to | to distinguish it more from sep=, default. #1664

…d'|'yyyymmdd'|'epoch'

MichaelChirico · 2016-11-11T16:58:29Z

excellent stuff Matt, thanks so much!!

…t C level. Closes #1903. #1664.

skanskan · 2016-11-18T10:34:32Z

Do we need to use "library(bit64)" with fwrite and fread when we have long numbers or not anymore?

stanislav-a · 2016-11-21T12:15:42Z

Thank you very much for your work, this feature is really useful.
But last version looks not very stable.

I caught 2 strange issues:

showProgress should be set explicitly:

a <- c("1", "2", "3", "4", "5")
d <- rep("2016-11-21", 5)
c <- rep("a", 5)
m <- rep("0.5", 5)

data<-data.table(a, d, c, m)

fwrite(data, "e:/tmp_buf/tmp.csv", sep="~",
       col.names=FALSE, append=FALSE, ..turbo = T, quote = F)


#Error: isLOGICAL(showProgress) is not TRUE

eol delimiter does not work correctly

fwrite(data, "e:/tmp_buf/tmp.csv", sep="~",
       col.names=FALSE, append=FALSE, ..turbo = T, quote = F, showProgress = T)

#result:
#"1"~"2016-11-21"~"a"~"0.5""2"~"2016-11-21"~"a"~"0.5""3"~"2016-11-21"~"a"~"0.5""4"~"2016-11-21"~"a"~"0.5""5"~"2016-11-21"~"a"~"0.5"
#No eol delimeters

fwrite(data, "e:/tmp_buf/tmp.csv", sep="~",
       eol = "\r\n",
       col.names=FALSE, append=FALSE, ..turbo = T, quote = F, showProgress = T)
#result:
#"1"~"2016-11-21"~"a"~"0.5""2"~"2016-11-21"~"a"~"0.5""3"~"2016-11-21"~"a"~"0.5""4"~"2016-11-21"~"a"~"0.5""5"~"2016-11-21"~"a"~"0.5"
#Still no eol delimeters

Also I would like to know, how can I write .csv files without scientific notation. For example, 93434234223523523.5 converts to 9.34342342235235E+016. But if I want to use this file for bulk insert I'll have problems. Can I set explicitly number of decimal places?

jangorecki · 2016-11-21T14:10:31Z

@stanislav-a It would be useful if you could provide your sessionInfo() and read.dcf(system.file("DESCRIPTION", package="data.table"), "Commit"). And ideally re-run on latest version as there were lots of improvements made recently. I'm on linux and cannot reproduce problems you reported. Re scientific notation, it will round your number 93434234223523523.5 on writing, similarly to write.csv. Exact floating point range is mentioned in manual ?fwrite.
@skanskan how do you store your long numbers without bit64? if you store it as double, it will be processed as double and you don't need bit64.

stanislav-a · 2016-11-21T15:13:56Z

@jangorecki I reinstall package with last commit, now it works fine, thank you.

thvasilo · 2016-12-02T11:51:16Z

I can confirm the first issue @stanislav-a has mentioned, I'm on the 1.9.8 release.

I use the following generated file, it's a simple csv file: https://gist.github.com/thvasilo/6edffdccda87f09572cbc4184662af47

surv_1k <- fread("surv_1k.csv")

fwrite(surv_1k, "copy.csv")

# Error: isLOGICAL(showProgress) is not TRUE

Session info:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] purrr_0.2.2      caret_6.0-73     ggplot2_2.2.0    lattice_0.20-34  data.table_1.9.8

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7        magrittr_1.5       splines_3.3.2      MASS_7.3-45       
 [5] munsell_0.4.3      colorspace_1.2-6   foreach_1.4.3      minqa_1.2.4       
 [9] stringr_1.1.0      car_2.1-4          plyr_1.8.4         tools_3.3.2       
[13] parallel_3.3.2     nnet_7.3-12        pbkrtest_0.4-6     grid_3.3.2        
[17] gtable_0.2.0       nlme_3.1-128       mgcv_1.8-16        quantreg_5.29     
[21] MatrixModels_0.4-1 iterators_1.0.8    lme4_1.1-12        lazyeval_0.2.0    
[25] assertthat_0.1     tibble_1.2         Matrix_1.2-7.1     nloptr_1.0.4      
[29] reshape2_1.4.2     ModelMetrics_1.1.0 codetools_0.2-15   stringi_1.1.1     
[33] scales_0.4.1       stats4_3.3.2       SparseM_1.74

I haven't tried the latest master.

MichaelChirico · 2016-12-02T12:30:57Z

please update, Matt just fixed this

…

On Dec 2, 2016 6:51 AM, "Theodore Vasiloudis" ***@***.***> wrote: I can confirm the first issue @stanislav-a <https://github.com/stanislav-a> has mentioned, I'm on the 1.9.8 release. I use the following generated file, it's a simple csv file: https://gist.github.com/thvasilo/6edffdccda87f09572cbc4184662af47 surv_1k <- fread("surv_1k.csv") fwrite(surv_1k, "copy.csv") # Error: isLOGICAL(showProgress) is not TRUE Session info: > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] purrr_0.2.2 caret_6.0-73 ggplot2_2.2.0 lattice_0.20-34 data.table_1.9.8 loaded via a namespace (and not attached): [1] Rcpp_0.12.7 magrittr_1.5 splines_3.3.2 MASS_7.3-45 [5] munsell_0.4.3 colorspace_1.2-6 foreach_1.4.3 minqa_1.2.4 [9] stringr_1.1.0 car_2.1-4 plyr_1.8.4 tools_3.3.2 [13] parallel_3.3.2 nnet_7.3-12 pbkrtest_0.4-6 grid_3.3.2 [17] gtable_0.2.0 nlme_3.1-128 mgcv_1.8-16 quantreg_5.29 [21] MatrixModels_0.4-1 iterators_1.0.8 lme4_1.1-12 lazyeval_0.2.0 [25] assertthat_0.1 tibble_1.2 Matrix_1.2-7.1 nloptr_1.0.4 [29] reshape2_1.4.2 ModelMetrics_1.1.0 codetools_0.2-15 stringi_1.1.1 [33] scales_0.4.1 stats4_3.3.2 SparseM_1.74 I haven't tried the latest master. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1664 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHQQdTbq8P9plEY4BtUrdCKBRDlYvQP2ks5rEAY5gaJpZM4IL7OV> .

david-awam-jansen · 2016-12-05T18:42:39Z

I just upgraded to 1.9.8 today and still have the same issue.
When I try and save a csv file using fwrite I still get "# Error: isLOGICAL(showProgress) is not TRUE"

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] xtable_1.8-2 lubridate_1.6.0 ggrepel_0.6.3 data.table_1.10.0 cowplot_0.7.0 ggplot2_2.2.0 RPostgreSQL_0.4-1 DBI_0.5-1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.8 assertthat_0.1 grid_3.3.2 plyr_1.8.4 gtable_0.2.0 magrittr_1.5 scales_0.4.1 stringi_1.1.2 lazyeval_0.2.0 tools_3.3.2
[11] stringr_1.1.0 munsell_0.4.3 colorspace_1.3-0 knitr_1.15 tibble_1.2

jangorecki · 2016-12-05T18:52:10Z

@MichaelChirico David is already on 1.10 according to session info.
@david-awam-jansen Please open new issue with you report. If possible include code to reproduce (at least on your machine), but please include only relevant part. Now I see in your session info you have many other unrelated packages loaded. Before reporting it is always good to ensure that issue is reproducible in clean session in R console. #1111 is same issue but on fread, you may try one solution from there:

what solved the problem is the closing of all R sessions running on the computer before installing data.table

mattdowle added the benchmark label Apr 20, 2016

mattdowle added this to the v1.9.8 milestone Apr 20, 2016

mattdowle changed the title ~~fwrite final items~~ fwrite(): final items Apr 20, 2016

mattdowle added enhancement and removed benchmark labels Apr 20, 2016

jangorecki added the fread label May 12, 2016

arunsrinivasan added fwrite and removed fread labels May 13, 2016

MichaelChirico mentioned this issue May 19, 2016

fwrite with POSIX type #1715

Closed

arunsrinivasan mentioned this issue Jul 14, 2016

fwrite doesn't save dates #1772

Closed

arunsrinivasan modified the milestones: v2.0.0, v1.9.8 Aug 26, 2016

MichaelChirico mentioned this issue Sep 2, 2016

Interim reset of default for quote on fwrite #1838

Closed

MichaelChirico mentioned this issue Sep 19, 2016

Fwrite fails when writing integer64 #1850

Closed

mattdowle added a commit that referenced this issue Oct 29, 2016

fwrite scientific/decimal format to exactly match write.csv, #1664

6c1ed96

mattdowle added a commit that referenced this issue Nov 3, 2016

fwrite quote='auto' implemented and tests added. #1664

601d2df

mattdowle added a commit that referenced this issue Nov 3, 2016

fwrite gains dec for changing '.' to ','. #1664

bcf60ac

mattdowle added a commit that referenced this issue Nov 3, 2016

fwrite gains progress meter, #1664

eb9f7ef

mattdowle added a commit that referenced this issue Nov 4, 2016

Added comments to fwrite.c as to why not interruptable for now, #1664

1a4263f

mattdowle added a commit that referenced this issue Nov 4, 2016

Refined fwrite error messages, #1664

28502a9

mattdowle added a commit that referenced this issue Nov 5, 2016

fwrite gains logicalAsInt, buffMB and nThread. maxLineLength now an e…

0d7bdf0

…stimate based on sample for efficiency and to prep for sep2 now we can realloc the buffers if needed. #1664

mattdowle added a commit that referenced this issue Nov 5, 2016

fwrite thread safe realloc buffers when longer lines are seen than lo…

35cac46

…ngest from sample. #1664

mattdowle added a commit that referenced this issue Nov 7, 2016

fwrite sep2 implemented. #1664. Closes #806.

bb12b7b

mattdowle added a commit that referenced this issue Nov 7, 2016

Increased progress meter blanking width, thanks to testing from Hugh …

4364b61

…Parsonage, #1664

mattdowle added a commit that referenced this issue Nov 7, 2016

Added test of quote='auto' containing sep2 when and when not list col…

0b656b2

…umns are present. Changed default sep2 from ; to | to distinguish it more from sep=, default. #1664

mattdowle mentioned this issue Nov 7, 2016

First version of the fwrite function #580 #1613

Merged

mattdowle added a commit that referenced this issue Nov 7, 2016

Tidied ?fwrite, #1664. Added tests for plain list() input.

33f63a8

mattdowle added a commit that referenced this issue Nov 7, 2016

Simplified fwrite example. #1664

bcefe8d

mattdowle added a commit that referenced this issue Nov 8, 2016

fwrite ITime implemented, #1664

cac3b6e

mattdowle added a commit that referenced this issue Nov 9, 2016

fwrite Date and IDate implemented, #1664. Including dateAs='yyyy-mm-d…

de932e0

…d'|'yyyymmdd'|'epoch'

mattdowle closed this as completed in 0f10613 Nov 11, 2016

mattdowle added a commit that referenced this issue Nov 11, 2016

fwrite > 1e6 columns fixed stack overflow segfault by removing VLAs a…

4fb148f

…t C level. Closes #1903. #1664.

mattdowle mentioned this issue Nov 18, 2016

[R-Forge #2622] Add command "fwrite" to faster save csv files #580

Closed

franknarf1 mentioned this issue Feb 1, 2018

Use lookup (join to dictionary) for performance boost #2603

Open

jangorecki added the openmp label May 30, 2018

MichaelChirico added the idate/itime label Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fwrite(): final items #1664

fwrite(): final items #1664

mattdowle commented Apr 20, 2016 •

edited

Loading

eantonya commented Apr 20, 2016

MichaelChirico commented Apr 20, 2016

mattdowle commented Apr 20, 2016 •

edited

Loading

mattdowle commented Apr 20, 2016 •

edited

Loading

MichaelChirico commented Apr 20, 2016 •

edited

Loading

mattdowle commented Apr 20, 2016

jangorecki commented Apr 20, 2016 •

edited

Loading

rafapereirabr commented Apr 25, 2016

MichaelChirico commented Apr 25, 2016

rafapereirabr commented Apr 25, 2016

MichaelChirico commented Jun 17, 2016 •

edited by mattdowle

Loading

MichaelChirico commented Aug 24, 2016 •

edited by mattdowle

Loading

mattdowle commented Nov 2, 2016

HughParsonage commented Nov 7, 2016 •

edited

Loading

mattdowle commented Nov 7, 2016 •

edited

Loading

MichaelChirico commented Nov 11, 2016

skanskan commented Nov 18, 2016 •

edited

Loading

stanislav-a commented Nov 21, 2016

jangorecki commented Nov 21, 2016 •

edited

Loading

stanislav-a commented Nov 21, 2016

thvasilo commented Dec 2, 2016

MichaelChirico commented Dec 2, 2016 via email

david-awam-jansen commented Dec 5, 2016

jangorecki commented Dec 5, 2016 •

edited

Loading

fwrite(): final items #1664

fwrite(): final items #1664

Comments

mattdowle commented Apr 20, 2016 • edited Loading

eantonya commented Apr 20, 2016

MichaelChirico commented Apr 20, 2016

mattdowle commented Apr 20, 2016 • edited Loading

mattdowle commented Apr 20, 2016 • edited Loading

MichaelChirico commented Apr 20, 2016 • edited Loading

mattdowle commented Apr 20, 2016

jangorecki commented Apr 20, 2016 • edited Loading

rafapereirabr commented Apr 25, 2016

MichaelChirico commented Apr 25, 2016

rafapereirabr commented Apr 25, 2016

MichaelChirico commented Jun 17, 2016 • edited by mattdowle Loading

MichaelChirico commented Aug 24, 2016 • edited by mattdowle Loading

mattdowle commented Nov 2, 2016

HughParsonage commented Nov 7, 2016 • edited Loading

mattdowle commented Nov 7, 2016 • edited Loading

MichaelChirico commented Nov 11, 2016

skanskan commented Nov 18, 2016 • edited Loading

stanislav-a commented Nov 21, 2016

jangorecki commented Nov 21, 2016 • edited Loading

stanislav-a commented Nov 21, 2016

thvasilo commented Dec 2, 2016

MichaelChirico commented Dec 2, 2016 via email

david-awam-jansen commented Dec 5, 2016

jangorecki commented Dec 5, 2016 • edited Loading

mattdowle commented Apr 20, 2016 •

edited

Loading

mattdowle commented Apr 20, 2016 •

edited

Loading

mattdowle commented Apr 20, 2016 •

edited

Loading

MichaelChirico commented Apr 20, 2016 •

edited

Loading

jangorecki commented Apr 20, 2016 •

edited

Loading

MichaelChirico commented Jun 17, 2016 •

edited by mattdowle

Loading

MichaelChirico commented Aug 24, 2016 •

edited by mattdowle

Loading

HughParsonage commented Nov 7, 2016 •

edited

Loading

mattdowle commented Nov 7, 2016 •

edited

Loading

skanskan commented Nov 18, 2016 •

edited

Loading

jangorecki commented Nov 21, 2016 •

edited

Loading

jangorecki commented Dec 5, 2016 •

edited

Loading