Trying to concat rows of ~55000 CSV files with a cumulative size of 1.4gb, xsv killed by oom_reaper #230

smabie · 2020-07-27T19:04:16Z

Hi, so I have 55161 csv files in a directory (1.csv to 55161.csv) . I'm trying to concat them all with:

xsv cat rows $(ls *.csv | sort -n) -o daily.csv

But xsv is being killed by the oom_reaper after exhausting all of my 32gb of RAM. Does anyone know why this is happening? I wouldn't really expect sxv cat to use very much memory at all, much less over 30gb of memory!

Does anyone know what's going on?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2020-07-27T19:08:55Z

Please provide a reproduction. If you can't share the data, then please consider obfuscating or censoring it somehow. Indeed, this command should use very little memory. Its code is very simple and it is implemented in a straight-forward streaming fashion:

xsv/src/cmd/cat.rs

Lines 71 to 84 in 3de6c04

    
           fn cat_rows(&self) -> CliResult<()> { 
        
               let mut row = csv::ByteRecord::new(); 
        
               let mut wtr = Config::new(&self.flag_output).writer()?; 
        
               for (i, conf) in self.configs()?.into_iter().enumerate() { 
        
                   let mut rdr = conf.reader()?; 
        
                   if i == 0 { 
        
                       conf.write_headers(&mut rdr, &mut wtr)?; 
        
                   } 
        
                   while rdr.read_byte_record(&mut row)? { 
        
                       wtr.write_byte_record(&row)?; 
        
                   } 
        
               } 
        
               wtr.flush().map_err(From::from) 
        
           }

The only thing that's required is that each row must fit into memory.

smabie · 2020-07-27T19:37:44Z

Okay, here's a link to the tarball: https://drive.google.com/file/d/19UdCh9qFeuZsy1JOYUQvEPl773EVuvVc/view

So, steps to reproduce:

tar xf data.tar.gz
cd data
xsv cat rows *.csv -o out.csv

By looking at top, you'll see that xsv consumes more and more memory until it is killed by the oom_reaper.

Thanks for the help!

smabie · 2020-07-27T19:41:47Z

Oh, and:

$ xsv --version
0.13.0

BurntSushi · 2020-07-28T16:21:36Z

Thank you for the easy reproduction! Unfortunately, this is a problem with the argv parser that xsv uses: docopt/docopt.rs#207

At some point, I'd like to move off that parser and use clap instead. But it's a big refactor.

The only work-around available to you, I think, is to chunk it up into multiple xsv processes. The simplest way to do that is with xargs:

$ find ./ -name '*.csv' -print0 | xargs -0 -n1000 xsv cat rows > ../out.csv

smabie · 2020-07-29T02:49:24Z

Thanks, I ended up just using awk instead:

$ awk '(NR==1)||(FNR>1)' $(ls *.csv | sort -n) > daily.csv

Quite elegant! But I digress, Never thought I would see the day when a command-line parser eats all of my ram, It's probably trying to do something far too clever!

BurntSushi · 2020-07-29T03:09:55Z

awk can't parse csv correctly, so I'd be careful with that. It assumes the first header record only uses a single line, which might be true in your case but isn't in general.

It's probably trying to do something far too clever!

I wrote the parser and abandoned it ages ago, because of this and other problems. The specific problem is that it uses backtracking to implement the "docopt" style. So it goes exponential in the worst case. I'd say it's decidedly not clever.

BurntSushi added the bug label Jul 28, 2020

smabie closed this as completed Jul 29, 2020

pando85 mentioned this issue Dec 23, 2021

Document scripts rash-sh/rash#212

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to concat rows of ~55000 CSV files with a cumulative size of 1.4gb, xsv killed by oom_reaper #230

Trying to concat rows of ~55000 CSV files with a cumulative size of 1.4gb, xsv killed by oom_reaper #230

smabie commented Jul 27, 2020

BurntSushi commented Jul 27, 2020 •

edited

Loading

smabie commented Jul 27, 2020

smabie commented Jul 27, 2020

BurntSushi commented Jul 28, 2020

smabie commented Jul 29, 2020

BurntSushi commented Jul 29, 2020 •

edited

Loading

Trying to concat rows of ~55000 CSV files with a cumulative size of 1.4gb, xsv killed by oom_reaper #230

Trying to concat rows of ~55000 CSV files with a cumulative size of 1.4gb, xsv killed by oom_reaper #230

Comments

smabie commented Jul 27, 2020

BurntSushi commented Jul 27, 2020 • edited Loading

smabie commented Jul 27, 2020

smabie commented Jul 27, 2020

BurntSushi commented Jul 28, 2020

smabie commented Jul 29, 2020

BurntSushi commented Jul 29, 2020 • edited Loading

BurntSushi commented Jul 27, 2020 •

edited

Loading

BurntSushi commented Jul 29, 2020 •

edited

Loading