Loading multiple .csv files uses ~double the memory it should #952

krynju · 2021-12-18T16:54:16Z

I'm on Julia master/1.8
and Windows

I tried CSV.File, CSV.read with NamedTuples, DTable, DataFrame etc. all have the same issue.
There's just some sticky memory leftover from loading the csv files

The table generated is about 1.6GB

# generate
using DataFrames
d = DataFrame((;[Symbol("a$i") => rand(Int32(1):Int32(1000), Int(1e8)) for i in 1:4]...));
# run GC.gc() a few times, memory usage settles at ~

# prep
using CSV
genchunk = () -> (; [Symbol("a$i") => rand(Int32(1):Int32(1000), Int(1e7)) for i = 1:4]...)
mkpath("data")

for i = 1:10
    CSV.write(joinpath(["data", "datapart_$i.csv"]), genchunk())
end

# load from multiple files
d = CSV.read(files, DataFrame)

generated

loaded from files

krynju · 2021-12-18T16:58:52Z

related to:
#850 (comment)
JuliaLang/julia#42566 (comment)

krynju · 2021-12-19T08:03:52Z

d = CSV.read(files, DataFrame, types=Int32)
forgot it parses as Int64 and that's where my double memory usage was coming from
this sticky memory related to glibc issue is still observable on my end though, but that's a different issue

krynju changed the title ~~Loading a .csv uses ~double the memory than it should~~ Loading a multiple .csv files uses ~double the memory than it should Dec 18, 2021

krynju changed the title ~~Loading a multiple .csv files uses ~double the memory than it should~~ Loading a multiple .csv files uses ~double the memory it should Dec 18, 2021

krynju changed the title ~~Loading a multiple .csv files uses ~double the memory it should~~ Loading multiple .csv files uses ~double the memory it should Dec 18, 2021

krynju mentioned this issue Dec 18, 2021

[DTable] Loading .csv through the loader_function constructor takes allocates double the memory it's supposed to JuliaParallel/Dagger.jl#320

Closed

krynju closed this as completed Dec 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading multiple .csv files uses ~double the memory it should #952

Loading multiple .csv files uses ~double the memory it should #952

krynju commented Dec 18, 2021 •

edited

Loading

krynju commented Dec 18, 2021

krynju commented Dec 19, 2021

Loading multiple .csv files uses ~double the memory it should #952

Loading multiple .csv files uses ~double the memory it should #952

Comments

krynju commented Dec 18, 2021 • edited Loading

krynju commented Dec 18, 2021

krynju commented Dec 19, 2021

krynju commented Dec 18, 2021 •

edited

Loading