Accessing data in a DataFrameColumn is insanely slow. #5966
Labels
Microsoft.Data.Analysis
All DataFrame related issues and PRs
perf
Performance and Benchmarking related
System Information (please complete the following information):
Describe the bug
Accessing data in a PrimitiveDataFrameColumn<> is very very very slow.
To Reproduce
int n = 1000_000;
PrimitiveDataFrameColumn column = new PrimitiveDataFrameColumn("Name", n);
for (int i = 0; i <n; i++)
column[i] = 1;
Expected behavior
I filling in values in a column should cost a few clock cycles per value. So perhaps at least 100 million values per second should be achievable on a normal computer. But 1 million elements take around 0.5s on a high performance new laptop.
Is it simply that nullable objects are this slow? If that is the case, why did you go for such a technology for a data processing library where performance is a key factor?
For perspective, writing the data to disk is 10 times faster!
The text was updated successfully, but these errors were encountered: