You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataFrame performance is relatively slow and can be improved.
As this is a complex issue, it has sence to split it into several independent steps. This Epic is a container for related changes to keep it accessible from one place. Here is the list of proposed changes:
Improve Performance of DataFrame Arithmetic Operations
Improve the speed of binary Arithmetic and Comparison operations on columns with the same underlying data type.
This can be achived by improving PrimitiveDataFrame.Clone method to use memory block coping. Avoid using CloneAs method, that involves type conversion for columns with the same data type
Reduce the number of copies in binary operations for columns with different type of underlying data (for example In32DataFrameColumn + Int16DataframeColumn).
We make copies of columns in the binary operation APIs mostly to reuse existing code. This is a wellknown issue. there are already tasks for eliminate excessive coping and g the binary operations behavior when types mismatch
Increase speed of PrimitiveDataFrameColumn initialization, by fixing AppendMany(value, count) method, that is used in all PrimitiveDataFrameColumn constructors
Accessing DataFramePrimitiveColumn elements by index involve converting Memory to Span on each operation. That is very slow operation. we can consider using unmanaged memory in DataFrameBuffer instead. This also solves the issue with converting To/From Apache Arrow and heavy load on GC
DataFrame performance is relatively slow and can be improved.
As this is a complex issue, it has sence to split it into several independent steps. This Epic is a container for related changes to keep it accessible from one place. Here is the list of proposed changes:
Improve Performance of DataFrame Arithmetic Operations
Improve the speed of binary Arithmetic and Comparison operations on columns with the same underlying data type.
This can be achived by improving PrimitiveDataFrame.Clone method to use memory block coping. Avoid using CloneAs method, that involves type conversion for columns with the same data type
PR: Improve performance of column cloning inside DataFrame arithmetics #6814
PR: Improve performance of DataFrame binary comparison operations #6869
Reduce the number of copies in binary operations for columns with different type of underlying data (for example In32DataFrameColumn + Int16DataframeColumn).
We make copies of columns in the binary operation APIs mostly to reuse existing code. This is a wellknown issue. there are already tasks for eliminate excessive coping and g the binary operations behavior when types mismatch
Issue: Reduce the number of copies in binary operations in DataFrame #5663
Issue: Improve PrimitiveDataFrameColumn.BinaryOperations.tt #5665
Increase speed of PrimitiveDataFrameColumn initialization, by fixing AppendMany(value, count) method, that is used in all PrimitiveDataFrameColumn constructors
PR: PrimitiveDataFrameColumn.Clone method crashes when is used with IEnumerable mapIndices argument #6822
Improve Nullable support during arithmetic operations
Issue: Improve Nullable support during dataframe arithmetic operations #6825
Consider how to implement Nullable support in Elementwise operations without any decrease in performance
Issue: All DataFrame Elementwise methods uncorrectly work with NULL values #6820
Use Simd vectorization
Issue: Optimize binary operations on DataFrame #5695
Add performance benchmarks
Issue: Add performance benchmarks for dataframe arithmetic operations #6826
Improve Performance of Filtering
Faster way to Filter
Issue: Q: Faster way to Filter DataView #6164
Improve Performance of Indexing
Accessing DataFramePrimitiveColumn elements by index involve converting Memory to Span on each operation. That is very slow operation. we can consider using unmanaged memory in DataFrameBuffer instead. This also solves the issue with converting To/From Apache Arrow and heavy load on GC
Issue: Accessing data in a DataFrameColumn is insanely slow. #5966
Issue: DataFrame GetMutableBuffer method and ReadOnlyBuffer issues #6715
The text was updated successfully, but these errors were encountered: