Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Nullable support during dataframe arithmetic operations #6825

Closed
asmirnov82 opened this issue Sep 14, 2023 · 2 comments · Fixed by #6846
Closed

Improve Nullable support during dataframe arithmetic operations #6825

asmirnov82 opened this issue Sep 14, 2023 · 2 comments · Fixed by #6846
Labels
enhancement New feature or request

Comments

@asmirnov82
Copy link
Contributor

asmirnov82 commented Sep 14, 2023

During arithmetic operations dataframe performs cloning the left side column into the result to have validity bitmap and than checks the right side validity bitmap for NULL value.

For example for Multiply we do cloning in case of inPlace parameter is set to false (default behavior):

PrimitiveDataFrameColumn<U> newColumn = inPlace ? primitiveColumn : primitiveColumn.Clone();
newColumn._columnContainer.Multiply(column._columnContainer);

and inside container for each value we check validity:

 for (int i = 0; i < span.Length; i++)
 {
     if (BitmapHelper.IsValid(right.NullBitMapBuffers[b].ReadOnlySpan, i))
     {
         span[i] = (double)(span[i] * otherSpan[i]);
     }
     else
     {
         left[index] = null;
     }

     index++;
 }

Validity check is a very slow operation. It's possible to calculate Raw values and then use binary logic (AND) for calculating validity bitmap for whole byte.

//calculate raw values
for (int i = 0; i < span.Length; i++)
{                
    resultSpan[i] =  (double)(span[i] * otherSpan[i]);
}

//Calculate validity (nulls)
resultValidityBitmap = Bitmap.ElementWiseAnd(validityBitmap, otherValidityBitmap));
@asmirnov82 asmirnov82 added the enhancement New feature or request label Sep 14, 2023
@ghost ghost added the untriaged New issue has not been triaged label Sep 14, 2023
@asmirnov82 asmirnov82 changed the title Improve Nullable support during dataframe arithmetic operations and avoid excessive cloning of the left side Improve Nullable support during dataframe arithmetic operations Sep 21, 2023
@asmirnov82
Copy link
Contributor Author

asmirnov82 commented Sep 21, 2023

It's also possible to get rid of cloning of the left side and create new empty column for results instead. However investigation shows, that there isn't any dramatical improvement of performance on avoiding Cloning. On the other hands, it requires quite a lot of code changes in both PrimitiveDataFrameColumn.BinaryOperations.tt and PrimitiveDataFrameColumn.BinaryOperationImplementations.Exploded.tt (current implementation of DataFrame provides two different implementations for arithmetic calculation: one for PrimitiveDataFrameColumn and another for inheritors, like Int32DataFrameColumn and etc). So I decided to postpone remediation of the left part cloning until templates files are simplified and duplicating implementations are removed.

Here is the result of my experimentation (first column is speed with just enhanced nullable, second column is enhanced nullable + avoiding cloning):
image

@asmirnov82
Copy link
Contributor Author

Final results, when PR is implemented

image

@ghost ghost removed in-pr untriaged New issue has not been triaged labels Oct 3, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant