-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize decimal: reduce validation when construct the decimal array or cast to the decimal array #2313
Comments
In the arrow-rs, there are many places to generate the decimal array. For example, Reading the decimal(n,0) from parquet int64 column and the n is greater equal to 18, we don't need to verify the result of the decimal array, because the value from int64 will not be overflow the target precision. From above the method, we use less cpu for decimal data type. |
Also potentially related to this, the way decimal data is currently read from parquet is hopelessly inefficient #2318. I keep meaning to fix it, but I haven't got to it yet. Perhaps I can find some time this weekend... 🤔 |
Your optimization will improve the performance of reading decimal data. |
Decimal also can be deserialized from INT32 or INT64 type of parquet |
This PR #2383 may have improved performance as well |
We no longer perform validation except where explicitly opted in to |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When we use the ballista/datafusion to execute some query( the table has may decimal data type), and perf the cpu.
The validation of decimal cost a lot of cpu about 5% from below picture.
Describe the solution you'd like
Describe alternatives you've considered
Additional context
DecimalArray
validation #2447The text was updated successfully, but these errors were encountered: