Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce pandas dataframe overhead. #11058

Merged
merged 4 commits into from
Dec 5, 2024
Merged

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Dec 4, 2024

Related: #10882

The performance is not back to 1.7 yet, but the regression is no longer so severe. It's still far away from numpy performance. The result is obtained using the script in the referenced issue.

  • Lazy import some of the pandas modules.
  • Merge type check into type conversion.
  • Early return of nullable type check.

1.7

DataFrame iter_time:  0.2178664207458496 ms
DataFrame iter_time:  0.21868157386779785 ms
DataFrame iter_time:  0.23699164390563965 ms

PR

DataFrame iter_time:  0.27150797843933105 ms
DataFrame iter_time:  0.28162598609924316 ms
DataFrame iter_time:  0.278597354888916 ms

Master

DataFrame iter_time:  0.8856163024902344 ms
DataFrame iter_time:  0.9370462894439697 ms
DataFrame iter_time:  0.9501245021820068 ms

@trivialfis trivialfis requested a review from hcho3 December 4, 2024 19:34
@trivialfis trivialfis merged commit 96952fc into dmlc:master Dec 5, 2024
28 of 30 checks passed
@trivialfis trivialfis deleted the pd-overhead branch December 5, 2024 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants