Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Don't use gc in pandas join benchmark. #84

Closed
wants to merge 2 commits into from
Closed

[WIP] Don't use gc in pandas join benchmark. #84

wants to merge 2 commits into from

Conversation

trivialfis
Copy link
Contributor

  • gc is not guaranteed to clear up memory usage, each package can hold
    additional references in Python or in C/C++ as they see fit.
  • Remove duplicated code in running benchmarks.

* gc is not guaranteed to clear up memory usage, each package can hold
  additional references in Python or in C/C++ as they see fit.
* Remove duplicated code in running benchmarks.
@jangorecki
Copy link
Contributor

jangorecki commented May 3, 2019

Pandas join benchmark is out of date. Initially, in 2016, this benchmark was testing mostly distributed tools + pandas, data.table, dplyr. If you are interested in those you may want to check this talk https://www.youtube.com/watch?v=5X7h1rZGVs0
Join scripts are left from those old times, except for the data.table, where I am drafting new join benchmark now. The old was very limited. Up to date info on join is in #18.
I am aware of gc, it is the same in R, but for consistency is included in various tools.
Using wrapper functions in benchmark scripts was discussed in #41
Closing this, please discuss the feature/changes you want to push to avoid unnecessary work. Thanks for contributing!

@jangorecki jangorecki closed this May 3, 2019
@trivialfis
Copy link
Contributor Author

@jangorecki Thanks for the explanation.

Tmonster added a commit to Tmonster/db-benchmark that referenced this pull request Jul 2, 2024
…h within the time limit for the smaller scale factors, but on the larger ones, I don't like waiting 6 hours only for a timeout to be reached (h2oai#84)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants