Skip to content

Produce representative dataframes for benchmarking #15911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mrocklin opened this issue Apr 5, 2017 · 0 comments
Open

Produce representative dataframes for benchmarking #15911

mrocklin opened this issue Apr 5, 2017 · 0 comments
Labels
Performance Memory or execution speed performance

Comments

@mrocklin
Copy link
Contributor

mrocklin commented Apr 5, 2017

It would be convenient to have a canonical set of dataframes for use in testing and/or benchmarking. Ideally this would be a set of named dataframes that represented common forms of data like the following:

  1. Random floating point data
  2. Random integer data
  3. Strings with low entropy
  4. Strings with high entropy
  5. Mostly sorted datetimes
  6. ...

These could then be used either within Pandas or in other libraries for benchmarks. Having a consistent set of dataframes would probably aid consistent benchmarking.

Additionally if this was then separately arranged into pytest fixture we could imagine setting things up and tearing things down in a way that made benchmarking more consistent (such as controlling garbage collection), though this may be a separate endeavor. It would be nice to have access to the dataframes outside of the context of PyTest as well

cc @jreback @wesm @cpcloud

@jreback jreback added the Performance Memory or execution speed performance label Apr 5, 2017
@jreback jreback modified the milestones: 0.20.0, Next Minor Release Apr 5, 2017
@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

3 participants