Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROOT-C++-Python - Benchmarking, comparing, best practices #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# ROOT-C++-Python - Benchmarking, comparing, best practices

## problem to solve
More and more people see benefits in using ML techniques and in addition to that (or apart from that) they see the benefits from taking advantage of the large data science ecosystem around (scipy, numpy, pandas, matplotlib and many many more) in addition to their ROOT based analyses. But instead of then using these tools there seems to be a high level of caution mainly due to:

- people are not necessarily aware of easy ways to connect ROOT based data with python data science tools
- people fear that using python will be _significantly_ slower than the ROOT based approach
- strongly connected to that: people are not necessarily aware of how to parallelise in python

## desired outcome
The best case scenario would be to come out of this hackathon with a comprehensive but simple presentation (mini-tutorial), that shows best practices on how to integrate non-ROOT-tools in an overall ROOT based analysis, how to transfer data between the ecosystems and which includes some performance comparisons between the different approaches.
So basically a talk that can be used to mitigate the fears of stepping outside of a purely ROOT based analysis and that gives actual starting points on how to do that.

I think that the workload of this project would be threefold:
- actually compute some performance comparisons
- search for performance comparisons, tutorials, talks about the topic that already exist.. and add them to the repository (for the latter see e.g. https://github.com/ChristosChristofidis/awesome-deep-learning )
- compile a talk (maybe a notebook, maybe something else) with a high pedagogical value :)

## skills / knowledge needed (for the project, not per person)
- didactic skills
- literature research skills
- some programming skills
- ROOT
- Other data storage solutions