Skip to content
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.

related to #347 - Examples And Performance Results #419

Open
keightyfive opened this issue May 11, 2018 · 3 comments
Open

related to #347 - Examples And Performance Results #419

keightyfive opened this issue May 11, 2018 · 3 comments

Comments

@keightyfive
Copy link

keightyfive commented May 11, 2018

Hi,
so I had to slightly modify the autotuner_parallel.sh script, and I still have a few questions:

  1. Does the script work in srun mode as well?
  2. What exactly does the script do? Does it tune the mapping for all the benchmarks and then runs all of these 1000 times as described in the paper?
  3. Is there a way to run the kernels individually without autotuning?
  4. Do they run only on NVidia GPUs, or can one run them on CPUs as well?
  5. I installed TC using the conda package with pytorch integration (https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html#installation), since the building from source won't work, which I reported in a separate issue [Build] issues finding Cuda, incomplete config #407. Will I need to install the conda package for caffee2 as well for running the benchmarks?

Cheers, Kevin

@keightyfive
Copy link
Author

Has this site gone dead already??

@nicolasvasilache
Copy link
Contributor

Has this site gone dead already??

During conference submission crunch time, yes unfortunately :) but we're now back

Reg. benchmarking, we have been making progress in #423 during this week, best is to wait until we land it early next week. I had to resort to some build hacks following the caffe2 source of truth moving to pytorch that I will need to clean up.

Reg. your previous questions:

Does the script work in srun mode as well?

I have only run myself in sbatch mode, it should run in srun mode with minor modifications there is nothing magical about it. The current script uses SLURM_ARRAY_JOB_ID to set a path but you can easily adapt and set the path to whatever.

What exactly does the script do? Does it tune the mapping for all the benchmarks and then runs all of these 1000 times as described in the paper?

Essentially yes. We have reduced to 100 by default but essentially yes. If you want to run 1K times pass --benchmark_iterations=1000

Is there a way to run the kernels individually without autotuning?

Yes if you checkout the branch from #423 you can then just run ./build/tc/benchmarks/benchmark_xxx --gtest_filter="*P100*" for the Pascal benchmarks and V100 for the Volta benchmarks. We have saved the best options (see tc/benchmarks/*.h) so numbers are easily reproducible.

Do they run only on NVidia GPUs, or can one run them on CPUs as well?

GPU only for now, I'm prioritizing CPU starting this week, I would say it will take about 1 month to get things in a decent state.

I installed TC using the conda package with pytorch integration (https://facebookresearch.github.io/TensorComprehensions/framework/pytorch_integration/getting_started.html#installation), since the building from source won't work, which I reported in a separate issue #407. Will I need to install the conda package for caffee2 as well for running the benchmarks?

TC + pytorch is still in an extremely alpha state right now, I haven't had the chance to benchmark it myself yet. For the benchmarks we report perf on, it's C++ only atm.
Reg. build system we're definitely not happy about the user experience there but given the resources we have it is what it is (i.e. it works for the core dev team). I'd be happy to help you set up if you are interested in working at that level. We also accept contributions from the community since this is still an early research project with extremely scarce resources.

@keightyfive
Copy link
Author

Thanks very much for your detailed answers. I'll then try to install with caffe2 and see if I can run the kernels. I assume the overall performance will be better after building from source though... if you are willing to help me that, that would be very greatly appreciated. Hopefully I can contribute in some shape or form...
Cheers, Kevin

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants