Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machine learning pipeline #41

Open
UniqueFool opened this issue Jun 9, 2016 · 9 comments
Open

machine learning pipeline #41

UniqueFool opened this issue Jun 9, 2016 · 9 comments

Comments

@UniqueFool
Copy link

This is related to another discussion currently taking place here: jrprice/Oclgrind#109 (comment)

The idea is to emulate an OpenCL kernel using oclgrind and use this to gather kernel-specific runtime information (think dataflow, variable lifetime) and use this information in the ML pipeline to do more sophisticated transformations based on much more comprehensive. and better, information of the kernel's runtime behavior.

To pull this off, some kind of interface would need to be established between the kernel virtualization and the tuner components, even if that just means serializing kernel-specific data to a file on disk and use that for the ML pipeline.

@CNugteren
Copy link
Owner

Interesting. I'll take a look at oclgrind to get a better understand of what you want. I'll come back to you soon. Perhaps also the Collective Knowledge framework might be of some help: https://github.com/ctuning/ck

@UniqueFool
Copy link
Author

wow, I wasn't even aware that something like ck existed ... gotta have to do some reading now.

@CNugteren
Copy link
Owner

CNugteren commented Jun 13, 2016

I found some time and I think I understand your idea. By the way, in your first post you meant "emulate an OpenCL device", not "emulate an OpenCL kernel", right?

I am not sure if CLTune is what you are looking for though. What is your use-case exactly? I can interpret your goal in two ways:

  1. You are trying to machine-learn a kernel optimiser/tuner based on previous kernels it has seen. So you'll need a lot of kernels and some static and run-time information (that's where oclgrind comes into play). Then you can learn what optimisations are a good choice given static and run-time information of a previously unseen kernel.
  2. You are trying to optimise a single kernel but the optimisation-space is too vast. In that case you'll hope that some static and run-time information (oclgrind again here) can help you guide a machine-learned model faster towards a good (or the best) solution.

In the first case CLTune is really not your choice: it can only perform 'optimisations' that are pre-programmed using pre-processor variables into a kernel. CLTune is a tool to help you explore those options, optionally using machine learning to guide you faster towards a decision space. Better to hook this up in the compiler itself I would say.

For the second case it might be a better fit, but I am not so sure if this extra information will be helpful to train a model. With the extra data we might also need to look at larger models that can capture this new information. Keep in mind that I am currently not even using the static data that is readily available (number of instructions of some sort, number of branches, vector width, architecture details), I am only using the current user-defined 'configuration'. So perhaps it is better to start there, instead of using run-time information from device emulation?

@UniqueFool
Copy link
Author

yes, I meant "device" like you said - your 2) describes the idea pretty well, i.e. it has more to do with kernel-specific runtime information and using that come up with/guide different transformations

I will have to do some reading to see if this is really feasible, for all the reasons you mentioned - however, I did reference a few papers that basically describe doing this sort of thing.

So it really is more about narrowing-down and guiding the search space based on kernel-specific information that can be gathered via emulated execution.

@CNugteren
Copy link
Owner

OK! Which papers are those? I'm interested as well to see what's possible.

@UniqueFool
Copy link
Author

I basically worked through the referenced paper and its references section: http://arxiv.org/pdf/1506.00842v1.pdf

We have developed and validated a machine learning
based auto-tuning framework for OpenCL. The frame-
work measures the performance of several candidate im-
plementations from a parameter configuration space and
uses this result to build a artificial neural network, which
works as a performance model. This model is then used
to find interesting parts of the configuration space, which are explored exhaustively to find good candidate imple-
mentations. Our neural network model achieves a mean
relative error as low as 6.1% for three different bench-
marks executed on three different devices, a Intel i7 3770
CPU, an Nvidia K40 GPU and a AMD Radeon HD 7970.
The autotuner is able to find good configurations, at best
only 1.3% slower than the best configuration.
Future work includes enhancing the performance of the
model, in particular with regard to invalid configurations,
evaluating the model on novel hardware architectures, be-
yond just CPUs and GPUs, and integrating problem pa-
rameters into the performance model. Incorporating ad-
vanced new features specific to a given architecture[39]
will remain challenging. However, studying multi-GPU
systems[40] and looking into multi-variate analysis[41]
may also be interesting avenues of inquiry.

@CNugteren
Copy link
Owner

CNugteren commented Jun 15, 2016

This is exactly what CLTune is also doing. I mainly wrote CLTune because the other paper's authors did not made any tool available. However, I did not evaluate the machine learning part too much, the paper actually doesn't include it at all (http://www.cedricnugteren.nl/downloads/Nugteren2015a.pdf). Perhaps someone should do some more experiments using CLTune an a small neural network?

Or are you perhaps referring to the future work part of the paper:

and integrating problem parameters into the performance model

I am not sure exactly what the authors mean with this, but it could be that this is what you are referring to? In that case I would also contact the authors and see if they haven't already done this?

@bhack
Copy link

bhack commented Jul 19, 2016

A really interesting thread. /cc @hughperkins

@bhack
Copy link

bhack commented Jul 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants