Sharing workloads

[ Home ]

Table of Contents Introduction Adding new data sets Extracting data set features Adding new benchmarks Customizing OpenCL HOG C++ benchmark Adding new compiler descriptions

Introduction

We expect that you have read Getting Started Guide Part 1 and Part 2! Here we describe how to add your own workloads to participate in CK-powered crowd-benchmarking and crowd-tuning!

CK allows to reuse and extend existing components and their API as a template for new code and data (thus making research look more like Wikipedia or open-source software development). Here we will describe how to add and customize own data sets, benchmarks, compiler descriptions and libraries, while taking advantage of shared scenarios for collaborative autotuning and crowd-benchmarking combined with predictive analytics.

We expect that you already have ck-autotuning repository installed via

 $ ck pull repo:ck-autotuning

If you want to participate in crowd-tuning, you should also install ck-crowdtuning repository via

 $ ck pull repo:ck-crowdtuning

Adding new data sets

We already shared thousands of data sets in CK format for shared benchmarks. For example, you may find a minimal set at GITHUB:

 $ ck pull repo:ctuning-datasets-min

or a full set from our past research or current collaborative projects at our Google Drive (images, video, audio, texts, crypted files, videos, etc):

https://drive.google.com/folderview?id=0B-wXENVfIO82dzYwaUNIVElxaGc&usp=sharing

You can see current available data sets with their repositories in CK via:

 $ ck list dataset --all

Naturally, you may want to add locally or even share your own data sets.

You may use new CK entry as a holder (container) either for one data set or even for groups of files and directories. For simplicity, let's consider one file per CK entry (as currently supported for CK-based autotuning and crowd-benchmarking).

As example, let's consider that you would like to add image my_image.jpg as a CK data set. You can create a new CK entry for this file (substitute my_image_alias with your own alias) via:

 $ ck add dataset:my_image_alias @@dict

CK will ask you to enter meta description of this entry in JSON format. You should enter the following

 {
  "dataset_files": [
    "my_image.jpg"
  ],
  "tags": [
    "dataset",
    "image",
    "jpeg",
    "jpg"
  ]

Note, that if you use repository shared via GIT(HUB), you can add flag --share to add CK entry to GIT.

Then you can find newly created entry and copy your file there via

 $ ck find dataset:my_image_alias
 $ cp my_image.jpg <above_path>

Again, if you use repository shared via GIT, you should later go to this directory and manually add this file to GIT via

 $ git add my_image.jpg
 $ git commit

Note, that already shared benchmarks use data set tags to find all available data sets during compilation, execution, auto-tuning and crowd-benchmarking. Hence, you should first try to find most close data sets to yours via

 $ ck list dataset

Alternatively, our idea is that users will be exchanging info about available data sets and their tags with the community via our related mailing lists, LinkedIn groups, Wikipedia and other media.

Then, you can print meta description of the most close dataset to reuse tags in your new CK data set entry via

 $ ck load dataset:image-jpeg-0001

Note, that by default your CK data set entry is created in a local CK repository. You can add it to any existing one including your owns via

 $ ck list repo
 $ ck add some_existing_repo:dataset:my_image_alias @@dict

Also note that if you want to add many data sets (particularly in a batch mode), you may skip alias and just add CK entry with a generated UID via:

 $ ck add dataset:

Interestingly, Grigori converted most of his own photos and videos for past 15 years to this format in his local repositories to be able to use them for his research and experimentation on machine learning based autotuning and run-time adaptation.

Extracting data set features

For our research on statically enabling adaptive applications we need various data set features to be correlated with the most profitable optimizations and hardware configurations:

Therefore, we added preliminary support to automatically extract common features of known data set types (currently images):

You can extract features of your data set via:

 $ ck extract dataset.features:my_image_alias
 $ ck find dataset.features:my_image_alias

This command will create a new entry with the same alias but under dataset.features container including features key in meta description. For example, check out features of a shared image-jpeg-0001 data set:

 $ ck load dataset.features:image-jpeg-0001 --min

  ...
  "features": {
    "compression": "",
    "format": "JPEG",
    "height": 208,
    "mode": "RGB",
    "raw_info": {
      "dpi": [
        72,
        72
      ],
      "jfif": 257,
      "jfif_density": [
        72,
        72
      ],
      "jfif_unit": 1,
      "jfif_version": [
        1,
        1
      ]
    },
    "total_size": 8689,
    "width": 162,
    "xdpi": 72,
    "ydpi": 72
  }

Again, our idea is that user can edit this meta and add in free format any extra features needed for their research, and later share them with the community, as conceptually described in:

http://cknowledge.org/repo/web.php?wcid=29db2248aba45e59:cd11e3a188574d80

Also, when shared, you can easily add such data sets (for example, images) to your HTML-based interactive reports, Digital Libraries or just view them in CK web browser:

http://cknowledge.org/repo/web.php?wcid=dataset:image-jpeg-0001

Adding new benchmarks

Benchmark JSON description in CK grew considerably in the past few years to accommodate all our research needs on developing self-optimizing and self-adaptive software and hardware (including self-tuning compilers and crowd-benchmarking). Hence, rather than adding new benchmark as CK entry from scratch, we strongly suggest you to find the most similar programs and use it as template.

You can check available benchmarks via

 $ ck pull repo:ctuning-programs
 $ ck list program

Let's consider that the most close benchmark to yours is simple C-based cbench-automotive-susan image processing application. You can then simply create a copy of this entry in your local repository via

 $ ck cp program:cbench-automotive-susan :my-app

Alternatively, you can create a copy of this entry in a given repository via

 $ ck list repo
 $ ck cp program:cbench-automotive-susan some-repo::my-app

You can now check your template was created successfully by compiling it via CK as following

 $ ck compile program:my-app

If compilation succeeded, you should similar text to following:

Compilation time: 5.860 sec.; Object size: 70916; MD5: 9b58b9d55585ca5bd19a5e75f448bb14

From this moment, you can start customizing this entry to add files of your own benchmark. For this purpose, find new entry, remove original files there and copy yours. You can do it as following on Linux/MacOS (similar procedure on Windows):

 $ ck find program:my-app
 $ rm <above path>/*
 $ cp my_files <above_path>

Now, you can edit benchmark JSON meta to list above files and customize command lines, etc:

 $ ck find program:my-app
 $ mcedit <above path>/.cm/meta.json

You should change the following keys:

list your new source files

  "source_files": [
    "susan.c"
  ],

update user-friendly tags particularly

  "tags": [
    "lang-c",        # specifies that it's a C program

    "cbench",        # can be changed
    "susan",         # can be changed
    "automotive",    # can be changed

    "benchmark",     # specifies that it is converted to benchmark, 
                     #  i.e. output does not change across iterations
    "program",       # specifies that it is a real and not synthetic application

    "small",         # low execution times (execution time within a minute)
    "crowd-tuning"   # can be used for crowd-tuning and crowd-benchmarking
                     #  (i.e. crowdsourcing autotuning and benchmarking
                     #   across shared computational resources such as
                     #   cloud, mobile phones, laptops, supercomputers, 
                     #   data centers, etc)
  ],

Check compile dependencies (they will be automatically resolved

by CK program compilation and execution workflow

  "compile_deps": {              
    "compiler": {                # specifies that CK should find CK environment with a C compiler
                                 # by default, we provided pre-installed environments for GCC
      "local": "yes", 
      "sort": 10, 
      "tags": "compiler,lang-c"  
    }, 

    "xopenme": {                 # specifies dependency on our xOpenME library (stripped plugin based framework)
                                 #  which is used to instrument programs and expose
                                 #  various run-time features in JSON format.
                                 # You may keep this dependency even if you don't use 
                                 #  this library (since we use it in most of CK benchmarks and 
                                 #  you may want to use and extend it in the future)
                                 #  or you can remove this dependency.
      "local": "yes", 
      "sort": 20, 
      "tags": "lib,xopenme"
    }
  },

Check extra linker libs in CK format ($<< VAR >>$ is substituted

by CK from the compiler environment meta-description)

  "extra_ld_vars": "$<<CK_EXTRA_LIB_M>>$",

Add default build variables (in GCC will be prefixed with -D,

in Intel and Microsoft compilers with /D, etc)

  "build_compiler_vars": {
    "XOPENME": ""
  },

Prepare execution command line (can be multiple such as in susan

benchmark example which can invoke various algorithms to detect corners, edges, etc)

  "run_cmds": {                

    "corners": {               # User key describing a given execution command line

      "dataset_tags": [        # Data set tags - will be used to query CK
        "image",               # and automatically find related entries such as images
        "pgm", 
        "dataset"
      ], 

      "run_time": {            # Next is the execution command line format
                               # $#BIN_FILE#$ will be automatically substituted with the compiled binary
                               # $#dataset_path#$$#dataset_filename#$ will be substituted with
                               # the first file from the CK data set entry (see above example
                               # of adding new data sets to CK).
                               # tmp-output.tmp is and output file of a processed image.
                               # Basically, you can shuffle below words to set your own CMD

        "run_cmd_main": "$#BIN_FILE#$ $#dataset_path#$$#dataset_filename#$ tmp-output.tmp -c", 

        "run_cmd_out1": "tmp-output1.tmp",  # If !='', add redirection of the stdout to this file
        "run_cmd_out2": "tmp-output2.tmp",  # If !='', add redirection of the stderr to this file

        "run_output_files": [               # Lists files that are produced during
                                            # benchmark execution. Useful when program
                                            # is executed on remote device (such as
                                            # Android mobile) to pull necessary
                                            # files to host after execution
          "tmp-output.tmp", 
          "tmp-ck-timer.json"
        ],


        "run_correctness_output_files": [   # List files that should be used to check
                                            # that program executed correctly.
                                            # For example, useful to check benchmark correctness
                                            # during automatic compiler/hardware bug detection
          "tmp-output.tmp", 
          "tmp-output2.tmp"
        ], 

        "fine_grain_timer_file": "tmp-ck-timer.json"  # If XOpenME library is used, it dumps run-time state
                                                      # and various run-time parameters (features) to tmp-ck-timer.json.
                                                      # This key lists JSON files to be added to unified 
                                                      # JSON program workflow output
      },

      "hot_functions": [                 # Specify hot functions of this program
        {                                # to analyze only these functions during profiling
          "name": "susan_corners",       # or during standalone kernel extraction
          "percent": "95"                # with run-time memory state (see "codelets"
                                         #  shared in CK repository from the MILEPOST project
                                         #  and our recent papers for more info)
        }
      ] 

      "ignore_return_code": "no"         # Some programs have return code >0 even during
                                         # successful program execution. We use this return code
                                         # to check if benchmark failed particularly during
                                         # auto-tuning or compiler/hardware bug detection
                                         #  (when randomly or semi-randomly altering code,
                                         #   for example, see Grigori Fursin's PhD thesis with a technique
                                         #   to break assembler instructions to detect 
                                         #   memory performance bugs) 
    }, 
    ...
  },

Update run time environment variables:

  "run_vars": {
    "CT_REPEAT_MAIN": "1"
  },

Classify benchmark as belonging to some computational species

(either manually or automatically during crowd-benchmarking). See available CK entries for program species via:

 $ ck list program.species

To some extent it is similar to Berkeley Dwarfs except that we allow the community to share numerous benchmarks, continuously classify them and find representative benchmarks and species using predictive analytics. Check out the following paper for more details:

http://cknowledge.org/repo/web.php?wcid=29db2248aba45e59:cd11e3a188574d80

You can add UIDs of above species in this list in benchmark meta:

  "species": [
    "c84ac2ab43ad1400"
  ],

Provide backup alias name and UID for this entry.

You can find this info via

 $ ck info program:my-app

and then update the following keys:

  "backup_data_uid": "ffbf51c23a91343c", 
  "data_name": "cbench-automotive-susan",

We found it useful when benchmarks are shared across workgroups and someone changes by accident its UID. Then it will not be possible to find related experiments. Backing up UID (which is not changing during low-level CK operations) allows users to recover from errors and mistakes.

Check other auxiliary keys (for now can be left unchanged)

  "compiler_env": "CK_CC", 

  "main_language": "c", 

  "process_in_tmp": "yes", 

  "program": "yes", 

  "target_file": "a"

Customizing OpenCL HOG C++ benchmark

A more complex example includes customization of an OpenCL benchmark. You may get and customize one from our CARP project (HOG image processing application using machine learning):

 $ ck pull repo:reproduce-carp-project
 $ ck find program:realeyes-hog-opencl-tbb

This C++ benchmark has more dependencies on libraries that should be registered via CK environment entries including OpenCL, OpenCV and Intel TBB:

  "compile_deps": {
    "compiler": {
      "local": "yes", 
      "sort": 10, 
      "tags": "compiler,lang-cpp"
    }, 
    "lib.xopenme": {
      "local": "yes", 
      "sort": 40, 
      "tags": "lib,xopenme"
    }, 
    "lib_opencl": {
      "local": "yes", 
      "sort": 30, 
      "tags": "lib,opencl"
    }, 
    "lib_opencv": {
      "extra_libs": [
        "opencv_imgproc", 
        "opencv_ocl", 
        "opencv_highgui"
      ], 
      "local": "yes", 
      "sort": 20, 
      "tags": "lib,opencv"
    }, 
    "lib_tbb": {
      "local": "yes", 
      "sort": 50, 
      "tags": "lib,tbb"
    }
  }

You can find details about how to register environment for above libraries (and thus elegantly supporting multiple version of the same library or tool installed on user platform) and tools in the following Getting Started Guide for this benchmark.

It has extra build variables:

  "build_compiler_vars": {
    "HOG_OPENCL_CL": "\\\"../hog.opencl.cl\\\"", 
    "NO_PENCIL": "", 
    "RUN_ONLY_ONE_EXPERIMENT": "", 
    "SELECT_CL_DEVICE_TYPE": "CL_DEVICE_TYPE_GPU", 
    "WITH_TBB": "", 
    "XOPENME": "1", 
    "XOPENME_DUMP_IMAGES": ""
  }

Various keys to provide extra compilation and linking flags:

  "compiler_add_include_as_env_from_deps": [
    "CK_ENV_LIB_STDCPP_INCLUDE", 
    "CK_ENV_LIB_STDCPP_INCLUDE_EXTRA"
  ], 

  "compiler_flags_as_env": "$<<CK_COMPILER_FLAG_CPP11>>$", 

  "include_dirs": [
    "include"
  ], 

  "linker_add_lib_as_env": [
    "CK_CXX_EXTRA", 
    "CK_ENV_LIB_STDCPP_STATIC"
  ],

Note, that execution description has a post-processing script in python that converts raw JSON from OpenME and calculates new characteristics and features including energy (on Odroid) and frames per second.

  "run_cmds": {
    "default": {
      "run_time": {
        "post_process_cmd": "python $#src_path_local#$convert_timers_to_ck_format.py",

Here $#src_path_local#$ is automatically substituted with the path to CK program entry.

If all the steps were correct, you should be able to compile and run your new application:

 $ ck compile program:my-app
 $ ck run program:my-app

Adding new compiler descriptions

We briefly described how to add and modify own compiler descriptions in CK in this Getting Started example on autotuning.

CK development is coordinated by the non-profit cTuning foundation and dividiti

Provide feedback

Saved searches

Use saved searches to filter your results more quickly