Skip to content
Grigori Fursin edited this page Sep 9, 2016 · 76 revisions

Home

Brief motivation

Are you an individual researcher or organization performing many experiments on a regular basis? You may find the Collective Knowledge framework (CK) useful if you suffer from one or more of the following problems:

  • instead of innovating, you spend weeks and months preparing ad-hoc experimental workflows, which you either throw away when your ideas are not validated or need to maintain (adapting to ever changing software, hardware, interfaces and data formats);
  • you have trouble sharing whole experimental workflows and results with your colleagues since they use different operating systems, tools, libraries and hardware (and they do need to use their latest environment rather than possibly outdated Docker or VM images);
  • you have trouble managing and reusing your own scripts, tools, data sets and reproducing your own results from past projects;
  • you have trouble retrieving data from your own or someone else's "black-box" database (particularly if you do not know the schema);
  • you spend lots of time updating your reports and papers whenever you obtain new results;
  • you do not have enough realistic workloads, benchmarks and data sets for your research;
  • you face the ever increasing number of experimental choices to explore in complex design and optimization spaces;
  • you accumulate vast amounts of raw experimental data but do not know what the data is telling you ("big data" problem);
  • you want to extract knowledge from raw data in form of models but never find time to master powerful predictive analytics techniques;
  • your organization pays dearly for its computational needs (in particular, for hardware and energy used in data centers and supercomputers) while you suspect they could be met at a fraction of the cost (if, for example, your deep learning algorithms could run 10 times faster).

Over the past 15 years, we have suffered from all the above problems, which intolerably slowed down our own research (on developing faster, smaller, more energy efficient and reliable computer systems via multi-objective autotuning, machine learning and run-time adaptation). Eventually, we have realized that the above problems can only be tackled collaboratively by bringing together an interdisciplinary community.

Hence, we designed Collective Knowledge (CK) as just a small and highly customizable Python wrapper framework with a unified JSON API, command line, web services and meta-descriptions. This allows researchers gradually warp and glue together any existing software, hardware and data, share and reuse wrappers via Git, unify information flow between them, quickly prototype experimental workflows from shared artifacts, apply predictive analytics and enable interactive articles.

CK is an open-source (under permissive license), lightweight (less than 1 MB) and very portable research SDK. It has minimal dependencies and simple interfaces with software written in C, C++, Fortran, Java, PHP and other languages. Please, check out CK documentation and Getting Started Guide for more details: http://github.com/ctuning/ck/wiki

Though seemingly simple, such agile approach already proved to be powerful enough to help scientists and research engineers:

  • abstract and unify access to their software, hardware and data via CK modules (wrappers) with a simple JSON API while protecting users from continuous low-level changes and exposing only minimal information needed for research and experimentation (this, in turn, enables simple co-existence of multiple tools and libraries such as different versions of compilers including LLVM, GCC and ICC);
  • provide a simple and user-friendly directory structure (CK repositories) to gradually convert all local artifacts (scripts, benchmarks, data sets, tools, results, predictive models, graphs, articles) into searchable, reusable and interconnected CK entries (components) with unique IDs and open JSON-based meta information while getting rid of all hardwired paths;
  • quickly prototype research ideas from shared components as LEGO(TM), unify exchange of results in schema-free JSON format and focus on knowledge discovery (only when idea is validated you should spend extra time on adding proper types, descriptions and tests, and not vice versa);
  • easily share CK repositories with whole experimental setups and templates with the community via popular public services including GitHub and BitBucket while keeping track of all development history;
  • speed up search across all your local artifacts by JSON meta information using popular ElasticSearch (optional);
  • involve the community or workgroups to share realistic workloads, benchmarks, data sets, tools, predictive models and features in a unified and customizable format;
  • reproduce empirical experimental results in a different environment and under different conditions, and apply statistical analysis (similar to physics) rather than just replicating them - useful to analyze and validate varying results (such as performance and energy);
  • use built-in CK web server to view interactive graphs and articles while easily crowdsourcing experiments using spare computational resources (mobile devices, data centers, supercomputers) and reporting back unexpected behavior;
  • obtain help from an interdisciplinary community to explain unexpected behavior when reproducing experiments, solve it by improving related CK modules and entries, and immediately push changes back to the community (similar to Wikipedia);
  • simplify the use of statistical analysis and predictive analytics techniques for non-specialists via CK modules and help you process large amount of experimental results (possibly on the fly via active learning), share and improve predictive models and features (knowledge), and effectively compact "big data".

For example, our colleagues successfully use CK to accelerate computer systems' research and tackle issues known and unsolved for more than 15 years. They started practically enabling customizable, extensible and multi-objective software/hardware optimization, run-time adaptation and co-design as a CK experimental workflow shared via GitHub. The community can now gradually expose various tuning choices (algorithm and OpenCL/CUDA/MPI parameters, compiler flags, polyhedral transformations, CPU/GPU frequency, etc) and objectives (execution time, code size, compilation time, energy, processors size, accuracy, reliability) . The community can also reuse shared autotuning and machine learning plugins to speed up exploration of large and non-linear optimization spaces and even enable run-time adaptation (self-tuning computer systems).

For example, check out public GCC/LLVM optimization results of various shared workloads across diverse hardware including mobile devices provided by volunteers:

Furthermore, our colleagues have managed to speed up their real-world applications across latest platforms (from mobile phones to cloud servers) by 10x with the same numerical accuracy, reduce energy by 30% and code size by 50%. Furthermore, such CK templates can be easily reused in other research scenarios while allowing students and researchers start new experiments or reproduce others' results in minutes rather than days and weeks, as described here:

Our long-term mission is to help the research community dramatically accelerate knowledge discovery via open, agile, collaborative and reproducible research, experimentation and knowledge sharing while keeping it as simple as GitHub and Wikipedia - do join us!

For more details, please check our

garbageexample of the CK-based reproducible and interactive article;
Clone this wiki locally