Skip to content

MetaCenterCloudPuppet/cesnet-pig

Repository files navigation

Apache Pig Puppet Module

Build Status

####Table of Contents

  1. Module Description - What the module does and why it is useful
  2. Setup - The basics of getting started with pig
  3. Usage - Configuration options and additional functionality
  4. Reference - An under-the-hood peek at what the module is doing and how
  5. Development - Guide for contributing to the module

##Module Description

This module installs Apache Pig - platform for analyzing large data sets. By default pig expects locally set-up Hadoop client.

Supported are:

  • Debian 7/wheezy: Cloudera distribution (tested on CDH 5.3.0, Pig 0.12.0)
  • Ubuntu 14/trusty: Cloudera distribution (tested on CDH 5.3.0, Pig 0.12.0)
  • RHEL 6 and clones: Cloudera distribution (tested on CDH 5.4.2, Pig 0.12.0)

##Setup

###What cesnet-pig module affects

  • Packages: installs pig packages
  • Files: files with environment settings

###Setup Requirements

Be aware of:

###Beginning with pig

Example:

include pig

##Usage

By default pig uses Hadoop for its operations, like launched with -x mapreduce:

pig -x mapreduce

Pig can be launched locally this way:

pig -x local

Usage Pig with HBase: add following to the pig scripts (replace <ZooKeeper_version> and <HBase_version> by current values):

register /usr/lib/zookeeper/zookeeper-<ZooKeeper_version>.jar
register /usr/lib/hbase/hbase-<HBase_version>-security.jar

Usage Pig with DataFu: add following to the pig scripts (replace <DataFu_version> by current value):

REGISTER /usr/lib/pig/datafu-<DataFu_version>.jar

##Reference

###Classes

  • pig: Pig setup
  • pig::config
  • pig::install
  • pig::params

###Module Parameters (pig class)

####datafu_enabled

Install also Pig User-Defined Functions collection. Default: false.

Default is false. The package is not available since CDH 6.

##Development