Skip to content

Hadrian MR

Jim Pivarski edited this page Nov 18, 2015 · 7 revisions

TODO...

Help message...

% hadoop jar target/hadrian-mr-TRUNK-jar-with-dependencies.jar --help
Usage: hadoop jar hadrian-mr.jar [options] input output

  input
        input path specification
  output
        output directory, must not yet exist
  -m <value> | --mapper <value>
        location of mapper PFA
  -r <value> | --reducer <value>
        location of reducer PFA
  -i | --identity-reducer
        use an identity reducer (key-grouping and possibly secondary sort, but no reducer action)
  -n <value> | --num-reducers <value>
        number of reducers (must be at least 1 if --reducer or --identity-reducer is used)
  -s <value> | --snapshot <value>
        output a snapshot of a reducer cell/pool after processing each key, rather than the reducer engine's output (pools take precedence over cells in case of name conflicts)
  --help
        print this help message

Hadrian-MR in "score" mode runs a PFA-encoded scoring engine as a
mapper and a PFA-encoded scoring engine as a reducer.

The output type of the mapper must be a record with two fields: "key"
and "value".  The key must either be a string or a record containing a
string-valued "groupby" field.  If the key is a string, that string
will be used for grouping with no secondary sort.  If the key is a
record, its groupby field is used for grouping and the whole record is
used for secondary sort (according to the normal record-sorting Avro
rules).

The input type of the reducer must be a record with the same structure
as the mapper output.
Clone this wiki locally