Analyze Data using Hadoop tools from within the R environment (e.g. MapReduce)
Documentation: http://saptarshiguha.github.com/RHIPE/
Binary downloads: http://ml.stat.purdue.edu/rhipebin/
Many! And better documentation to come soon. Very importantly, rhmr has gone and is now replaced by rhwatch.
Though the warning says ‘use the same arguments’, this is not true. Here is a quick conversion
The parameters ifolder, ofolder and inout have gone.
- For sequence input and sequence output,
rhwatch(, input=path-to-input, output=path-to-output) - For text input and sequence output,
rhwatch(, input=rhfmt(path-to-input,type‘text’), output=path-to-output= - For text output, sequence input,
rhwatch(, input=path-to-input, output=rhfmt(path-to-output, type‘text’)= - For the mapreduce equivalent of
lapply(1:N, F), dorhwatch(map=rhmap({ rhcollect(k, F(k) )}), input=N)for documentation regarding these , ask the google groups mailing list.
And there is support for reading from HBase, writing to HBase too(experimental).