Skip to content
Ondřej Moravčík edited this page Mar 14, 2015 · 16 revisions

For complete methods list check

Examples

Sum of numbers

$sc.parallelize(0..10).sum
# => 55

Words count

rdd = sc.text_file(PATH)

rdd = rdd.flat_map(lambda{|line| line.split})
         .map(lambda{|word| [word, 1]})
         .reduce_by_key(lambda{|a, b| a+b})

rdd.collect_as_hash

To string

rdd = sc.parallelize(0..10)
rdd.map(:to_s).collect

Bind object

replace_to = '***'

def replacing(word)
  if word =~ /[0-9]+/
    replace_to
  else
    word
  end
end

rdd = sc.text_file('text.txt')
rdd = rdd.flat_map(lambda{|line| line.split})
rdd = rdd.map(method(:replacing))
rdd = rdd.bind(replace_to: replace_to)
rdd.collect
Clone this wiki locally