Skip to content

Commit

Permalink
Update emailaddr_agg.py
Browse files Browse the repository at this point in the history
  • Loading branch information
scotthaleen committed Nov 4, 2015
1 parent c66f5f4 commit 6d13b3c
Showing 1 changed file with 0 additions and 1 deletion.
1 change: 0 additions & 1 deletion spark/emailaddr_agg.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@ def sender_receiver(o):
sc = SparkContext(conf=conf)
rdd_raw_emails = sc.textFile(args.input_path_emails).cache()
rdd_addr_to_emails = rdd_raw_emails.flatMap(email_to_addrs).keyBy(lambda x: x['addr']).groupByKey().map(in_out).cache()
#{"recepient": [], "addr": "loraleivp@erols.com", "sender": []}
#rdd_addr_to_emails.saveAsTextFile(args.output_path_email_address)

rdd_edges = rdd_raw_emails.flatMap(sender_receiver).reduceByKey(lambda a,b: a+b).map(lambda x: (x[0][0], x[0][1], x[1])).cache()
Expand Down

0 comments on commit 6d13b3c

Please sign in to comment.