Skip to content

Using the disco framework (Chango Search dataset & Million Song Dataset)

Seppala edited this page Jul 8, 2011 · 2 revisions

1. Search query analysis using Disco (Chango Search Dataset)

https://github.com/ashchristopher/HackReduceToronto

Bartek Ciszkowski (@bartek), Ash Christopher (@ashchristopher)

(Hack/Reduce 2 Toronto)

Bartek and Ash analyzed search queries that had been made during the course of one day. They grouped search queries in four categories: travel, sex, nerd and cooking. They then analyzed how the popularity of these categories in searches varied during the day.

2. Million song dataset analysis with Disco and Python

https://github.com/joeyrobert/hackreduce

(Hack/Reduce 2 Toronto)

Joel, Johan, Joey and Ian used a 10 000 song subset of the million song dataset. They were using the Disco distributed computing framework with Python.

They analyzed:

The most romantic year by looking for the word love in song titles. The variation of words in song titles (Only 100 words are used in song titles) Average song tempo per year Song lengths per year Saddest tones (Turns out D is really sad) Recording locations.