YPlan coding challenge
Hi
Thank you for applying to YPlan! The role of Data Scientist was updated earlier this week (following a review of the role) to include experience using Python as an essential requirement.
We liked your application and would like to invite you to complete a Coding Challenge. It should take you less than 1 hour to complete. There is no set deadline for this task, but the sooner you return your answers to me the quicker we can progress your application (and the less risk you will lose out to other candidates).
Instructions:
-
Download the "MovieLens 1M" data set at http://grouplens.org/datasets/movielens/ and extract the three data files which comprise a set of movies, users, and users' ratings of the movies. The README explains their format.
-
Create a python program "analyze_movies.py" which runs as a CLI and outputs a little analysis of the data showing the names of the top movies by average rating by group. The program should take arguments in this format (they are all required):
./analyze_movies.py (gender|agegroup) <number> - The first argument is the grouping. For example, if gender is selected, then two lists will be output: top movies as rated by men, and top movies as rated by women. If agegroup is selected, 7 lists should be output, as specified in the dataset README. - The <number> is how many to output. If there are fewer than <number> movies, output them all.
An example invocation would, therefore be "./analyze_movies.py age 20"
Other details:
Use python 2.7 or 3.4 as you prefer. Use only the standard library. If speed is an issue feel free to restrict the number of data rows you use. Output via 'print' with a simple format of your choosing. Send code to us however you like, but Github is preferred if you have an account. Code will be marked for correctness, legibility, and efficiency. Let me know if you have any questions!
I look forward to hearing from you :)
Best,
Talent Coordinator