Movielens 100k Benchmark using Raccoon

Movielens is a popular recommendation engine test data set. Original can be found here: http://grouplens.org/datasets/movielens/100k/ . Users were selected at random for inclusion. All users selected had rated at least 20 movies. Unlike previous MovieLens data sets, no demographic information is included. Each user is represented by an id, and no other information is provided.

Raccoon https://github.com/guymorita/recommendationRaccoon is an easy-to-use collaborative filtering based recommendation engine and NPM module built on top of Node.js and Redis. The engine uses the Jaccard coefficient to determine the similarity between users and k-nearest-neighbors to create recommendations.

This repository is a test of raccoon using the Movielens 100k data set. The full description of how to run the test and the results are below. There are some pretty clear areas for optimization. I would love for any help in investigating:

Bottlenecks in the raccoon algorithms
How to lower the time by an order of magnitude. Currently at 25.3 minutes to finish all 100k entries and predictions
How to get a more accurate RMSE. Raccoon is designed for predictions between -1 and 1 (often very close to zero) while the Movielens dataset is evenly distributed between 1 and 5.
How to make it consume less memory. Currently while its running at its peak, node requires up to 5GB and is running over 100k instantaneous_ops_per_sec

Primary Statistics:

Prediction Accuracy: 0.7106% -- 14,189 out of 19,968 correct
Total time: 25.3 min
Redis total_commands_processed: 26,591,038

How to run test

Clone / cd into directory

npm install
brew install redis
redis-server

Run node with increased memory allocation

node --max_old_space_size=8100 --optimize_for_size --max_executable_size=8100 --stack_size=8100 test.js

If you want to customize which test pool to use, you can change the these lines in config.js

this.baseDataPool = 'u1.base.yaml';
this.testDataPool = 'u1.test.yaml';

If you want to change the number of users to make predictions for, you can change this line in config.js

this.numUsersToTest = 943;

If you want heapdumps, you can uncomment

// generateHeapDumpAndStats();

// setInterval(generateHeapDumpAndStats, 2000); //Do garbage collection and heap dump every 2 seconds

Other helpful tools to run with test

Redis shell

redis-cli
monitor

RDM

https://redisdesktop.com/

Redis stat

https://github.com/junegunn/redis-stat

Current benchmarks

For the test including all 100,000 reviews, the stats were as follows. Keep in mind that for this test, raccoon's recommendations were turned into binary 0's or 1's and the ratings were change from 1-3 to 0 and from 4-5 to 1. This means that when raccoon makes a guess it is exactly right or exactly wrong.

Compared 943 users

RMSE = 0.5712
Prediction Accuracy: 0.7106% -- 14189 out of 19968 correct
Unrated: 0.0016% -- 32 out of 20000 total
Guessed high: 0.6560% -- 3791 high out of 5779 wrong
Total time: 25.3 min

Redis Info

Server

redis_version:3.2.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:1ef1a7bf74a705e5
redis_mode:standalone
os:Darwin 14.5.0 x86_64
arch_bits:64
multiplexing_api:kqueue
gcc_version:4.2.1
process_id:2278
run_id:ce00cc046008eee0b706f891f840c40a9167b7be
tcp_port:6379
uptime_in_seconds:2626
uptime_in_days:0
hz:10
lru_clock:7582645
executable:/Users/guymorita/redis-server
config_file:

Clients

connected_clients:2
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

Memory

used_memory:84214816
used_memory_human:80.31M
used_memory_rss:287903744
used_memory_rss_human:274.57M
used_memory_peak:428091184
used_memory_peak_human:408.26M
total_system_memory:17179869184
total_system_memory_human:16.00G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:3.42
mem_allocator:libc

Persistence

loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1483977563
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

Stats

total_connections_received:2
total_commands_processed:26591038
instantaneous_ops_per_sec:0
total_net_input_bytes:1662496069
total_net_output_bytes:119113895540
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
evicted_keys:0
keyspace_hits:28725883
keyspace_misses:95512
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:1885
migrate_cached_sockets:0

Replication

role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

CPU

used_cpu_sys:81.84
used_cpu_user:131.83
used_cpu_sys_children:0.17
used_cpu_user_children:1.78

Cluster

cluster_enabled:0

Keyspace

db0:keys=6786,expires=0,avg_ttl=0

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
build		build
lib		lib
logs		logs
ml-100k		ml-100k
.gitignore		.gitignore
README.md		README.md
convertToYaml.js		convertToYaml.js
package.json		package.json
redis_key_sizes.sh		redis_key_sizes.sh
test.js		test.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movielens 100k Benchmark using Raccoon

How to run test

Other helpful tools to run with test

Redis shell

RDM

Redis stat

Current benchmarks

Redis Info

Server

Clients

Memory

Persistence

Stats

Replication

CPU

Cluster

Keyspace

About

Releases

Packages

Languages

guymorita/benchmark_raccoon_movielens

Folders and files

Latest commit

History

Repository files navigation

Movielens 100k Benchmark using Raccoon

How to run test

Other helpful tools to run with test

Redis shell

RDM

Redis stat

Current benchmarks

Redis Info

Server

Clients

Memory

Persistence

Stats

Replication

CPU

Cluster

Keyspace

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages