Skip to content

publicgoodsw/go_simhash_experiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go Simhash Experiment

To Build

  1. Use Go v1.11 or better. Make sure you have GO111MODULE set to on in your build environment, since this experiment uses Go modules.
  2. Check out this repo into whatever directory, outside your GOPATH.
  3. Inside the repo, run go build ./...

To Prepare

First, be sure you have a tunnel open to the ContentID database. You will need the PGS AWS Salt Test SSH key.

ssh -i PATH_TO_PGS_SALT_TEST_KEY -L 54321:cluster01-content-id.cluster-c4uwuietvovl.us-east-1.rds.amazonaws.com:5432 ec2-user@cid-jump.pgs.io -N -f

Test your tunnel by connecting with psql:

psql -h localhost -p 54321 -U pgs -d content_id -W

That should ask you for the password. If you don't have it, ask EKINGERY or MSM.

Export the following variables into your environment:

export SHE_dbhost=localhost
export SHE_dbpass=PUT_THE_DB_PASSWORD_HERE
export SHE_dbname=content_id
export SHE_dbport=54321
export SHE_dbuser=pgs

To Run

simhash -n 18436187482098755291 -s 90 -d 7

Looking for hash 18436187482098755291 (ffda7ed5faffe6db), distance 7, since 90 days ago
533440 hashes returned
load took 5.86575223s
no matches found
hash search took 463.037µs

The experiment takes the following command line arguments:

-n: The hash, as a decimal integer, that you're looking for.

-s: The window backwards from now in days. Default is 10000 which should pick up all hashes for all time in the database.

-d: The Hamming distance to search for.

To Populate

We are going to want to test this on a much larger corpus of simhashes than we currently have available, so the populate directory contains a little utility that will create random documents, simhash them, and stuff those simhash records into the database.

To add 1,000,000 records to the DB:

populate -c 1000000

Expect this to take quite some time.

About

Go simhash lookup experiment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages