GitHub - Shaked/rg-stream-sampler: Stream Sampler

Stream Sampler - Description

In different scenarios its hard to know the size of a possible list of items, therefore, it is required to randomly choose k samples from the list (where n is either a very large or unknown number).

There are probably more ways to solve this issue but I have focused on 2 common algorithms:

The first algorithm is simple but less efficient, especially for a large number of k samples. The complexity of the algorithm is O(K^2) as a result of a in-loop search.
The second algorithm is called Reservoir Sampling which is known for this type of problems. Having said that, this algorithm might be tricky when it comes to multi threading and a large number of K. The complexity of the algorithm is O(N).

Usage

First install dependencies:

make install

If installed, you can use make update later on.

The sampler supports 3 input methods:

Piped command line input (STDIN)

./bin/rg-stream-sampler -i THEQUICKBROWNFOXJUMPSOVERTHELAZYDOG

Self randomized strings

Using the -r you can decide what would be the size of the random text

./bin/rg-stream-sampler -r 10

Loading values remotely

./bin/rg-stream-sampler -i "https://www.random.org/strings/?num=1000&len=20&digits=on&upperalpha=on&loweralpha=on&unique=on&format=plain&rnd=new"

The default algorithm is Reservoir Sampling. You can choose to use the sequence one by adding -a seq.

Testing

In order to run the tests you should run the following commands:

make install //if not installed already
make test

Coverage is also available using make cover (and it will open the browser directly after). This won't work on PHP 7 as xdebug is/was not available at the time of writing it.

Benchmarking

As I have decided to use 2 algorithms, it might worth showing the benchmark results:

Sampler\Benchmarks\ReservoirEvent
    Method Name             Iterations    Average Time      Ops/second
    ---------------------  ------------  --------------    -------------
    benchReservoir10k    : [1,000     ] [0.2484058105946] [4.02567]
    benchReservoir100k   : [1,000     ] [0.2644526109695] [3.78140]
    benchReservoir1000k  : [1,000     ] [0.2710286958218] [3.68965]
    benchReservoir10000k : [500       ] [0.2710664744377] [3.68913]
    benchReservoir100000k: [10        ] [0.2284299612045] [4.37771]


Sampler\Benchmarks\SequenceEvent
    Method Name            Iterations    Average Time      Ops/second
    --------------------  ------------  --------------    -------------
    benchSequence10k    : [1,000     ] [0.1142126736641] [8.75560]
    benchSequence100k   : [1,000     ] [0.1222488436699] [8.18004]
    benchSequence1000k  : [1,000     ] [0.1420348870754] [7.04052]
    benchSequence10000k : [10        ] [2.5817480802536] [0.38733]
    benchSequence100000k: [1         ] [298.1928839683533] [0.00335]

You can run the benchmarks yourself using make bench.

Note that the benchmarks take time, so in case you wish to change it, you can go to benchmark directory and change the annotation accordingly.

TODO

Unicode Support

At the moment there is no unicode support. One clear example is the use of str_split which:

str_split() will split into bytes, rather than characters when dealing with a multi-byte encoded string. (php.net)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
src		src
tests/Cli		tests/Cli
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
composer.json		composer.json
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stream Sampler - Description

Usage

Piped command line input (STDIN)

Self randomized strings

Loading values remotely

Testing

Benchmarking

TODO

Unicode Support

About

Releases

Packages

Languages

License

Shaked/rg-stream-sampler

Folders and files

Latest commit

History

Repository files navigation

Stream Sampler - Description

Usage

Piped command line input (STDIN)

Self randomized strings

Loading values remotely

Testing

Benchmarking

TODO

Unicode Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages