Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

Implement buzhash #16

Merged
merged 8 commits into from
Oct 7, 2019
Merged

Implement buzhash #16

merged 8 commits into from
Oct 7, 2019

Conversation

Kubuxu
Copy link
Member

@Kubuxu Kubuxu commented Oct 6, 2019

It has the same properites as Rabin.

Benchmark results:

name             time/op
Buzhash2/1K-4       601ns ± 4%
Buzhash2/1M-4       910µs ± 2%
Buzhash2/16M-4     18.8ms ± 1%
Buzhash2/100M-4     116ms ± 2%
Rabin/1K-4         74.3µs ± 3%
Rabin/1M-4         6.12ms ± 3%
Rabin/16M-4        92.1ms ± 4%
Rabin/100M-4        576ms ± 3%
Default/1K-4        590ns ± 5%
Default/1M-4        326µs ± 2%
Default/16M-4      4.32ms ± 2%
Default/100M-4     24.9ms ± 6%

name             speed
Buzhash2/1K-4    1.71GB/s ± 4%
Buzhash2/1M-4    1.15GB/s ± 2%
Buzhash2/16M-4    892MB/s ± 1%
Buzhash2/100M-4   904MB/s ± 2%
Rabin/1K-4       13.8MB/s ± 3%
Rabin/1M-4        171MB/s ± 3%
Rabin/16M-4       182MB/s ± 4%
Rabin/100M-4      182MB/s ± 3%
Default/1K-4     1.74GB/s ± 4%
Default/1M-4     3.22GB/s ± 2%
Default/16M-4    3.88GB/s ± 2%
Default/100M-4   4.21GB/s ± 6%

name             alloc/op
Buzhash2/1K-4      1.17kB ± 0%
Buzhash2/1M-4      1.08MB ± 1%
Buzhash2/16M-4     17.1MB ± 0%
Buzhash2/100M-4     106MB ± 0%
Rabin/1K-4          402kB ± 1%
Rabin/1M-4         2.25MB ± 0%
Rabin/16M-4        19.2MB ± 0%
Rabin/100M-4        108MB ± 0%
Default/1K-4       1.14kB ± 0%
Default/1M-4       1.05MB ± 0%
Default/16M-4      16.8MB ± 0%
Default/100M-4      105MB ± 0%

name             allocs/op
Buzhash2/1K-4        3.00 ± 0%
Buzhash2/1M-4        7.00 ± 0%
Buzhash2/16M-4       71.0 ± 0%
Buzhash2/100M-4       406 ± 0%
Rabin/1K-4           8.00 ± 0%
Rabin/1M-4           22.0 ± 0%
Rabin/16M-4           204 ± 0%
Rabin/100M-4        1.21k ± 0%
Default/1K-4         3.00 ± 0%
Default/1M-4         7.00 ± 0%
Default/16M-4        70.0 ± 0%
Default/100M-4        406 ± 0%

License: MIT

It has the same properites as Rabin.

Benchmark results:
```
name       time/op
Buzhash-4    14.3ms ± 7%
Rabin-4      94.1ms ± 3%
Default-4    1.74ms ± 7%

name       speed
Buzhash-4  1.18GB/s ± 7%
Rabin-4     178MB/s ± 3%
Default-4  9.63GB/s ± 6%

name       alloc/op
Buzhash-4    14.0kB ±48%
Rabin-4      19.2MB ± 0%
Default-4      474B ± 6%

name       allocs/op
Buzhash-4      1.00 ± 0%
Rabin-4         196 ±12%
Default-4      2.00 ± 0%
```

License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look a lot simpler. My main concern is configurability. We don't need arbitrary block sizes but having options is still useful.

buzhash.go Outdated Show resolved Hide resolved
buzhash.go Outdated Show resolved Hide resolved
buzhash.go Outdated Show resolved Hide resolved

const (
buzMin = 128 << 10
buzMax = 512 << 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we configure these? Can we configure the expected/average chunk size?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check what is the perf penalty for making these configurable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6%:

name       old time/op    new time/op    delta
Buzhash-4    14.2ms ± 7%    15.1ms ± 6%   +6.36%  (p=0.000 n=20+19)

name       old speed      new speed      delta
Buzhash-4  1.18GB/s ± 6%  1.11GB/s ±14%   -6.64%  (p=0.000 n=20+20)

Your call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying. I'm fine leaving that off for now.

buzhash.go Outdated Show resolved Hide resolved
Jakub Sztandera added 2 commits October 6, 2019 13:53
License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
@Kubuxu
Copy link
Member Author

Kubuxu commented Oct 6, 2019

By eliminating Reader interface and implementing buzhash chunking on a []byte I was able to get 1.3GB/s. So this is the upper limit of for it:

name         time/op
BuzNoRead-4    12.3ms ±10%

name         speed
BuzNoRead-4  1.37GB/s ± 9%

name         alloc/op
BuzNoRead-4    3.97kB ± 0%

name         allocs/op
BuzNoRead-4      2.00 ± 0%

Don't return them either in benchmarks

License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
@Kubuxu
Copy link
Member Author

Kubuxu commented Oct 6, 2019

buzhash.go Outdated
return nil, b.err
}

buf := b.buf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: buf is always b.buf. IMO, we should just use b.buf direct.y.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good nit, it didn't use to be like that.

License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
Jakub Sztandera added 3 commits October 7, 2019 12:02
License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>

License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
License: MIT
Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
@Kubuxu Kubuxu merged commit 21b0c06 into master Oct 7, 2019
@ribasushi ribasushi deleted the feat/buzhash branch March 26, 2020 09:22
Jorropo pushed a commit to ipfs/go-libipfs-rapide that referenced this pull request Mar 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants