Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chunked encoding of large directories #16

Merged
merged 3 commits into from
Aug 19, 2024

Conversation

klauspost
Copy link
Collaborator

@klauspost klauspost commented Jul 15, 2024

Above 25K entries do separate chunks to make FindSerialized able to search without full decompression, giving it a linear time characteristic. Minor compression loss (in artificial test set 1 byte/entry).

BEFORE:

BenchmarkFindSerialized/100-32         	   84632	     14027 ns/op	        24.28 b/file	         84632 files/s	    5062 B/op	       4 allocs/op
BenchmarkFindSerialized/1000-32        	   13371	     88087 ns/op	        23.32 b/file	         13371 files/s	   49398 B/op	       4 allocs/op
BenchmarkFindSerialized/10000-32       	    1491	    769408 ns/op	        20.66 b/file	          1491 files/s	  451106 B/op	       4 allocs/op
BenchmarkFindSerialized/100000-32      	     163	   7350855 ns/op	        20.25 b/file	         163.0 files/s	 4499216 B/op	       5 allocs/op
BenchmarkFindSerialized/1000000-32     	      15	 175437187 ns/op	        20.16 b/file	         7.500 files/s	44978624 B/op	       6 allocs/op

AFTER:
BenchmarkFindSerialized/100-32         	   88186	     13926 ns/op	        24.28 b/file	     88186 files/s	     216 B/op	       4 allocs/op
BenchmarkFindSerialized/1000-32        	   14157	     82875 ns/op	        23.32 b/file	     14157 files/s	     220 B/op	       4 allocs/op
BenchmarkFindSerialized/10000-32       	    1562	    757547 ns/op	        20.66 b/file	      1562 files/s	     225 B/op	       4 allocs/op
BenchmarkFindSerialized/100000-32      	     567	   2101872 ns/op	        21.66 b/file	       567.0 files/s	 1164419 B/op	      10 allocs/op
BenchmarkFindSerialized/1000000-32     	     583	   2035905 ns/op	        21.73 b/file	       583.0 files/s	 1181472 B/op	      47 allocs/op

When warm it provides a nice speedup:

```
BEFORE:

BenchmarkFindSerialized/100-32         	   84632	     14027 ns/op	        24.28 b/file	         84632 files/s	    5062 B/op	       4 allocs/op
BenchmarkFindSerialized/1000-32        	   13371	     88087 ns/op	        23.32 b/file	         13371 files/s	   49398 B/op	       4 allocs/op
BenchmarkFindSerialized/10000-32       	    1491	    769408 ns/op	        20.66 b/file	          1491 files/s	  451106 B/op	       4 allocs/op
BenchmarkFindSerialized/100000-32      	     163	   7350855 ns/op	        20.25 b/file	         163.0 files/s	 4499216 B/op	       5 allocs/op
BenchmarkFindSerialized/1000000-32     	      15	 175437187 ns/op	        20.16 b/file	         7.500 files/s	44978624 B/op	       6 allocs/op

AFTER:
BenchmarkFindSerialized/100-32         	   89673	     13144 ns/op	        24.28 b/file	      89673 files/s	     216 B/op	       4 allocs/op
BenchmarkFindSerialized/1000-32        	   15356	     78862 ns/op	        23.32 b/file	      15356 files/s	     220 B/op	       4 allocs/op
BenchmarkFindSerialized/10000-32       	    1663	    719424 ns/op	        20.66 b/file	       1663 files/s	     228 B/op	       4 allocs/op
BenchmarkFindSerialized/100000-32      	     170	   6894286 ns/op	        20.25 b/file	       170.0 files/s	 26760 B/op	       4 allocs/op
BenchmarkFindSerialized/1000000-32     	      16	  67916169 ns/op	        20.16 b/file	       16.00 files/s	 2812044 B/op	   4 allocs/op
```
@klauspost klauspost force-pushed the internal-decomp-buffer branch 2 times, most recently from 9da8402 to 8e036b3 Compare August 16, 2024 08:14
@klauspost klauspost force-pushed the internal-decomp-buffer branch from 8e036b3 to bda8c4d Compare August 16, 2024 08:17
Kicks in at 25000 dir entries. Ensures that performance will be somewhat linear.

Bumps index limit to 1B.

```
BenchmarkFindSerialized/100-32         	   88186	     13926 ns/op	        24.28 b/file	     88186 files/s	     216 B/op	       4 allocs/op
BenchmarkFindSerialized/1000-32        	   14157	     82875 ns/op	        23.32 b/file	     14157 files/s	     220 B/op	       4 allocs/op
BenchmarkFindSerialized/10000-32       	    1562	    757547 ns/op	        20.66 b/file	      1562 files/s	     225 B/op	       4 allocs/op
BenchmarkFindSerialized/100000-32      	     567	   2101872 ns/op	        21.66 b/file	       567.0 files/s	 1164419 B/op	      10 allocs/op
BenchmarkFindSerialized/1000000-32     	     583	   2035905 ns/op	        21.73 b/file	       583.0 files/s	 1181472 B/op	      47 allocs/op
```
@klauspost klauspost changed the title Add internal decompression buffers Add chunked encoding of large directories Aug 19, 2024
@harshavardhana harshavardhana merged commit 44c138d into minio:main Aug 19, 2024
4 checks passed
@klauspost klauspost deleted the internal-decomp-buffer branch August 20, 2024 07:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants