Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: sharded key-value store for fix-length blobs #2685

Merged
merged 22 commits into from
Feb 23, 2022
Merged

Conversation

acud
Copy link
Member

@acud acud commented Nov 24, 2021

Checklist

  • Chunk size with span (plus SOC preamble)
  • Remove buffer on read-op optimization
  • My change requires a documentation update and I have done it
  • I have added tests to cover my changes.

Description


This change is Reviewable

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r2, all commit messages.
Reviewable status: 2 of 3 files reviewed, 5 unresolved discussions (waiting on @acud and @mrekucci)


pkg/sharky/shard.go, line 13 at r2 (raw file):

)

// location models the location <shard, offset, length> of a chunk

Comment should start with Location.


pkg/sharky/shard.go, line 36 at r2 (raw file):

	index    uint8           // index of the shard
	limit    int64           // max number of items in the shard
	fh       *os.File        // the file handle the shard is writing data to

I'd suggest using some more meaningful names instead of the fh (maybe file) and ffh (maybe fileFree).


pkg/sharky/shards.go, line 31 at r2 (raw file):

	ErrTooLong         = errors.New("data too long")
	ErrCapacityReached = errors.New("capacity reached")

This variable is unused.


pkg/sharky/shards.go, line 34 at r2 (raw file):

)

// models the sharded chunkdb

The doc comment should start with Shards.


pkg/sharky/shards.go, line 66 at r2 (raw file):

func (s *Shards) Close() error {
	close(s.quit)
	errs := []string{}

I'd suggest using multierror package we're already using.


pkg/sharky/shards.go, line 109 at r2 (raw file):

	wg := &sync.WaitGroup{}
	if ffi.Size() > 0 {
		frees, err := ioutil.ReadAll(ffh)

The ioutil is deprecated, use io instead: https://go.dev/doc/go1.16


pkg/sharky/shards.go, line 188 at r2 (raw file):

}

func (s *Shards) Release(ctx context.Context, loc Location) {

If the context.Context it's not used it shouldn't be passed as param.


pkg/sharky/sharky_test.go, line 121 at r2 (raw file):

	}

	// check location and data consisency

consisency -> consistency

@mrekucci mrekucci self-requested a review November 25, 2021 13:05
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @acud and @mrekucci)


pkg/sharky/shard.go, line 36 at r2 (raw file):

Previously, mrekucci wrote…

I'd suggest using some more meaningful names instead of the fh (maybe file) and ffh (maybe fileFree).

also possible to abstract this as a ReaderAt/WriterAt


pkg/sharky/shard.go, line 43 at r2 (raw file):

// forever loop processing
func (sh *shard) offer(size int64) {

the free slots could be passed here as args and managed as a FIFO queue on disc


pkg/sharky/shard.go, line 72 at r2 (raw file):

		case <-sh.quit:
			// remember free offset in limbo
			sh.freed <- offset

double allocation issue here


pkg/sharky/shard.go, line 92 at r2 (raw file):

			// this condition checks if an offset is in limbo (popped but not used for write op)
			if writeOps != nil {
				sh.freed <- offset

double allocation issue here


pkg/sharky/shard.go, line 118 at r2 (raw file):

				// again put back offset in limbo
				if writeOps != nil {
					sh.freed <- offset

double allocation issue here


pkg/sharky/shard.go, line 134 at r2 (raw file):

	free := []int64{}
	for offset := range sh.freed {
		free = append(free, offset)

at the places marked with "double allocation issue here", a last-position location reference is remembered as free, it saved here and offered the next time we start the node
solution: check if offset==size if yes

  • then not put in freed channel (caveat: need to apply at 5 places)
  • not saved (caveat: here size is not available)
  • filter at bootup when free slots file is read (caveat: the slot is sstored in the free slots list, even though we know it cannot win)

pkg/sharky/shard.go, line 147 at r2 (raw file):

	}
	if _, err := sh.ffh.Write(frees); err != nil {
		return err

we should read the whole file in at once.


pkg/sharky/shard.go, line 170 at r2 (raw file):

	if err != nil {
		go func() {
			sh.freed <- offset

double allocation issue here


pkg/sharky/shards.go, line 95 at r2 (raw file):

		return nil, err
	}
	size := fi.Size() / DataSize

@acud here size is initialised to a value based on how many blobs actually fit in the file.


pkg/sharky/shards.go, line 120 at r2 (raw file):

		for _, offset := range free {
			offset := offset
			if offset/DataSize >= size {

this is actually the solution 3 to the double allocation issue here. So it was false alarm. yet this should be commented here. otherwise not clear why this is here.


pkg/sharky/shards.go, line 124 at r2 (raw file):

			}
			wg.Add(1)
			go func() {

we should probably just pass the variable free to offerA function


pkg/sharky/shards.go, line 143 at r2 (raw file):

	}
	sh.wg.Add(2) // initialisation requires so that s.wg.Wait() does not hold prematurely
	go sh.offer(size)

@acud so here it is. after unclean shutdown, a shard's slots will count as taken and offer will make sure to generate slots from size upto limit whwn needed,
as long as there is no deletion during the migration only the one free slot in limbo can be lost.

@zelig zelig changed the title sharky: key-value DB fix-length blobstore sharky: sharded key-value store for fix-length blobs Nov 28, 2021
@acud acud added this to the 1.5.0 milestone Dec 7, 2021
@ldeffenb
Copy link
Collaborator

ldeffenb commented Jan 5, 2022

Looking forward to building this into my testnet nodes! Any reason not to do this once it merges? Anything to watch out for specifically?

@aloknerurkar aloknerurkar force-pushed the sharky-pkg branch 2 times, most recently from 060da23 to 26f14b7 Compare January 17, 2022 08:11
@mrekucci mrekucci force-pushed the sharky-pkg branch 2 times, most recently from 17d88a2 to 5bce3c8 Compare January 17, 2022 10:07
pkg/sharky/shard.go Outdated Show resolved Hide resolved
pkg/swarm/swarm.go Outdated Show resolved Hide resolved
Copy link
Member Author

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 17 of 29 files at r13, 1 of 2 files at r14, 4 of 8 files at r15, 1 of 1 files at r16, 2 of 7 files at r17.
Reviewable status: 25 of 39 files reviewed, 18 unresolved discussions (waiting on @acud, @aloknerurkar, @mrekucci, and @zelig)


pkg/sharky/shard_slots_test.go, line 1 at r17 (raw file):

package sharky

code license missing

Copy link
Member Author

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some work still needed on this one

pkg/api/soc_test.go Outdated Show resolved Hide resolved
pkg/localstore/disaster_recovery.go Outdated Show resolved Hide resolved
pkg/localstore/gc.go Outdated Show resolved Hide resolved
pkg/localstore/localstore.go Outdated Show resolved Hide resolved
pkg/localstore/localstore.go Outdated Show resolved Hide resolved
pkg/sharky/store.go Outdated Show resolved Hide resolved
pkg/sharky/store.go Outdated Show resolved Hide resolved
pkg/sharky/store.go Outdated Show resolved Hide resolved
pkg/sharky/store.go Show resolved Hide resolved
pkg/soc/soc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@mrekucci mrekucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r11, 15 of 29 files at r13, 1 of 2 files at r14, 3 of 8 files at r15, 1 of 1 files at r16, 1 of 7 files at r17, 17 of 17 files at r18, all commit messages.
Reviewable status: all files reviewed, 49 unresolved discussions (waiting on @acud, @aloknerurkar, @mrekucci, and @zelig)


pkg/localstore/disaster_recovery.go, line 28 at r18 (raw file):

	// first define the index instance
	headerSize := 16 + postage.StampSize

This can be a constant.


pkg/localstore/disaster_recovery_test.go, line 16 at r18 (raw file):

func TestRecovery(t *testing.T) {
	chunkCount := 150

This can be a constant.


pkg/localstore/export.go, line 60 at r18 (raw file):

	}

	ctx := context.Background()

Since the only usage of the context is in db.sharky.Read I'd suggest moving it directly there without creating a new variable. Also consider using context.TODO() instead,

@mrekucci mrekucci self-requested a review February 1, 2022 19:27
@acud acud removed the request for review from zelig February 3, 2022 20:15
Copy link
Member Author

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice. minor comments

pkg/localstore/mode_put.go Outdated Show resolved Hide resolved
pkg/localstore/localstore.go Outdated Show resolved Hide resolved
pkg/sharky/store.go Show resolved Hide resolved
s.metrics.TotalReleaseCalls.Inc()
if err == nil {
shard := strconv.Itoa(int(sh.index))
s.metrics.CurrentShardSize.WithLabelValues(shard).Sub(1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this should hold the size of shard in actual stored chunks... but it might be also useful to somehow keep track of the actual size of the shard on the disk (i'm not sure if this can be done in some way which is not too ugly since you might need to leak stuff from the slots component). this would allow us to know for example if there's any leakage of the shard (i.e. shard keeps growing while there are free slots), etc

@acud
Copy link
Member Author

acud commented Feb 22, 2022

/approve

@acud acud changed the title sharky: sharded key-value store for fix-length blobs feat!: sharded key-value store for fix-length blobs Feb 23, 2022
@aloknerurkar aloknerurkar merged commit ebe988e into master Feb 23, 2022
@aloknerurkar aloknerurkar deleted the sharky-pkg branch February 23, 2022 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants