Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

localstore: reduce critical section size on gc #1435

Merged
merged 5 commits into from
Mar 19, 2021
Merged

Conversation

acud
Copy link
Member

@acud acud commented Mar 15, 2021

This PR changes the way we do the GC in localstore. Previous approach was locking the database for quite a long time, causing other protocols to slow down for a significant amount of time. This changeset introduces a flag that indicates whether gc is running, and when it does - chunk addresses that are changed in the database are logged to a slice. Once the gc collects enough candidates, it checks the slice for possible dirty candidates, leaving the dirty ones outside the GC round, then commits the result to leveldb.

Tested on our staging cluster and shows an order of magnitude improvement in the time we spend under lock while garbage collection is running.

The slice of dirty addresses can later be optimized to be a cuckoo or a bloom filter.

@acud acud requested a review from janos March 15, 2021 14:37
@acud acud self-assigned this Mar 15, 2021
@acud acud requested a review from zelig March 15, 2021 14:37
@acud acud added the ready for review The PR is ready to be reviewed label Mar 15, 2021
@acud acud force-pushed the localstore-gc-less-lock branch from 78ebe84 to 3541c0b Compare March 15, 2021 14:37
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks!

@acud acud mentioned this pull request Mar 16, 2021
@@ -109,6 +117,7 @@ func (db *DB) collectGarbage() (collectedCount uint64, done bool, err error) {
done = true
first := true
start := time.Now()
candidates := make([]shed.Item, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidates array should be package level preallocated with gcBatchSize length

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this will save on allocations, but then the cleanup would be ugly since we'd want to get rid of entries from the previous iteration, so in order to actually save the allocation we'd have to iterate over every item in the slice, setting it to nil, otherwise we'd have dangling shed.Items in memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @janos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) just remember how many you fill in, no need to go through it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could work nicely, just to make sure that every time before the slice is sliced to zero length, it is iterated up to its length and set every element to "zero" Item. Zero item is the item with all slices as nil (important for memory usage) and integers as 0 (not important for memory usage). That way, no dangling Items will exist. Basically a type that would be []shed.Item with reset method and contains method which would replace swrm.Address.MemberOf as per other @zelig's comment which I agree on, also. Somehow, this container naturally imposes a new type, even for future improvements as the data structure may change, as suggested in the description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so.... not sure if i understand but is this a blocker?

Copy link
Member Author

@acud acud Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) just remember how many you fill in, no need to go through it

This is incorrect, since you will have danging items on the last gc run after which the gc target has been met, leaving dangling shed.Items in slice. We therefore have to iterate over the whole slice every time we finish GCing (either in the last iteration, or on every iteration) for those items to not be referenced anymore, making them available for garbage collection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@acud For the first iteration it is also fine as it is, for me.


// get rid of dirty entries
for _, item := range candidates {
if swarm.NewAddress(item.Address).MemberOf(db.dirtyAddresses) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other way round, container is usually the receiver struct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we'd need to introduce a new type alias for []swarm.Address which is something I'd like to avoid

pkg/localstore/gc.go Show resolved Hide resolved
pkg/localstore/gc_test.go Outdated Show resolved Hide resolved
addrs := make([]swarm.Address, 0)
mtx := new(sync.Mutex)
online := make(chan struct{})
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need a diff go routine? you waiting for it to terminate anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm waiting for it to be scheduled. There's no defer call on the close. Otherwise we might get test flakes

t.Error(err)
}
i++
time.Sleep(100 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sleep here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Gets update the gc and this happens in a different goroutine. So we sleep to guarantee that the update happens, resulting in the dirtyAddresses to be updated

Comment on lines +779 to +794
for i := 0; i < chunkCount; i++ {
ch := generateTestRandomChunk()

_, err := db.Put(context.Background(), storage.ModePutUpload, ch)
if err != nil {
t.Fatal(err)
}

err = db.Set(context.Background(), storage.ModeSetSync, ch.Address())
if err != nil {
t.Fatal(err)
}
mtx.Lock()
addrs = append(addrs, ch.Address())
mtx.Unlock()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    chunks := generateTestRandomChunks(chunkCount)
    _, err := db.Put(context.Background(), storage.ModePutSync, chunks...)
    if err != nil {
	t.Fatal(err)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then we'd still need to create a for loop to get the individual chunk addresses so that we could Set and add to the addrs slice

@acud acud requested a review from zelig March 16, 2021 16:28
@acud acud merged commit b90bd7c into master Mar 19, 2021
@acud acud deleted the localstore-gc-less-lock branch March 19, 2021 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull-request ready for review The PR is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants