Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid panic from releasing unacquired semaphore #363

Merged
merged 1 commit into from
Dec 6, 2022

Conversation

abursavich
Copy link
Contributor

When using this pattern:

eg, egCtx := errgroup.WithContext(ctx)
for /* ... */ {
	limiter.Acquire(ctx, 1)
	eg.Go(func() error {
		defer limiter.Release(1)
		// ...
		return nil
	})
}
if err := eg.Wait(); err != nil {
	return err
}

If ctx is cancelled then Acquire may return an error, which will then cause Release to panic after it's called more times than the semaphore was successfully acquired.

This change uses the SetLimit functionality which has been added to errgroup.Group and provides a small wrapper around it in the internal/syncutil package to improve the ergonomics of its usage.

@abursavich abursavich force-pushed the semerr branch 3 times, most recently from a7188d3 to b19b776 Compare November 21, 2022 19:20
@shizhMSFT shizhMSFT requested a review from Wwwsylvia November 22, 2022 14:19
@codecov-commenter
Copy link

Codecov Report

Merging #363 (b19b776) into main (621970f) will decrease coverage by 0.05%.
The diff coverage is 18.75%.

@@            Coverage Diff             @@
##             main     #363      +/-   ##
==========================================
- Coverage   72.16%   72.10%   -0.06%     
==========================================
  Files          42       43       +1     
  Lines        4092     4094       +2     
==========================================
- Hits         2953     2952       -1     
- Misses        857      862       +5     
+ Partials      282      280       -2     
Impacted Files Coverage Δ
internal/syncutil/limitgroup.go 0.00% <0.00%> (ø)
content.go 74.55% <100.00%> (-1.20%) ⬇️
internal/syncutil/merge.go 87.50% <0.00%> (+17.85%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@shizhMSFT
Copy link
Contributor

@abursavich Thank you for your contribution.

I thought about using SetLimit instead of semaphore. However, when I look into the implementation of SetLimit(), I find

func (g *Group) SetLimit(n int) {
	if n < 0 {
		g.sem = nil
		return
	}
	if len(g.sem) != 0 {
		panic(fmt.Errorf("errgroup: modify limit while %v goroutines in the group are still active", len(g.sem)))
	}
	g.sem = make(chan token, n)
}

The statement g.sem = make(chan token, n) makes sync.Group with SetLimit(n int) having a space complexity of O(n) while the semaphore.Weighted has a space complexity of O(1). Therefore, I think the original implementation is better.

However, your concern is still valid. The pattern

eg, egCtx := errgroup.WithContext(ctx)
for /* ... */ {
	limiter.Acquire(ctx, 1)
	eg.Go(func() error {
		defer limiter.Release(1)
		// ...
		return nil
	})
}
if err := eg.Wait(); err != nil {
	return err
}

is flawed since the error of limiter.Acquire(ctx, 1) is never checked. I think the fix will be as simple as checking the error as follows.

eg, egCtx := errgroup.WithContext(ctx)
for /* ... */ {
	if err := limiter.Acquire(ctx, 1); err != nil {
		return err
	}
	eg.Go(func() error {
		defer limiter.Release(1)
		// ...
		return nil
	})
}
if err := eg.Wait(); err != nil {
	return err
}

@shizhMSFT
Copy link
Contributor

BTW, thank you for informing golang.org/x/sync has a tagged version v0.1.0.

@abursavich
Copy link
Contributor Author

abursavich commented Nov 22, 2022

The statement g.sem = make(chan token, n) makes sync.Group with SetLimit(n int) having a space complexity of O(n) while the semaphore.Weighted has a space complexity of O(1). Therefore, I think the original implementation is better.

The type used in the channel is actually 0 bytes: https://go.dev/play/p/ctojkieS8zW

type token struct{}

// ...

func (g *Group) SetLimit(n int) {
	// ...
	g.sem = make(chan token, n)
}

So its space complexity ends up being O(1) too :)

I think the fix will be as simple as checking the error as follows.

limiter := semaphore.NewWeighted(opts.Concurrency)
eg, egCtx := errgroup.WithContext(ctx)
for /* ... */ {
	if err := limiter.Acquire(ctx, 1); err != nil {
		return err
	}
	eg.Go(func() error {
		defer limiter.Release(1)
		// ...
		return nil
	})
}
if err := eg.Wait(); err != nil {
	return err
}

This was also my initial fix, but I ended up changing it because:

  • If you use ctx to acquire the limiter, it will keep going after an error has occurred and egCtx has been cancelled.
  • If you use egCtx to acquire the limiter to fix that problem, it will return a "context cancelled" error instead of the underlying error that triggered the cancellation.

Then I did this, which fixed all of those problems:

limiter := semaphore.NewWeighted(opts.Concurrency)
eg, egCtx := errgroup.WithContext(ctx)
for /* ... */ {
	if err := limiter.Acquire(ctx, 1); err != nil {
		return err
	}
	eg.Go(func() error {
		defer limiter.Release(1)
		select {
		case <-egCtx.Done():
			return egCtx.Err()
		default:
		}
		// ...
		return nil
	})
}
if err := eg.Wait(); err != nil {
	return err
}

But I felt like that was too complicated to be repeating everywhere, so I switched to SetLimit since its built into errgroup and got rid of the limiter, which helped some.

eg, egCtx := errgroup.WithContext(ctx)
eg.SetLimit(int(opts.Concurrency))
for /* ... */ {
	eg.Go(func() error {
		select {
		case <-egCtx.Done():
			return egCtx.Err()
		default:
		}
		// ...
		return nil
	})
}
if err := eg.Wait(); err != nil {
	return err
}

But that still felt a little too complicated to be repeating everywhere, so I encapsulated all the grunge into LimitGroup, leaving behind:

eg, egCtx := syncutil.NewLimitGroup(ctx, int(opts.Concurrency))
for /* ... */ {
	eg.Go(func() error {
		// ...
		return nil
	})
}
if err := eg.Wait(); err != nil {
	return err
}

@shizhMSFT
Copy link
Contributor

@abursavich You are right. The struct{} is indeed special and has zero cost in the go channel implementation.

c.buf = mallocgc(mem, elem, true)

content.go Outdated
@@ -119,13 +118,10 @@ func TagN(ctx context.Context, target Target, srcReference string, dstReferences
return err
}

limiter := semaphore.NewWeighted(opts.Concurrency)
eg, egCtx := errgroup.WithContext(ctx)
eg, egCtx := syncutil.NewLimitGroup(ctx, int(opts.Concurrency))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue on casting int64 to int. What if we lose precisions?

@Wwwsylvia Should we change the type of all occurrences of Concurrency to int from int64?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah int should be enough.

Copy link
Member

@Wwwsylvia Wwwsylvia Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent a PR for this. See #376.


// A LimitGroup is a collection of goroutines working on subtasks that are part of
// the same overall task.
type LimitGroup struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of LimitGroup / NewLimitGroup() pair, it is better to align the name to LimitedGroup / LimitGroup(), just like io.LimitedReader / io.LimitReader() and LimitedRegion / LimitRegion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with the change to the constructor's name, but I made the change as requested.

LimitReader is taking in a Reader and limiting it, so its name is like a command "limit this reader". I think NewLimitedGroup would be better than LimitGroup, since it's creating the LimitedGroup and not just wrapping an existing Group with limits.

@shizhMSFT
Copy link
Contributor

@abursavich @Wwwsylvia Do you also want to revisit internal/syncutil/limit.go for context cancellation?

@abursavich abursavich force-pushed the semerr branch 2 times, most recently from 78c5148 to 43c0ad6 Compare November 28, 2022 23:12
@abursavich
Copy link
Contributor Author

@abursavich @Wwwsylvia Do you also want to revisit internal/syncutil/limit.go for context cancellation?

I think that this type of concurrent code is tricky and you should probably use whatever you're most comfortable with. I'm more comfortable with for { select {} } loops that manage worker goroutines (kinda like this talk).

LimitedRegion has the ended field and guards all over the place to protect the Semaphore from being released when it isn't held. That's nice to avoid panics, but it makes me concerned that the code using it may be invalid.

Could probably tweak the Go function a little to use the egCtx and short-circuit on error:

func Go[T any](ctx context.Context, limiter *semaphore.Weighted, fn GoFunc[T], items ...T) error {
	eg, egCtx := errgroup.WithContext(ctx)
	for _, item := range items {
		region := LimitRegion(egCtx, limiter)
		if err := region.Start(); err != nil {
			eg.Go(func() error { return err })
			break
		}
		eg.Go(func(t T) func() error {
			return func() error {
				defer region.End()
				return fn(egCtx, region, t)
			}
		}(item))
	}
	return eg.Wait()
}

@TerryHowe
Copy link
Member

rebsae?

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
@abursavich
Copy link
Contributor Author

I've rebased this a couple times to keep it up to date with main. It's unclear to me if there are any more requested changes to this PR, or if the more recent discussion is about other related changes that should be made separately?

Copy link
Member

@Wwwsylvia Wwwsylvia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Wwwsylvia
Copy link
Member

Wwwsylvia commented Dec 6, 2022

I've rebased this a couple times to keep it up to date with main. It's unclear to me if there are any more requested changes to this PR, or if the more recent discussion is about other related changes that should be made separately?

LGTM. Let's wait for review from @shizhMSFT.

I disagree with the change to the constructor's name, but I made the change as requested.
LimitReader is taking in a Reader and limiting it, so its name is like a command "limit this reader". I think NewLimitedGroup would be better than LimitGroup, since it's creating the LimitedGroup and not just wrapping an existing Group with limits.

+1 for NewLimitedGroup.

Copy link
Contributor

@shizhMSFT shizhMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shizhMSFT shizhMSFT merged commit a5be49a into oras-project:main Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants