Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive:Tenant's readyS(*ReadyStorage) is not initialized when perform concurrent call getOrLoadTenant #4534

Closed
wants to merge 2 commits into from

Conversation

imneov
Copy link

@imneov imneov commented Aug 8, 2021

Prior to this fix, ThanosReceive would return a tenant with a uninitialized readyS(*ReadyStorage).
This would trigger a ErrNotReady("TSDB not ready") error. Like #4533.

There are two problems that need to be fixed:

  1. When getOrLoadTenant() is called concurrently, the lock is released prematurely. This will cause the readyS (*ReadyStorage) of the tenant we returned to not be initialized.

    t.mtx.Unlock()

  2. When we accept Write requests, we cannot use non-block requests, which will also cause the readyS (*ReadyStorage) of the tenant we returned to not be initialized.

This PR use defer t.mtx.Unlock() released lock to fix getOrLoadTenant()

This PR always call getOrLoadTenant in blocking mode.

Fixes #4533

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

Yes, I add a unittest

for i := 0; i < 3; i++ {
wg.Add(1)
go func(index int) {
func(tt *tenant, err error){
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SA4009: argument err is overwritten before first use
(at-me in a reply with help or ignore)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore

@imneov
Copy link
Author

imneov commented Aug 8, 2021

Bug trigger conditions:

  1. Tenant's data files is not exist in the local data directory. This will not fill the MultiTSDB's tenants map.
  2. Concurrent Write. This will concurrent call getOrLoadTenant().

…concurrent getOrLoadTenant

Signed-off-by: imneov <grantliu@yunify.com>
… handle prometheus write request

Signed-off-by: imneov <grantliu@yunify.com>
@imneov
Copy link
Author

imneov commented Aug 8, 2021

@brancz @bwplotka @squat

@stale
Copy link

stale bot commented Oct 16, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Oct 16, 2021
@stale stale bot closed this Oct 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant