Skip to content

Conversation

@oknet
Copy link
Member

@oknet oknet commented Feb 2, 2019

No description provided.

@oknet oknet added the HostDB label Feb 2, 2019
@oknet oknet added this to the 9.0.0 milestone Feb 2, 2019
@oknet oknet self-assigned this Feb 2, 2019
loop = false; // Only loop on explicit set for retry.
// find the partition lock
//
// TODO: Could we reuse the "mutex" above safely? I think so but not sure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should delete this comment, it's clearly not safe because the lock on line 619 is scoped and has been dropped at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted.

if (!aforce_dns) {
MUTEX_TRY_LOCK(lock, cont->mutex, thread);
if (!lock.is_locked()) {
goto Ldelay;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very much not fond of goto. In this case, I'd be tempted to change the do;while(loop) into a while (loop). E.g. something like

MUTEX_TRY_LOCK(...);
bool loop = lock.is_locked();
while (loop) {
  loop = false; // ...
  // ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per your suggestion.

@oknet oknet force-pushed the i4916 branch 2 times, most recently from 613d359 to 784caa6 Compare February 3, 2019 06:06
SolidWallOfCode
SolidWallOfCode previously approved these changes Feb 3, 2019
Copy link
Member

@SolidWallOfCode SolidWallOfCode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@scw00
Copy link
Member

scw00 commented Feb 4, 2019

Could it be dead lock ?

See

Lretry:
// Otherwise, create a continuation to do a deeper probe in the background
//
HostDBContinuation *c = hostDBContAllocator.alloc();
HostDBContinuation::Options opt;
opt.timeout = dns_lookup_timeout;
opt.force_dns = aforce_dns;
opt.cont = cont;
opt.host_res_style = host_res_style;
c->init(hash, opt);
SET_CONTINUATION_HANDLER(c, (HostDBContHandler)&HostDBContinuation::probeEvent);
thread->schedule_in(c, MUTEX_RETRY_DELAY);
return &c->action;

I'm not sure here, But the HostDBContinuation use the same bucket lock. And it grab the bucket lock first then get continuation's lock (which equals to action.mutex). I know that the the cont will run after getby but it would be better to comment it if my understanding is correctly.

@SolidWallOfCode
Copy link
Member

I think not. The locking is try, not hard locks, which prevents deadlock. Any continuation that gets one lock but not the other will drop the acquired lock rather than waiting on the second lock.

scw00
scw00 previously approved these changes Feb 4, 2019
Copy link
Member

@scw00 scw00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks +1 ..

@oknet
Copy link
Member Author

oknet commented Feb 11, 2019

@scw00 @SolidWallOfCode

Rebased to resolve merge conflicts with #4926.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants