Skip to content

Conversation

@oknet
Copy link
Member

@oknet oknet commented Feb 11, 2019

No description provided.

@oknet oknet self-assigned this Feb 11, 2019
@oknet oknet added the HostDB label Feb 11, 2019
@oknet oknet added this to the 9.0.0 milestone Feb 11, 2019
@oknet oknet force-pushed the i4961 branch 2 times, most recently from 8e9cc11 to bb23a5d Compare February 11, 2019 13:22
scw00
scw00 previously approved these changes Feb 12, 2019
Copy link
Member

@scw00 scw00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ..

@oknet
Copy link
Member Author

oknet commented Feb 12, 2019

The workflow after PR #4916
dns-hostdb-pr4916

The workflow with this PR #4961
dns-hostdb-pr4961

if (opt.flags & HOSTDB_FORCE_DNS_ALWAYS) {
force_dns = true;
} else if (opt.flags & HOSTDB_FORCE_DNS_RELOAD) {
force_dns = (hostdb_re_dns_on_reload ? true : false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of alternatives.

force_dns = host_db_re_dns_on_reload;

or

if (hostdb_re_dns_on_reload) {
  force_dns = true;
  HOST_DB_INCREMENT_DYN_STAT(..);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


if (!lock2.is_locked() && !trylock) {
// Refer to: [TS-374](http://issues.apache.org/jira/browse/TS-374)
lock2.acquire(thread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a deadlock concern, if the thread blocks on acquiring a lock.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just rewrote SCOPED_MUTEX_LOCK(lock, bucket_mutex, thread);.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the TS-374 commit via ba3b7d0.

It solves a performance issue of reverse proxy.

@SolidWallOfCode
Copy link
Member

Overall I like the cleanup, but I'm quite concerned about the blocking lock. I don't see why

(cont->*cb_process_result)(r.get())

cannot be bound in to the HostDBContinuation if the lock isn't acquired. The continuation can be given the bucket mutex as its mutex, then it need merely try for lock2 to call the function.

@oknet
Copy link
Member Author

oknet commented Feb 13, 2019

@SolidWallOfCode

IMO, the (cont->*cb_process_result)(r.get()) may be a fast path than reply_to_cont(cont, r).

@SolidWallOfCode
Copy link
Member

Clearly, which is the reason to do a try lock. But if that fails, hard locking and creating blockage from lock contention is the slow path.

@oknet
Copy link
Member Author

oknet commented Feb 14, 2019

@SolidWallOfCode
May I merge this PR first? Because it does not change the logic.
And then open a new issue to discuss the TS-374 (commit ba3b7d0) and (cont->*cb_process_result)(r.get()).

The SCOPED_MUTEX_LOCK and (cont->*cb_process_result)(r.get()) are the only differences between the getbyname_re and the getbyname_imm interfaces.

@oknet
Copy link
Member Author

oknet commented Feb 15, 2019

[approve ci autest]

@vmamidi
Copy link
Contributor

vmamidi commented Feb 15, 2019

@oknet I like the code cleanup but i think we might still encounter deadlock like this( #1335 ) even after the cleanup. #1756 avoids such scenarios but it doesn't prevent anyone from accidentally causing a dead lock similar to ( #1335 ).

@oknet
Copy link
Member Author

oknet commented Feb 21, 2019

@vmamidi

I will find a new solution to the TS-374 and replace the commit ba3b7d0.

It will use MUTEX_TRY_LOCK completely to avoid the deadlocks.

@oknet
Copy link
Member Author

oknet commented Feb 21, 2019

@SolidWallOfCode @vmamidi
I merged this and open an issue #5043 to discuss the deadlock problem and the solution.

@oknet oknet merged commit 2906f16 into apache:master Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants