Skip to content

Conversation

@masaori335
Copy link
Contributor

@masaori335 masaori335 commented Jan 29, 2020

In Http2Stream.cc, below (bit simplified) code appear 5 times. However, this doesn't work as expected when the try lock is not locked.

MUTEX_TRY_LOCK(lock, read_vio.mutex, this_ethread());
if (lock.is_locked()) {
    read_vio.cont->handleEvent(event, &read_vio);
} else {
    this_ethread()->schedule_imm(read_vio.cont, event, &read_vio);
}

The third argument of schedule_imm() is void *cookie and it is assigned to Event::cookie. When the event is processed, the second argument of handleEvent() is Event *.
OTOH, the event handler of the read_vio.cont (HttpTunnel or HttpSM) assume the second void * is VIO *.
This means if we want to pass VIO *, we need to call handleEvent() directly.

if ((p = get_producer(static_cast<VIO *>(data))) != nullptr) {

vc_entry = vc_table.find_entry(static_cast<VIO *>(data));

When ATS falls into the case, the event is ignored and the stream might be stalled.

@masaori335 masaori335 added this to the 10.0.0 milestone Jan 29, 2020
@masaori335 masaori335 self-assigned this Jan 29, 2020
@maskit
Copy link
Member

maskit commented Jan 29, 2020

I understand what problem you are trying to solve, but I'm not sure we should solve it in this way. Doesn't it affect performance? Maybe it doesn't matter because the mechanism is broken already though.

I'd keep the asynchronous mechanism and find a way to pass data correctly.

@shinrich
Copy link
Member

shinrich commented Feb 6, 2020

Very interesting. I've looked at this code many times, but it never hit me that the scheduled handle event is passing in the event rather than the event->cookie.

It would be good to figure this out in general to avoid deadlocking. There are doubtless other places in the code making this same flawed assumption. Perhaps the scheduled handleEvent call should just always pass in the e->cookie. Of course would need to do a deeper analysis of the code to determine the impact of that change.

@shinrich
Copy link
Member

shinrich commented Feb 6, 2020

It seems that the schedule operations in Http2ClientSession have the same issue.

@vmamidi
Copy link
Contributor

vmamidi commented Feb 6, 2020

Although the fix makes sense, I am concerned that a thread might be blocked if we have any other cases doing the same thing.

Copy link
Contributor

@vmamidi vmamidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but we should also look at other places for similar issues to avoid any deadlocks.

@shinrich
Copy link
Member

shinrich commented Feb 7, 2020

I think doing using the TSHttpSsnCallback class in TSHttpSsnReenable() would be the safe thing to do if the lock cannot be immediately acquired. A new continuation is created using the same mutex as the original target continuation. The only purpose of the Callback continuation is to call the handler of the original target continuation with the original arguments and then clean itself up. With a reuse list of jemalloc and/or a proxy allocator list, the memory manipulation overhead should be minimal.

@maskit
Copy link
Member

maskit commented Feb 8, 2020

The code was introduced by 85c0211. All 7.x and 8.x releases should be affected.

Also, _switch_thread_if_not_on_right_thread might not safe neither. I'm not sure if there are components that call Http2Stream::handleEvent directly, but if there were, _switch_thread_if_not_on_right_thread might not work as expected. The function itself is relatively new but the logic came from the same commit above.

Always using handleEvent would work for HttpSM and HttpTunnel, but we cannot do the same for _switch_thread_if_not_on_right_thread.

@maskit
Copy link
Member

maskit commented Feb 8, 2020

ProxyTransaction::adjust_thread might cause the same issue. It's basically does the same thing as _switch_thread_if_not_on_right_thread.

@masaori335 masaori335 closed this Feb 12, 2020
@zwoop zwoop removed this from the 10.0.0 milestone Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants