Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support inline Lua code in per filter config #12516

Merged
merged 10 commits into from
Sep 3, 2020

Conversation

wbpcode
Copy link
Member

@wbpcode wbpcode commented Aug 6, 2020

Signed-off-by: wbpcode comems@msn.com

Commit Message:
Additional Description:
This is a supplement to PR #11235. Check #12460 for more info.

Risk Level: Low
Testing:
Docs Changes:
Release Notes:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #12516 was opened by wbpcode.

see: more, trace.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is awesome. A few comments to get started.

/wait

source/extensions/filters/http/lua/config.cc Show resolved Hide resolved
PerLuaCodeSetup(const std::string& lua_code, ThreadLocal::SlotAllocator& tls);
PerLuaCodeSetup(const std::string& lua_code, ThreadLocal::SlotAllocator& tls,
Event::Dispatcher& dispatcher);
~PerLuaCodeSetup() { dispatcher_.deferredDelete(std::move(lua_state_)); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason you added deferred delete here? I think this part is a little tricky unfortunately.

Basically, I think it's possible for the route table to get removed from the main thread but still be stored by an active request. Thus, the request may hold that last reference to the route table and then when it's over it will destroy the route table and the per-lua code. I think this means that this may happen cross thread. Is deferred delete safe cross thread? (It may be I would need to check.) Anyway this is the part that needs very careful thought. Can you describe what you did here and add more comments so we can discuss?

Copy link
Member Author

@wbpcode wbpcode Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 'dispatcher_' belongs to the main thread. And add this deferred delete will move TheadLocalState(lua_state_) and destroy it on the main thread.

In this case, no matter on which thread the route table is destroyed, we can guarantee that TheadLocalState(lua_state_) will only be destroyed in the main thread.

In this way, we can ensure that the state of TLS will not be modified on the worker thread.

Copy link
Member Author

@wbpcode wbpcode Aug 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked the source code and found that deferredDelete itself is not thread-safe, and it cannot safe cross thread.

Perhaps we can simply use post callback to let the main thread destroy TheadLocalState(lua_state_).
Just like following code:


  ~PerLuaCodeSetup() {
    dispatcher_.post([state = std::move(lua_state_)]() {});
    // dispatcher_.deferredDelete(std::move(lua_state_));
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible this will work but the threading would need to be analyzed. Do you want to try this out and we can take a look?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will update it this weekend. My own branch also needs to verify this function in the production environment immediately.

@wbpcode
Copy link
Member Author

wbpcode commented Aug 20, 2020

It took more time to complete this work than expected. Especially to resolve TSAN and ASAN alarms. Now, it can accept review. An integration test is added to verify thread safety issues in the case of route table updates. @mattklein123

if (lua_state_.getGlobalRef(request_function_slot_) == LUA_REFNIL) {
PerLuaCodeSetup::PerLuaCodeSetup(const std::string& lua_code, ThreadLocal::SlotAllocator& tls,
Event::Dispatcher& dispatcher)
: lua_state_(new Filters::Common::Lua::ThreadLocalState{lua_code, tls}),
Copy link
Member

@dio dio Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question, for my own education, why do we need to have lua_state_ as a pointer here? Since this will be still owned by this class right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For thread safety, ThreadLocalState can only be destructured in the main thread. However, when the route table is dynamically updated, the old route configuration may be destroyed in a random worker thread. Its PerLuaCodeSetup will also be destructured in the corresponding worker thread. By converting lua_state_ to a pointer, we can push the destruction of lua_state_ into the main thread with a callback during PerLuaCodeSetup destructuring.

  virtual ~PerLuaCodeSetup() {
    if (dispatcher_.isThreadSafe()) {
      return;
    }
    dispatcher_.post([state = lua_state_.release()] { delete state; });
  }

Copy link
Member Author

@wbpcode wbpcode Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! I have a new idea. Thank you very much for this question @dio . It reminded me that there's actually a different way to think about making the code a little cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is ready for review can you remove [WIP] from the title also? Thank you.

/wait

Copy link
Member Author

@wbpcode wbpcode Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right. 🆗

Signed-off-by: wbpcode <comems@msn.com>
Signed-off-by: wbpcode <comems@msn.com>
wbpcode added 2 commits August 25, 2020 11:00
@wbpcode wbpcode changed the title [WIP] support inline Lua code in per filter config support inline Lua code in per filter config Aug 25, 2020
@wbpcode
Copy link
Member Author

wbpcode commented Aug 25, 2020

It's ready for a review. Thanks. @mattklein123 @dio

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is fantastic. Nicely done. Just a few comments.

/wait

// lua_State, some static variables are used to assist memory allocation. Creating lua_State in
// different threads at the same time may cause data race. Although 'LuaJIT' uses retries to
// ensure the final safety of the memory allocation, it may still trigger the TSAN alarm.
static Thread::MutexBasicLockable lua_thread_local_lock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential static fiasco here. Please use CONSTRUCT_ON_FIRST_USE


~FilterConfigPerRoute() override {
if (per_lua_code_setup_ptr_ && !main_thread_dispatcher_.isThreadSafe()) {
main_thread_dispatcher_.post([setup = per_lua_code_setup_ptr_.release()] { delete setup; });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is what is needed, but can you add more comments on why this is necessary? Technically I think it's possible this might leak during shutdown but it probably doesn't matter. If you want to avoid a leak no matter what, create a shared_ptr wrapper, put that in the lambda, and then move the unique_ptr inside of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will add some short comment to explain the necessity and try to fix the possible memory leak.

wbpcode added 2 commits August 31, 2020 15:15
@mattklein123
Copy link
Member

Filed #12896 for the CI issue. Sorry I think we have a unicode issue that needs to be fixed.

/wait

@mattklein123
Copy link
Member

Can you merge main? CI should be fixed. Thank you @asraa!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks looks great. One small comment/question for discussion.

/wait-any

Comment on lines 73 to 75
// different threads at the same time may cause data race. Although 'LuaJIT' uses retries to
// ensure the final safety of the memory allocation, it may still trigger the TSAN alarm.
Thread::LockGuard guard(luaThreadLocalLock());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to avoid the lock here, should we just disable the relevant tests under TSAN with a comment? I'm fine either way. cc @dio @lizan

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to avoid the lock here, should we just disable the relevant tests under TSAN with a comment? I'm fine either way. cc @dio @lizan

Yes. Only one test need to be disabled under TSAN. I think both would be acceptable, depending on what the community has to say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah let's just do that. Please have a good comment on why we are disabling TSAN for the test.

/wait

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok 😄

Signed-off-by: wbpcode <comems@msn.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants