Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send a queue removed message to the queue manager #5391

Merged

Conversation

style95
Copy link
Member

@style95 style95 commented Mar 30, 2023

This is to handle the issue reported in #5388 in another way.

Description

Related issue and scope

  • I opened an issue to propose and discuss this change (#????)

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-commenter
Copy link

codecov-commenter commented Mar 30, 2023

Codecov Report

Merging #5391 (c1c8ca1) into master (60ca660) will decrease coverage by 0.27%.
The diff coverage is 100.00%.

❗ Current head c1c8ca1 differs from pull request most recent head 3e99442. Consider uploading reports for the commit 3e99442 to get more accurate results

@@            Coverage Diff             @@
##           master    #5391      +/-   ##
==========================================
- Coverage   76.91%   76.64%   -0.27%     
==========================================
  Files         240      240              
  Lines       14588    14595       +7     
  Branches      629      636       +7     
==========================================
- Hits        11221    11187      -34     
- Misses       3367     3408      +41     
Impacted Files Coverage Δ
...ntainerpool/v2/FunctionPullingContainerProxy.scala 78.65% <100.00%> (+0.17%) ⬆️
...e/openwhisk/core/scheduler/queue/MemoryQueue.scala 80.99% <100.00%> (-1.42%) ⬇️

... and 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bdoyle0182
Copy link
Contributor

This LGTM we need this as well but I think it's important to have the additional safeguard from the state timeout never occurring in the removing state creating an activation cycle such that the queue is never shut down so will still go ahead with merging that as well if you approve it.

@bdoyle0182
Copy link
Contributor

The edge case also occurs from this event

    case Event(StateTimeout, _: NoActors) =>
      logging.info(this, s"[$invocationNamespace:$action:$stateName] The queue is timed out, stop the queue.")
      cleanUpDataAndGotoRemoved()

which this change wouldn't cover.

I think unit tests will need to be updated to make sure these changes don't have any side effects where sending queue removed to the parent occurs too early in the removal process

@style95
Copy link
Member Author

style95 commented Mar 31, 2023

@bdoyle0182
My bad, I added another missing part.

So both cleanUpDataAndGotoRemoved() and cleanUpActorsAndGotoRemoved() should send a queue removed message to the queue manager.

@bdoyle0182
Copy link
Contributor

Since this will be called from every shutdown case, I just want to make sure that there aren't any new race conditions created. Would it be possible for the queue removed message to be sent twice if it's called from here now and it was sent from another shutdown case such that the queue is removed from the manager map and then a new queue is created and this fsm sends another queue removed message removing it from the parent map after the new queue is created? I'm not sure if that could happen or matters, but want to check

@bdoyle0182
Copy link
Contributor

Will approve this so long as we're confident sending queue removed to the parent from these locations cannot create a new race condition.

@style95
Copy link
Member Author

style95 commented Apr 3, 2023

Once the queue manager receives a queue removed message, it will just remove the corresponding reference from queue pool and send an ack back to the sender.
If there is any logic to re-add the reference to the queue pool then there might be a race condition, but it always removes the reference and no race condition happens.
https://github.com/apache/openwhisk/blob/master/core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/QueueManager.scala#L204

One concern could be multiple acks sent from the queue manager, but it will also be handled by the wildcard case if there is no match for a certain state.
https://github.com/apache/openwhisk/blob/master/core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/MemoryQueue.scala#L670

@bdoyle0182
Copy link
Contributor

LGTM

@bdoyle0182 bdoyle0182 merged commit cb3b64f into apache:master Apr 4, 2023
mtt-merz pushed a commit to mtt-merz/openwhisk that referenced this pull request Oct 22, 2023
* Send a queue removed message to the queue manager when cleaning up the actor and data

* Add one more missing part

* Fix test cases

(cherry picked from commit cb3b64f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants