Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to address #3918, reuse a container on applicationError #3941

Conversation

tysonnorris
Copy link
Contributor

Reuse a container on applicationError (graceful error from action), only during /run (any error during /init still destroys the container)

Description

In the Future handling of ContainerProxy.initializeAndRun(), this adjustment will allow container to NOT be destroyed when applicationError is returned from the /run request.

This affects warm container reuse in the case where an anticipated error should produce an error result, but should have no impact on container reuse (container is still valid to process next activation).
Example use case is parameter validation where some sanity check is performed at the start of the action code, and immediately returns {error: "invalid parameter"}.

This will

  • NOT affect the API/client results (error is still seen on any error from container)
  • improve container reuse (since preemptively failing based on user input will no longer cause container destruction)

I considered, but did NOT implement, a change to WhiskActivation to add a field that indicates /init failure - since I think it can only be detected in activation logs whether the applicationError response was during /init vs /run, and it may be useful for action developers to have easier access to this bit of info. Sone of the code is convoluted to track the distinction between applicationError on /init and applicationError on /run (e.g. WhiskActivation becomes(WhiskActivation, Boolean)).

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • [] Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

case Left(error) => Future.failed(error)
case Right(act) => Future.successful(act)
//if non-successful, init failures should fail, and non-applicationErrors should also fail
case Right((act, initFailure)) if !act.response.isSuccess && (initFailure || !act.response.isApplicationError) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this a little bit more straightforward and reduce the size of the change, does it make sense to change https://github.com/apache/incubator-openwhisk/blob/master/common/scala/src/main/scala/whisk/core/containerpool/Container.scala#L114 to be a containerError? That way, I think all init failures are non-application errors and thus you can just check for isSuccess | isApplicationError here vs. having to thread the extra boolean flag through.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable! 👍

@codecov-io
Copy link

codecov-io commented Aug 3, 2018

Codecov Report

Merging #3941 into master will decrease coverage by 4.86%.
The diff coverage is 91.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3941      +/-   ##
==========================================
- Coverage   85.61%   80.75%   -4.87%     
==========================================
  Files         147      146       -1     
  Lines        7107     7058      -49     
  Branches      429      418      -11     
==========================================
- Hits         6085     5700     -385     
- Misses       1022     1358     +336
Impacted Files Coverage Δ
...ain/scala/whisk/core/containerpool/Container.scala 80.3% <100%> (ø) ⬆️
...cala/whisk/core/containerpool/ContainerProxy.scala 93.78% <100%> (-0.04%) ⬇️
...ain/scala/whisk/core/entity/ActivationResult.scala 96.92% <87.5%> (ø) ⬆️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.1%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
...scala/whisk/core/containerpool/ContainerPool.scala 89.41% <0%> (-10.59%) ⬇️
...src/main/scala/whisk/core/entity/Attachments.scala 83.33% <0%> (-5.56%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e9b245...6b9f24b. Read the comment docs.

@rabbah rabbah added invoker review Review for this PR has been requested and yet needs to be done. labels Aug 4, 2018
Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding a test!

collector.calls should have size 2
container.suspendCount shouldBe 0
acker.calls should have size 2
store.calls should have size 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a remove check that equals 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! done

Copy link
Member

@dubee dubee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG2 3484 ⏳

Copy link
Member

@rabbah rabbah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but have a question about the new test, I don't understand the active ack checks.

acker.calls should have size 2
store.calls should have size 2

val initErrorActivation = acker.calls(0)._2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you testing here? I don't follow the rest of the checks here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the same (except the first run failure) assertions as in the run an action and continue with a next run without pausing the container test; I added a comment to indicate this

@rabbah rabbah mentioned this pull request Aug 8, 2018
21 tasks
@dubee
Copy link
Member

dubee commented Aug 9, 2018

@csantanapr any comments on this one?

@mgencur
Copy link
Contributor

mgencur commented Aug 10, 2018

Hi, I'm wondering what's the rationale behind throwing different types of errors when there's a timeout during init and run phases. Some thoughts:

  • how is timeout during initialization different from the timeout during run that we want to treat it differently?
  • can the timeout during init be caused by something else than a developer writing the function in a wrong way?
  • when I look at the code and the variables it makes sense: ContainerError during init phase and ApplicationError during run phase, but then later the ContainerError translates to "action developer error" which starts to get confusing
  • if we want to treat init and run timeouts differently should we also have an option to set different timeouts for these phases?
  • anyway, I'd vote for simplicity - one timeout, one type of error, it gets a little too complicated
    Thanks

@markusthoemmes
Copy link
Contributor

Good questions @mgencur.

I agree, a timeout on run should be made a container-error (aka action-developer-error) as well. We can see it as an uncaught exception as the action can and should be aware of its own time limits.

On the containerError -> application-developer-error confusion: I agree, feel free to rename the internal methods as you see fit (developerError maybe fo readability?). The translation is weird and non-obvious.

@rabbah
Copy link
Member

rabbah commented Aug 10, 2018

It’s better - would you want to reuse the container if it timed out? You’d end up with a concurrent activation for example. +1

@tysonnorris
Copy link
Contributor Author

Is there a test existing for timeout on run? I don't see one but will try to add it (please let me know if you think it already exists)

I can take a stab at renaming containerError -> developerError, but I think this is also used for docker and other host-related errors:

  • memory errors
  • container failed to start

I'm not sure these are strictly under control of developer - does that matter?

@rabbah
Copy link
Member

rabbah commented Aug 10, 2018

I think there is such a test I’ll try to find it.
For the second part I don’t think it matters.

@tysonnorris tysonnorris reopened this Aug 10, 2018
@tysonnorris
Copy link
Contributor Author

All the tests are updated; let me know if you have other comments?

@markusthoemmes
Copy link
Contributor

markusthoemmes commented Aug 23, 2018

Whoops this fell through the cracks, sorry. Any last words @rabbah?

PG1 3270 ⌛️

@markusthoemmes
Copy link
Contributor

@tysonnorris can you please rebase this to the latest and greatest?

@drcariel
Copy link
Contributor

drcariel commented Aug 28, 2018

@tysonnorris is this CLI PR the extent of the changes needed for the CLI to facilitate this incubator PR? or do I need to worry about all the pre-existing ApplicationError logic?
apache/openwhisk-cli#364

@tysonnorris
Copy link
Contributor Author

@drcariel AFAIK ApplicationError logic should remain as-is, I don't see any logic around ContainerError (now DeveloperError), so I think cli should be fine (aside from the tests, fixed in your PR). Thanks!

@markusthoemmes markusthoemmes merged commit 1515e41 into apache:master Sep 5, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
Fixes apache#3918

Renamed `ActivationResponse.containerError` -> `ActivationResponse.developerError`

* generate ApplicationResponse.containerError during failed init (instead of ApplicationResponse.applicationError)

* timeout on run now produces `ActivationResponse.containerError`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invoker review Review for this PR has been requested and yet needs to be done.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants