to address #3918, reuse a container on applicationError #3941

tysonnorris · 2018-08-02T21:46:46Z

Reuse a container on applicationError (graceful error from action), only during /run (any error during /init still destroys the container)

Description

In the Future handling of ContainerProxy.initializeAndRun(), this adjustment will allow container to NOT be destroyed when applicationError is returned from the /run request.

This affects warm container reuse in the case where an anticipated error should produce an error result, but should have no impact on container reuse (container is still valid to process next activation).
Example use case is parameter validation where some sanity check is performed at the start of the action code, and immediately returns {error: "invalid parameter"}.

This will

NOT affect the API/client results (error is still seen on any error from container)
improve container reuse (since preemptively failing based on user input will no longer cause container destruction)

I considered, but did NOT implement, a change to WhiskActivation to add a field that indicates /init failure - since I think it can only be detected in activation logs whether the applicationError response was during /init vs /run, and it may be useful for action developers to have easier access to this bit of info. Sone of the code is convoluted to track the distinction between applicationError on /init and applicationError on /run (e.g. WhiskActivation becomes(WhiskActivation, Boolean)).

Related issue and scope

I opened an issue to propose and discuss this change (on "application error", return success=false, but should not destroy the container #3918 )

My changes affect the following components

Types of changes

[] Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

markusthoemmes · 2018-08-03T06:29:03Z

core/invoker/src/main/scala/whisk/core/containerpool/ContainerProxy.scala

-      case Left(error)                           => Future.failed(error)
-      case Right(act)                            => Future.successful(act)
+      //if non-successful, init failures should fail, and non-applicationErrors should also fail
+      case Right((act, initFailure)) if !act.response.isSuccess && (initFailure || !act.response.isApplicationError) =>


To make this a little bit more straightforward and reduce the size of the change, does it make sense to change https://github.com/apache/incubator-openwhisk/blob/master/common/scala/src/main/scala/whisk/core/containerpool/Container.scala#L114 to be a containerError? That way, I think all init failures are non-application errors and thus you can just check for isSuccess | isApplicationError here vs. having to thread the extra boolean flag through.

WDYT?

Sounds reasonable! 👍

codecov-io · 2018-08-03T19:12:42Z

Codecov Report

Merging #3941 into master will decrease coverage by 4.86%.
The diff coverage is 91.3%.

@@            Coverage Diff             @@
##           master    #3941      +/-   ##
==========================================
- Coverage   85.61%   80.75%   -4.87%     
==========================================
  Files         147      146       -1     
  Lines        7107     7058      -49     
  Branches      429      418      -11     
==========================================
- Hits         6085     5700     -385     
- Misses       1022     1358     +336

Impacted Files	Coverage Δ
...ain/scala/whisk/core/containerpool/Container.scala	`80.3% <100%> (ø)`	⬆️
...cala/whisk/core/containerpool/ContainerProxy.scala	`93.78% <100%> (-0.04%)`	⬇️
...ain/scala/whisk/core/entity/ActivationResult.scala	`96.92% <87.5%> (ø)`	⬆️
...core/database/cosmosdb/RxObservableImplicits.scala	`0% <0%> (-100%)`	⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala	`0% <0%> (-95.1%)`	⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala	`0% <0%> (-92.6%)`	⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala	`0% <0%> (-81.82%)`	⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala	`0% <0%> (-58.83%)`	⬇️
...scala/whisk/core/containerpool/ContainerPool.scala	`89.41% <0%> (-10.59%)`	⬇️
...src/main/scala/whisk/core/entity/Attachments.scala	`83.33% <0%> (-5.56%)`	⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e9b245...6b9f24b. Read the comment docs.

markusthoemmes

LGTM, thanks for adding a test!

markusthoemmes · 2018-08-06T08:12:17Z

tests/src/test/scala/whisk/core/containerpool/test/ContainerProxyTests.scala

+      collector.calls should have size 2
+      container.suspendCount shouldBe 0
+      acker.calls should have size 2
+      store.calls should have size 2


Should there be a remove check that equals 0?

dubee

PG2 3484 ⏳

rabbah

LGTM but have a question about the new test, I don't understand the active ack checks.

rabbah · 2018-08-07T19:13:08Z

tests/src/test/scala/whisk/core/containerpool/test/ContainerProxyTests.scala

+      acker.calls should have size 2
+      store.calls should have size 2
+
+      val initErrorActivation = acker.calls(0)._2


what are you testing here? I don't follow the rest of the checks here.

These are the same (except the first run failure) assertions as in the run an action and continue with a next run without pausing the container test; I added a comment to indicate this

dubee · 2018-08-09T19:24:00Z

@csantanapr any comments on this one?

mgencur · 2018-08-10T07:28:45Z

Hi, I'm wondering what's the rationale behind throwing different types of errors when there's a timeout during init and run phases. Some thoughts:

how is timeout during initialization different from the timeout during run that we want to treat it differently?
can the timeout during init be caused by something else than a developer writing the function in a wrong way?
when I look at the code and the variables it makes sense: ContainerError during init phase and ApplicationError during run phase, but then later the ContainerError translates to "action developer error" which starts to get confusing
if we want to treat init and run timeouts differently should we also have an option to set different timeouts for these phases?
anyway, I'd vote for simplicity - one timeout, one type of error, it gets a little too complicated
Thanks

markusthoemmes · 2018-08-10T09:01:39Z

Good questions @mgencur.

I agree, a timeout on run should be made a container-error (aka action-developer-error) as well. We can see it as an uncaught exception as the action can and should be aware of its own time limits.

On the containerError -> application-developer-error confusion: I agree, feel free to rename the internal methods as you see fit (developerError maybe fo readability?). The translation is weird and non-obvious.

rabbah · 2018-08-10T09:06:15Z

It’s better - would you want to reuse the container if it timed out? You’d end up with a concurrent activation for example. +1

tysonnorris · 2018-08-10T19:51:34Z

Is there a test existing for timeout on run? I don't see one but will try to add it (please let me know if you think it already exists)

I can take a stab at renaming containerError -> developerError, but I think this is also used for docker and other host-related errors:

memory errors
container failed to start

I'm not sure these are strictly under control of developer - does that matter?

rabbah · 2018-08-10T19:52:50Z

I think there is such a test I’ll try to find it.
For the second part I don’t think it matters.

tysonnorris · 2018-08-10T19:56:15Z

I guess this is the test: https://github.com/apache/incubator-openwhisk/blob/8f1dc2e9c848deb9f72da5b636895cbc5c565adb/tests/src/test/scala/whisk/core/containerpool/docker/test/DockerContainerTests.scala#L477

rabbah · 2018-08-10T19:56:29Z

See
https://github.com/apache/incubator-openwhisk/blob/f6046721acd801ea1777cb5cb040dfaa6922a18e/tests/src/test/scala/whisk/core/limits/MaxActionDurationTests.scala

tysonnorris · 2018-08-13T22:28:39Z

All the tests are updated; let me know if you have other comments?

markusthoemmes · 2018-08-23T09:13:40Z

Whoops this fell through the cracks, sorry. Any last words @rabbah?

PG1 3270 ⌛️

markusthoemmes · 2018-08-24T10:17:04Z

@tysonnorris can you please rebase this to the latest and greatest?

…turned during /run

…ad of ApplicationResponse.applicationError)

…eloperError`

…vationResponse.developerError)

drcariel · 2018-08-28T18:17:50Z

@tysonnorris is this CLI PR the extent of the changes needed for the CLI to facilitate this incubator PR? or do I need to worry about all the pre-existing ApplicationError logic?
apache/openwhisk-cli#364

tysonnorris · 2018-08-28T18:37:54Z

@drcariel AFAIK ApplicationError logic should remain as-is, I don't see any logic around ContainerError (now DeveloperError), so I think cli should be fine (aside from the tests, fixed in your PR). Thanks!

Fixes apache#3918 Renamed `ActivationResponse.containerError` -> `ActivationResponse.developerError` * generate ApplicationResponse.containerError during failed init (instead of ApplicationResponse.applicationError) * timeout on run now produces `ActivationResponse.containerError`

tysonnorris requested a review from markusthoemmes August 2, 2018 21:46

markusthoemmes reviewed Aug 3, 2018

View reviewed changes

rabbah assigned markusthoemmes Aug 4, 2018

rabbah added invoker review Review for this PR has been requested and yet needs to be done. labels Aug 4, 2018

markusthoemmes approved these changes Aug 6, 2018

View reviewed changes

dubee reviewed Aug 7, 2018

View reviewed changes

rabbah reviewed Aug 7, 2018

View reviewed changes

rabbah mentioned this pull request Aug 8, 2018

Docs fixes for action invocations #3951

Merged

21 tasks

tysonnorris closed this Aug 10, 2018

tysonnorris reopened this Aug 10, 2018

rabbah approved these changes Aug 24, 2018

View reviewed changes

tysonnorris added 6 commits August 24, 2018 11:48

to address apache#3918, reuse a container when applicationError is re…

54bbbbf

…turned during /run

generate ApplicationResponse.containerError during failed init (inste…

4e690de

…ad of ApplicationResponse.applicationError)

update tests to check for ContainerError

6961c99

update tests to check for ContainerError

0309233

update tests to check for ContainerError

99e39d2

assert destroyCount == 0 (for ApplicationError)

06d1748

tysonnorris added 6 commits August 24, 2018 11:49

added comment for new test assertions

3ede7cb

timeout on run now produces ActivationResponse.containerError

9b6e4b4

rename ActivationResponse.containerError -> `ActivationResponse.dev…

5cf9106

…eloperError`

timeout on run now produces ActivationResponse.containerError (Acti…

9407d47

…vationResponse.developerError)

timeout on run now produces ActivationResponse.containerError (Acti…

94a6f48

…vationResponse.developerError)

timeout on run now produces ActivationResponse.containerError (Acti…

6b9f24b

…vationResponse.developerError)

tysonnorris force-pushed the reuse-container-on-application-error branch from 544d94b to 6b9f24b Compare August 24, 2018 18:49

drcariel mentioned this pull request Aug 28, 2018

use DeveloperError in place of ContainerError apache/openwhisk-cli#364

Merged

markusthoemmes merged commit 1515e41 into apache:master Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to address #3918, reuse a container on applicationError #3941

to address #3918, reuse a container on applicationError #3941

tysonnorris commented Aug 2, 2018

markusthoemmes Aug 3, 2018

tysonnorris Aug 4, 2018

codecov-io commented Aug 3, 2018 •

edited

Loading

markusthoemmes left a comment

markusthoemmes Aug 6, 2018

tysonnorris Aug 7, 2018

dubee left a comment •

edited

Loading

rabbah left a comment

rabbah Aug 7, 2018

tysonnorris Aug 7, 2018

dubee commented Aug 9, 2018

mgencur commented Aug 10, 2018 •

edited

Loading

markusthoemmes commented Aug 10, 2018

rabbah commented Aug 10, 2018

tysonnorris commented Aug 10, 2018

rabbah commented Aug 10, 2018

tysonnorris commented Aug 10, 2018

rabbah commented Aug 10, 2018

tysonnorris commented Aug 13, 2018

markusthoemmes commented Aug 23, 2018 •

edited

Loading

markusthoemmes commented Aug 24, 2018

drcariel commented Aug 28, 2018 •

edited

Loading

tysonnorris commented Aug 28, 2018

to address #3918, reuse a container on applicationError #3941

to address #3918, reuse a container on applicationError #3941

Conversation

tysonnorris commented Aug 2, 2018

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

markusthoemmes Aug 3, 2018

Choose a reason for hiding this comment

tysonnorris Aug 4, 2018

Choose a reason for hiding this comment

codecov-io commented Aug 3, 2018 • edited Loading

Codecov Report

markusthoemmes left a comment

Choose a reason for hiding this comment

markusthoemmes Aug 6, 2018

Choose a reason for hiding this comment

tysonnorris Aug 7, 2018

Choose a reason for hiding this comment

dubee left a comment • edited Loading

Choose a reason for hiding this comment

rabbah left a comment

Choose a reason for hiding this comment

rabbah Aug 7, 2018

Choose a reason for hiding this comment

tysonnorris Aug 7, 2018

Choose a reason for hiding this comment

dubee commented Aug 9, 2018

mgencur commented Aug 10, 2018 • edited Loading

markusthoemmes commented Aug 10, 2018

rabbah commented Aug 10, 2018

tysonnorris commented Aug 10, 2018

rabbah commented Aug 10, 2018

tysonnorris commented Aug 10, 2018

rabbah commented Aug 10, 2018

tysonnorris commented Aug 13, 2018

markusthoemmes commented Aug 23, 2018 • edited Loading

markusthoemmes commented Aug 24, 2018

drcariel commented Aug 28, 2018 • edited Loading

tysonnorris commented Aug 28, 2018

codecov-io commented Aug 3, 2018 •

edited

Loading

dubee left a comment •

edited

Loading

mgencur commented Aug 10, 2018 •

edited

Loading

markusthoemmes commented Aug 23, 2018 •

edited

Loading

drcariel commented Aug 28, 2018 •

edited

Loading