Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design proposal: Event notification support for build and application status for IDE integration for devfile scenarios (#2550) #3177

Merged
143 changes: 143 additions & 0 deletions docs/proposals/event-notification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# ODO build and application status notification

## Summary

With this proposal and [it's related issue](https://github.com/openshift/odo/issues/2550) we examine how to consume build/run event output from odo in order to allow external tools (such as IDEs) to determine the application/build status of an odo-managed application.

In short, the proposal is:
- New flag for push command, `odo push -o json`: Outputs JSON events that correspond to what devfile actions/commands are being executed by push.
- New odo command `odo component status -o json`: Pings the application URLs, and checks the container/pod status every X seconds (and/or a watch, for Kubernetes case). The result of both is output as JSON to be used by external tools to determine if the application is running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if application doesn't have any URLs?
Also, what would be considered as pinging url? Just checking if the port is open or sending HTTP commands like GET or HEAD?

Wouldn't it be enough to just check if the application process is running?
That could be done easily, as it will run as a supervisord service. There could be additional information in the status message like if the URL as alive or not (using HTTP HEAD), but the primary source for the status would be if the process is running or not.

Copy link
Contributor Author

@jgwest jgwest May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good questions:

If the user is developing a web application, often the first metric of "is my application running?" is "can I see it in my browser when I refresh?". I attempt to capture this idea with this part of the proposal.

The algorithm I had considered would be:

  • For each odo-managed url:
    • issue a GET/HEAD request to the app domain root, for example, a component 'my-app' might be pinged at https://my-app.my-cluster-domain.com:8080
    • odo component status command to output a success to the console if ANY HTTP response code (2xx/3xx/4xx/5xx) is received, rather than only success HTTP codes, since what we are testing is whether the app is connectable (a failure would be no response before timeout).

Now this raises a few good questions:

  • Is it always safe (eg safe, non-destructive, immutable, idempotent) to issue GET/HEAD requests to the root of an any odo-managed application without the user's explicit acknowledgement? I had assumed yes (eg low risk), but am definitely open to further discussion here.
  • Will Kubernetes ingress controllers ALWAYS respond with an HTTP response, even if sent to an invalid cluster URL? If so, this would make this check not especially useful for the Kubernetes case...
  • Could we just remove this entire algorithm from the proposal and leave it up to the IDE to ping the URL if appropriate? After all, the IDE does have full knowledge over which URLs are exposed via odo... at the expense of slightly more work on the IDE-side.

As for the case where no application URLs are exposed (and there are many such applications): if no odo-managed URLs are defined, the algorithm will be a no-op, and no URL success/failure output will be output. This proposal does not require applications to define a URL.

RE: checking if the application process is running, I like the idea and is definitely preferable (and perhaps this could still be complemented by URL checking); I will discuss this further on the other comment where you mentioned this.

- A standardized markup format for communicating detailed build status, something like: `#devfile-status# {"buildStatus":"Compiling application"}`, to be optionally included in devfile scripts that wish to provide detailed build status to consuming tools.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 simple and powerful solution. I like it


## Terminology

With this issue, we are looking at adding odo support for allowing external tools to gather build status and application status. We further divide both statuses into detailed and non-detailed, with detailed principally being related to statuses that can be determine by looking at container logs.

**Build status**: A simple build status indicating whether the build is running (y/n), and whether the last build succeeded or failed. This can be determined based on whether odo is running a dev file action/command, and the process error code (0 = success, non-0 = failed) of that action/command.

**Detailed build status**: An indication of which step in the build process the build is in. For example: are we 'Compiling application', or are we 'Running unit tests'?

**App status**: Determine if an application is running, using container status (for both local and Kubernetes, various status: containercreating, started, restarting, etc), or whether an HTTP/S request to the root application URL returns an HTTP response with any status code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the use of the SupervisorD the application might not be running or falling and the container would be still running. It needs to check the application process to determine if the application is running or not.

Copy link
Contributor Author

@jgwest jgwest May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed: the use of container status here is an imperfect representation of the app status for that exact reason. It's still a useful data point (eg if the container ISN'T running. then the app is obviously not running), but if the container IS running it doesn't fully/necessarily imply the app is running (due to SupervisorD).

I like the idea of checking the application process itself, and my understanding is it is straightforward to check using supervisord ctl status, let me look into that further...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud: can we make use of k8s health checks for this and/or other status info that we intend to come up with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s health checks appear not to be defined in either odo devfile v1 or v2 (correct me if I'm wrong, I didn't see it), so this would be something to consider if/when these are added there. (And these would not apply if odo was running with docker target)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea, but I'm afraid that using k8s health checks might have unwanted consequences. I wouldn't want to use LivelinessProbe as once it starts failing Kubernetes will restart the Pod. Using ReadinessProbe might work, but if ReadinessProbe starts to fail it Kubernetes will stop sending traffic to the Service, It might not be what you want when developing an application.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kadel that leaves us with startup probes which, from the documentation, sound like what might fit in. However, it would only make sense to use those if integrating those won't be a problem. Otherwise, coming up with our own solution for that shouldn't be a hassle, I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A followup: as discussed earlier in the thread, supervisord ctl status can be used to verify that long-running processes inside the container are running as expected.

I have prototyped it and it works well, so I've updated the proposal to include these data as another event type produced by component status.


**Detailed application status**:
- While app status works well for standalone application frameworks (Go, Node Express, Spring Boot), it works less well for full server runtimes such as Java EE application servers like OpenLiberty/WildFly that may begin responding to Web requests before the user's deployed WAR/EAR application has finished startup.
- Since these application servers are built to serve multiple concurrently-deployed applications, it is more difficult to determine the status of any specific application running on them. The lifecycle of the application server differs from the lifecycle of the application running inside the application server.
- Fortunately, in these cases the IDE/consuming tool can use the console logs (from `odo log`) from the runtime container to determine a more detailed application status.
- For example, OpenLiberty (as an example of an application-server-style container) prints a specific code when an application is starting `CWWKZ0018I: Starting application {0}.`, and another when it has started. `CWWKZ0001I: Application {0} started in {1} seconds.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would odo know what log line means what? For example that CWWKZ0018I: Starting application {0}. means that the application is starting initialization. This is framework-specific knowledge and as such, it should be somewhere in devfile or in a Stack but not inside odo.

Copy link
Contributor Author

@jgwest jgwest May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, we definitely don't want odo to know anything about these codes/strings; it looks like the language I used in the proposal is imprecise re: who would use these data.

I have updated the proposal to clarify that it is the consuming tool's responsibility (for example, IDE) to know what these codes mean.. I added these lines:

Odo itself should NOT know anything about these specific application codes; knowing how these translate into a detailed application status would be the responsibility of the IDE/consuming tool. Odo's role here is only to provide the console output log.

In the future, we could add these codes into the devfile to give Odo itself some agency over determining the detailed application status, but for this initial proposal I am leaving this responsibility with the consuming tool.

Copy link
Member

@kadel kadel May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, we definitely don't want odo to know anything about these codes/strings; it looks like the language I used in the proposal is imprecise re: who would use these data.

I have updated the proposal to clarify that it is the consuming tool's responsibility (for example, IDE) to know what these codes mean.. I added these lines:

Maybe it was just my bad inpretetation. What you added makes this perfectly clear, thx.

- Odo itself should NOT know anything about these specific application codes; knowing how these translate into a detailed application status would be the responsibility of the IDE/consuming tool. Odo's role here is only to provide the console output log.
- In the future, we could add these codes into the devfile to give Odo itself some agency over determining the detailed application status, but for this proposal I am leaving this responsibility with the consuming tool.

**Devfile writer**: A devfile writer may be a runtime developer (for example, a Red-Hatter working on WildFly or an IBMer working on OpenLibery) creating a devfile for their organization's runtime (for example 'OpenLiberty w/ Maven' dev file), or an application developer creating/customizing a dev file for use with their own application. In either case, the devfile writer must be familiar with the semantics of both odo and the devfile.

## JSON-based odo command behaviours to detect app and build status

New odo commands and flags:
- `odo push -o json`
- `odo component status -o json`

With these two additions, an IDE or similar external tool can detect build running/succeeded/failed, application starting/started/stopping/stopped, and (in many cases) get a 'detailed app status' and/or 'detailed build status'.

### Build status notification via `odo push -o json`

`odo push -o json` is like standard `odo push`, but instead it outputs JSON events (including action console output) instead of text. This allows the internal state of the odo push process to be more easily consumed by external tools.

Several different types of JSON-formatted events would be output, which correspond to odo container command/action executions:
- Dev file command execution begun *(command name, start timestamp)*
- Dev file command execution completed *(error code, end timestamp)*
- Dev file action execution begun *(action name, parent command name, start timestamp)*
- Dev file action execution completed *(action name, error code, end timestamp)*
- Log/console stdout from the actions, one line at a time *(for example, `mvn build` output).* (timestamp)

(Exact details for which fields are included with events are TBD)

In addition, `odo push -o json` should return a non-zero error code if one of the actions returned a non-zero error code, otherwise zero is returned.

### `odo push -o json` example output

This is what an `odo push -o json` command invocation would look like:
```
odo push -o json
{ "devFileCommandExecutionBegun": { "commandName" : "build", "timestamp" : "(UTC unix epoch seconds.microseconds)" } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each line has a different JSON structure (different keys), this might make parsing in some languages (like Go) a little bit trickier. Not a huge problem, but just something to be aware of.

Copy link
Contributor Author

@jgwest jgwest May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true; here's my draft PR of part of this feature (note the PR has gone through zero review, so is subject to change) that contains the go structs used to marshal/unmarshall these entries: https://github.com/openshift/odo/blob/e8040649268c0e7624c5662e364fdb8607837b41/pkg/machineoutput/types_event_logging.go#L24

{ "devFileActionExecutionBegun" : { "commandName" : "build", "actionName" : "run-build-script", "timestamp" : "(...)" } }
{ "logText" : { "text:" "one line of text received\\n another line of text received", "timestamp" : "(...)" } }
{ "devFileActionExecutionComplete" : { "errorCode" : 1, ( same as above )} }
{ "logText" : { "text": (... ), "timestamp" : "(...)" } } # Log text is interleaved with events
{ "devFileCommandExecutionComplete": { "success" : false, (same as above) } }

(Exact details on event name, and JSON format are TBD; feedback welcome!)
```

These events allow an external odo-consuming tool to determine the build status of an application (build succeeded, build failed, build not running).
jgwest marked this conversation as resolved.
Show resolved Hide resolved

Note that unlike other machine-readable outputs used in odo, each individual line is a fully complete and parseable JSON document, allowing events to be streamed and processed by the consuming tool one-at-a-time, rather than waiting for all the events to be received before being parseable (which would be required if the entire console output was one single JSON document, as is the case for other odo machine-readable outputs.)

### Detailed build status via JSON+custom markup

For detailed build status, it is proposed that devfile writers may *optionally* include custom markup in their devfile actions which indicate a detailed build status:
- If a dev file writer wanted to communicate that the current command/action were compiling the application, they would insert a specific markup string (`#devfile-status#`) at the beginning of a console-outputted line, and then between those two fields would be a JSON object with a single field `buildStatus`:
- For example: `#devfile-status# {"buildStatus":"Compiling application"}` would then communicate that the detailed build status should be set to `Compiling application`.
- Since this line would be output as container stdout, it would be included as a `logText` JSON event, and the consuming tool can look for this markup string and parse the simple JSON to extract the detailed build status.
- Feedback welcome around exact markup text format.

The build step (running as a bash script, for example, invoked via an action) of a devfile might then look like this:
```
#!/bin/bash
(...)
echo "#devfile-status# {'buildStatus':'Compiling application'}
mvn compile
echo "#devfile-status# {'buildStatus':'Running unit tests'}
mvn test
```

This 'detailed build status' markup text is *entirely optional*: if this markup is not present, the odo tool can still determine build succeeded/failed and build running/not-running using the other `odo push -o json` JSON events.

### App status notification via `odo component status -o json` and `odo log --follow`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nitpick, but the word "App status" has occurred enough times to make me bring it up. app means a different thing in odo terminology. I understand that we're referring to the application building/running inside a container in the component. However, when we say app, it could refer to set of commands exposed via odo app. For the sake of this proposal, it's just an FYI. But when we write the docs for this feature, we need to be unambiguous.

Copy link
Contributor Author

@jgwest jgwest May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! Now, what I'm about to say may be a distinction without a useful different, but:

'App status' is definitely an imperfect term for the reasons you described, but I needed a term that also refers to whether the application's OS process is running inside the container or not. In contrast, the term component status is more general; it would refer to whether the pod/container is running, but not (necessarily) to whether the app is running.

These two lifecycle terms are mostly the same, except in the following cases:

  • User's applications fails to start, or dies after starting: In this case the component status is up (eg the pod/container are still running, the odo init volume is mapped in, etc), but the app is down.
  • During a devfile build, the app running in the container may be temporarily stopped in order to allow the build to step to deploy new files; in this scenario the app would be down but the component would be up.

In any case, like I said, using 'app status' is still problematic, so I'll think a bit more if there's a better term I can use here to refer to both ideas, so that there is no confusion with the odo application/component model terminology 👍 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the term "developer application" when we're talking about the application inside the pod/container that a dev cares about? Two words, I know, but unambiguous to a good extent, IMO.

However, broader discussion around the terminology is happening at #3076 & #3190.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, broader discussion around the terminology is happening at #3076 & #3190.

I would keep the discussion about terminology separate in #3076. We will need to change and redefine some of the terms used in odo across the odo all commands


In general, within the execution context that odo operates, there are a few ways for us to determine the application status:
1) Can the application be pinged at its exposed URL?
dharmit marked this conversation as resolved.
Show resolved Hide resolved
2) What state is the container in? (running/container creating/restarting/etc -- different statuses between local and Kube but same general idea)
3) In the application log, specific hardcoded text strings can be searched for (for example, OpenLiberty outputs defined status codes to its log to indicate that an app started.)
dharmit marked this conversation as resolved.
Show resolved Hide resolved

Ideally, we would like for odo to provide consuming tools with all 3 sets of data. Thus, as proposed:
- 1 and 2 are handled by a new `odo component status -o json` command, described here.
- 3 is handled by the existing unmodified `odo log --follow` command.
jgwest marked this conversation as resolved.
Show resolved Hide resolved

The new proposed `odo component status -o json` command will:
- Be a *long-running* command that will continue outputing status until it is aborted by the parent process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this sounds like something we'd do with log --follow (example: odo log --follow) or --watch (example: kubectl get pods -w) commands. status commands are more like one-time, no? They show the current status of the application when the user asks for it.

Status could keep changing over time. For example, if odo watch is running in another terminal and the user modifies something, it would trigger an automated odo push and the build/compile/run cycle would be repeated. If odo component status -o json would stream the events in that case as well, how is it different from what odo log --follow is doing except providing the output in json format? Shouldn't we do odo log --follow -o json instead in that case?

Copy link
Contributor Author

@jgwest jgwest May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and actually, in an earlier version of this design proposal this was called odo component status -o json --follow, but then I simplified it to remove --follow.

Would you prefer something like odo component status -o json --follow? This definitely makes the long-running behaviour of the command more clear. (Or if you have another noun/noun-phrase we could use instead of status, that better conveys this idea, that would work too.)

The other item it sounds like you may be proposing in your above comment is combining the odo log and odo component status commands? I had considered this, but ultimately realized there is a case where the lifecycle of the two commands should differ: when the user adds a new URL to an existing component, the odo component status process needs to be restarted (eg to pick up the new URL), but the odo log --follow process does not need to be restarted (because the container/pod remain the same).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer something like odo component status -o json --follow? This definitely makes the long-running behaviour of the command more clear.

I agree that this makes the long-running behaviour more obvious to the user. However, I was thinking of odo log --follow -o json but your later point about odo log --follow being something that shouldn't need a restart makes sense as well and. So, I guess, odo component status -o json --follow would be more appropriate.

- Every X seconds, ping the URLs/routes of the application as they existed when the command was first executed. Output the result as a JSON string.
- Every X seconds (or using a Kubernetes watch, where appropriate), check the container status for the application, based on the application data that was present when the command was first issued. Output the result as a JSON string.

**Note**: This command will NOT, by design, respond to route/application changes that occur during or after it is first invoked. It is up to consuming tools to ensure that the `odo component status` command is stopped/restarted as needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do not design it to respond to changes that occur in route/app since first run of odo component status, this command would report failure in accessing the route/app, right? For example:

  • I pushed the component and executed odo component status -o json.
  • Now,
    • I deleted the URL and created another one instead, or
    • I change the port (less likely than URL deletion+creation scenario), or
    • I feel like adding another endpoint to the application /status that displays the status of my app.
  • If the user has setup some action to be performed in case of an URL/app being inaccessible, wouldn't this give a false positive?

This might be irrelevant right now, but it might become more relevant once #2756 starts moving ahead.

Copy link
Contributor Author

@jgwest jgwest May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the scenario you describe is exactly the problem with concurrent modification of the application while the status command is running, and I included this line to indicate this Note: This command will NOT, by design, respond to route/application changes that occur during or after it is first invoked. It is up to consuming tools to ensure that the odo component status command is stopped/restarted as needed.

How to handle this exact problem is a fundamental architectural constraint in the problem space of this proposal: we need a way to get always up-to-date status, and always up-to-date logs, but we also need the ability to deploy changes to the application via a different odo command (odo push, running in a different OS process). And we need to do this (ideally) without turning the odo tool into either a server, or an LSP-style process, or a multi-process application that needs to do IPC between various running instances of the odo tool. I'll go into more details in your next comment where you have added additional details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agree upon this being a problem since its the end-user who has to perform an action. And as you said elsewhere, this leads to a scenario of the problem being solved at many places instead of being solved once in odo. But that could be taken up in the next phase, I guess.

- For example, if the user tells the IDE to delete their application with the IDE UI, the IDE will call `odo delete (component)`; at the same time, the IDE should also abort any existing `odo component status` commands that are running (as these are no longer guaranteed to return a valid status now that the application itself no longer exists). `odo component status` will not automatically abort when the application is deleted (because it has no reliable way to detect this in all cases).
- Another example: if the IDE adds a new URL via `odo url create [...]`, any existing `odo component status` commands that are running should be aborted, as these commands would still only be checking the URLs that existed when the command was first invoked (eg there is intentionally no cross-process notification mechanism for created/updated/deleted URLs implemented as part of this command.)
dharmit marked this conversation as resolved.
Show resolved Hide resolved

This is an example an `odo component status -o json` command invocation look like:
```
odo component status -o json

{ "applicationPing" : { "url" : "https://(...)", "response" : "true", "responseCode" : 200, "timestamp" : (UTC unix epoch seconds.microseconds) } }
{ "applicationPing" : { "url" : "https://(...)", "response" : "false", error: "host unreachable", "timestamp" : (...) } }
{ "containerStatus" : { "status" : "containercreating", "timestamp" : (...)} }
{ "containerStatus" : { "status" : "running", "timestamp" : (...)} }
(...)

(Exact details on event name, and JSON format are TBD; feedback welcome!)
```

To keep from overwhelming the output, only state changes would be printed (after an initial state output), rather than every specific event.

## Consumption of odo via external tools, such as IDEs

Based on our existing knowledge from previously building similar application/build-status tracking systems in Eclipse Codewind, we believe the above described commands should allow any external tool to provide a detailed status for odo-managed applications.

The proposed changes ensure that the the high-level logic around tracking application changes across time can be managed by external tools (such as IDEs) as desired, without the need to leak/"pollute" odo with any of these details. These changes give consuming tools all the data they need ensure fast, reliable, up-to-date and (where possible) detailed build/application status.


### What happens if the network connection is lost while executing these commands?

One potential challenge is how to handle network connection instability when the push/log/status commands are actively running. Both odo, and any external consuming tools, should be able to ensure that the odo-managed application can be returned to a stable state once the connection is re-established.

We can look at how each command should handle a temporary network disconnection:
- If network connection is dropped during *push*: consuming tool can restart the push command from scratch. Well-written dev files should be nuking any existing build processes (for example, when running a 'build' action, that build action should look for any old maven processes and kill them, if there are any that are already running; or said another way, it is up to the build action of a devfile to ensure that container state is consistent before starting a new build)
- If connection is dropped during *logs*: start a new tail, and then do a full 'get logs' to make sure we didn't miss anything; match up the two (the full log and the tail) as best as possible, to prevent duplicates. (The Kubernetes API may already have a better way of handling this; this is the "naive" algorithm)
- If connection is dropped during *status*: no special behaviour is needed here.