Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOBS shouldn't be a sub-resource of Processes #69

Closed
matthias-mueller opened this issue May 11, 2020 · 33 comments
Closed

JOBS shouldn't be a sub-resource of Processes #69

matthias-mueller opened this issue May 11, 2020 · 33 comments
Labels
1.0-draft.5 Draft version for after the public review change request

Comments

@matthias-mueller
Copy link

The current proposal is to expose jobs as a sub-resource of a process:

GET /processes/{process-id}/jobs/{job-id}/results

I think jobs should be on the same level as processes, e.g.:

GET /jobs/{job-id}/results

Why?

  • Algorithms(=processes) should be managed independently from actual comuptations(=jobs)
  • The process description might be updated after job execution/completion, keeping the job as a sub-resource will be misleading, because the process definition has changed
  • The process might be deleted from the server, but the job's results are still stored
  • Jobs are usually pooled at the service level, not per-process/algorithm
@bpross-52n
Copy link
Contributor

Hi Matthias, I see you point. The idea was that you send a POST request to /processes/{process-id}/jobs to create new job. Do you suggest to change this execute endpoint, too?

@matthias-mueller
Copy link
Author

I would probably post new jobs to /jobs and expect a link to the newly created job (/jobs/{job-id}) in the response.

@bpross-52n
Copy link
Contributor

Got it, this would require us to re-introduce the process id in the execute JSON. I would postpone discussion of this to after the release of the draft documents.

@ghobona
Copy link
Contributor

ghobona commented Jun 15, 2020

SWG decision on 2020-06-15 that this will be discussed after the draft document has been released for Public Comment. Consistent with the suggestion in #69 (comment)

@m-mohr
Copy link

m-mohr commented Jul 28, 2020

I agree with this, we do it very similarly in openEO: https://api.openeo.org/#tag/Batch-Jobs
Once you go into process chaining (#47) the per process jobs will be a problem anyway.

@bpross-52n bpross-52n added 1.0-draft.5 Draft version for after the public review and removed draft.4 labels Aug 17, 2020
@fmigneault
Copy link
Contributor

Both variants to retrieve jobs and their underlying results are also supported by CRIM's ADES/EMS.

GET /jobs/{jobId}
GET /processes/{id}/jobs/{jobId}

Adding the POST directly on jobs would only require to provide process ID via the body rather than by path.

Contrary to @m-mohr, I don't feel per-process jobs references are a problem during chaining, as that chain should already know about chained processes being executed one after the other anyway (we are executing workflows in this manner without issue). Each job can easily maintain a reference to the process that created it, so both links should be equivalent.

@christophenoel
Copy link
Contributor

christophenoel commented Sep 16, 2020

I'm open to this different approach, but an argument in favour of the process sub-resources is that the Job POST request itself depends on the specific process.

Nevertheless "The process might be deleted from the server, but the job's results are still stored" is also a good point :)

@bpross-52n
Copy link
Contributor

We discussed this in the SWG telecon on Monday, October 19th.

We propose to add an additional endpoint /processes/jobs that lists the jobs independent of the process identifier.

@matthias-mueller would you accept this solution?

@matthias-mueller
Copy link
Author

matthias-mueller commented Oct 30, 2020

Listing job-IDs in some place is only a small part of this issue (see all of comment 1 and 3). - More important is the overall process from job creation, via job monitoring, result retrieval and job dismission. Which endpoints are involved in these steps and where do you do you POST new jobs?

@pvretano
Copy link
Contributor

pvretano commented Oct 30, 2020

Hmm ... perhaps two top-level resource, one for process management and one for job management @matthias-mueller proposes would not be bad. Just thinking out loud here ...

For the /processes resource:

  • GET /processes gets the list of processes
  • GET /processes/{processId} get the process description
  • POST /processes create a new process (aka Transactional WPS)
  • PUT /processes/{processId} updates a process definition
  • DELETE /processes/{processId} deletes the process

For the /jobs resource:

  • GET /jobs - gets a list of all jobs
  • GET /jobs/{processId} gets a list of jobs for a specific {processId}
  • GET /jobs/{processId}/{jobId} get the status of a specific job
  • GET /jobs/{processId}/{jobId}/results get the results of a job
  • POST /jobs/{processId} create a new job
  • DELETE /jobs/{processId}/{jobId} deletes or cancels a job

For the /job resource, the {processId} identifier must be from the list of identifiers obtained by accessing the /processes resource.

This organization has the added side effect of making process invocation more natural with a GET in addition to being able to invoke the job with a POST.

Example: http://www.someserver.com/ogcapi/jobs/MyProcess?mode=async&input01=val01&input02=val02&bbox=1,2,3,4&output=tiffImage

Just thinking out loud base on @matthias-mueller comment above. Comments?

@sptillma
Copy link
Contributor

sptillma commented Oct 30, 2020 via email

@pvretano
Copy link
Contributor

@sptillma OGC API does not impose any requirement on the path ahead of the / so one could arrange to put all the OGC-specific resources in one sub-tree and other non-OGC stuff in other sub-trees. For example, at CubeWerx, we segregate the OGC API resource from other resources by placing an "ogcapi/" in the path: https://eratosthenes.pvretano.com/cubewerx/cubeserv/default/ogcapi/wpstest/processes
The "wpstest" path element maps to the dataset / data store / distribution as per DCAT so /ogcapi// is the landing page as per OGC API Common and after that ... well we all know the story! ;)

@fmigneault
Copy link
Contributor

I find having /jobs/{processId}/... is extremely counter-intuitive, since jobs is placed first, one would expect to have a jobId after and not a process reference.
I would much rather have something like /jobs/{jobId}?processId={processId} if a root /job was absolutely needed while also providing some processId via the path.

@matthias-mueller
Copy link
Author

I think there are two distict types of scenarios/setups to consider:

  1. If you operate a WPS service with static set of immutable process definitions, you and your clients probably have some sympathy to associate each job with the corresponding processId because it is a permanent realation in this kind of setup and it seems natural to craft the API into this direction.

  2. If you acknowledge concepts like WPS-T or operate a WPS service that changes it's process offerings frequently (in particular removal of process-ids of modification of the related process descriptions and I/O parameters) or just do not communicate the update policy of your process offerings to your clients, you an your clients will probably prefer a more generic layout that does not suggest a permanent relationship between processIds and jobs. In that case it seems more natural to have separate endpoints for processes and jobs to separate concerns and avoid false expectations on the client side.

@christophenoel
Copy link
Contributor

christophenoel commented Nov 2, 2020 via email

@pvretano
Copy link
Contributor

pvretano commented Nov 2, 2020

@spacebel Yes, DELETE /jobs/{jobId}; miss-typed that.
GET /jobs/{jobId} ... sure.
GET /jobs/{jobId}/results ... sure.
However, I am not seeing POST /jobs. How does the server know which process is being invoked? Is the process id encoded in the execute request? As far as I remember, that is not currently the case. I think it needs to be POST /jobs/{processId} no?

@christophenoel
Copy link
Contributor

christophenoel commented Nov 2, 2020 via email

@pvretano
Copy link
Contributor

pvretano commented Nov 2, 2020

@spacebel yeah I think you are right.
So POST /jobs and the execute request contains the process id or POST /jobs?processId=MyProcess.

@matthias-mueller
Copy link
Author

... or put it in the body of the POST request? (But maybe you already ruled that one out in previous discussions)

@pvretano
Copy link
Contributor

pvretano commented Nov 2, 2020

@matthias-mueller yes, that is what I meant by "... the execute request contains the process id ...".

@bpross-52n
Copy link
Contributor

... or put it in the body of the POST request? (But maybe you already ruled that one out in previous discussions)

I would not say it was ruled out. In fact, the process id was part of the execute json earlier. But it was removed, as it was kind of duplicate due to the execute endpoint, which included the process id. It would not be a big change to re-add the id to the execute json.

@matthias-mueller
Copy link
Author

Got it. The nice thing of a self-sufficient jobs endpoint would be that could be used as a means to add asynchronous processing to other parts of the OGC API (e.g. long running WFS/WCS operations). Not sure if that one is still on the agenda but it used to pop up randomly when we discussed WPS with other SWGs.

@fmigneault
Copy link
Contributor

@spacebel
I find all of the routes in #69 (comment) are valid except:

  • GET /jobs/{processId} get a list of jobs for a specific {processId}

There is no real safe way to differentiate between processId and jobId of GET /jobs/{jobId} in that case, especially if they are both UUIDs.

I also see POST /jobs as a completely valid use case, whether processId is provided via query parameter or body is equivalent, but I think via body is better since inputs of the process are provided this way anyway. Makes sense to have the process ID matching the inputs specified together.

@christophenoel
Copy link
Contributor

christophenoel commented Nov 3, 2020

Hi Francis,

I see your point, even if processId is not a UUID.

What do you think about adding an optional filter parameter to the HTTP GET /jobs ?

/jobs?parentProcess={processId}

@fmigneault
Copy link
Contributor

Yes. That would work. Maybe simply /jobs?processId={processId}, or maybe even just identifier (as in older WPS).

@bpross-52n
Copy link
Contributor

In our last telecon, we agreed that it would be ok to remove the process id from the /jobs endpoint.

The question is whether the /jobs endpoint should be moved

(1) to the root level, i.e. the same level as /processes

{root}/processes
{root}/jobs

or (2) under /processes

{root}/processes
{root}/processes/jobs

Some discussion points:
In case of (1), the {root}/jobs endpoint could be used with other OGC APIs that offer asynchronous functionality. Eventually this could be moved to API Common. On the other hand, the /jobs endpoint at root level could interfere with already existing endpoints of vendor-specific APIs that are used for job control.
Approach (2) would couple the /jobs endpoint more tightly to the API - Processes and could be used when no other OGC API used the jobs concept.

@christophenoel
Copy link
Contributor

Not clear to me why 2 could be used with other OGC APIs offering asynchronous funcitonalities ? Could you please elaborate a little ?

@bpross-52n
Copy link
Contributor

Sorry, but I am not sure that I understand the question correctly. The baseline is that

(1) {root}/jobs could also be used by other OGC APIs that offer asynchronous functionality

whereas

(2) {root}/processes/jobs could be used when the other OGC APIs do not offer asynchronous functionality or at least do not make use the jobs concept or URL endpoint

@christophenoel
Copy link
Contributor

Ok, I got it.

@pvretano
Copy link
Contributor

@bpross-52n I don't think that interference with vendor-specific endpoint is an issue. That argument could be made for any endpoint off the root (conformance, collections, etc.). My feeling is that vendors will segregate APIs using upstream path elements. For example, for CubeWerx all ogcapi endpoints live under ".../ogcapi/{datastore}/".
@bpross-52n with your answer to @spacebel are you suggesting that we support both /jobs and /processes/jobs? Generally I find adding too many degrees of freedom like that gets confusing.

@bpross-52n
Copy link
Contributor

@pvretano I agree with you. I do not suggest to use both approaches. I merely wanted to start a discussion here about which one we should choose.

@sptillma
Copy link
Contributor

@pvretano I'm not sure your argument regarding the interference with vendor-specific endpoints is valid to say "because it doesn't affect things the way we do it" :). I brought this argument up trying to resolve conflicts that "might" happen in the future. However, during the last meeting, you gave the best argument for /jobs when you said it might be moved into Common and apply across other standards as well. In that case, we define a standard that can be pointed to justify top level definition. But if it is self-contained to only apply to OGC API Processes, then I would argue it should stay under /processes. I'm good either way as long as we have merit behind the decision.

@bpross-52n
Copy link
Contributor

The respective changes should be merged now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0-draft.5 Draft version for after the public review change request
Projects
None yet
Development

No branches or pull requests

8 participants