Centralized Scheduled Job Identity Management using IdentityPlugin #394

cwperks · 2023-06-01T18:15:45Z

Description

Companion PR in core: opensearch-project/OpenSearch#7573
Companion PR in security repo: https://github.com/opensearch-project/security

This PR uses the IdentityService from core to invoke the ScheduledJobIdentityManager if an implementation is provided via the IdentityService. The main idea is to centralize the identities that jobs are associated with in a single secure index owned by the security plugin. The current de-centralized management where plugins store identities of users associated with a job and inject the user's roles into the thread context at job execution is not a viable solution for an ecosystem with 3rd-party code where extensions cannot be implicitly trusted to inject the proper roles for privilege evaluation. The current concept of roles injection will not have an analogy in the extensions ecosystem.

In its place I am proposing a solution to centralize the management of identities associated with scheduled jobs and then call on the security plugin to perform the roles injection (in case of plugins) or issue an access token (in case of extensions). This is a more secure system where the identity cannot be overwritten by a plugin/extension, but management could be performed by a super admin or UX could be developed in the security-dashboards-plugin repo to change identities associated with jobs by modifying the index that identities associated with scheduled jobs are stored in.

This PR takes advantage of the fact that JobSweeper is an IndexOperationListener and hooks into preIndex and postDelete to create and delete entries in the centralized scheduled jobs identity index respectively.

Job Creation

In the security plugin, the implementation for scheduled job identity management introduces a convention for a plugin/extension to supply the identity they would like to associate with a scheduled job on job creation inside the job details document being indexed. Many plugins currently store jobs details and associated identities in system indices owned by the plugins. In order for a plugin to write to its system index it will stash the thread context and then index a document. Because of the thread context stashing, when indexing the document containing jobs details in the plugin's jobs index, there will be no User in the threadcontext during the time preIndex is called in JobSweeper. To solve for this issue, there is a new convention where the plugin can supply an operator field in the document.

i.e.

PUT /.hello-world-jobs
{
    "enabled_time": 1685640975348,
    "last_update_time": 1685640975348,
    "name": "hw",
    "lock_duration_seconds": 60,
    "enabled": true,
    "schedule":
    {
        "interval":
        {
            "unit": "Minutes",
            "period": 1,
            "start_time": 1685640975348
        }
    },
    "operator":
    {
        "user":
        {
            "username": "Craig",
            "roles": "all_access",
            "backend_roles": "admin"
        }
    }
}

OR 

{
    "enabled_time": 1685640975348,
    "last_update_time": 1685640975348,
    "name": "hw",
    "lock_duration_seconds": 60,
    "enabled": true,
    "schedule":
    {
        "interval":
        {
            "unit": "Minutes",
            "period": 1,
            "start_time": 1685640975348
        }
    },
    "operator":
    {
        "token": "<encodedOnBehalfOfJwt>"
    }
}

Inside of preIndex the security plugin will read the operator field from the document and modify the document prior to being indexed to remove the operator being stored in the .hello-world-jobs index and instead store it in the .opendistro-security-scheduled-job-identity where there is referential integrity with all plugin jobs indices and the compound key is job_index and job_id.

See below for an example entry in the .opendistro-security-scheduled-job-identity index:

{
    "job_id": "xlkLeIgBMkyX7K74-JX3",
    "job_index": ".hello-world-jobs",
    "created_time": 1685641231900,
    "last_update_time": 1685641231900,
    "user":
    {
        "name": "Craig",
        "backend_roles":
        [
            "admin"
        ],
        "roles":
        [
            "all_access"
        ],
        "custom_attribute_names":
        [],
        "user_requested_tenant": null
    }
}

After the operator is removed from the document the plugin/extension is trying to index the resulting document stored in the jobs details index is:

.hello-world-jobs

{
    "enabled_time": 1685640975348,
    "last_update_time": 1685640975348,
    "name": "hw",
    "lock_duration_seconds": 60,
    "enabled": true,
    "schedule":
    {
        "interval":
        {
            "unit": "Minutes",
            "period": 1,
            "start_time": 1685640975348
        }
    }
}

Job Execution

When the job scheduler determines that a job is due to be run it will invoke the runJob method of the Job Runner. The runJob method consumes JobExecutionContext and in this PR it modifies this context for Job Scheduler to request an access token from the identity system and pass it to an extension as part of the execution context. The AuthToken is a NamedWriteable meaning that it is serializable and can be securely passed via transport to the extension running the job. The extension (using the SDK) will extract the access token from that context and pass it along with REST Requests that the job would perform on behalf of the user associated with the job.

Job Deletion/De-Scheduled

When a job is deleted from the plugin/extension's job index, Job Scheduler will also hook into the event utilizing postDelete or the IndexOperationListener. In this method, the jobId and jobIndex are used call on the identity system to locate and delete the identity associated with the scheduled job in the centralized scheduled job identity index. I am currently looking into ways to ensure that this operation is atomic.

Refresh Access Token During Job Execution

It is possible that a job takes longer to run than the access token is valid for. In these circumstances a mechanism will be developed for a job runner to request a new access token. To do this, the job runner will call on a Job Scheduler REST API (RestRefreshTokenAction) using the service account token of the extension along with the jobId and jobIndex of the currently executing job.

The handler of this REST Request will verify:

A lock exists for this job - meaning the job is currently being executed
The service account token is valid
The extension associated with the service account owns that job index
An identity is associated with the jobId and jobIndex and if so issue a new access token

Issues Resolved

Related to: opensearch-project/security#2528

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Craig Perkins <cwperx@amazon.com>

cwperks · 2023-06-01T18:22:38Z

@saratvemulapalli @joshpalis @vibrantvarun Would love to know your thoughts on this approach to securing scheduled jobs.

peternied · 2023-06-05T20:30:42Z

spi/src/main/java/org/opensearch/jobscheduler/spi/JobSchedulerNamedWriteableRegistry.java

+                    List.of(
+                        new NamedWriteableRegistry.Entry(AuthToken.class, BasicAuthToken.NAME, BasicAuthToken::new),
+                        new NamedWriteableRegistry.Entry(AuthToken.class, BearerAuthToken.NAME, BearerAuthToken::new)


Instead of implementing this in the JobScheduler plugin could we pull in a static list defined in Core that knows what all of these tokens are so as new tokens are added it automatically supports them?

That's a good idea. Putting this in core.

peternied · 2023-06-05T21:17:25Z

src/main/java/org/opensearch/jobscheduler/scheduler/JobScheduler.java

+                    accessToken
+                );
+            } else {
+                // invoke job runner


I think this comment should be deduped and moved down or just erased

peternied · 2023-06-05T21:19:28Z

spi/src/main/java/org/opensearch/jobscheduler/spi/JobExecutionContext.java

+        this.accessToken = null;
+    }
+
+    public JobExecutionContext(


Could we keep a single constructor since auth token can be passed in as null, or if you'd prefer Optional?

peternied · 2023-06-05T21:21:49Z

src/main/java/org/opensearch/jobscheduler/sweeper/JobSweeper.java

@@ -186,6 +197,82 @@ public void beforeClose() {
        this.fullSweepExecutor.shutdown();
    }

+    @Override
+    public Engine.Index preIndex(ShardId shardId, Engine.Index operation) {
+        if (JobSchedulerPlugin.GuiceHolder.getIndicesService() != null


Can you invert the condition so that we exit early if any of these are not true? Makes it easer to read

peternied · 2023-06-05T21:25:53Z

src/main/java/org/opensearch/jobscheduler/sweeper/JobSweeper.java

+                && JobSchedulerPlugin.GuiceHolder.getIdentityService().getScheduledJobIdentityManager() != null) {
+                JobSchedulerPlugin.GuiceHolder.getIdentityService()
+                    .getScheduledJobIdentityManager()
+                    .deleteUserDetails(delete.id(), shardId.getIndexName());


What will happen if the deleteUserDetails completes, but the deschedule fails?

peternied · 2023-06-05T21:38:59Z

src/main/java/org/opensearch/jobscheduler/sweeper/JobSweeper.java

+            ParsedDocument parsedDoc = operation.parsedDoc();
+
+            try {
+                XContentParser parser = JsonXContent.jsonXContent.createParser(


Could this parsing logic be pulled out into another class?

Yes it could, this was a bit difficult to figure out at first but I learned a lot about XContent trying to figure this out

peternied · 2023-06-05T21:39:22Z

src/main/java/org/opensearch/jobscheduler/sweeper/JobSweeper.java

+                    XContentType.JSON
+                );
+
+                ParsedDocument docMinusOperator = mapperService.documentMapper().parse(toParse);


Maybe I'm having trouble reading this, I'm concerned about this, from this code the implication is that the operator information, including the token are accessible by the plugin, since the data is stored within the existing job index.

I'm glad you pointed this out because this is where I want the most feedback.

For plugins, plugins will stash their threadcontext before writing to their jobs index so at the time the job details are indexed the user info is not available in the threadcontext.

For extensions, extensions will use their service account token when writing to their jobs index so the identity of the requester (service account) is different from the identity of what to associate for the scheduled job.

This is where this convention comes into play. In this model, when writing the job details document to the plugin/extension's job index they would include a field called operator to tell the identity system the identity to store alongside the job details. This parses that information from the document. Using the preIndex hook of the IndexOperatorListener and modifies the document before its stored in the plugin/extensions job index and removes it from the document. Instead, that information would be stored in the centralized scheduled job identity index owned by the security plugin.

See this comment in the companion Core PR for more details: opensearch-project/OpenSearch#7573 (comment)

Signed-off-by: Craig Perkins <cwperx@amazon.com>

peternied · 2023-06-06T16:27:40Z

@cwperks Are data flow diagrams / trust boundary diagrams associated with this design?

Signed-off-by: Craig Perkins <cwperx@amazon.com>

cwperks · 2023-06-06T20:19:34Z

Below is a diagram of the 1) Job Creation workflow to show how the Identity System intercepts the index operation when a plugin/extensions creates a job to store the identity info in the scheduled job identity index.

I am created 2 others as well for:

Job Execution
Job Deletion/De-scheduling

peternied · 2023-06-06T21:07:49Z

In this diagram the authz information is being stored in a different location. While its slightly more secure to hold this information in a different spot that has limited access, it doesn't expand how AuthN/AuthZ information can be handled.

What would it take to augment this design to support external refresh? By allowing assigning jobs to operate as principals other than the current principal this would be a huge upgrade to the manageability of these jobs.

cwperks · 2023-06-06T21:17:15Z

@peternied The operator on the left hand side of the diagram is up to the plugin to supply when indexing the job details document. That operator does not necessarily need to be the creator of the job. For an extension, as long as that operator is a valid token then you can create the job for a user other than the creator.

When centralizing the identities associated with scheduled jobs in an index owned by the security plugin, its possible to create APIs to manage identities associated with the jobs.

Edit: This design covers how an access token could be issued on job invocation, but if the access token is not issued for a long enough window then an extension may still need to ask for a new one. In that case, the extension would request a new access token by calling a JS API to refresh their token provided a jobId and jobIndex.

plugins wouldn't require a refresh because the threadcontext headers/transient headers do not expire for the life of the thread

cwperks · 2023-06-07T13:31:37Z

If an access token expires while a job is still running, then I am thinking about providing a Job Scheduler API that a Job Runner on an Extension can call to refresh their access token. The API would consume the service account token of the extension, jobId and jobIndex.

On the handler of the API, the Job Scheduler would verify that the job 1) exists, 2) is owned by the extension and 3) is currently running. If those conditions are met than it can issue a new access token for a job in the middle of execution.

cwperks · 2023-06-08T20:59:05Z

@peternied To capture the problems with existing problems of storing authz outside of the security index I filed a campaign issue here: opensearch-project/security#2846

IMO I would love to solve the problems around dangling references and come up with a strategic way of fetching most up-to-date authz at job execution time going forward but I think its unrealistic to expect that that would be ready for the targeted experimental release of 2.9.0. I think that this design of centralizing the identities so that they are not stored in indices owned by extensions is a good move forward for extensions security and does create a clearer path to removing dangling references to static user authz. It will include the authz initially (same as plugins do now), but in this model the security plugin will have the references in a single place which can then make it possible to update the information on changes to a user or deletion of a role by also updating this index storing the identities associated with scheduled jobs.

Signed-off-by: Craig Perkins <cwperx@amazon.com>

cwperks · 2023-07-27T21:13:30Z

Closing for now. Will re-open at a later date with updates.

cwperks added 10 commits May 15, 2023 14:16

Use IdentityService.saveUserDetails on postIndex of job details

605c9d1

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Merge branch 'main' into scheduled-job-identity-manager

087d251

Show index modification working in preIndex

1dea579

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Use ScheduledJobOperator from core

0f7f6a8

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Add access token to JobExecutionContext

e02a20a

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Use NamedWriteableRegistry

d163537

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Remove sysouts

7682246

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Delete scheduled job identity on postDelete

c467658

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Update preIndex

5d6e533

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Remove log

5fa41f5

Signed-off-by: Craig Perkins <cwperx@amazon.com>

This was referenced Jun 1, 2023

Create ScheduledJobIdentityManager interface in IdentityPlugin opensearch-project/OpenSearch#7573

Closed

Store Scheduled Job User Information in an index owned by the Security plugin opensearch-project/security#2773

Closed

peternied reviewed Jun 5, 2023

View reviewed changes

cwperks added 4 commits June 6, 2023 10:27

Use NamedWriteableRegistry from core

4f9ce89

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Single constructor

6976f9f

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Return early

61b7d78

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Create JobOperatorParser

788fa58

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Add test for JobOperatorParser

c4776a2

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Merge branch 'main' into scheduled-job-identity-manager

f20b8e7

cwperks added 4 commits June 20, 2023 10:56

Add RestRefreshTokenAction

abbdfe1

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Merge branch 'main' into scheduled-job-identity-manager

8e90c49

Run spotlessApply

c7250ba

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Update extension point names

6a487ce

Signed-off-by: Craig Perkins <cwperx@amazon.com>

Merge branch 'main' into scheduled-job-identity-manager

43a7c11

cwperks closed this Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralized Scheduled Job Identity Management using IdentityPlugin #394

Centralized Scheduled Job Identity Management using IdentityPlugin #394

cwperks commented Jun 1, 2023 •

edited

Loading

cwperks commented Jun 1, 2023

peternied Jun 5, 2023

cwperks Jun 6, 2023

peternied Jun 5, 2023

peternied Jun 5, 2023

cwperks Jun 6, 2023

peternied Jun 5, 2023

peternied Jun 5, 2023

peternied Jun 5, 2023

cwperks Jun 6, 2023

peternied Jun 5, 2023

cwperks Jun 6, 2023 •

edited

Loading

peternied commented Jun 6, 2023

cwperks commented Jun 6, 2023 •

edited

Loading

peternied commented Jun 6, 2023

cwperks commented Jun 6, 2023 •

edited

Loading

cwperks commented Jun 7, 2023

cwperks commented Jun 8, 2023

cwperks commented Jul 27, 2023

Centralized Scheduled Job Identity Management using IdentityPlugin #394

Centralized Scheduled Job Identity Management using IdentityPlugin #394

Conversation

cwperks commented Jun 1, 2023 • edited Loading

Description

Issues Resolved

Check List

cwperks commented Jun 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwperks Jun 6, 2023 • edited Loading

Choose a reason for hiding this comment

peternied commented Jun 6, 2023

cwperks commented Jun 6, 2023 • edited Loading

peternied commented Jun 6, 2023

cwperks commented Jun 6, 2023 • edited Loading

cwperks commented Jun 7, 2023

cwperks commented Jun 8, 2023

cwperks commented Jul 27, 2023

cwperks commented Jun 1, 2023 •

edited

Loading

cwperks Jun 6, 2023 •

edited

Loading

cwperks commented Jun 6, 2023 •

edited

Loading

cwperks commented Jun 6, 2023 •

edited

Loading