Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralized Scheduled Job Identity Management using IdentityPlugin #394

Closed

Conversation

cwperks
Copy link
Member

@cwperks cwperks commented Jun 1, 2023

Description

Companion PR in core: opensearch-project/OpenSearch#7573
Companion PR in security repo: https://github.com/opensearch-project/security

This PR uses the IdentityService from core to invoke the ScheduledJobIdentityManager if an implementation is provided via the IdentityService. The main idea is to centralize the identities that jobs are associated with in a single secure index owned by the security plugin. The current de-centralized management where plugins store identities of users associated with a job and inject the user's roles into the thread context at job execution is not a viable solution for an ecosystem with 3rd-party code where extensions cannot be implicitly trusted to inject the proper roles for privilege evaluation. The current concept of roles injection will not have an analogy in the extensions ecosystem.

In its place I am proposing a solution to centralize the management of identities associated with scheduled jobs and then call on the security plugin to perform the roles injection (in case of plugins) or issue an access token (in case of extensions). This is a more secure system where the identity cannot be overwritten by a plugin/extension, but management could be performed by a super admin or UX could be developed in the security-dashboards-plugin repo to change identities associated with jobs by modifying the index that identities associated with scheduled jobs are stored in.

This PR takes advantage of the fact that JobSweeper is an IndexOperationListener and hooks into preIndex and postDelete to create and delete entries in the centralized scheduled jobs identity index respectively.

Job Creation

In the security plugin, the implementation for scheduled job identity management introduces a convention for a plugin/extension to supply the identity they would like to associate with a scheduled job on job creation inside the job details document being indexed. Many plugins currently store jobs details and associated identities in system indices owned by the plugins. In order for a plugin to write to its system index it will stash the thread context and then index a document. Because of the thread context stashing, when indexing the document containing jobs details in the plugin's jobs index, there will be no User in the threadcontext during the time preIndex is called in JobSweeper. To solve for this issue, there is a new convention where the plugin can supply an operator field in the document.

i.e.

PUT /.hello-world-jobs
{
    "enabled_time": 1685640975348,
    "last_update_time": 1685640975348,
    "name": "hw",
    "lock_duration_seconds": 60,
    "enabled": true,
    "schedule":
    {
        "interval":
        {
            "unit": "Minutes",
            "period": 1,
            "start_time": 1685640975348
        }
    },
    "operator":
    {
        "user":
        {
            "username": "Craig",
            "roles": "all_access",
            "backend_roles": "admin"
        }
    }
}

OR 

{
    "enabled_time": 1685640975348,
    "last_update_time": 1685640975348,
    "name": "hw",
    "lock_duration_seconds": 60,
    "enabled": true,
    "schedule":
    {
        "interval":
        {
            "unit": "Minutes",
            "period": 1,
            "start_time": 1685640975348
        }
    },
    "operator":
    {
        "token": "<encodedOnBehalfOfJwt>"
    }
}

Inside of preIndex the security plugin will read the operator field from the document and modify the document prior to being indexed to remove the operator being stored in the .hello-world-jobs index and instead store it in the .opendistro-security-scheduled-job-identity where there is referential integrity with all plugin jobs indices and the compound key is job_index and job_id.

See below for an example entry in the .opendistro-security-scheduled-job-identity index:

{
    "job_id": "xlkLeIgBMkyX7K74-JX3",
    "job_index": ".hello-world-jobs",
    "created_time": 1685641231900,
    "last_update_time": 1685641231900,
    "user":
    {
        "name": "Craig",
        "backend_roles":
        [
            "admin"
        ],
        "roles":
        [
            "all_access"
        ],
        "custom_attribute_names":
        [],
        "user_requested_tenant": null
    }
}

After the operator is removed from the document the plugin/extension is trying to index the resulting document stored in the jobs details index is:

.hello-world-jobs

{
    "enabled_time": 1685640975348,
    "last_update_time": 1685640975348,
    "name": "hw",
    "lock_duration_seconds": 60,
    "enabled": true,
    "schedule":
    {
        "interval":
        {
            "unit": "Minutes",
            "period": 1,
            "start_time": 1685640975348
        }
    }
}

Job Execution

When the job scheduler determines that a job is due to be run it will invoke the runJob method of the Job Runner. The runJob method consumes JobExecutionContext and in this PR it modifies this context for Job Scheduler to request an access token from the identity system and pass it to an extension as part of the execution context. The AuthToken is a NamedWriteable meaning that it is serializable and can be securely passed via transport to the extension running the job. The extension (using the SDK) will extract the access token from that context and pass it along with REST Requests that the job would perform on behalf of the user associated with the job.

Job Deletion/De-Scheduled

When a job is deleted from the plugin/extension's job index, Job Scheduler will also hook into the event utilizing postDelete or the IndexOperationListener. In this method, the jobId and jobIndex are used call on the identity system to locate and delete the identity associated with the scheduled job in the centralized scheduled job identity index. I am currently looking into ways to ensure that this operation is atomic.

Refresh Access Token During Job Execution

It is possible that a job takes longer to run than the access token is valid for. In these circumstances a mechanism will be developed for a job runner to request a new access token. To do this, the job runner will call on a Job Scheduler REST API (RestRefreshTokenAction) using the service account token of the extension along with the jobId and jobIndex of the currently executing job.

The handler of this REST Request will verify:

  1. A lock exists for this job - meaning the job is currently being executed
  2. The service account token is valid
  3. The extension associated with the service account owns that job index
  4. An identity is associated with the jobId and jobIndex and if so issue a new access token

Issues Resolved

Related to: opensearch-project/security#2528

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

cwperks added 10 commits May 15, 2023 14:16
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
@cwperks
Copy link
Member Author

cwperks commented Jun 1, 2023

@saratvemulapalli @joshpalis @vibrantvarun Would love to know your thoughts on this approach to securing scheduled jobs.

Comment on lines 30 to 32
List.of(
new NamedWriteableRegistry.Entry(AuthToken.class, BasicAuthToken.NAME, BasicAuthToken::new),
new NamedWriteableRegistry.Entry(AuthToken.class, BearerAuthToken.NAME, BearerAuthToken::new)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of implementing this in the JobScheduler plugin could we pull in a static list defined in Core that knows what all of these tokens are so as new tokens are added it automatically supports them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. Putting this in core.

accessToken
);
} else {
// invoke job runner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should be deduped and moved down or just erased

this.accessToken = null;
}

public JobExecutionContext(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we keep a single constructor since auth token can be passed in as null, or if you'd prefer Optional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -186,6 +197,82 @@ public void beforeClose() {
this.fullSweepExecutor.shutdown();
}

@Override
public Engine.Index preIndex(ShardId shardId, Engine.Index operation) {
if (JobSchedulerPlugin.GuiceHolder.getIndicesService() != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you invert the condition so that we exit early if any of these are not true? Makes it easer to read

&& JobSchedulerPlugin.GuiceHolder.getIdentityService().getScheduledJobIdentityManager() != null) {
JobSchedulerPlugin.GuiceHolder.getIdentityService()
.getScheduledJobIdentityManager()
.deleteUserDetails(delete.id(), shardId.getIndexName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if the deleteUserDetails completes, but the deschedule fails?

ParsedDocument parsedDoc = operation.parsedDoc();

try {
XContentParser parser = JsonXContent.jsonXContent.createParser(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this parsing logic be pulled out into another class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could, this was a bit difficult to figure out at first but I learned a lot about XContent trying to figure this out

XContentType.JSON
);

ParsedDocument docMinusOperator = mapperService.documentMapper().parse(toParse);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm having trouble reading this, I'm concerned about this, from this code the implication is that the operator information, including the token are accessible by the plugin, since the data is stored within the existing job index.

Copy link
Member Author

@cwperks cwperks Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you pointed this out because this is where I want the most feedback.

  1. For plugins, plugins will stash their threadcontext before writing to their jobs index so at the time the job details are indexed the user info is not available in the threadcontext.
  2. For extensions, extensions will use their service account token when writing to their jobs index so the identity of the requester (service account) is different from the identity of what to associate for the scheduled job.

This is where this convention comes into play. In this model, when writing the job details document to the plugin/extension's job index they would include a field called operator to tell the identity system the identity to store alongside the job details. This parses that information from the document. Using the preIndex hook of the IndexOperatorListener and modifies the document before its stored in the plugin/extensions job index and removes it from the document. Instead, that information would be stored in the centralized scheduled job identity index owned by the security plugin.

See this comment in the companion Core PR for more details: opensearch-project/OpenSearch#7573 (comment)

cwperks added 4 commits June 6, 2023 10:27
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
@peternied
Copy link
Member

@cwperks Are data flow diagrams / trust boundary diagrams associated with this design?

Signed-off-by: Craig Perkins <cwperx@amazon.com>
@cwperks
Copy link
Member Author

cwperks commented Jun 6, 2023

Below is a diagram of the 1) Job Creation workflow to show how the Identity System intercepts the index operation when a plugin/extensions creates a job to store the identity info in the scheduled job identity index.

Centralized Scheduled Job Identity Management - Job Creation (1)

I am created 2 others as well for:

  1. Job Execution
  2. Job Deletion/De-scheduling

@peternied
Copy link
Member

In this diagram the authz information is being stored in a different location. While its slightly more secure to hold this information in a different spot that has limited access, it doesn't expand how AuthN/AuthZ information can be handled.

What would it take to augment this design to support external refresh? By allowing assigning jobs to operate as principals other than the current principal this would be a huge upgrade to the manageability of these jobs.

@cwperks
Copy link
Member Author

cwperks commented Jun 6, 2023

@peternied The operator on the left hand side of the diagram is up to the plugin to supply when indexing the job details document. That operator does not necessarily need to be the creator of the job. For an extension, as long as that operator is a valid token then you can create the job for a user other than the creator.

When centralizing the identities associated with scheduled jobs in an index owned by the security plugin, its possible to create APIs to manage identities associated with the jobs.

Edit: This design covers how an access token could be issued on job invocation, but if the access token is not issued for a long enough window then an extension may still need to ask for a new one. In that case, the extension would request a new access token by calling a JS API to refresh their token provided a jobId and jobIndex.

plugins wouldn't require a refresh because the threadcontext headers/transient headers do not expire for the life of the thread

@cwperks
Copy link
Member Author

cwperks commented Jun 7, 2023

If an access token expires while a job is still running, then I am thinking about providing a Job Scheduler API that a Job Runner on an Extension can call to refresh their access token. The API would consume the service account token of the extension, jobId and jobIndex.

On the handler of the API, the Job Scheduler would verify that the job 1) exists, 2) is owned by the extension and 3) is currently running. If those conditions are met than it can issue a new access token for a job in the middle of execution.

@cwperks
Copy link
Member Author

cwperks commented Jun 8, 2023

@peternied To capture the problems with existing problems of storing authz outside of the security index I filed a campaign issue here: opensearch-project/security#2846

IMO I would love to solve the problems around dangling references and come up with a strategic way of fetching most up-to-date authz at job execution time going forward but I think its unrealistic to expect that that would be ready for the targeted experimental release of 2.9.0. I think that this design of centralizing the identities so that they are not stored in indices owned by extensions is a good move forward for extensions security and does create a clearer path to removing dangling references to static user authz. It will include the authz initially (same as plugins do now), but in this model the security plugin will have the references in a single place which can then make it possible to update the information on changes to a user or deletion of a role by also updating this index storing the identities associated with scheduled jobs.

cwperks added 4 commits June 20, 2023 10:56
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
@cwperks
Copy link
Member Author

cwperks commented Jul 27, 2023

Closing for now. Will re-open at a later date with updates.

@cwperks cwperks closed this Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants