-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[object_store] support azure managed and workload identities #3581
Conversation
I intend to review this first thing tomorrow |
assert_eq!(req.uri().path(), format!("/{}/oauth2/v2.0/token", tenant)); | ||
assert_eq!(req.method(), &Method::POST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I really wanted to check here is if the right content from the federated token file is included in the client_assertion
. However getting the body bytes is an async operation. We could probably get a tokio runtime internally, but I was wondering if there is a better way to achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use futures::block_on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just some minor nits, looks good thank you 👍
@@ -294,45 +306,241 @@ impl ClientSecretOAuthProvider { | |||
pub fn new( | |||
client_id: String, | |||
client_secret: String, | |||
tenant_id: String, | |||
tenant_id: impl AsRef<str>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
object_store/src/azure/credential.rs
Outdated
/// Fetch a token | ||
pub async fn fetch_token( | ||
async fn fetch_token(&self, client: &Client, retry: &RetryConfig) -> Result<String> { | ||
self.cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about moving the TokenCache onto CredentialProvider::TokenProvider
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes absolute sense - done.
object_store/src/azure/credential.rs
Outdated
#[derive(serde::Deserialize, Debug)] | ||
#[async_trait::async_trait] | ||
pub trait TokenCredential: std::fmt::Debug + Send + Sync + 'static { | ||
async fn fetch_token(&self, client: &Client, retry: &RetryConfig) -> Result<String>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the client should be passed at credential construction, this avoids issues with clients not permitting https
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I passed in a client for the managed identity, but I was unsure about the other credentials as they always use ssl. So we just ignore the client for the managed identity. If however the "savings" of not having a redundant instance of the client are not relevant, I'm happy to move it for all credentials.
object_store/src/azure/credential.rs
Outdated
(None, None, Some(msi_res_id)) => { | ||
query_items.push(("msi_res_id", msi_res_id)) | ||
} | ||
_ => (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this silently ignores parameters if multiple are specified, I can't see why this would be the case. If this is intentional I think we should return an error for this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True - I went for just having a hierarchy. Since object id and msi_res_id are more specific (only used by managed identity) and less likely to just "end up" in the environment, I thought they would get precedence.
.header("metadata", "true") | ||
.query(&query_items); | ||
|
||
if let Ok(val) = std::env::var(MSI_SECRET_ENV_KEY) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to provide this in the constructor, and delegate to the config key system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure about this value. From what I understood, its rotated at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm mistaken, environment variables can't reliably be altered at runtime by an external process. So I think it should be fine to move it outside
object_store/src/azure/credential.rs
Outdated
let fs = LocalFileSystem::new_with_prefix(root.path()).unwrap(); | ||
let tenant = "tenant"; | ||
let tokenfile = root.path().join("tokenfile"); | ||
fs.put( | ||
&crate::path::Path::from("tokenfile"), | ||
bytes::Bytes::from("federated-token"), | ||
) | ||
.await | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use NamedTempFile
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point :)
assert_eq!(req.uri().path(), format!("/{}/oauth2/v2.0/token", tenant)); | ||
assert_eq!(req.method(), &Method::POST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use futures::block_on
msi_endpoint: Option<String>, | ||
) -> Self { | ||
let msi_endpoint = msi_endpoint.unwrap_or_else(|| { | ||
"http://169.254.169.254/metadata/identity/oauth2/token".to_owned() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem we had to workaround with the AWS provider, is ensuring that the client used for this is not https only. In practice this meant creating a separate client for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completely missed that - using a dedicated client here as well now.
object_store/src/azure/mod.rs
Outdated
@@ -824,18 +947,40 @@ impl MicrosoftAzureBuilder { | |||
Ok(credential::CredentialProvider::AccessKey(bearer_token)) | |||
} else if let Some(access_key) = self.access_key { | |||
Ok(credential::CredentialProvider::AccessKey(access_key)) | |||
} else if self.use_managed_identity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with Azure SDKs, but the AWS and Google SDKs default to using managed identity if nothing else is specified. I'm not sure if this is something we want to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default azure credential does a lot of opinionated things that I think we do not want to replicate here :). That said, it makes sense to use the metadata as default since it may work without any configuration.
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
I don't have the means to test this, so I am going to take it on faith that this works as advertised 😄 |
Benchmark runs are scheduled for baseline = f0be9da and contender = 73ce076. 73ce076 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
@tustvold - I also have to set up a real-world test and am in the process of doing this right now. Will open up a follow up PR if I see that changes need to be made. Also wanted to move that last variable :). Sorry for not mentioning that sooner! |
Which issue does this PR close?
Closes #3580
Rationale for this change
see linked issue
What changes are included in this PR?
added implementations for a managed identity and federated workload credential for azure along with corresponding configuration on the azure builder.
Are there any user-facing changes?
users can now use managed identities and workload credentials to authenticate object store requests.