Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Feature Request]: DefaultAzureCredentials for Blob Storage #3269

Open
1 task done
lukas-reining opened this issue Dec 18, 2024 · 3 comments
Open
1 task done

[BUG/Feature Request]: DefaultAzureCredentials for Blob Storage #3269

lukas-reining opened this issue Dec 18, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@lukas-reining
Copy link

lukas-reining commented Dec 18, 2024

Contact Details [Optional]

lukas.reining@codecentric.de

System Information

ZENML_LOCAL_VERSION: 0.70.0
ZENML_SERVER_VERSION: 0.71.0
ZENML_SERVER_DATABASE: mysql
ZENML_SERVER_DEPLOYMENT_TYPE: kubernetes
ZENML_CONFIG_DIR: /Users/lukasreining/Library/Application Support/zenml
ZENML_LOCAL_STORE_DIR: /Users/lukasreining/Library/Application Support/zenml/local_stores
ZENML_SERVER_URL: https://....com
ZENML_ACTIVE_REPOSITORY_ROOT: None
PYTHON_VERSION: 3.9.20
ENVIRONMENT: native
SYSTEM_INFO: {'os': 'mac', 'mac_version': '15.1.1'}
ACTIVE_WORKSPACE: default
ACTIVE_STACK: default
ACTIVE_USER: zenml
TELEMETRY_STATUS: enabled
ANALYTICS_CLIENT_ID: 0bfa0527-ad75-46eb-b2b5-15bcbc282bd7
ANALYTICS_USER_ID: 64e39d11-76e3-408f-8673-f2ade3bb9330
ANALYTICS_SERVER_ID: 2422aa09-bbed-4fb9-adb2-3f22f789d7d2
INTEGRATIONS: ['airflow', 'bitbucket', 'kaniko', 'pigeon']
PACKAGES: {'markdown': '3.3.7', 'markupsafe': '2.1.5', 'pyyaml': '6.0.2', 'babel': '2.16.0', 'bracex': '2.5', 'certifi': '2024.7.4', 'charset-normalizer': '3.3.2', 'colorama': '0.4.6', 'ghp-import': '2.1.0', 'idna': '3.7', 'jinja2': '3.1.4', 'markdown-inline-graphviz-extension': '1.1.2', 'mdx-truly-sane-lists':
'1.3', 'mergedeep': '1.3.4', 'mkdocs': '1.6.0', 'mkdocs-awesome-pages-plugin': '2.9.3', 'mkdocs-get-deps': '0.2.0', 'mkdocs-material': '9.5.27', 'mkdocs-material-extensions': '1.3.1', 'mkdocs-monorepo-plugin': '1.1.0', 'mkdocs-techdocs-core': '1.4.0', 'natsort': '8.4.0', 'paginate': '0.5.6', 'pathspec': '0.12.1',
'plantuml-markdown': '3.9.7', 'pygments': '2.17.2', 'pymdown-extensions': '10.3.1', 'python-dateutil': '2.9.0.post0', 'python-slugify': '8.0.4', 'pyyaml-env-tag': '0.1', 'regex': '2024.7.24', 'requests': '2.32.3', 'six': '1.16.0', 'text-unidecode': '1.3', 'urllib3': '2.2.2', 'watchdog': '4.0.2', 'wcmatch': '9.0',
'zipp': '3.20.0', 'gitpython': '3.1.43', 'mako': '1.3.7', 'pyjwt': '2.10.0', 'pymysql': '1.1.1', 'sqlalchemy': '2.0.36', 'sqlalchemy-utils': '0.41.2', 'alembic': '1.8.1', 'altgraph': '0.17.4', 'annotated-types': '0.7.0', 'anyio': '4.6.2.post1', 'astroid': '3.2.4', 'asttokens': '3.0.0', 'attrs': '24.2.0',
'backports.tarfile': '1.2.0', 'bcrypt': '4.0.1', 'black': '24.8.0', 'cffi': '1.17.1', 'chardet': '5.2.0', 'click': '8.1.3', 'cloudpickle': '2.2.1', 'comm': '0.2.2', 'coverage': '7.6.4', 'cryptography': '43.0.3', 'decorator': '5.1.1', 'dill': '0.3.8', 'distlib': '0.3.8', 'distro': '1.9.0', 'dnspython': '2.6.1',
'docker': '7.1.0', 'email-validator': '2.2.0', 'exceptiongroup': '1.2.2', 'executing': '2.1.0', 'filelock': '3.15.4', 'flake8': '7.1.1', 'flake8-black': '0.3.6', 'flake8-pyproject': '1.2.3', 'fritzconnection': '1.12.2', 'gitdb': '4.0.11', 'h11': '0.14.0', 'hatch': '1.13.0', 'hatchling': '1.26.3', 'hera': '5.16.2',
'httpcore': '1.0.7', 'httpx': '0.27.2', 'hyperlink': '21.0.0', 'importlib-metadata': '7.0.0', 'iniconfig': '2.0.0', 'ipython': '8.18.1', 'ipywidgets': '8.1.5', 'isort': '5.13.2', 'jaraco.classes': '3.4.0', 'jaraco.context': '6.0.1', 'jaraco.functools': '4.1.0', 'jedi': '0.19.2', 'jupyterlab-widgets': '3.0.13',
'keyring': '25.5.0', 'lark': '1.1.9', 'macholib': '1.16.3', 'markdown-it-py': '3.0.0', 'matplotlib-inline': '0.1.7', 'mccabe': '0.7.0', 'mdurl': '0.1.2', 'mkdocs-autorefs': '1.2.0', 'mkdocs-kroki-plugin': '0.7.0', 'mkdocs-redirects': '1.2.1', 'more-itertools': '10.5.0', 'msal': '1.31.1', 'mypy-extensions': '1.0.0',
'olca-ipc': '0.0.11', 'packaging': '24.2', 'parso': '0.8.4', 'passlib': '1.7.4', 'pexpect': '4.9.0', 'pip': '24.3.1', 'platformdirs': '4.2.2', 'pluggy': '1.5.0', 'prompt-toolkit': '3.0.48', 'psutil': '6.1.0', 'ptyprocess': '0.7.0', 'pure-eval': '0.2.3', 'pycodestyle': '2.12.1', 'pycparser': '2.22', 'pydantic':
'2.8.2', 'pydantic-core': '2.20.1', 'pydantic-settings': '2.6.1', 'pyflakes': '3.2.0', 'pyinstaller-hooks-contrib': '2024.4', 'pylint': '3.2.6', 'pytest': '8.3.3', 'pytest-cov': '6.0.0', 'pytest-dependency': '0.6.0', 'pytest-mock': '3.14.0', 'pytest-subtests': '0.13.1', 'python-dotenv': '1.0.1', 'rich': '13.9.4',
'setuptools': '74.1.2', 'shellingham': '1.5.4', 'smmap': '5.0.1', 'sniffio': '1.3.1', 'sqlmodel': '0.0.18', 'stack-data': '0.6.3', 'tomli': '2.0.1', 'tomli-w': '1.1.0', 'tomlkit': '0.13.2', 'traitlets': '5.14.3', 'trove-classifiers': '2024.10.21.16', 'typing-extensions': '4.12.2', 'userpath': '1.9.2', 'uv':
'0.5.2', 'video-processing': '0.0.1', 'virtualenv': '20.26.3', 'wcwidth': '0.2.13', 'wheel': '0.44.0', 'widgetsnbextension': '4.0.13', 'zenml': '0.70.0', 'zstandard': '0.23.0', 'autocommand': '2.2.2', 'importlib-resources': '6.4.0', 'inflect': '7.3.1', 'jaraco.text': '3.12.1', 'typeguard': '4.3.0'}

CURRENT STACK

Name: default
ID: 013c76cc-0bf7-48c6-b9ee-560c7ec97b47
Workspace: default / 747cca81-9e4c-4899-b947-1e1a02f2dc34

ORCHESTRATOR: default

Name: default
ID: 5bfdd975-2e70-4e98-ae58-2d84240c6649
Type: orchestrator
Flavor: local
Configuration: {}
Workspace: default / 747cca81-9e4c-4899-b947-1e1a02f2dc34

ARTIFACT_STORE: default

Name: default
ID: 43004c2b-2eb0-4ad0-bbab-fc6344bf3c7c
Type: artifact_store
Flavor: local
Configuration: {'path': ''}
Workspace: default / 747cca81-9e4c-4899-b947-1e1a02f2dc34

What happened?

We want to use ZenML in an AKS Kubernetes Cluster with workload identities enabled.
For this we use Implicit authentication using the workload identities for all services if possible.
This works great for the secret store, the Kubernetes Orchestrator and the Kubernetes Step Operator.

As the documentation says, this does not work for the blob storages:

The only Azure authentication method that works with Azure blob storage resources is the service principal authentication method.
[1]

And it also does not work for the ACR without the admin account enabled.

If an authentication method other than the Azure service principal is used for authentication, the admin account must be enabled for the registry, otherwise, clients will not be able to authenticate to the registry. See the official Azure documentation on the admin account for more information.
[2]

We can certainly go for using the service principal but we would like to avoid this to keep static credentials out of our system.

Technically it does not seem problematic at the first glance to support the usage of the DefaultAzureCredential.
In the following the credentials are checked if they are the correct type:

Then they are given to the adlfs.AzureBlobFileSystem:

def filesystem(self) -> adlfs.AzureBlobFileSystem:
"""The adlfs filesystem to access this artifact store.
Returns:
The adlfs filesystem to access this artifact store.
"""
if not self._filesystem:
secret = self.get_credentials()
credentials = secret.get_values() if secret else {}
self._filesystem = adlfs.AzureBlobFileSystem(
**credentials,
anon=False,
use_listings_cache=False,
)
return self._filesystem

This seems to have no problem with consuming the DefaultAzureCredentials [3]:

The filesystem can be instantiated for different use cases based on a variety of storage_options combinations. The following list describes some common use cases utilizing AzureBlobFileSystem, i.e. protocols abfsor az. Note that all cases require the account_name argument to be provided:

1. Anonymous connection to public container: storage_options={'account_name': ACCOUNT_NAME, 'anon': True} will assume the ACCOUNT_NAME points to a public container, and attempt to use an anonymous login. Note, the default value for anon is True.
2. Auto credential solving using Azure's DefaultAzureCredential() library: storage_options={'account_name': ACCOUNT_NAME, 'anon': False} will use [DefaultAzureCredential](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) to get valid credentials to the container ACCOUNT_NAME. DefaultAzureCredential attempts to authenticate via the [mechanisms and order visualized here](https://learn.microsoft.com/en-us/python/api/overview/azure/identity-readme?view=azure-python#defaultazurecredential).
....

I might be overseeing anything that blocks this from being feasible, but support for workload identities wherever possible would be great for us.
Happy to get feedback from you regarding this :)

[1]: https://docs.zenml.io/how-to/infrastructure-deployment/auth-management/azure-service-connector#azure-blob-storage-container
[2]: https://docs.zenml.io/how-to/infrastructure-deployment/auth-management/azure-service-connector#acr-container-registry
[3]: https://pypi.org/project/adlfs/

Reproduction steps

No response

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@lukas-reining lukas-reining added the bug Something isn't working label Dec 18, 2024
@htahir1
Copy link
Contributor

htahir1 commented Dec 19, 2024

@lukas-reining Good one - I also think we might be able to enable this in a future release. We'd welcome a contribution if you already have a clear idea how to add the DefaultAzureCredential as well!

@lukas-reining
Copy link
Author

I also think we might be able to enable this in a future release

Cool, this sounds good @htahir1!

We'd welcome a contribution if you already have a clear idea how to add the DefaultAzureCredential as well!

I will have a look, maybe there will be some time over the holidays, but I can't promise :)

Just for my understanding: Do you know why there is the explicit check for only using the service principle?
Was there a restriction in the past?
If you can not tell, I will try to find it out, I just want to avoid stepping into a pitfall that you might already have seen.

@htahir1
Copy link
Contributor

htahir1 commented Dec 19, 2024

I think @stefannica can help here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants