Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator-sdk 1.20.0 breaks k8s_status in FIPS enabled OpenShift cluster #5723

Closed
efussi opened this issue May 3, 2022 · 6 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. language/ansible Issue is related to an Ansible operator project
Milestone

Comments

@efussi
Copy link

efussi commented May 3, 2022

Bug Report

What did you do?

I have an Ansible operator image based on quay.io/operator-framework/ansible-operator:v1.19.1 which adds the kubernetes.core:2.3.0 and operator_sdk.util:0.4.0 collections in requirements.yaml. One of the playbook tasks sets the status of a CR like so:

- name: Set status to {{ status }} for {{ ansible_operator_meta.name }} in {{ ansible_operator_meta.namespace }}
  k8s_status:
    api_version: "acme.com/v1beta1"
    kind: AcmeThing
    name: "{{ ansible_operator_meta.name }}"
    namespace: "{{ ansible_operator_meta.namespace }}"
    status:
      acmeStatus: "{{ status }}"
      acmeVersion: "{{ version | default(omit) }}"
  register: set_cr_status
  retries: 3
  delay: 5
  until: set_cr_status is not failed

This works just fine on my FIPS-enabled OCP 4.8 cluster.

What did you expect to see?

When I change the base image to ansible-operator:v1.20.0 it continues to work.

What did you see instead? Under which circumstances?

When I change the base image to ansible-operator:v1.20.0 task k8s_status fails:

fatal: [localhost]: FAILED! => {"attempts": 3, "changed": false, "error": "[digital envelope routines: EVP_DigestInit_ex] disabled for FIPS", "msg": "Failed to get client due to %s"}

Environment

Operator type:

/language ansible

Kubernetes cluster type:

OpenShift 4.8.39

$ operator-sdk version

operator-sdk version: "v1.20.0", commit: "deb3531ae20a5805b7ee30b71f13792b80bd49b1", kubernetes version: "1.23", go version: "go1.17.9", GOOS: "linux", GOARCH: "amd64"

$ go version (if language is Go)

$ kubectl version

$ oc version
Client Version: 4.8.36
Server Version: 4.8.39
Kubernetes Version: v1.21.8+ed4d8fd

Possible Solution

The problem seems to be related to using MD5 hashes which are restricted in FIPS mode, compare s3tools/s3cmd#1005 (comment).

Additional context

@openshift-ci openshift-ci bot added the language/ansible Issue is related to an Ansible operator project label May 3, 2022
@rashmigottipati rashmigottipati added this to the v1.21.0 milestone May 9, 2022
@varshaprasad96 varshaprasad96 modified the milestones: v1.21.0, v1.22.0 May 18, 2022
@varshaprasad96 varshaprasad96 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 8, 2022
@varshaprasad96 varshaprasad96 modified the milestones: v1.22.0, v1.23.0 Jun 8, 2022
@asmacdo asmacdo modified the milestones: v1.23.0, Backlog Jun 29, 2022
@efussi
Copy link
Author

efussi commented Jul 6, 2022

I patched my operator to run with ANSIBLE_VERBOSITY=3 and was able to gather the stack trace:

The full traceback is:
  File "/tmp/ansible_k8s_status_payload_bi0wnjm8/ansible_k8s_status_payload.zip/ansible_collections/operator_sdk/util/plugins/module_utils/api_utils.py", line 86, in get_api_client
    client = DynamicClient(kubernetes.client.ApiClient(configuration))
  File "/usr/local/lib/python3.8/site-packages/openshift/dynamic/client.py", line 40, in __init__
    K8sDynamicClient.__init__(self, client, cache_file=cache_file, discoverer=discoverer)
  File "/usr/local/lib/python3.8/site-packages/kubernetes/dynamic/client.py", line 84, in __init__
    self.__discoverer = discoverer(self, cache_file)
  File "/usr/local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 224, in __init__
    Discoverer.__init__(self, client, cache_file)
  File "/usr/local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py", line 48, in __init__
    default_cachefile_name = 'osrcp-{0}.json'.format(hashlib.md5(default_cache_id).hexdigest())
fatal: [localhost]: FAILED! => {
    "attempts": 3,
    "changed": false,
    "error": "[digital envelope routines: EVP_DigestInit_ex] disabled for FIPS",

Comparing the pip freeze output for ansible-operator:v1.19.1 and ansible-operator:v1.20.0 the kubernetes version changed from 12.0.1 to 23.3.0. However, both seem to have the same code:

$ grep md5 /usr/local/lib/python3.8/site-packages/kubernetes/dynamic/discovery.py
        default_cachefile_name = 'osrcp-{0}.json'.format(hashlib.md5(default_cache_id).hexdigest())

When I patch discovery.py in my operator's Dockerfile, it works:

 && ansible-galaxy collection install -r ${HOME}/requirements.yml \
 && site_packages=/usr/local/lib/python3.8/site-packages \
 && sed -i -e 's/hashlib.md5(default_cache_id)/hashlib.md5(default_cache_id, usedforsecurity=False)/' ${site_packages}/kubernetes/dynamic/discovery.py \

While it's still not clear to me which of the python package updates from 1.19.1 to 1.20.0 caused this, I think the proper fix here involves two steps:

  1. Update package kubernetes (tracked through K8sDynamicClient fails on FIPS-enabled OpenShift cluster kubernetes-client/python#1851)
  2. Pull the updated package into operator-sdk (can be tracked through the subject issue)

@efussi
Copy link
Author

efussi commented Jul 6, 2022

@venkataramanam
Copy link

I had observed a FIPS issue with python openshift package version 0.13.1
openshift/openshift-restclient-python#427 (comment)

Looks like Ansible operator now uses openshift version 0.13.1
9bb14cc
https://github.com/operator-framework/operator-sdk/blob/master/images/ansible-operator/Pipfile.lock#L265

@efussi
Copy link
Author

efussi commented Oct 26, 2022

With https://github.com/kubernetes-client/python/releases/tag/v25.3.0 released, the above patch in the operator's Dockerfile can be changed to:

 && pip3 install --no-cache-dir kubernetes~=25.3.0 \
 && ansible-galaxy collection install -r ${HOME}/requirements.yml \

@venkataramanam
Copy link

@efussi

Thank you. Did a quick test by installing openshift==0.13.1 and it installed kubernetes==25.3.0 as a dependency which has the fix you had committed.

daezaa added a commit to daezaa/operator-sdk that referenced this issue Nov 14, 2022
upgrading kubernetes dependency to pull fixes due to failures on FIPS enabled clusters (operator-framework#5723)

Closes operator-framework#6169

Signed-off-by: daezaa <dschoi92@gmail.com>
everettraven pushed a commit that referenced this issue Nov 22, 2022
upgrading kubernetes dependency to pull fixes due to failures on FIPS enabled clusters (#5723)

Closes #6169

Signed-off-by: daezaa <dschoi92@gmail.com>

Signed-off-by: daezaa <dschoi92@gmail.com>
@efussi
Copy link
Author

efussi commented Dec 10, 2022

https://github.com/operator-framework/operator-sdk/releases/tag/v1.26.0 contains kubernetes 25.3.0 which has the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. language/ansible Issue is related to an Ansible operator project
Projects
None yet
Development

No branches or pull requests

5 participants