Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podvm: remove cdh,attestation-agent units #1499

Merged

Conversation

mkulke
Copy link
Contributor

@mkulke mkulke commented Oct 5, 2023

fixes #1495

As we build kata with SEALED_SECRET=yes by default this implies that kata-agent will attempt to spawn attestation-agent, cdh and api-server-rest. We'll end up with duplicate processes and contention over the sockets they need to create.

We can/need to keep api-server-rest as a systemd unit since this one needs to run in the podns network namespace and since it's exposing a tcp socket there is no contention.

As we build kata with SEALED_SECRET=yes by default this implies that
kata-agent will attempt to spawn attestation-agent, cdh and
api-server-rest. We'll end up with duplicate processes and contention
over the sockets they need to create.

We can/need to keep api-server-rest as a systemd unit since this one
needs to run in the podns network namespace and since it's exposing a
tcp socket there is no contention.

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
@mkulke mkulke added bug Something isn't working core Issues related to the core adaptor code labels Oct 5, 2023
Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

Copy link
Member

@bpradipt bpradipt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@mkulke
Copy link
Contributor Author

mkulke commented Oct 5, 2023

For context, this is how the processes look like on a podvm:

root@podvm-busybox-caa-7fcb8d777f-d7tth-f6dd9e03:/home/azureuser# pstree
systemd─┬─agent-protocol-───6*[{agent-protocol-}]
        ├─2*[agetty]
        ├─api-server-rest───2*[{api-server-rest}]
        ├─chronyd───chronyd
        ├─cron
        ├─dbus-daemon
        ├─irqbalance───{irqbalance}
        ├─kata-agent─┬─attestation-age───2*[{attestation-age}]
        │            ├─confidential-da───2*[{confidential-da}]
        │            ├─pause
        │            ├─sleep
        │            └─5*[{kata-agent}]
        ├─multipathd───6*[{multipathd}]
        ├─networkd-dispat
        ├─python3───python3───4*[{python3}]
        ├─rsyslogd───3*[{rsyslogd}]
        ├─sshd───sshd───sshd───bash───sudo───sudo───su───bash───pstree
        ├─systemd───(sd-pam)
        ├─systemd-journal
        ├─systemd-logind
        ├─systemd-network
        ├─systemd-resolve
        ├─systemd-udevd
        └─unattended-upgr───{unattended-upgr}
root@podvm-busybox-caa-7fcb8d777f-d7tth-f6dd9e03:/home/azureuser# systemctl status
● podvm-busybox-caa-7fcb8d777f-d7tth-f6dd9e03
    State: running
     Jobs: 0 queued
   Failed: 0 unit

The pod can then retrieve a passport token via remote attestation:

$ k exec -it deploy/busybox-caa -- wget -qO- http://127.0.0.1:8006/aa/token\?token_type\=kbs | jq .
{
  "token": "eyJhbGciOiJSUzM4NCIsInR5cCI6IkpXVCJ9.eyJldmFsdWF0aW9uLXJlcG9ydCI6IntcImFsbG93XCI6dHJ1ZX0iLCJleHAiOjE2OTY1MDA2OTcsImlzcyI6IkNvQ28tQXR0ZXN0YXRpb24tU2VydmljZSIsImp3ayI6eyJhbGciOiJSUzM4NCIsImUiOiJBUUFCIiwia3R5IjoiUlNBIiwibiI6InNrak1vNXA2WDdnWVFGcUZTZWc1cTRaMVZLaDl0VTBGeUJxNWRsU2tOVFNBR3g2RkRWaTlpZjRMdmpRdHVGSTJ2VjNCT2Etdnc5emNkblFpdk9JbVRQaXY3dEdhdVdfcFlxX3k2U2VRb3VGbW5YSWVUNUw4cDNFQmQ3ZEU5YXNQNTdJNmVFRjNNMWl4RlZ3ektUVF9WN05sSWRwTDN4bFpaNEFsTXBvWG83ZWdKNHVwYXBZMU5XTktzZWlWX1ZDbEZjQ0dvb1pVTkFxS19wVjZJSVhfbnROSGI4cU9ia2FvN1Q0aExkZmMzOFVNSEhNWjl1SVJRTmpJTkp2MjhUWG0yTkduUmp3WnczbzRsSmNlOVBXdm1EZUJMWnNrUnk5ak9Ga0MzNXIxeEdKNUR1OXhsc1dMOW1odjlEMjlfcXBPOEl1bHlIb0RoZmNFNnoxcUF5VFdZUSJ9LCJuYmYiOjE2OTY1MDAzOTcsInRjYi1zdGF0dXMiOnsiYXpzbnB2dHBtLm1lYXN1cmVtZW50IjoiVm5WZEkxVnRvZTFpdzBzRWIvVUpzSUdUK3lkK3JYM2pxTUxRS0lWL1Frek1UYVZla3FoaldnSVN1RVlNbVZOSyIsImF6c25wdnRwbS5wbGF0Zm9ybV9zbXRfZW5hYmxlZCI6IjAiLCJhenNucHZ0cG0ucGxhdGZvcm1fdHNtZV9lbmFibGVkIjoiMSIsImF6c25wdnRwbS5wb2xpY3lfYWJpX21ham9yIjoiMCIsImF6c25wdnRwbS5wb2xpY3lfYWJpX21pbm9yIjoiMzEiLCJhenNucHZ0cG0ucG9saWN5X2RlYnVnX2FsbG93ZWQiOiIwIiwiYXpzbnB2dHBtLnBvbGljeV9taWdyYXRlX21hIjoiMCIsImF6c25wdnRwbS5wb2xpY3lfc2luZ2xlX3NvY2tldCI6IjAiLCJhenNucHZ0cG0ucG9saWN5X3NtdF9hbGxvd2VkIjoiMSIsImF6c25wdnRwbS5yZXBvcnRlZF90Y2JfYm9vdGxvYWRlciI6IjMiLCJhenNucHZ0cG0ucmVwb3J0ZWRfdGNiX21pY3JvY29kZSI6IjExNSIsImF6c25wdnRwbS5yZXBvcnRlZF90Y2Jfc25wIjoiOCIsImF6c25wdnRwbS5yZXBvcnRlZF90Y2JfdGVlIjoiMCJ9LCJ0ZWUtcHVia2V5Ijp7ImFsZyI6IlJTQTFfNSIsImUiOiJBUUFCIiwia3R5IjoiUlNBIiwibiI6InpCeWwyMm5jYnFCRHR5cUV0OXpWYWFrczB2RHRkYlNHLVgydFh1UGcyQzJfSGtjdk5JMUo5OUk4UFdrR0FYNjFtRmFXd3k0OFpQQTRKVzEyUW93MXRvVUlSbFJhUVAycUxMbTYwRlVIMjBFTVhZQ2NwRDMzQWFnVXVUcTFVWTUtMmRjTkZiamhWNFVsOUhzOE9LZS1fYWg4SDR1Qm10Q2xmd0ZTMWNCZWR5QWNqNVhVMXdIaUFRY3hlNXJyaHpEYm1XZHZxSzFOQWJmXy13U085OHdfT0NxSzlYVDJuaklsTVZKRGtUcXhvY1BDVmRDaldXcXNjSkZVX0VOenBDS0MxZkFMdTM3eEtZRGpzbENEbV9vX0h2RlJpeklvXzdqZUJKN1Ridm9vRVFTWncxOGUzOGE2dmt0ZnFjTmxhZXJFeUJ3c0pyeDliWUNmc3FtS3FRRjFjUSJ9fQ.ebgD2QsnzPjsxwI2qJah7u53lgPchfrqIrkLiMn2IBlLBgzHQcJ3Z5jGLpM6kNTlZa5ZpKXzX-88lXGnQTCOzFAviEgDhzeFtDloHAIoT5PGIWHFQuzUXYDSXazmwKaUN3dTHmaRVU-SqEp_F4GAbh6HCob26JXgUPnlZh5BRq2q4qix7N7a9qtxfthTsSALwrjweR6CRkObU1bYgGEADGk8fx3OOpinUaZjRkiXSZctSUegUE-YIqByC6_aQmOUgKXZs1zmOoHtBtYqY-VmlXa6Leb8wmwPZPQsoTEvTXbs-ZipWPyxA2CwK6Z1PuH8CTDiFjB97rzzoHkm7Ouyhw",
  "tee_keypair": "-----BEGIN RSA PRIVATE KEY-----\nMIIEowIBAAKCAQEAzByl22ncbqBDtyqEt9zVaaks0vDtdbSG+X2tXuPg2C2/Hkcv\nNI1J99I8PWkGAX61mFaWwy48ZPA4JW12Qow1toUIRlRaQP2qLLm60FUH20EMXYCc\npD33AagUuTq1UY5+2dcNFbjhV4Ul9Hs8OKe+/ah8H4uBmtClfwFS1cBedyAcj5XU\n1wHiAQcxe5rrhzDbmWdvqK1NAbf/+wSO98w/OCqK9XT2njIlMVJDkTqxocPCVdCj\nWWqscJFU/ENzpCKC1fALu37xKYDjslCDm/o/HvFRizIo/7jeBJ7TbvooEQSZw18e\n38a6vktfqcNlaerEyBwsJrx9bYCfsqmKqQF1cQIDAQABAoIBAD6k5DqNKPxC78V9\npTIQ8ub05y7uhtLDT1GvQtCGu/FdSPTwAAru+i63NYnbe95llzJkEO1ieWK5X2IN\nUGhoQ+v6tGlxZingMKR9dFqQXlLqifMAkBLQecjmX0XiQNgBFemh2QA7t912ngmE\n8RyqTzHmzgGYfXSYaNKsA1JbMiL5CbpxArT/3K5wcs1wDMWLZFqUbWSQLeraERZs\nN/C6uF9u/a5iWaF4r8Tohn1LlzVITHlFpdeJlZIYbETrdd1IXne3LRfLXhYGNEeN\ng5Oy3IOVOZFzA5ULutpDyuVjSwblb370E2nhwA6w9o9pkOMeI6e7aExpHRjfFZiw\n7uj6WeECgYEA5ELyenxw8rUjg+CmCB9ovVAeXyffAG23XEtRlilrkBRZCa3Pqm//\nmvkdizwEEJbtsWH2h9xfUUHsXst5IEGNrnV5UTmcYpVI06BCdn7l3nss7bjpRlO4\nIGXU8/iqh0ZiwiTbA2VU5/uPNIhLvujNpNNV4+hQRZRoQJydhNrPLDUCgYEA5Ops\ndSHYaouWr8u+KFeqqxPlgrtvF3dQpRsGnLLALUrf0mmgqiMEyBsH5tDYy76a4vyH\n0qweJzuZ29PiJDuu7I7SWx2Ps+dTWLjQB1hozwgwqRNdkTChuAUveV95wCzOTpcq\nPzErq5r8p8DQQU/5BRpVG/rl+alw+ViAw5Kqs80CgYB4+fplbHq4R8SQ6olUmMD8\nRPAz4n/QTFX39ntBKKa3b/FYreP4Iu/HhOxhlOdam4NSlecBToy+FkBeZVzG+bdL\nlTs9D1mQ7inw72kKQGs4JPRE8dHA0jIuCYp523sVwvoohzwEaro7URou72WlwuDq\n0I8fAUs59VPjmp3pgcZ3WQKBgE2lzsA0iMorKyPaQlhA1F1PVGxx047sI+i9MBL6\n9wDmAuHGfn73fem6cYWzlbYWo0cXTaMCSwAX0WqlhnGv5PfMwkGx10q4zqarmbTE\nIlkHeCoBrZ1QF6rp516OKigripdR4zyoGx4MZmMonftpexhmBDSHeHalKPMLODIe\nj9SJAoGBAM5/6/xX/suVpTLu5UCIIRJqQthtSHhWaAD40AXDorRUD8bD756t4vHk\nUlx6rBWMM/ZR8fU14kXYGM0c6Khar12rJK3EcAYMSe1/m3JpjmdHhunjAEvr/Kmx\nu/hbNlbYraHR3OciwAuqGBVCd1YNlrl99gJjkVIRSTkshk0RQfB5\n-----END RSA PRIVATE KEY-----\n"
}

@mkulke mkulke merged commit a4d57fd into confidential-containers:main Oct 5, 2023
27 checks passed
@mkulke mkulke deleted the mkulke/remove-aa-and-cdh-units branch October 5, 2023 10:09
@surajssd
Copy link
Member

surajssd commented Oct 6, 2023

From an image that was built with the changes merged to main, I don't see that the CDH or AA-rest are working services.

Here is the process tree for kata-agent:

root         921  0.1  0.8  90524 64216 ?        Ssl  23:32   0:02 /usr/local/bin/kata-agent --config /etc/agent-config.toml
root        1024  0.0  0.1 153912  8600 ?        Sl   23:32   0:00  \_ /usr/local/bin/attestation-agent --keyprovider_sock unix:///run/confidential-containers/attestation-agent/keyprovider.sock --getresource_sock unix:///run/confidential-containers/attestation-agent/getresource.sock --attestation_sock unix:///run/c
65535       1160  0.0  0.0    996     4 ?        S    23:33   0:00  \_ /pause
root        1214  0.0  0.0  11380  7540 ?        S    23:34   0:00  \_ nginx: master process nginx -g daemon off;
systemd+    1241  0.0  0.0  11844  2808 ?        S    23:34   0:00  |   \_ nginx: worker process
systemd+    1242  0.0  0.0  11844  2808 ?        S    23:34   0:00  |   \_ nginx: worker process
root        1266  0.0  0.0   4188  3456 pts/0    Ss+  23:35   0:00  \_ bash

You can find the image here: /CommunityGalleries/cocopodvm-d0e4f35f-5530-4b9c-8596-112487cdea85/Images/podvm_image0/Versions/10.05.224738.

Also I have added the kustomization file generated as follows, notice that AA_KBC_PARAMS points to the KBS service:

cat <<EOF >install/overlays/azure/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../yamls
images:
- name: cloud-api-adaptor
  newName: "${registry}/cloud-api-adaptor"
  newTag: latest
generatorOptions:
  disableNameSuffixHash: true
configMapGenerator:
- name: peer-pods-cm
  namespace: confidential-containers-system
  literals:
  - CLOUD_PROVIDER="azure"
  - AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
  - AZURE_REGION="${AZURE_REGION}"
  - AZURE_INSTANCE_SIZE="Standard_DC2as_v5"
  - AZURE_RESOURCE_GROUP="${AZURE_RESOURCE_GROUP}"
  - AZURE_SUBNET_ID="${AZURE_SUBNET_ID}"
  - AZURE_IMAGE_ID="${AZURE_IMAGE_ID}"
  - AA_KBC_PARAMS="cc_kbc::http://10.0.211.55:8080"
secretGenerator:
- name: peer-pods-secret
  namespace: confidential-containers-system
  literals: []
- name: ssh-key-secret
  namespace: confidential-containers-system
  files:
  - id_rsa.pub
patchesStrategicMerge:
- workload-identity.yaml
EOF

@mkulke
Copy link
Contributor Author

mkulke commented Oct 6, 2023

From an image that was built with the changes merged to main, I don't see that the CDH or AA-rest are working services.

I have a suspicion that this is due to the kata-agent binary being cached. In versions.yaml we have set ccV0 as a ref, and this is our cache key too, so we are building the podvm with an outdated agent that doesn't spawn the CDH processes. I'll delete the kata-agent cache and test a rebuild, but we have to rethink the caching logic to accommodate major changes in kata, some options:

  • "resolve" the CCv0 ref into a commit sha and then use this one as cache key, the potential downside being that with some churn on the CCv0 branch, we'll probably don't cache much.

  • rebuild the agent async (having a schedule that checks for updates on CCv0 and retriggers a kata-agent rebuild)

  • use upstream-built binaries

  • don't bother, because we don't expect more breaking changes on kata-agent CCv0 in the near future 🤞

@mkulke
Copy link
Contributor Author

mkulke commented Oct 6, 2023

apparentely that worked. I rebuilt an image on this repo. Using the resulting /CommunityGalleries/cocopodvm-d0e4f35f-5530-4b9c-8596-112487cdea85/images/podvm_image0/versions/0.0.71 image the CDH processes on the PodVM look fine for me.

@surajssd
Copy link
Member

surajssd commented Oct 6, 2023

Thanks @mkulke this worked for me. But now on to the next roadblock. I see that some parsing fails with the following error:

# curl http://127.0.0.1:8006/aa/token\?token_type\=kbs
rpc status: Status { code: INTERNAL, message: "[ERROR:attestation-agent] AA-KBC get token failed: RCAR handshake failed: KBS attest unauthorized, Error Info: ErrorInformation { error_type: \"https://github.com/confidential-containers/kbs/errors/AttestationFailed\", detail: \"Attestation failed: Verifier evaluate failed: json parse error\\n\\nCaused by:\\n    trailing characters at line 1 column 420\" }", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }

Things look fine on the KBS side. Could it be because the AA did not pick up the latest change? I will try to build the image on my own and then verify this again.

@mkulke
Copy link
Contributor Author

mkulke commented Oct 6, 2023

Things look fine on the KBS side. Could it be because the AA did not pick up the latest change? I will try to build the image on my own and then verify this again.

No, this is a KBS error (Attestation Service actually, I assume you use the as-builtin option) verification was broken due to changes in the HCL report, those were fixed in az-snp-vtpm crate. You need to rebuild/redeploy KBS and make sure Cargo.lock is recreated, so it won't reuse an older revision of AS#main.

@surajssd
Copy link
Member

surajssd commented Oct 6, 2023

Gotcha, found your PR: confidential-containers/trustee#165

@surajssd
Copy link
Member

surajssd commented Oct 6, 2023

Thanks Magnus, it all worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core Issues related to the core adaptor code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDH: there are 2 process instances of AA, CDH and api-server-rest
4 participants