Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run kola test on Azure failed #2445

Closed
HuijingHei opened this issue Sep 17, 2021 · 13 comments · Fixed by #2555
Closed

Run kola test on Azure failed #2445

HuijingHei opened this issue Sep 17, 2021 · 13 comments · Fixed by #2555

Comments

@HuijingHei
Copy link
Member

HuijingHei commented Sep 17, 2021

Run kola test on Azure, but result is failed, could some help to resolve it? Thanks!

  1. Install azure cli refer to https://www.edureka.co/community/93254/how-to-install-azure-cli-with-yum
  2. Setup credential refer to https://coreos.github.io/coreos-assembler/mantle/credentials/#azure
  3. Run kola list to get test which is available on azure, for example: coreos.ignition.ssh.key
    $ kola run -d -b rhcos -p azure --azure-auth ~/.azure/osServicePrincipal.json --azure-size Standard_D1_v2 --azure-location centralus coreos.ignition.ssh.key

2021-09-17T08:36:21Z kola: Flight failed: failed to resolve image: unable to fetch release bucket alpha version: Get "https://alpha.release.core-os.net/amd64-usr/current/version.txt": dial tcp: lookup alpha.release.core-os.net: no such host

@miabbott
Copy link
Member

miabbott commented Sep 20, 2021

2021-09-17T08:36:21Z kola: Flight failed: failed to resolve image: unable to fetch release bucket alpha version: Get "https://alpha.release.core-os.net/amd64-usr/current/version.txt": dial tcp: lookup alpha.release.core-os.net: no such host

Since mantle and kola were originally designed to run + test Container Linux images, it is defaulting to trying to find a CL image to use in Azure. Most of the CL infrastructure has been decommissioned, so this error is not surprising.

I think the more interesting error is if you try providing a location of a Red Hat/Fedora CoreOS image in Azure (i.e. like the RHCOS image that we upload as part of the build process).

@HuijingHei
Copy link
Member Author

I think the more interesting error is if you try providing a location of a Red Hat/Fedora CoreOS image in Azure (i.e. like the RHCOS image that we upload as part of the build process).

Thanks @miabbott
I tried, but failed with error, I am still asking for help from ocp team
harness.go:1124: Cluster failed starting machines: compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="LinkedInvalidPropertyId" Message="Property id 'https://hheios4storage.blob.core.windows.net/vhd/rhcos-49.84.202109172039-0-azure.x86_64.vhd' at path 'properties.storageProfile.imageReference.id' is invalid. Expect fully qualified resource Id that start with '/subscriptions/{subscriptionId}' or '/providers/{resourceProviderNamespace}/'."

@HuijingHei
Copy link
Member Author

HuijingHei commented Sep 27, 2021

Test command:
kola run -d -b rhcos -p azure --azure-auth ~/.azure/osServicePrincipal.json --azure-size Standard_DS1_v2 --azure-location centralus --azure-disk-uri /subscriptions/{subscription ID}/resourceGroups/{resource group}/providers/Microsoft.Compute/images/{imagename} --azure-version xxxx coreos.ignition.ssh.key

Test results:

  • passed with coreos.ignition.ssh.key

  • failed with ext.fedora-coreos-config.ignition.remote, get error:

harness.go:1124: Cluster failed starting machines: compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidParameter" Message="The value of parameter linuxConfiguration.ssh.publicKeys.keyData is invalid."

@miabbott
Copy link
Member

I wonder if the NoSSHKeyInMetadata flag on the coreos.ignition.ssh.key test has something to do with this failure?

We don't have a mechanism for setting that flag on the ext tests, so perhaps the harness is trying to inject the SSH key via platform metadata?

I'm kind of grasping at straws as I don't have a great understanding of the mantle code in this space. @bgilbert do you have any ideas?

@bgilbert
Copy link
Contributor

I wonder if the NoSSHKeyInMetadata flag on the coreos.ignition.ssh.key test has something to do with this failure?

coreos.ignition.ssh.key is the unusual case here. kola normally passes the SSH key through both the Ignition config and the Azure API, but that test is an exception (because it specifically tests the Ignition SSH key path). There might be a broader problem with passing keys via the API. @HuijingHei, do other kola tests work on Azure?

@HuijingHei
Copy link
Member Author

HuijingHei commented Sep 30, 2021

I wonder if the NoSSHKeyInMetadata flag on the coreos.ignition.ssh.key test has something to do with this failure?

coreos.ignition.ssh.key is the unusual case here. kola normally passes the SSH key through both the Ignition config and the Azure API, but that test is an exception (because it specifically tests the Ignition SSH key path). There might be a broader problem with passing keys via the API.

Thanks!
Yes, check coreos.ignition.ssh.key script has the flag NoSSHKeyInMetadata.

@HuijingHei, do other kola tests work on Azure?

No, test with rpmostree.status and get the same error The value of parameter linuxConfiguration.ssh.publicKeys.keyData is invalid.

This means kola passed the SSH key to Azure API, and it returns error with invalid parameter. And coreos.ignition.ssh.key passed because it does not pass key to Azure API. Is there any way/log to check what kola passed to Azure API?

@bgilbert
Copy link
Contributor

bgilbert commented Nov 9, 2021

We're currently passing ECDSA-format SSH keys to Azure, and Azure doesn't support those. kola will need code changes to work on Azure again. See #1772 for details.

@HuijingHei
Copy link
Member Author

We're currently passing ECDSA-format SSH keys to Azure, and Azure doesn't support those. kola will need code changes to work on Azure again. See #1772 for details.

Thanks @bgilbert for pointing out this!

Run kola test rpmostree.status on Azure, by default using ecdsa-sha2, test will be failed with The value of parameter linuxConfiguration.ssh.publicKeys.keyData is invalid; if change sshKey to rsa, it will pass

coreos-assembler/blob/main/mantle/platform/machine/azure/flight.go

	if !rconf.NoSSHKeyInMetadata {
		ac.sshKey = af.SSHKey
	} else {
		ac.sshKey = af.FakeSSHKey
	}
	fmt.Printf("======final = %v\n", ac.sshKey)
  1. By default result, run failed with ecdsa-sha2-nistp256 key
======final = ecdsa-sha2-nistp256 AAAA... core@default
--- FAIL: rpmostree.status (108.80s)
        harness.go:1213: Cluster failed starting machines: compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidParameter" Message="The value of parameter linuxConfiguration.ssh.publicKeys.keyData is invalid."
  1. Change sshKey to rsa, result is passed
	if !rconf.NoSSHKeyInMetadata {
		ac.sshKey = af.FakeSSHKey
	} else {
		ac.sshKey = af.FakeSSHKey
	}
	fmt.Printf("======final = %v\n", ac.sshKey)
======final = ssh-rsa AAAA...
--- PASS: rpmostree.status (409.65s)

@dustymabe
Copy link
Member

Yep. I think @bgilbert is trying to say this should be fixed if we implement the proposed solution in #1772 (comment), which I think @jmarrero has plans to work on.

@bgilbert
Copy link
Contributor

Well, since Azure only supports RSA, we'd need to generate a throwaway RSA key, same as we're doing on AWS.

@bgilbert
Copy link
Contributor

Fix in #2555. @HuijingHei, could you test it?

@dustymabe
Copy link
Member

Well, since Azure only supports RSA, we'd need to generate a throwaway RSA key, same as we're doing on AWS.

oh fun.. didn't realize they only supported RSA

@HuijingHei
Copy link
Member Author

Fix in #2555. @HuijingHei, could you test it?

Run kola test rpmostree.status on Azure with the patch, result is passed, refer to #2555 (comment)

bgilbert added a commit that referenced this issue Nov 11, 2021
Azure only supports RSA keys in instance metadata, but we currently don't
use those because our SSH library only supports the old RSA SHA-1
handshake which is no longer supported by the OS.  Until the SSH library
is fixed, always pass a fake RSA SSH key to the instance, and disable
tests that would try to use it.

Fixes #2445.
dustymabe pushed a commit to dustymabe/coreos-assembler that referenced this issue Dec 2, 2022
Azure only supports RSA keys in instance metadata, but we currently don't
use those because our SSH library only supports the old RSA SHA-1
handshake which is no longer supported by the OS.  Until the SSH library
is fixed, always pass a fake RSA SSH key to the instance, and disable
tests that would try to use it.

Fixes coreos#2445.

(cherry picked from commit a2066db)
jlebon pushed a commit that referenced this issue Dec 2, 2022
Azure only supports RSA keys in instance metadata, but we currently don't
use those because our SSH library only supports the old RSA SHA-1
handshake which is no longer supported by the OS.  Until the SSH library
is fixed, always pass a fake RSA SSH key to the instance, and disable
tests that would try to use it.

Fixes #2445.

(cherry picked from commit a2066db)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants