Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent issue with -Install switch #157

Closed
joshooaj opened this issue Jul 11, 2019 · 14 comments
Closed

Intermittent issue with -Install switch #157

joshooaj opened this issue Jul 11, 2019 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@joshooaj
Copy link
Contributor

After generating a certificate, I need to install it into Cert:\LocalMachine\My, which the -Install flag does appear to do without fail.

However, for my application, I need to then do a netsh http add|update sslcert which frequently fails with the following message:

SSL Certificate update failed, Error: 1312 A specified logon session does not exist. It may already have been terminated.

After further investigation, I noticed that when this error happens, certutil -Store My returns the following information about the certificate

================ Certificate 9 ================
Serial Number: fa4e3fbab3233f766ae01309389d34d8e075
Issuer: CN=Let's Encrypt Authority X3, O=Let's Encrypt, C=US
 NotBefore: 6/21/2019 1:14 PM
 NotAfter: 9/19/2019 1:14 PM
Subject: CN=host.example.com
Non-root Certificate
Cert Hash(sha1): 6304015fd0f78ed5ae56d3850940b62055e65b3b
  Key Container =
  Simple container name: DefaultKeys
  Unique container name: fc1e3851f429ea606d6ff1e01a5229f1_aac1ff8f-089f-4a50-9ad1-45c6050250aa
  ERROR: Container name inconsistent:
  Provider = Microsoft Enhanced Cryptographic Provider v1.0
  Simple container name: DefaultKeys
  Unique container name: fc1e3851f429ea606d6ff1e01a5229f1_aac1ff8f-089f-4a50-9ad1-45c6050250aa
  ERROR: Container name inconsistent:
Encryption test passed

And when I do not get an error while using netsh http add|update sslcert, the output of certutil -Store My looks like this:

================ Certificate 9 ================
Serial Number: fa4e3fbab3233f766ae01309389d34d8e075
Issuer: CN=Let's Encrypt Authority X3, O=Let's Encrypt, C=US
 NotBefore: 6/21/2019 1:14 PM
 NotAfter: 9/19/2019 1:14 PM
Subject: CN=host.example.com
Non-root Certificate
Cert Hash(sha1): cf66345d299c0dd136738174c9cfa7c436e13f0c
  Key Container = {DFDA1D87-B29F-494C-A795-FC35865E03B5}
  Unique container name: 2746465e656820b8ac8ce725f06ff086_aac1ff8f-089f-4a50-9ad1-45c6050250aa
  Provider = Microsoft Enhanced Cryptographic Provider v1.0
Encryption test passed

I'm not sure if this has anything to do with Posh-ACME or Let's Encrypt, but I've seen the behavior myself on two computers and seen reports of this behavior intermittently from customers. I don't know enough at the moment about the "Key Container" property in the output of certutil to know where to start looking.

@joshooaj
Copy link
Contributor Author

I am able to delete the "corrupt" cert from the cert store in Windows, then manually install the fullchain.pfx file, and following this, the netsh http add|update sslcert command completes without error.

@joshooaj
Copy link
Contributor Author

In an attempt to reproduce this I enumerated through all the staging and production certs on my system, called Import-PfxCertInternal, and checked certutil for an error many times using the script below. So far I haven't been able to reproduce it.

The Key Container property shown in the output of certutil -store my doesn't look like something that is set by Posh-ACME and I'm not sure if it's something .NET/Windows generates when you import it, or if it's based on data in the certificate received from LE?

$badPfxs = @()
$goodPfxs = @()
foreach ($server in @("LE_STAGE", "LE_PROD")) {
    Write-Host "Selecting LE server $server"
    Set-PAServer $server
    foreach ($cert in Get-PACertificate -List) {
        Write-Host "Testing certificate $($cert.Subject)"
        Import-PfxCertInternal -PfxFile $cert.PfxFullChain -PfxPass $cert.PfxPass
        $certUtilResult = [string]::Join([System.Environment]::NewLine, (certutil -Store My $cert.Thumbprint))
        
        if ($certUtilResult.Contains("ERROR")) {
            $certUtilResult
            $badPfxs += "BAD: $server - $($cert.Subject)"
        } else {
            $goodPfxs += "GOOD: $server - $($cert.Subject)"
        }

        Get-ChildItem Cert:\LocalMachine\My | ? Thumbprint -eq $cert.Thumbprint | Remove-Item
    }
}

$badPfxs
$goodPfxs

@rmbolger
Copy link
Owner

This sounds vaguely familiar to issue #122 which was ultimately caused by a (still unresolved) bug in PowerShell's native Import-PfxCertificate function. At the time, I solved it by just using the lower level .NET cert store libraries everywhere instead of only on OSes that didn't have Import-PfxCertificate.

When was the bad cert generated? The fix for that bug was included in version 3.2.1 which released back in March. So if it was generated before you upgraded to 3.2.1, it might just be a legacy problem.

@rmbolger rmbolger self-assigned this Jul 12, 2019
@rmbolger rmbolger added the bug Something isn't working label Jul 12, 2019
@joshooaj
Copy link
Contributor Author

It would have been within the last month, I think I only discovered Posh-ACME around two weeks ago.

I'll peek at that bug and see if I should revise my testing - I've been exclusively testing the Import-PfxCertificateInternal and on Windows 10, server 2016 and server 2019.

@joshooaj
Copy link
Contributor Author

I did some more testing and I'm unsure why but it's like Windows always gets it wrong the first time, but the second New-PACertificate -Install invocation does not result in an incorrectly imported certificate.

Until I can understand this better, I have updated my script to always do a New-PACertificate -Install against the LE_STAGE server, then remove that certificate from the cert store, switch to LE_PROD, and re-run the same New-PACertificate command. This seems to be a reliable workaround for now. It definitely looks to be a Windows bug.

@rmbolger
Copy link
Owner

That's super weird. I'll see if I can reproduce this on my end as well. I'm on vacation for the next couple weeks. But I might have some time to poke around with it.

@rmbolger
Copy link
Owner

So I tried to do some testing on my end, but I can't get this to reproduce so far. Out of curiosity, what version of .NET do you currently have on the affected machine? Since .NET is ultimately what's being used to import the cert, it's feasible minor version differences could have different behavior that I'm unaware of.

The easiest way to check is this method. Here's a one-liner to get the build number from the registry.

(gp "HKLM:SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full").Release

The laptop I'm testing on is Win10 (1903) with build 461814 which according to the table in the link is .NET 4.7.2.

@joshooaj
Copy link
Contributor Author

I've been testing on fresh Server 2019 Standard version 1809 VM's with .NET build 461814, but ever since I modified my script to run through the process using a LE_STAGE cert first, the LE_PROD certificate works the first time every time. I probably won't change behavior since this will be used mostly by customers who are likely to be inexperienced with PowerShell / ACME / certificates in general, so better to get it wrong on LE_STAGE first.

If I get some time this evening I'll do another run through on a fresh VM without doing an import of the LE_STAGE cert first and get a screen cap.

@rmbolger
Copy link
Owner

Hey @jhendricks123. I've got a test VM that I can reliably reproduce this on now. As far as I can tell, there's nothing special about getting a cert from LE_STAGE first. It seems any certificate you import on a fresh install via .NET like the code is doing will suffer from this problem. I'm still looking into how the .NET code is different than importing via the GUI. But short term, here's a workaround that should be a lot simpler than getting a staging cert first.

You're basically just letting New-PACertificate install it as usual. Then deleting it and re-installing with Install-PACertificate (new in 3.6.0).

$cert = New-PACertificate <arguments as usual> -Install
if ($cert) {
    Remove-Item "Cert:\LocalMachine\My\$($cert.Thumbprint)"
    $cert | Install-PACertificate
}

@rmbolger
Copy link
Owner

Ok, I've got what seems to be a reliable workaround in place now. Though I'm still not exactly sure why it works. Currently if you don't specify the -FriendlyName parameter, the module defaults to an empty string which gets passed down and used for various cert properties including the private key alias in the generated PFX file. This empty private key alias is ultimately what Windows seems to be freaking out over on a fresh install. I have no clue why it doesn't freak out when you just try it again using the same file and the same code.

In any case, if you want to try it out for yourself, just include a non-empty -FriendlyName parameter and see if that fixes your problem. I'm going to add a fix that stops defaulting to an empty string and instead defaults to the primary domain value.

@webprofusion-chrisc
Copy link
Contributor

Sorry I didn't notice this bug before, just as a random aside Certify The Web had a similar issue and the solution for me was to double install the cert. I took it to be some sort of ephemeral key issue to do with how windows stores private keys as part of user profiles (and we are installing as System).

@rmbolger
Copy link
Owner

Doh! The explicit -FriendlyName workaround may not actually work yet. In the process of implementing the change to a non-empty default FriendlyName I think I stumbled on a bug where the FriendlyName value never actually makes it to the exported PFX files.

@rmbolger
Copy link
Owner

False alarm. While there is technically a bug with the FriendlyName processing code, it ends up that it works anyway despite the bug because PowerShell variable scoping is weird. So @jhendricks123, you should definitely try the -FriendlyName workaround when you get a chance and see if it fixes the issue with installing a cert on a fresh system.

rmbolger added a commit that referenced this issue Sep 12, 2019
…ty string to work around Windows edge case issues with cert installing (#157)
@rmbolger
Copy link
Owner

Version 3.7.0 now defaults the FriendlyName to the cert's primary name to avoid this bug. I'm going to close the issue for now. But let me know if you still run into it after upgrading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants