-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTPS requests fail on Ubuntu 24.04 Noble ARM32 due to bundled certs "NotTimeValid" error #101444
Comments
I can take a look (and finally have an excuse to take that Raspberry Pi out of the box!) |
Firstly
This will take a bit of additional time to troubleshoot, whatever is going on is not immediately obvious (to me) |
@vcsjones |
Yep. I am trying to narrow it down to the smallest possible place to start looking for ABI mismatches. It does reproduce with Right now I believe this code to be problematic under the ABI change. runtime/src/native/libs/System.Security.Cryptography.Native/openssl.c Lines 953 to 968 in 5661009
We always set the verification time when building the chain: Line 120 in 5661009
Next steps is I need to figure out how to get a debugger going here. |
@AndyAyersMS On ARM32, if time_t is a 64-bit value then it gets passed using the stack pointer offset convention rather than as a register, right? That suggests to me that we need to build/distribute an entirely different runtime for arm32+time_t64 and arm32+time_t32. Unless our call to X509_VERIFY_PARAM_set_time is literally the only place in all of .NET that a time_t is passed by value, in which case I think we can use the uint64_t timeOrBigger = 0xFFFFFFFFFFFFFFFF;
time((time_t*)&timeOrBigger);
if (timeOrBigger >> 32 == 0xFFFFFFFF)
{
// Either a: the top 32 bits didn't get written, or
// b: it's approximately the year 128,000,002,000 and all times start with that pattern.
//
// Assume a, and be happy
// normal mktime, normal call.
}
else
{
uint64_t (*mktime_ptr)(struct tm*) = (whatever the cast looks like)(void*)mktime;
uint64_t timeval = mktime_ptr(&tm);
void (*hacky)(X509_VERIFY_PARAM*, uint64_t) = (cast)(void*)X509_VERIFY_PARAM_set_time;
hacky(verifyParams, timeval);
} But I highly suspect that this isn't the only thing broken right now... just the one people see first. (And I'm concerned that it means any function called that wants a time_t is going to be reading arbitrary stack state from the caller, which seems... bad). |
runtime/src/coreclr/pal/src/file/filetime.cpp Lines 229 to 265 in ce0ae49
gmtime /gmtime_r , which means (assuming this file is used on arm32) it's getting the 32 bits we assigned, and the 32 bits declared after (before?) it on the stack. Which might be our conveniently zero value for UnixSystemTime , or might be part of FileTime , or might just be garbage.
|
Basically we are sending un-initialized data to
We return a time_t just like |
Seems likely, but I would have to read up on the arm32 ABI. Perhaps @jakobbotsch knows as he's been working on this area recently... |
ARM32 never passes parameters by reference, they are always passed by value. What is the definition of |
I don't think the problem is the (ARM32) ABI per-se but the fact whether Effectively, if the system libraries start to use a different |
When we built the arm32 portable build: typedef int32_t time_t; The particular flavors of Linux: typedef int64_t time_t; When thinking this might just be limited to us calling into OpenSSL, I was trying to figure out whether we could just force the data to work out; so I was trying to figure out the calling convention for a 64-bit integer being passed around; and I waffled a bit between whether it was "adjacent registers" or the magical thing with relative stack (which might be the same as pass by reference); particularly when the If I understand the Godbolt output correctly, it does look like we could get away with always treating time_t as int64_t on arm32 for calling It looks like an But, since this isn't the only place that we use |
Another answer, of course, could be that we don't support the portable build on Linux arm32-with-64-bit-time... but that's a business decision vs a technical one, so less interesting to discuss (and not our decision to make). |
Not saying you should do that... but setting (I'd much rather see OpenSSL fixed to follow the same convention as glibc, ie. export two versions of |
@richlander We might be looking at a new RID right now, and/or a hole in our OS support story. https://github.com/dotnet/core/blob/main/release-notes/8.0/supported-os.md says that .NET 8 is supported on Ubunutu 20.04+ on ARM32; but the 32-vs-64-bit time thing breaks it (X509Chain is the most visible break since it happens during Because this has created another place where binaries are incompatible, I think that we a) need a new portable build, and b) need to define a RID for it (e.g. With less rambling:
I picked you because you've done the most recent work on RIDs (to my knowledge). Not sure who gets to (has to) decide between A and B, though. (But maybe you know who 😄) |
I don't think that's the case. As I previously mentioned, glibc normally exports different versions of the functions and only headers switch the (Essentially, it's a similar model to A/W/TCHAR in Windows API where the consumer decides what they want; the problem only affects shared libraries other than glibc that don't use the glibc model, and that use |
I was writing up an explanation of why I still think fstat is broken, but I think it ends up working out because we don't call fstat, we call fstat64 (assuming it is defined already for our Linux/arm32 build). The problem with fstat is that the caller allocates the fstat64 doesn't have that problem since it pre-promoted to time64_t. As for other (actual) time_t-based calls... unless I'm happy to be shown what I'm missing, though. Like, maybe Ubuntu built glibc with 32-bit time but then set the headers to 64-bit time afterwards... (though I think that'd still make |
Glibc builds support both 32-bit and 64-bit |
There's not a single Then the header magic is driven by |
It is worth noting that Debian is also making the same transition for the next release (anticipated sometime next year) so going with Plan B would mean that our forward-looking arm32 story is musl-only (already transitioned). |
Based on the assumption that distros will be changing to 64-bit times on arm32 to address the year 2038 problem, a future version of .NET could choose for portable arm32 to swap to support only 64-bit time_t OpenSSL.
Definitely a hack. I don't know how .NET could determine what Besides the already mentioned X509_VERIFY_PARAM_set_time there is: X509_cmp_time. I didn't find others. |
Stating the obvious ... the RID graph was never intended to describe this level of differentiation / ABI difference. We'll be very hesitant to go down that path and to produce builds of any of our packages for both Arm32 RIDs. |
It should be possible to fix both of these calls by backporting #100461 that @filipnavara suggested above. It is a hack that depends on Arm ABI details, but anything else would be a hack too - just more complicated one. |
You're referring to: #101444 (comment). This is for the call to
|
@richlander If we're declaring that we don't support t64 on arm32 with .NET 8, should we just close this issue now? |
The OP was about .NET 8 and 9. It was effectively saying "we have nothing to ship for Ubuntu 24.04 Arm32". That's still relevant. We're also close to being able to resolve this. |
Failed in: runtime-coreclr libraries-jitstress 20240514.1 Failed tests:
Error message:
Stack trace:
|
Failed in: runtime-coreclr libraries-jitstress 20240515.1 Failed tests:
Error message:
Stack trace:
|
This is now happening on Debian 12 images as well and causing some test failures. |
Reading further on this, looks like this was caused by #102059 Based on dotnet/core#9285, I assume we want to remove Debian 12 ARM32 from our CI and have only Debian 13 for main? cc: @sbomer, @richlander |
I'm reverting that change while I investigate the failures. We are running tests with a product that has |
We believe that there is a path to supporting Debian 12 and 13 (but not 11). Theory tells us that this should work. Practice is turning out to be challenging. There is something we don't yet understanding. We'll be sure to document our findings since this little "3 hour tour" will certainly be useful for someone else in future. |
…102324) This reverts commit 0129837. Reverts #102059 We need to address the failures mentioned in #101444 (comment) before upgrading to a Y2038-compatible glibc.
Failed in: runtime-coreclr libraries-pgo 20240516.1 Failed tests:
Error message:
Stack trace:
|
This updates our linux arm32 build to build against a more recent glibc that supports _TIME_BITS (which we set to 64). Since openssl may be using either 32-bit or 64-bit time_t, this includes detection logic to determine which case we are in, and avoid passing time values that don't fit in 32 bits to openssl. The arm build image is updated to the latest version of the images added in dotnet/dotnet-buildtools-prereqs-docker#1037. The helix test images are updated to debian images added in dotnet/dotnet-buildtools-prereqs-docker#1041. Additional context: Additional context: Reintroduces the fix for Y2038 support on arm32 linux (#102059), which was reverted due to problems running against openssl built with _TIME_BITS=32. Fixes #101444 (both the originally reported issue, and the test failures mentioned in #101444 (comment)). Supports: #91826
This updates our linux arm32 build to build against a more recent glibc that supports _TIME_BITS (which we set to 64). Since openssl may be using either 32-bit or 64-bit time_t, this includes detection logic to determine which case we are in, and avoid passing time values that don't fit in 32 bits to openssl. The arm build image is updated to the latest version of the images added in dotnet/dotnet-buildtools-prereqs-docker#1037. The helix test images are updated to debian images added in dotnet/dotnet-buildtools-prereqs-docker#1041. Additional context: Additional context: Reintroduces the fix for Y2038 support on arm32 linux (dotnet#102059), which was reverted due to problems running against openssl built with _TIME_BITS=32. Fixes dotnet#101444 (both the originally reported issue, and the test failures mentioned in dotnet#101444 (comment)). Supports: dotnet#91826
)" (dotnet#102324) This reverts commit 0129837. Reverts dotnet#102059 We need to address the failures mentioned in dotnet#101444 (comment) before upgrading to a Y2038-compatible glibc.
This updates our linux arm32 build to build against a more recent glibc that supports _TIME_BITS (which we set to 64). Since openssl may be using either 32-bit or 64-bit time_t, this includes detection logic to determine which case we are in, and avoid passing time values that don't fit in 32 bits to openssl. The arm build image is updated to the latest version of the images added in dotnet/dotnet-buildtools-prereqs-docker#1037. The helix test images are updated to debian images added in dotnet/dotnet-buildtools-prereqs-docker#1041. Additional context: Additional context: Reintroduces the fix for Y2038 support on arm32 linux (dotnet#102059), which was reverted due to problems running against openssl built with _TIME_BITS=32. Fixes dotnet#101444 (both the originally reported issue, and the test failures mentioned in dotnet#101444 (comment)). Supports: dotnet#91826
Update: A final resolution to this issue has been posted at dotnet/core#9285.
Description
HTTPS requests from .NET are failing on Ubuntu 24.04 Noble on ARM32 due to bundled certs "NotTimeValid" errors.
I believe this may be because 24.04 has migrated to 64-bit time. The OpenSSL package has changed from
libssl3
tolibssl3t64
.Reproduction Steps
dotnet new console
dotnet add package System.Text.Json
I created this repro Dockerfile. It may or may not work since I had issues with it on my AMD64 dev machine. No easy way for me to test this currently outside of our official .NET Container image builds. With some trial and error you can probably get the issue to repro on a real arm32 machine using Docker. https://gist.github.com/lbussell/52e0ac904108d238d0e511f8b6ec89e1
Expected behavior
The .NET CLI should successfully access NuGet.org.
Actual behavior
From our PR validation in .NET Docker: https://dev.azure.com/dnceng-public/public/_build/results?buildId=652911&view=logs&j=7bc65791-3246-5ca2-874f-59d2e579cf6b&t=08651ed6-ba3f-5f8a-52df-50083ac157c2&l=942
Regression?
No response
Known Workarounds
No response
Configuration
Other information
Discovered in dotnet/dotnet-docker#5241
Known build error template
Build Information
Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=674655&view=results
Build error leg or test failing:
Pull request:
Error Message
Fill the error message using step by step known issues guidance.
Known issue validation
Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=674655
Error message validated:
[arm32v7 The remote certificate is invalid because of errors in the certificate chain: NotTimeValid
]Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 5/16/2024 3:23:33 PM UTC
Report
Summary
The text was updated successfully, but these errors were encountered: