-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix compilation without HAVE_GSS_KRB5_CRED_NO_CI_FLAGS_X #70723
Conversation
Tagging subscribers to this area: @dotnet/ncl, @vcsjones Issue DetailsMostly theoretical fix since the only supported system that doesn't have Ref: #70447 (comment)
|
can you try it @EgorBo? Waiting for CI may not be worth as it passed on the original change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I don't think it's a stale CMake cache - it's a WSL I've just created (Ubuntu 20.04) and followed all the steps to configure it for dotnet/runtime Might be important that I use cross-compilation to build dotnet/runtime for linux-arm64 there (rootfs stuff) |
I've just verified the fix and it works for me, thanks! |
Test failures seems relevant @filipnavara. There is some weird SIGSEGV.... |
I have seen that same crash locally on the same test suites. It happened even with clean checkout but due to some of the tests not being run on debug builds (eg. the whole System.Net.Http.FunctionalTests) it could have easily been lurking here for a while. When run locally the crash happened only on certain runs. I'll look at the core dumps. |
Hitting this issue in the CI - https://dev.azure.com/dnceng/9ee6d478-d288-47f7-aacc-f6e6d082ae6d/_apis/build/builds/1823854/logs/506.
|
what OS @AaronRobinsonMSFT? The original PR had all legs clean. |
Agreed. Unsure why it is failing. This is on CoreCLR Product Build Linux x86 checked. See #70685 |
should be |
This is going to be fun, the crash is inside The managed stack leading to it:
|
I'm somewhat convinced that the crashing code is null dereference here: I have no hard evidence though. Unless I find something quick I am fine with dropping the whole HAVE_GSS_KRB5_CRED_NO_CI_FLAGS_X support instead of trying to fix it. |
We can roll-back the original change to get to stable point so we have more time to investigate. It seems to be specific to particular versions and that should help. |
I think that's reasonable at this point. |
Hmm, both of the failures were on RedHat 7 machines. Those are the only ones that should be missing |
I can look as well @filipnavara. maybe NULL after the load/lookup fails? |
My expectation is that the build is done on a machine that has For this build case we should get There may be some flaw in that logic that I am not seeing, or RedHat is shipping something that I didn't expect. The dumps showed the crash actually happened inside the |
Is there any way to figure out what exact RHEL version is running in the RedHat.7.Amd64.Open pool? (and ideally the krb5-libs package version) /cc @MattGal |
Also, in PR #70447 one of the intermediate runs already failed on the Red Hat pool with SIGSEGV. https://dev.azure.com/dnceng/public/_build/results?buildId=1814435&view=ms.vss-test-web.build-test-results-tab&runId=48210036&resultId=199331&paneView=dotnet-dnceng.dnceng-anon-build-release-tasks.helix-anon-test-information-tab |
Same in #70740 (comment) |
If you have access to the dnceng/internal project you can check out the image definition yamls (I don't think you do?) but currently it's running an image derived from this Azure gallery image:
I can try to get the /etc/os-release value off this image if it's important too. |
I will try to grab repro machine so we can do more testing... |
Thanks, that's good enough for me! |
I created a machine from the Azure image and downloaded the crashing Helix payload on it. I run it couple of times and it didn't crash. I added the UPD: Eventually it crashed on 7th run or so with the NTLM SSP installed, so I guess it may repro on this image.
Correct, I don't have access to that repository. |
I managed to get somewhat more useful stack trace:
|
Reduced repro of the SIGSEGV crash on RHEL7 w/ NTLM SSP: #include <gssapi/gssapi_ext.h>
#include <gssapi/gssapi_krb5.h>
#include <assert.h>
#include <stdio.h>
static char gss_spnego_oid_value[] = "\x2b\x06\x01\x05\x05\x02"; // Binary representation of SPNEGO Oid (RFC 4178)
static gss_OID_desc gss_mech_spnego_OID_desc = {.length = 6, .elements = gss_spnego_oid_value};
int main()
{
uint32_t majorStatus;
uint32_t minorStatus;
gss_cred_id_t credHandle = NULL;
gss_name_t name = NULL;
gss_buffer_desc inputNameBuffer = {.length = 4, .value = "user"};
majorStatus = gss_import_name(&minorStatus, &inputNameBuffer, GSS_C_NT_USER_NAME, &name);
assert(majorStatus == GSS_S_COMPLETE);
gss_OID_desc gss_mech_OID_desc = gss_mech_spnego_OID_desc;
gss_OID_set_desc gss_mech_OID_set_desc = {.count = 1, .elements = &gss_mech_OID_desc};
gss_OID_set desiredMech = &gss_mech_OID_set_desc;
gss_buffer_desc passwordBuffer = {.length = 8, .value = "password"};
majorStatus = gss_acquire_cred_with_password(
&minorStatus, name, &passwordBuffer, 0, desiredMech, GSS_C_INITIATE, &credHandle, NULL, NULL);
assert(majorStatus == GSS_S_COMPLETE);
gss_buffer_desc emptyBuffer = GSS_C_EMPTY_BUFFER;
majorStatus = gss_set_cred_option(&minorStatus, &credHandle, GSS_KRB5_CRED_NO_CI_FLAGS_X, &emptyBuffer);
assert(majorStatus == GSS_S_UNAVAILABLE || majorStatus == GSS_S_COMPLETE);
return 0;
} |
Funnily enough, the bug is still present even in latest krb5 packages. It just happens that the structures align in such a way that it doesn't crash. |
Mostly theoretical fix since the only supported system that doesn't have
GSS_KRB5_CRED_NO_CI_FLAGS_X
is RHEL 7. It could still happen with stale CMake cache though.Ref: #70447 (comment)