Skip to content

Add bootstrapping build mechanism and enable for SourceBuild #114285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

jkoritzinsky
Copy link
Member

This PR removes the "StageOne/StageTwo" build concept and replaces it with a bootstrapping-build model with the following design:

  • The bootstrap subset builds the following components:
    • Microsoft.NETCore.App ref assemblies
    • Microsoft.NETCore.App runtime assemblies
    • (as part of the above) an updated RID graph with any additional RIDs specified by the build command
    • apphost
    • (if NativeAOT is supported): NativeAOT runtime libraries, managed and native
    • (if NativeAOT is not supported): Singlefilehost

It then lays out the produced files under the artifacts/bootstrap directory. This way we can use the bootstrapped artifacts but still do a "full" product build when we build the product if the artifacts/bin and artifacts/obj directories are deleted.

Two new options are introduced for build.sh:

  • --use-bootstrap: Use the artifacts in artifacts/bootstrap as the "live" artifacts for any bootstrapped components
  • --bootstrap: Build the bootstrap subset, delete artifacts/bin and artifacts/obj, build with --use-bootstrap.

The following projects are "bootstrapped components":

  • crossgen2_publish
  • ILCompiler_publish

I've validated the following scenarios:

  • Building a non-portable build NativeAOTs crossgen2_publish and ILCompiler_publish with the live assets for the OutputRID.
  • Building for FreeBSD produces single-file-host-based crossgen2_publish and ILCompiler_publish with FreeBSD-targeting ELF executables.

This experience is enabled by default for SourceBuild legs that don't force the Mono runtime (as those legs introduce new RIDs).

Windows build script support and support in the VMR to toggle this (do we even want this togglable?) is still TODO.

Built on top of #113765

Copy link
Contributor

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

@tmds
Copy link
Member

tmds commented Apr 5, 2025

This PR removes the "StageOne/StageTwo" build

Does source build build these stages separately? I thought they were a CI concept.

a bootstrapping-build

@jkoritzinsky @am11 @ViktorHofer

I can understand HOW this works. I would have also liked to understand WHY it does it this way.

Currently, I only understand the need to make the SDK aware OutputRID is a valid targetable rid (that supports R2R, ILC, ...) by updating the rid graph.

Built on top of #113765

I have a 1 week break coming up and I have to finish some things before that in the next week. I imagined having enough time to run #113765 through some scenarios. I assume this PR is not as ready, so I'm not going to spend time validating it yet. I'm not going to validate #113765 separately either.

I don't think this should block you. This validation was mostly for (my) rest assurance.

@am11
Copy link
Member

am11 commented Apr 5, 2025

WHY

To bootstrap components which require LKG packages during publishing. Those packages include; apphost, singlefilehost, crossgen2 and ilc.

Does source build build these stages separately?

Currently SB/VMR support is broken on community platforms which support full coreclr and require publishing components filipnavara/dotnet-riscv#4

I thought they were a CI concept.

This is the only way to build runtime on community platforms. In .NET 9 and earlier, we were using live-built publishing components during the runtime build. 3f28b1a changed it to use LKG packages. We don't publish apphost, crossgen2 and ilc packages for community platforms, so LKG option does not work.

It would be nice if we also start publishing those packages to nuget feed so non-bootstrap option also works for community platforms; either on the same nuget feed where official platform packages are published or a separate one with, say, -community suffix in its name.

@tmds
Copy link
Member

tmds commented Apr 5, 2025

which require LKG packages during publishing

What is LKG?

Currently SB/VMR support is broken on community platforms which support full coreclr and require publishing components filipnavara/dotnet-riscv#4

This is a restore failure. It should not be trying to restore these packages, should it? These are meant to be produced by the build.

I can see how a bootstrap build can be used to address a restore issue by first doing a build that builds packages are causing the restore problem.

Afaik the runtime build already does some building of "live" artifacts which are then consumed by the build. I don't know why the same mechanism can not be used?

Does the bootstrap build depend on these packages existing for NETCoreSdkRuntimeIdentifier?

This is the only way to build runtime on community platforms. In .NET 9 and earlier, we were using live-built publishing components during the runtime build. 3f28b1a changed it to use LKG packages.

It seems these changes are drive by the change to use LKG packages for .NET 10.
I had missed that completely.

@am11
Copy link
Member

am11 commented Apr 5, 2025

What is LKG?

Last known good configuration. The fixed/immutable versions of runtime packs tied to the SDK version from global.json (they are restored from nuget feed).

This is a restore failure. It should not be trying to restore these packages, should it?

Some components perform dotnet publish during the build, which looks for runtime packs of target. Since community platforms runtime packs are not published to nuget feed, we can't completely build everything.

These are meant to be produced by the build.

Yes, it is the ordering issue and it can be fixed solely in Subsets.porps. We tried three approaches in #105004 and ended up with TwoStage build. The first approach was fixing the order, it worked but it turned out to be twisted and hard to maintain.

Afaik the runtime build already does some building of "live" artifacts which are then consumed by the build. I don't know why the same mechanism can not be used?

It still exists, but since the mainstream platforms are now moved to LKG plan, it is mainly used for community platforms.

Does the bootstrap build depend on these packages existing for NETCoreSdkRuntimeIdentifier?

CrossBuild uses the host runner (dotnet.exe, crossgen2.exe, ilc.exe with Rid=NETCoreSdkRuntimeIdentifier) with runtime pack for the target.

In TwoStage mode, the first stage builds everything which doesn't require publishing, that includes building (constituents of) runtime packs (without the actual nupkg). The second stage builds the publishable components using runtime packs (constituents) built by stage 1; skips restore of package and instead use deterministic paths under artifacts/bin dir to find the runtime pack.

This PR is replacing the TwoStage mode with bootstrap which a bit different. It will build everything-except-publishable-stuff with --use-bootstrap (just like stage 1 of TwoStage mode), but there is no counterpart for stage 2, i.e. 'only' build the publishable stuff skipped by stage 1. With stage 1 / --use-bootstrap, it builds enough parts of the repo that can run most of the src/tests (except host, R2R and AOT). To get the full build with published components, it is providing --bootsrap which first builds with --use-bootstrap, moves artifacts/bin to artifacts/bootstrap, cleans up artifacts/ (modulo artifacts/bootstrap) and rebuilds everything without skipping anything.

It has increased the build time by 56% compared to TwoStage:
https://github.com/am11/CrossRepoCITesting/actions/runs/14276225403
image

but I think it is more cleaner and less error prone: TwoStage required extra care that we are not rebuilding any msbuild target (which needs eviction of cache under obj/ otherwise the target gets skipped resulting in strange unpredictable outcome).

@tmds
Copy link
Member

tmds commented Apr 6, 2025

Those packages include; apphost, singlefilehost, crossgen2 and ilc.

It sounds like these packages (for OutputRID) have become prebuilds of runtime, and the bootstrap build is responsible for building them when they don't exist?

Since community platforms runtime packs are not published to nuget feed, we can't completely build everything.

This includes all (non-Microsoft) non-portable builds?

It has increased the build time by 56% compared to TwoStage:

I find this a large increase.

since the mainstream platforms are now moved to LKG plan

What is the benefit of moving to LKG instead of doing it the ".NET 9 way"?

@jkoritzinsky
Copy link
Member Author

jkoritzinsky commented Apr 6, 2025

We moved to the LKG way as it provides a significantly better developer experience due to slowdowns with debug builds. Our plan when moving to LKG was to always move to an optional bootstrap phase to remove LKG usage without significantly tanking the performance of the dev innerloop by more than this bootstrap build currently does.

@jkotas
Copy link
Member

jkotas commented Apr 6, 2025

.NET 9 way

.NET 9 way was a mix of live and LKG builds. We standardized on using LKG builds by default since it makes dev interloop more efficient.

@am11
Copy link
Member

am11 commented Apr 6, 2025

Does this include all non-Microsoft, non-portable builds?

I'm not sure what the top-level VMR flow will be, but there'll likely be a --bootstrap-runtime option. It's only needed if the required runtime pack version isn’t already available. e.g., the first fedora.42-riscv64 build would need it, but later ones wouldn't if a matching version exists.

I find this a large increase.

Yes, but this is more predictable than TwoStage, we don't reuse obj/ dir to avoid all kinds of caching issues.

@tmds
Copy link
Member

tmds commented Apr 7, 2025

better developer experience

Thank you for the discussion, and helping me understand that this is the driver for these changes!

I'm not sure what the top-level VMR flow will be, but there'll likely be a --bootstrap-runtime option. It's only needed if the required runtime pack version isn’t already available. e.g., the first fedora.42-riscv64 build would need it, but later ones wouldn't if a matching version exists.

We should avoid requiring an option to get a working build.

It sounds like the relative build time increase on the complete vmr build is going to be small, so using it in some cases where it may not be needed shouldn't be too much of a problem. It may even provide extra consistency and make it simpler to understand when bootstrap is used and when not.

Let's try to make the vmr build "smart" enough to know when it should use the bootstrap without user interaction.

Comment on lines 48 to 49
<!-- Source-build will use non-portable RIDs. TO build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly) == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<!-- Source-build will use non-portable RIDs. TO build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly) == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>
<!-- Source-build will use non-portable RIDs. To build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly)' == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a previously built SDK with ilc/cg2 runtime packs available, should it skip the --bootstrap? I think https://github.com/dotnet/dotnet/blob/main/prep-source-build.sh determines that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bootstrap is skippable in that case. We are considering using --bootstrap for VMR as well as SB (still needs to be decided), so this condition is subject to change.

Copy link
Member

@am11 am11 Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are considering using --bootstrap for VMR as well as SB (still needs to be decided), so this condition is subject to change.

We can infer it in SB so distro maintainer don't need to determine when to add the bootstrap argument and when to skip it. But if it's not feasible to infer, then explicit argument is fine.

prep-source-build.sh has --no-bootstrap arg which has different meaning, so it maybe better to qualify it --bootstrap-runtime (if we end up with explicit option).

@arrowd
Copy link
Contributor

arrowd commented Apr 10, 2025

Just a note from a downstream packager - the ability to produce a bootstrap and then use it to build the full version of the software package is just great. Right now in FreeBSD Ports I'm using a (most likely) hacky way to produce the bootstrap by passing /p:PortableBuild=true to build.sh that I dug in project files.

So, having a high-level switches like --bootstrap and --use-bootstrap makes life much easier for us packagers.

It's used to add OutputRID in the graph if the parent can't be detected. -->
<InnerBuildArgs>$(InnerBuildArgs) /p:AdditionalRuntimeIdentifierParent=$(BaseOS)</InnerBuildArgs>
<!-- Source-build will use non-portable RIDs. To build for these non-portable RID scenarios, we must do a boostrapped build. -->
<InnerBuildArgs Condition="'$(DotNetBuildSourceOnly)' == 'true' and '$(DotNetBuildUseMonoRuntime)' != 'true'">$(InnerBuildArgs) --bootstrap</InnerBuildArgs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'$(DotNetBuildSourceOnly)' == 'true'

This is good for me. I wonder if TargetRid != NETCoreSdkRuntimeIdentifier might be an alternative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use that by itself because that would cause this to trigger on some (but not all) Unified Build legs.

For Unified Build, we'll want this on either for 100% of the verticals or never.

@tmds
Copy link
Member

tmds commented Apr 10, 2025

@arrowd can you create an issue in https://github.com/dotnet/source-build and describe your workflow?

I hope for the specific change made in this PR, we can avoid extra flags for distro maintainers and let the build take care of it on its own.


<PropertyGroup Condition="'$(UseLocalNativeAotPack)' == 'true'">
<IlcToolsPath>$(CoreCLRCrossILCompilerDir)</IlcToolsPath>
<SysRoot Condition="('$(CrossBuild)' == 'true' or '$(BuildArchitecture)' != '$(TargetArchitecture)') and '$(HostOS)' != 'windows'">$(ROOTFS_DIR)</SysRoot>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toolAot.targets line 38 sets SysRoot (basically overwrites it), do we need it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its needed here to support NativeAOTing the libraries tests (where this block came from).

There's likely more places we can clean up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If libraries tests include toolAot.targets, then technically it shouldn't need it. I agree that there are some mysteries around toolAot. e.g. src/tests requires these again on community platforms for some reason: https://github.com/am11/CrossRepoCITesting/blob/f327dfbd066ca551df71cda479935b9355c24342/.github/workflows/linux-riscv64-aot.yml#L48

SysRoot=$ROOTFS_DIR LinkerFlavor=lld /runtime/src/tests/build.sh \
  -riscv64 -cross -nativeaot -tree:nativeaot -Checked -p:TasksConfiguration=Debug \
  -p:LibrariesConfiguration=Debug -keepnativesymbols -p:UseLocalAppHostPack=true \
  -p:UseNativeAotForComponents=true -p:TestNativeAot=true

Would be nice if we don't have to specify UseNativeAotForComponents as property and LinkerFlavor & SysRoot as environment variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave messing with the src/tests tree for later as that's always a mess and this PR already has a lot of nuance.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you!

support in the VMR to toggle this (do we even want this togglable?)

I think we want VMR to have it turned on by default.

Having a VMR switch to turn it off is nice to have. I think we can live without it until it proves to be necessary.

@jkoritzinsky
Copy link
Member Author

I'll add support for Windows and turn it on by default in VMR builds as well in a follow-up PR (so as to avoid messing with Preview 4).

Copy link
Member

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot! 👍

@tmds
Copy link
Member

tmds commented Apr 22, 2025

I tried to build this locally and it seemed to fail for me. Unfortunately I got sidetracked and didn't have time to continue. I'm going to try again tomorrow and share my findings.

@tmds
Copy link
Member

tmds commented Apr 23, 2025

When I build using the following:

./build.sh /p:NoPgoOptimize=true --portablebuild false /p:DotNetBuildFromSource=true /p:DotNetBuildSourceOnly=true /p:DotNetBuildTests=true --cmakeargs -DCLR_CMAKE_USE_SYSTEM_BROTLI=true --cmakeargs -DCLR_CMAKE_USE_SYSTEM_ZLIB=true --runtimeconfiguration Release --librariesConfiguration Debug --subset clr+libs+host+packs+libs.tests --test --bootstrap

The bootstrap build works, but the successive build fails to recognize the non-portable rid.

Build log:

...
Build succeeded.
    0 Warning(s)
    0 Error(s)

Time Elapsed 00:05:12.48
Cleaning up artifacts from bootstrap build...
  Determining projects to restore...
  All projects are up-to-date for restore.
  Determining projects to restore...
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(108,5): error NETSDK1083: The specified RuntimeIdentifier 'rhel.8.10-x64' is not recognized. See https://aka.ms/netsdk1083 for more information. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(108,5): error NETSDK1203: Ahead-of-time compilation is not supported for the target runtime identifier 'rhel.8.10-x64'. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(178,5): error NETSDK1083: The specified RuntimeIdentifier 'rhel.8.10-x64' is not recognized. See https://aka.ms/netsdk1083 for more information. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(178,5): error NETSDK1083: The specified RuntimeIdentifier 'rhel.8.10-x64' is not recognized. See https://aka.ms/netsdk1083 for more information. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(178,5): error NETSDK1083: The specified RuntimeIdentifier 'rhel.8.10-x64' is not recognized. See https://aka.ms/netsdk1083 for more information. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(178,5): error NETSDK1083: The specified RuntimeIdentifier 'rhel.8.10-x64' is not recognized. See https://aka.ms/netsdk1083 for more information. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(108,5): error NETSDK1083: The specified RuntimeIdentifier 'rhel.8.10-x64' is not recognized. See https://aka.ms/netsdk1083 for more information. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(108,5): error NETSDK1203: Ahead-of-time compilation is not supported for the target runtime identifier 'rhel.8.10-x64'. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(108,5): error NETSDK1083: The specified RuntimeIdentifier 'rhel.8.10-x64' is not recognized. See https://aka.ms/netsdk1083 for more information. [/home/tester/workdir/runtime/Build.proj]
/home/tester/workdir/runtime/.dotnet/sdk/10.0.100-preview.3.25201.16/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(108,5): error NETSDK1203: Ahead-of-time compilation is not supported for the target runtime identifier 'rhel.8.10-x64'. [/home/tester/workdir/runtime/Build.proj]
  Nothing to do. None of the projects specified contain packages to restore.

Build FAILED.

@jkoritzinsky
Copy link
Member Author

I'll take a look. The restore binlog should have which project is failing to restore and the RID graph that's being used.

@ViktorHofer
Copy link
Member

I only saw the last two comments but in case it helps, I filed #114069 recently. Maybe related.

@jkoritzinsky
Copy link
Member Author

@tmds fixed failures from your usage scenario

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Infrastructure source-build Issues relating to dotnet/source-build
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

6 participants