Skip to content

Evaluate and determine feasibility of Linux Distro partner VMR branches and Microsoft VMR branches being identical #3738

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mmitche opened this issue Nov 15, 2023 · 17 comments

Comments

@mmitche
Copy link
Member

mmitche commented Nov 15, 2023

One of the foundational rules of Unified Build is that our distro partners should be building the same commits as Microsoft builds. See https://github.com/dotnet/arcade/blob/main/Documentation/UnifiedBuild/Foundational-Concepts.md#public-open-source-net-releases-must-be-buildable-by-net-distro-maintainers-from-a-single-commit-in-the-upstream-repository. This does not mean that Microsoft builds the exact same product as distro partners. Build behavior may be different (restoring different packages, inclusion of non-OSS components in the Windows Build, etc.). It simply means that the ideal is that the commits we build will be the same as what our distro partners can build. There should be no permanent delta between the public upstream branches and what Microsoft builds.

It is time to test this in practice. Our distro partners have more restrictive rules about what may appear in the VMR with respect to binaries and source code licenses. As a result, we currently cloak files that do not abide by their rules but are not required to build the source-built Linux .NET products. Some of these cloaked files will be required to release the Microsoft-built product. My general sense is that most cloaking is generally a "bad" code smell w.r.t. to licensing and checked in binaries, and most should be able to be removed.

We need to evaluate all the currently cloaked items and determine the following:

  • Is the cloaked item required for Microsoft's build?
  • If it is not, then the cloaking entry can stay, but should the file be in the source repo at all? What purpose does it serve?
  • If the item is required for Microsoft's build, then what will it take to get rid of the cloaking? Example resolutions might include:
    • Re-evaluation of a license file (is the license problematic? Is it an installation artifact, or talking about actual source code)
    • Moving non-OSS compatible licensed code to another repo (should the code have been in the repo in the first place?)
    • Removal of binaries via restoring them from a package. Test assets commonly in this category.

T-shirt size: L. There is quite a bit of unknown here. This may be largely mechanical, an easy change. or it may uncover some serious issues.

Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@mthalman
Copy link
Member

/cc @dotnet/distro-maintainers

@mmitche
Copy link
Member Author

mmitche commented Jan 3, 2024

T-shirt size: L or XL

@omajid
Copy link
Member

omajid commented Jan 18, 2024

Removal of binaries via restoring them from a package. Test assets commonly in this category.

With my distro-owner hat on:

Generally, we frown on binaries because we don't know what license they are under and how to rebuild them. It's a very slippery slope to including effectively non-open source stuff in a repo.

However, if we know the binaries are under an open source license, and there are clearly documented steps on how to generate/modify the binary (either by hand or by a specialized tool), then including the binary itself isn't necessarily disallowed.

@mmitche
Copy link
Member Author

mmitche commented Jan 18, 2024

Thanks @omajid.

Wanted to summarize the discussion we had on this today and the proposed solution:

  • As Microsoft seeks to build its product from the VMR, We do not believe it's really feasible in the near/mid term to remove all the binaries, non-compliant licenses for tests, etc. from existing repos that contribute to the VMR.
  • Many of the components which are problematic aren't even interesting from the standpoint of a Linux distro maintainer (winforms.,wpf, windowsdesktop).
  • As an alternative approach:
    • We will change the VMR sync such that it will sync into a branch without cloaking. Let's call this the VMR-All branch
    • We will provide a distro-maintainer-centric branch which will comply with your requirements like the existing branch does.
    • This branch will be a subset of the VMR-All branch, related in git history. Let's call this a VMR-SourceOnly branch. It will have the cloaked set of files removed.
    • These cloaked files may also include repos that are not interesting for distro maintainers, e.g. winforms
    • PRs that update a VMR-All branch will also test updates to the corresponding VMR-SourceOnly branch.
    • Upon checkin, the VMR-SourceOnly branch will be immediately updated (no lag) via a merge process.
    • On release day, we will create at least two tags and two releases. One tag will point to the VMR-All content for the release, and the other to the source only compatible content.
    • We may helpfully restrict source only builds to source-only branches, and vice versa to improve UX.
    • The naming of these releases and tags is TBD, suggestions would be welcome.
    • At some point in the future we may seek to unify the sources, but for now, no.

/cc @dotnet/distro-maintainers.

Unless there are further objections to this plan, I'll open issues for the work.

/cc @jkotas @dotnet/source-build-internal

@ViktorHofer
Copy link
Member

ViktorHofer commented Jan 19, 2024

Thanks @omajid for the clarification. I assume that a pre-build process that removes the unwanted binaries and licenses from the source tree wouldn't be an acceptable solution here?

@mateusrodrigues
Copy link
Member

I believe that unifying sources is fine, as long as we also have a list of non-compliant files contained in the source tree. That way, we can script the removal of these files before packing the source into a source package for upload.

On our side (Canonical), the process of downloading and packaging the source from the VMR is already scripted, so adding this extra step in between would be a no-brainer.

@ViktorHofer is that what you meant?

@MichaelSimons
Copy link
Member

MichaelSimons commented Jan 19, 2024

I assume that a pre-build process that removes the unwanted binaries and licenses from the source tree wouldn't be an acceptable solution here?

I don't think this is the bar we want. We want to treat source-building as something that has first class support rather than something that requires a number of hoops to jump through.

We've heard from a set of distro maintainers that the want to take a sha or tarball and be able to feed it directly into their build systems without having to do any preprocessing. Yes some can but others can't.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2024

Yes some can but others can't

What are the distros that can't and that have a strict requirement on licenses of all source files in the repo?

@omajid
Copy link
Member

omajid commented Jan 19, 2024

We've heard from a set of distro maintainers that the want to take a sha or tarball and be able to feed it directly into their build systems without having to do any preprocessing.

Yes, this is the goal for me, if possible. That minimizes the chances of an attack on a developer (eg, me) or their machine compromising .NET itself.

I assume that a pre-build process that removes the unwanted binaries and licenses from the source tree wouldn't be an acceptable solution here?

Removing disallowed licenses duruing a pre-build step is not workable for us. We would be sharing/uploading the original source-code archive publicly, and that may already conflict with the license and its terms.

However, if all the licenses are permissible, we can delete (open source licensed) binaries in a pre-build step without any issues.

@omajid
Copy link
Member

omajid commented Jan 19, 2024

What are the distros that can't and that have a strict requirement on licenses of all source files in the repo?

Debian, Fedora and RHEL, to name a few.

Edit: Unless, of course, the maintainers for these distros want to a sanitize things and generate the sanitized source-code archive themselves. That's something I have done in the past but really trying to move away from.

@jkotas
Copy link
Member

jkotas commented Jan 19, 2024

Right, I have seen number of these pre-processing scripts. For example, https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/-/blob/snapshot/debian/orig-tar.sh looks like a preprocessing script for LLVM project for Debian.

I understand that no preprocessing is easier for distro maintainers, but I do not think there are hard rules about it from what I have seen.

@mmitche
Copy link
Member Author

mmitche commented Jan 19, 2024

It sounds to me like maintaining a subset branch is a more complete solution, at a similar cost. We need to maintain an exclusion list in either solution, it's just a matter of where and when it is applied. Having the subset branch also allows for last minute fixes to be checked into that branch without changing the original. This is valuable if source-only build specific issues are discovered after we have locked down and completed the Microsoft build of .NET. Uncommon, and it shouldn't happen, but it does happen.

If we don't have that branch, we need to maintain a patching mechanism

@ViktorHofer
Copy link
Member

If we don't have that branch, we need to maintain a patching mechanism

That's only true when we treat branches as the HEAD of what to build instead of SHAs. It sounds like we want to pin/tag a SHA and then hand that off to distro partners. I wonder why we couldn't just add commits to the release branch? Shouldn't the Microsoft build also be driven based off a SHA and not a branch?

We need to maintain an exclusion list in either solution, it's just a matter of where and when it is applied

The downside that I see with that is that we effectively split our product in two which requires separate tags, GH releases and impacts the developer workflow (i.e. can I build the SB product from the non SB branch?). Those are just concerns that I want to raise before we start going down one path vs the other.

@mmitche
Copy link
Member Author

mmitche commented Jan 19, 2024

That's only true when we treat branches as the HEAD of what to build instead of SHAs. It sounds like we want to pin/tag a SHA and then hand that off to distro partners. I wonder why we couldn't just add commits to the release branch? Shouldn't the Microsoft build also be driven based off a SHA and not a branch?

That's fair. It wouldn't be prohibitive of additional changes to not have two branches.

The downside that I see with that is that we effectively split our product in two which requires separate tags, GH releases and impacts the developer workflow (i.e. can I build the SB product from the non SB branch?). Those are just concerns that I want to raise before we start going down one path vs the other.

Also fair. There are definitely dev UX issues to work out on both sides. Building SB from the main branch should work, though not vice-versa. If it doesn't work, then what we're also saying is that if you have a single branch, then you should be deleting files from the dev's machine prior to build when building SB. That has generally poor UX. In addition, the prep script has a greater burden on distro maintainers.

@mmitche
Copy link
Member Author

mmitche commented Jan 19, 2024

Epic: #3989

@mmitche
Copy link
Member Author

mmitche commented Jan 19, 2024

I'm closing this work now in favor of the new epic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

7 participants