Skip to content

Finally formalise our defacto line-ending policy #86318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 17, 2024
Merged

Conversation

ldrumm
Copy link
Contributor

@ldrumm ldrumm commented Mar 22, 2024

Historically, we've not automatically enforced how git tracks line endings, but there are many, many commits that "undo" unintended CRLFs getting into history.

git log --pretty=oneline --grep=CRLF shows nearly 100 commits involving reverts of CRLF making its way into the index and then history. As far as I can tell, there are none the other way round except for specific cases like .bat files or tests for parsers that need to accept such sequences.

Of note, one of the earliest of those listed in that output is:

commit 9795860
Author: NAKAMURA Takumi geek4civic@gmail.com
Date: Thu Feb 3 11:41:27 2011 +0000

  cmake/*: Add svn:eol-style=native and fix CRLF.

  llvm-svn: 124793

...which introduced such a defacto policy for subversion.

With old versions of git, it's been a bit of a crapshoot whether enforcing storing line endings in the history will upset checkouts on machines where such line endings are the norm. Indeed many users have enforced that git checks out the working copy according to a global or per-user config via core crlf, or core autocrlf.

However, for ~8 years now[1], however, git has supported the ability to "do as the Romans do" on checkout, but internally store subsets of text files with line-endings specified via a system of patterns in the gitattributes file. Since we now have this ability, and we've been specifying attributes for various binary files, I think it makes sense to rid us of all that work converting things "back", and just let git handle the local checkout. Thus the new toplevel policy here is

  • text=auto

In simple terms this means "unless otherwise specified, convert all files considered "text" files to LF in the project history, but checkout them out as expected on the local machine. What is "expected on the local machine" is dependent on configuration and default.

For those files in the repository that do need CRLF endings, I've adopted a policy of eol=crlf which means that git will store them in history with LF, but regardless of user config, they'll be checked out in tree with CRLF.

Finally, existing files have been "corrected" in history via git add --renormalize .

[1]: git 2.10 was released with fixed support for fine-grained line-ending tracking that respects user-config and repo policy. This can be considered the point at which git will respect both the user's local working tree preference and the history as specified by the maintainers. See
https://github.com/git/git/blob/master/Documentation/RelNotes/2.10.0.txt#L248 for the release note.

@llvmbot llvmbot added the HLSL HLSL Language Support label Oct 17, 2024
Line ending policies were changed in the parent, dccebdd. To make
it easier to resolve downstream merge conflicts after line-ending
policies are adjusted this is a separate whitespace-only commit. If you
have merge conflicts as a result, you can simply `git add --renormalize
-u && git merge --continue` or `git add --renormalize -u && git rebase
--continue` - depending on your workflow.
@ldrumm ldrumm merged commit 9d98acb into llvm:main Oct 17, 2024
3 of 5 checks passed
@ldrumm ldrumm deleted the text=auto branch October 18, 2024 10:03
@mstorsjo
Copy link
Member

mstorsjo commented Oct 18, 2024

This breaks a number of tests on Windows.

Previously, to have tests working on Windows, one would do git config --global core.autocrlf false or similar, before checking out llvm-project - a number of test input files need to be in LF form to work. This was brought up earlier already by @llvm-beanz in #86318 (comment).

Now after this change, due to the added .gitattributes which overrides the core.autocrlf setting, these files get checked out with CRLF newlines (as the native form for the platform).

Based on the comment in the .gitattributes file, it seems like this is both known and intentional behaviour:

# Checkout as native, commit as LF except in specific circumstances
* text=auto

While it was already stated that blanket checkouts with CRLF will fail.

Additionally, the old mechanism of getting working newlines in the files is suddenly broken.


To make things worse, you won't notice this thing if you're updating an existing workdir - files that aren't touched aren't rewritten. (To trigger re-checkout of files to get .gitattributes applied, one can do something like git rm -r subdir && git reset && git checkout subdir.) So I guess most buildbots will keep chugging along fine, until someone pushes changes that touch those files. If running with a fresh checkout of llvm-project, this has a much bigger impact.


For compiler-rt tests, a handful of the profile tests depend on the right line endings - I can push a local .gitattributes file to fix that (see mstorsjo@99bec81).

But the Clang, clang-tools-extra and LLVM testsuites have many tests that are broken - there are around 80 tests failing due to this; sorting that out is a much bigger task that I'm not volunteering to take on right now.

CC @AaronBallman as the majority of those failing tests are in Clang.

Can we revert this until we figure out these bits? Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)? And other than that, we do need to tag the files that rely on being in LF form so that things work on Windows with either setting.

@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 18, 2024

There are a couple of things to unpack here

a number of test input files need to be in LF form to work

Which ones? Either there's a bug in a parser somewhere, or I missed some test files. In either case I'd like to fix the issue. I watched the buildbots quite closely last night and only noticed failures in ARM frame lowering - which isn't this, I think.

Now after this change, due to the added .gitattributes which overrides the core.autocrlf setting, these files get checked out with CRLF newlines (as the native form for the platform).

It's my understanding that text=auto does not override core.autocrlf. As far as I can tell from the documentation it honours the user's configuration for core.eol in combination with core.autocrlf - from git config --help:

core.eol
Sets the line ending type to use in the working directory for files that are marked as text (either by having the text attribute set, or by having text=auto and
Git auto-detecting the contents as text). Alternatives are lf, crlf and native, which uses the platform’s native line ending. The default value is native. See
gitattributes(5) for more information on end-of-line conversion. Note that this value is ignored if core.autocrlf is set to true or input.

and

core.autocrlf
Setting this variable to "true" is the same as setting the text attribute to "auto" on all files and core.eol to "crlf". Set to true if you want to have CRLF line
endings in your working directory and the repository has LF line endings. This variable can be set to input, in which case no output conversion is performed.

Can we revert this until we figure out these bits

Sure, but like I said, I'm happy to fix these broken cases. The old configuration was broken, but not in a controllable way, so I think it's reasonable to fix up the broken tests and move forward. Perhaps we also need clearer documentation?

@AaronBallman
Copy link
Collaborator

This seems to have broken precommit CI on Windows: https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab all of the failures look related to line endings, and I noticed that I got a ton of command line messages of the form:

warning: in the working copy of 'clang/include/clang/Basic/Attr.td', LF will be replaced by CRLF the next time Git touches it

Can we revert this until we figure out these bits?

Yes, please.

Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)?

It certainly came as a surprise to me; my editors handle LF line endings just fine on Windows; it was very jarring to get hundreds of warnings from git that all seemed unactionable.

@mstorsjo
Copy link
Member

a number of test input files need to be in LF form to work

Which ones?

A whole bunch of them. @AaronBallman's link to https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab shows mostly what I saw. If including clang-tools-extra and llvm in the set of tests to run, I have failures in the following tests as well:

   Clang Tools :: clang-move/move-used-helper-decls.cpp
   LLVM :: MC/ELF/warn-newline-in-escaped-string.s
   LLVM :: TableGen/x86-fold-tables.td
   LLVM :: tools/llvm-rc/tag-html.test
   lit :: shtest-shell.py

Either there's a bug in a parser somewhere,

In the vast majority of cases, it's not a bug in a parser, but a test that relies on the exact contents of the source files. E.g. for tools/llvm-rc/tag-html.test I would guess that the issue is that the test takes a text file (which now has variable line endings) and includes it in a binary resource file, and checks the output to match bitwise the expected output.

In my nightly run of compiler-rt tests https://github.com/mstorsjo/llvm-mingw/actions/runs/11395494568/job/31717433112, I had the following failures:

Failed Tests (5):
  Profile-x86_64 :: instrprof-gcov-exceptions.test
  Profile-x86_64 :: instrprof-gcov-multiple-bbs-single-line.test
  Profile-x86_64 :: instrprof-gcov-one-line-function.test
  Profile-x86_64 :: instrprof-gcov-switch.test
  Profile-x86_64 :: instrprof-gcov-two-objects.test

I think the issue here may be something around testing the exact amount of whitespace somewhere or so.

Mostly "brittle" tests that don't expect the source files to vary.

or I missed some test files. In either case I'd like to fix the issue. I watched the buildbots quite closely last night and only noticed failures in ARM frame lowering - which isn't this, I think.

Did you do a test run on Windows? Even on Linux, I guess it should be possible to check out the code, forcing Git to prefer checking out text files as CRLF, so you could experience the same amount of fallout.

Now after this change, due to the added .gitattributes which overrides the core.autocrlf setting, these files get checked out with CRLF newlines (as the native form for the platform).

It's my understanding that text=auto does not override core.autocrlf. As far as I can tell from the documentation it honours the user's configuration for core.eol in combination with core.autocrlf - from git config --help:

This doesn't match my experience.

See mstorsjo@inspect-newlines for a test github actions job that shows checking out the repo on Windows. First we check out an old branch, with default settings - we get CRLF. Then we set core.autocrlf=false and retry the above, we get a file with LF newlines. Then we check out the new version with gitattributes, and we notice how we now suddenly are getting CRLF again, despite our setting. See https://github.com/mstorsjo/llvm-project/actions/runs/11407224268 for the actual run log.

Feel free to play around with such an action, to come up with a better way of checking it out while still getting LF newlines - but this is the way I have been doing it (which is scripted in a number of places), and I would expect that many others have the same setup.

@mstorsjo
Copy link
Member

This seems to have broken precommit CI on Windows: https://buildkite.com/llvm-project/github-pull-requests/builds/111165#0192a01b-d3ac-44ad-abff-e53ac4a206ab all of the failures look related to line endings, and I noticed that I got a ton of command line messages of the form:

warning: in the working copy of 'clang/include/clang/Basic/Attr.td', LF will be replaced by CRLF the next time Git touches it

Right, that probably relates to the odd situation where files are checked out in one form, but the git attributes indicate that they should be treated differently. Getting git to rewrite the files on disk to match it is kinda messy actually - the best way I've found today is git rm -r . && git reset && git checkout ..

Do we really want checkouts to default to having the majority of files with CRLF on Windows? I would expect that most people working on Windows don't really want this (this wasn't the case so far anyway)?

It certainly came as a surprise to me; my editors handle LF line endings just fine on Windows;

Exactly - people haven't had a problem with having LF newlines so far. The main problem probably has been to set up Git to get it in that form.

I obviously don't mind fixing as many tests as possible to pass with any form of newlines, but having all checkouts of the repo actually have files in identical form, rather than in any fuzzy form, feels like a feature to me.

So keeping the gitattributes, but in a form where it dictates checking out files in LF form (which has required setting core.autocrlf=false so far) would IMO be preferrable.

Would we get there by setting the wildcard rule in .gitattrubtes to * text eof=lf, or something along those lines?

@aeubanks
Copy link
Contributor

I believe Chrome is also seeing many test failures due to this (https://crbug.com/374115887), although I haven't yet confirmed it's due to this specific commit.

@AaronBallman
Copy link
Collaborator

I just had someone in my office hours also running into problems from this commit. I went to revert the changes myself and I cannot because of merge conflicts... due to line endings.

@ldrumm -- can you revert these changes ASAP? They're causing significant problems in practice, so best to get us back to green rather than fix forward. Thanks!

@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 18, 2024 via email

@zmodem
Copy link
Collaborator

zmodem commented Oct 21, 2024

Thanks for reverting!

I must have missed this PR originally. I oppose letting Git change any line endings. It always ends like this.

@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 22, 2024

It's my understanding that text=auto does not override core.autocrlf. As far as I can tell from the documentation it honours the user's configuration for core.eol in combination with core.autocrlf - from git config --help:

This doesn't match my experience.

I think this is due to a subtly of config. Setting core.autocrlf to false doesn't actually do anything since it's the default. In that case git is still in "no opinion" mode - which means it stores the input line endings and does no conversion.

However, once eol=auto is set in a .gitattributes, it forces git to use the configured eol config:

static int text_eol_is_crlf(void)
{
	if (auto_crlf == AUTO_CRLF_TRUE)
		return 1;
	else if (auto_crlf == AUTO_CRLF_INPUT)
		return 0;
	if (core_eol == EOL_CRLF)
		return 1;
	if (core_eol == EOL_UNSET && EOL_NATIVE == EOL_CRLF)
		return 1;
	return 0;
}

static enum eol output_eol(enum convert_crlf_action crlf_action)
{
	switch (crlf_action) {
	case CRLF_BINARY:
		return EOL_UNSET;
	case CRLF_TEXT_CRLF:
		return EOL_CRLF;
	case CRLF_TEXT_INPUT:
		return EOL_LF;
	case CRLF_UNDEFINED:
	case CRLF_AUTO_CRLF:
		return EOL_CRLF;
	case CRLF_AUTO_INPUT:
		return EOL_LF;
	case CRLF_TEXT:
	case CRLF_AUTO:
		/* fall through */
		return text_eol_is_crlf() ? EOL_CRLF : EOL_LF;
	}
	warning(_("illegal crlf_action %d"), (int)crlf_action);
	return core_eol;
}

output_eol is the git function that decides to write out a file with CRLF or LF endings

Notice that now we hit the CRLF_AUTO case so it's text_eol_is_crlf() ? EOL_CRLF : EOL_LF;

text_eol_is_crlf() checks against core.autocrlf. Since you've stated that it's false, then it then it checks core.eol.

So if I've read that correctly, core.autocrlf=false is a red herring and you should really set core.eol=lf if you want git to use lf on windows.

Would we get there by setting the wildcard rule in .gitattrubtes to * text eof=lf, or something along those lines?

This patch is about respecting local config, which is the exact opposite of that suggestion. It would be a way to solve the line-ending issue by fiat, not by co-operation, so I'm against it on principle. To be clear I very much don't like CRLF, but I also very much don't like it when someone forces me to use wrong-handed tools. Windows users would be forced to use wrong handed tools if we force line-endings one way or another

@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 22, 2024

It always ends like this.

Ends Like what?
As far as I can see all this has done has exposed latent bugs in our testing and in clang's parser

@mstorsjo
Copy link
Member

It's my understanding that text=auto does not override core.autocrlf. As far as I can tell from the documentation it honours the user's configuration for core.eol in combination with core.autocrlf - from git config --help:

This doesn't match my experience.

I think this is due to a subtly of config. Setting core.autocrlf to false doesn't actually do anything since it's the default.

It most definitely does something. Please have another look at mstorsjo@inspect-newlines and the output of the log at https://github.com/mstorsjo/llvm-project/actions/runs/11407224268/job/31742748818.

First we do a checkout without setting anything. We get files with CRLF. Then we set core.autocrlf to false, which is the common way of dealing with this, then we do another checkout, and we get files with LF.

Do you dispute the above?

@mstorsjo
Copy link
Member

So if I've read that correctly, core.autocrlf=false is a red herring and you should really set core.eol=lf if you want git to use lf on windows.

That perhaps may be the case, but all common docs and all common practices around this revolve around setting core.autocrlf, not setting core.eol. Before this discussion, I have never seen a guide recommending setting core.eol.

While so far, all docs related to this say that it is core.autocrlf one should set: https://github.com/llvm/llvm-project/blob/llvmorg-19.1.2/clang/www/get_started.html#L151-L154 and https://github.com/llvm/llvm-project/blob/llvmorg-19.1.2/llvm/docs/GettingStarted.rst?plain=1#L37.

Would we get there by setting the wildcard rule in .gitattrubtes to * text eof=lf, or something along those lines?

This patch is about respecting local config, which is the exact opposite of that suggestion. It would be a way to solve the line-ending issue by fiat, not by co-operation, so I'm against it on principle. To be clear I very much don't like CRLF, but I also very much don't like it when someone forces me to use wrong-handed tools. Windows users would be forced to use wrong handed tools if we force line-endings one way or another

But if every single Windows developer involved here say that they want LF, and they don't want any ambiguity about it? Is it more important to give hypothetical users the choice to pick what they like, at the cost of every single current developer who do not want that, and breaking every established setup routine?

@mstorsjo
Copy link
Member

I must have missed this PR originally. I oppose letting Git change any line endings. It always ends like this.

Also just for context - the Clang precommit CI is allegedly still broken, because those buildbots happened to be restarted when we had these gitattributes in place, so all files are checked out with CRLF right now, and any incremental update on top doesn't change that, as long as those files aren't touched: https://discourse.llvm.org/t/windows-premerge-buildbot-broken-for-5-days/82571/6

@mstorsjo
Copy link
Member

This patch is about respecting local config, which is the exact opposite of that suggestion. It would be a way to solve the line-ending issue by fiat, not by co-operation, so I'm against it on principle. To be clear I very much don't like CRLF, but I also very much don't like it when someone forces me to use wrong-handed tools. Windows users would be forced to use wrong handed tools if we force line-endings one way or another

Also FWIW, I wouldn't that much mind letting users pick whichever form of line endings they would like, if all tests would have been cleaned up before this, so that they pass regardless of the user choice or tool defaults - but alas, there still are >70 Clang tests failing.

@mstorsjo
Copy link
Member

I think this is due to a subtly of config. Setting core.autocrlf to false doesn't actually do anything since it's the default.

In Git for Windows, the default actually is core.autocrlf set to true. When manually installing, the installer wizard used to ask the user which way they want their defaults to be set, not sure if this still is the case. But in e.g. Github Actions runners, you get it set to true by default - see https://github.com/mstorsjo/llvm-project/actions/runs/11459095411/job/31882864878.

@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 22, 2024

if all tests would have been cleaned up before this

That was most certainly my intention, and I saw green before merging, so I must've looked in the wrong place

@mstorsjo
Copy link
Member

if all tests would have been cleaned up before this

That was most certainly my intention, and I saw green before merging, so I must've looked in the wrong place

Ah, right - as we've seen that the CI runner normally only updates an existing checkout, where changes to gitattributes like these don't really take effect, I guess this can be understood.

(Plus there are a number of tests in less frequently executed testsuites, like compiler-rt, clang-tools-extra, and in llvm/utils/lit/tests, that don't necessarily get included in each normal run in CI.)

On that topic - some of the scripts that orchestrate that premerge testing lives in .ci/generate-buildkite-pipeline-premerge. By editing that script, it should be possible to trigger it to run all possible testsuites, not just the ones currently touched. And I'm wondering if there's anything we could add there temporarily (both right now, to flush the current checkouts on the premerge cluster, and when testing PRs like this one), to force it to check files out again?

@AaronBallman
Copy link
Collaborator

But if every single Windows developer involved here say that they want LF, and they don't want any ambiguity about it? Is it more important to give hypothetical users the choice to pick what they like, at the cost of every single current developer who do not want that, and breaking every established setup routine?

+1; I was on the fence about the changes because what happened here has happened before.

The amount of churn is already pretty high -- please make sure the original commit, fixes, and reverts get added to https://github.com/llvm/llvm-project/blob/main/.git-blame-ignore-revs. At the end of the day, we have a number of tests and files which are sensitive to line endings and we have a lot of existing clones of the repo which have been set up to work properly with the current setup, so this is a risky change.

Has there been a recent RFC asking if the community wants to go down this path? If not, we should run one before attempting further changes (aside from fixing anything up that still needs fixing, if anything).

@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 22, 2024

Yes. An RFC makes sense. None of us here speak for every windows developer. I will submit one to discourse once I iron out the kinks and am ready to try again

@mstorsjo
Copy link
Member

The amount of churn is already pretty high -- please make sure the original commit, fixes, and reverts get added to https://github.com/llvm/llvm-project/blob/main/.git-blame-ignore-revs. At the end of the day, we have a number of tests and files which are sensitive to line endings and we have a lot of existing clones of the repo which have been set up to work properly with the current setup, so this is a risky change.

FWIW, while I'm not a fan of checking things out with CRLF per se (or making core.autocrlf no longer have precedence), I totally don't mind fixing tests (or in most cases, adding some individual files to .gitattributes to mark them as needing to be checked out with LF newlines always), and I don't consider that part churn. We have a number of such files marked that way from before, but we clearly do need to add a few more.

But I haven't checked all the commits that have gone in to try to fix up tests that were failing due to CRLF - in case some of them really should be reverted once we're back to checking things out with LF.

@ldrumm
Copy link
Contributor Author

ldrumm commented Oct 22, 2024

@AaronBallman you said this has happened before, but I don't see this in history. Can you link to the commit to which you're referring?

I only see one other commit (9783f28) that touches the root .gitattributes

@AaronBallman
Copy link
Collaborator

@AaronBallman you said this has happened before, but I don't see this in history. Can you link to the commit to which you're referring?

I only see one other commit (9783f28) that touches the root .gitattributes

https://reviews.llvm.org/D124563 and https://reviews.llvm.org/D124606 where there was a lot of back and forth on the changes after they landed, but I may have missed some discussions too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.