Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SF.12: Rewrite to eliminate the notions of "project" and "component"; also copyedit #1666

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Quuxplusone
Copy link
Contributor

@Quuxplusone Quuxplusone commented Aug 17, 2020

The major change here is simply to delete parentheticals that muddy the waters
by talking about files belonging to "projects" or "components". My experience
has been that in industry we have "codebases," which are sometimes split up into
"repositories" and/or "libraries" (which may be installed in e.g. /usr/local/include
or may be used straight from their source tree), and everyone works on everything
or at least knows someone who works on the other thing. Figuring out what's
"in the same project" and what's "in a different project" is often fuzzy.

I would try to set up the build system so that its target_include_directories
(or equivalent) could find every header via <>, and then use <> everywhere:
even for files in the "same" component. (This does require that you follow SF.12's
advice for "Library creators", though, which I admit I don't for just about all
of my hobby projects.)

Strangely, the "Enforcement" section seems to have been written by a <> partisan
like me, even though the rule itself is ""-endian.
I copyedited the "Enforcement" section for grammar; did I unconsciously misinterpret
what it was trying to say? What was it trying to say? Should it just be deleted?


Btw, I think there's an English style question here: what is a "header search path"? Is it the ordered set of directories that the compiler traverses when searching for headers (similar to $PATH on a POSIX system)? or is it one individual pathname, such as "/usr/include" or "/usr/local/include"? If the latter, then we need a different name for the ordered set of directories manipulated via g++ -I. In my copyedits, I assumed the former: there is one "header search path", searched by <>; and "" searches the same set plus ..

@apenn-msft
Copy link
Contributor

apenn-msft commented Aug 19, 2020

I agree with this; there should not be mention of 'project' or other names for arbitrary structure we impose on our code, but note that @gdr-at-ms in #1596 wanted to see "structure" referenced in a way that would prevent the case where some_library/foo.h, installs as usr/include/foo.h and then the guidance is retro-actively interpreted as preferring foo.h to ""-include headers in usr/include.

Personally I don't think such an interpretation is reasonable and it seems obvious the guidance applies to the code where/as it is being authored and not where it is installed to later (the author applying the guidance cannot reason about the future location and relativity of where their code is installed).

what is a "header search path"

@Quuxplusone I thought you added this term in this PR?
The original text is "the alternate search path" which is interpreted from 19.2.2:

"...searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence
between the < and > delimiters, and causes the replacement of that directive by the entire contents of the
header. How the places are specified or the header identified is implementation-defined."

A more precise term then might be "implementation-defined places", reading instead as: "... or a header from an implementation-defined place"

the "Enforcement" section... What was it trying to say?

It enforces that headers under "" are relative, but only when those same headers are not also added to the <> search. So it is missing a case (e.g. the case you mentioned where a codebase may add all headers to the <> search) and it should probably instead read something like "a tool to check that files included via "" exist at a relative location to the including file, and those included via <> do not exist relatively to the including file".

emphasis on relative because the filesystem definition only may not be appropriate and it may need to take into consideration things like policy (e.g. a path like "../../foo/bar.h" is relative, but is ".." permitted?) and the implementation-defined places (e.g. <foo/bar.h> could be relative if it was installed under user code)

@Quuxplusone
Copy link
Contributor Author

I thought you added this term in this PR?

Well, yes :) but only as a replacement for the phrase "the alternate search path" which has the same question ("what does path mean in this context?") plus the additional question "alternative to, or alternating with, what?"

A more precise term then might be "implementation-defined places", reading instead as: "... or a header from an implementation-defined place"

That would at least be greppable back to the Standard's "a sequence of implementation-defined places," but I think anything as user-facing as the Core Guidelines should avoid phrases like "implementation-defined." What we mean is the thing manipulated with -I; we should look up what that's actually called and call it by name. GCC calls it the "search path" (or "search directory list"); Clang calls it the "include search path" (but also refers to a file being "found within a header search path" which sounds like the alternative definition of path I mentioned).

@apenn-msft
Copy link
Contributor

I'm ok with standardese where the alternative means using terms partial to an implementation; we do already mention "implementation defined" in the guidelines, e.g.:

and be aware of constructs with implementation defined meaning (e.g., sizeof(int)).

We also mention the term "compiler" elsewhere; it's a more approachable term, but also less accurate.
I think we can fairly expect any C++ reader to be familiar with what "implementation" means since it's critical to understanding common concepts like "implementation defined" and "undefined" behavior which are used quite often without explanation or euphemism)

To add a few more terms to the pile:

  • Apple calls it "header search path" and "user header search path"
  • MSVC refers to "Additional include directories"
    You've noted what GCC and clang call it.

If we do pick a term rooted in something quotable, I'd prefer it quote the standard than any specific compiler.

@gdr-at-ms
Copy link
Contributor

@apenn-msft

[...] the author applying the guidance cannot reason about the future location and relativity of where their code is installed

In fact, reasoning about where the library/header is installed is not uncommon in the world outside of Windows platforms, and to some degree made very easy with build tools :-) We have to exercise some caution about the hypotheses we make in the formulation of the guidelines.

@Quuxplusone : I think include search path is a reasonable term to use in the Guidelines.

@apenn-msft
Copy link
Contributor

apenn-msft commented Aug 20, 2020

@gdr-at-ms "include search path" is ambiguous; 19.2.2 and 19.2.3 both refer to a "search" being conducted for both <> and "" forms. I think the term could work if we qualify it as "angle-bracket include search path", which is wordier than "implementation-defined places" but probably more understandable (at the cost of being less 'grep'-able).

In terms of reasoning about install locations, I guess I can only provide my interpretation, but if it were not uncommon to interpret it as applying to the install location, is there a way to have the guidance distinguish the code as it exists in the install location versus the authored location, without introducing a term that refers to the code's structure (i.e. 'project' or 'library')?

@Quuxplusone
Copy link
Contributor Author

"include search path" is ambiguous; 19.2.2 and 19.2.3 both refer to a "search" being conducted for both <> and "" forms. I think the term could work if we qualify it as "angle-bracket include search path"

My mental model is that the "quoted include search path" is simply the "angle-bracket include search path" with . prepended. So, if <foo.h> is searched for in ../mylib/include, /usr/local/include and /usr/include, then "foo.h" is searched for in ., ../mylib/include, /usr/local/include and /usr/include. So there's still only one path; it's just used in slightly different ways for the two lookups. (In fact, if Clang fails to find <foo.h> in the angle-bracket include search path, Clang will look it up in . as well, and emit a warning if it is found in . recommending that you switch from <> to "".)

If my mental model is wrong, I'd be interested to know how to adjust it.

@apenn-msft
Copy link
Contributor

It's partially correct, except:

there's still only one path;

as you noted, there are two paths:

for the two lookups

this is why the term "include search path" is ambiguous unless we know which form is being used.
In the case of your earlier comment, you're looking for a term to represent:

the thing manipulated with -I

which would accrue into the search path used by the angle-bracket form (although I realize now, this (-I) is what I was referring to by the term "alternate", see below).

When used in the original sentence, things becomes clearer why it needs to be qualified:

It makes it easy to understand at a glance whether a header is being included from a local relative file versus a standard library header or a header from the _angle-bracket include search path _

if it were just "include search path", the distinction would be redundant since the header always comes from the include search path (depending on which form is used).
I think I also realize now why I used the term "alternate" and that was a nod to:

an implementation may provide a mechanism for making arbitrary source files available to
the < > search,

"alternate" was my attempt to name the "mechanism", which you also named "the thing manipulated with -I". I'm not sure what name we could give it that wouldn't raise questions, except for using verbose standardese like "the mechanism by which arbitrary source files are made available to the <> search", or just not naming it all. It means we lose the third distinction, but I think that is fine versus introducing a new term or cumbersome language. e.g. just the two cases is fine too: "relative file versus angle-bracket include search path"
(also you can copy-edit remove any other instances of "local", it's already covered by "relative")

@gdr-at-ms
Copy link
Contributor

@apenn-msft

@gdr-at-ms "include search path" is ambiguous; 19.2.2 and 19.2.3 both refer to a "search" being conducted for both <> and "" forms. I think the term could work if we qualify it as "angle-bracket include search path", which is wordier than "implementation-defined places" but probably more understandable (at the cost of being less 'grep'-able)

I am sorry, I don't see the ambiguity here. The term designates a set of paths consulted by two similar algorithms.

The Core Guidelines, deliberately, are not trying to emulate the a Bourbaki-level of precision so that they can communicate the right level of information to the majority of C++ programmers, giving enough leads to the experts to further specialized interests. So, by necessity it won't be able to satisfy every possible imaginable nuance.

@Quuxplusone
Copy link
Contributor Author

@apenn-msft: "...as you noted, there are two paths: for the two lookups this is why the term "include search path" is ambiguous unless we know which form is being used. In the case of your earlier comment, you're looking for a term to represent: the thing manipulated with -I which would accrue into the search path used by the angle-bracket form..."

Well, in Clang and GCC at least, AFAIK, there is only one search path; it's just used in slightly different ways for the two lookups. The thing manipulated with -I/-isystem is that single search path. Lookup for <foo> will look only in the directories in the header search path. Lookup for "foo" will look first in . and then in the directories in the header search path.

Does MSVC have separately manipulable search paths for <foo> versus "foo"?

(Clang and GCC also have a notion of "system headers" versus "[non-system] headers," but they use that notion only for suppressing diagnostics in system headers based on where the file is physically found; I've verified that it is 100% orthogonal to whether an inclusion was syntactically triggered via <stdio.h> or "stdio.h".)

@apenn-msft
Copy link
Contributor

Not MSVC, but Apple does:
https://stackoverflow.com/questions/3429031/header-search-paths-vs-user-header-search-paths-in-xcode

that's a concrete example although I was more referring to the abstract standard language, which mentions both "" and <> as performing a "search" (and I guess Apple did take that part of the standard to heart)

The part I mean about "include search path" being ambiguous is that if someone asks "is foo.h in the include search path?" the question would have to clarified as "it depends, are you including it using "foo.h" or <foo.h>?"

@Quuxplusone
Copy link
Contributor Author

Ah, I see that Xcode (and Clang via -iquote) support putting more things in the "user header search path" than just . (and in fact Clang has a dead codepath for taking . out of the "user header search path," too). Basically Clang has one path plus an AngledDirIdx into it, where AngledDirIdx says "if you're including via <>, start at this point in the path, instead of at index 0." (And if you're including via #include_next, start at an index that varies dynamically.)

// clang++ -iquote foo -I bar test.cc
#include "x.h"  // finds foo/x.h before bar/x.h
#include <x.h>  // finds only bar/x.h

Okay, that's close enough to "two search paths" that I'll stop arguing for the moment, at least. ;) I do think it will be good to stay away from the unadorned term "alternat[iv]e search path," though, since it's not at all obvious which of the two paths (the longer one used by "" or its shorter suffix used by <>) should be considered "alternative."

#include <some_library/common.h> // A file that is not locally relative, included from another library: use the <> form
#include "foo.h" // A file located relative to foo.cpp: use the "" form
#include "foo_utils/utils.h" // A file located relative to foo.cpp: use the "" form
#include <component_b/bar.h> // A file located via the header search path: use the <> form
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users commonly believe that if an include is from the same library (or codebase or project or whatever you want to call it), then it must use the "" form. However, could still have been found via the search path so that belief is wrong.

If there's one thing I'm trying to do with all of these comments, it's to fix this misconception. I fear that this PR pulls us back towards reinforcing that misconception. It's not quite clear enough that component_b is part of the same whatchamacallit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled with this too; the whatchamacallit is, well... a project. which means we have to choose between defining what a project is (pro: being more crisp about how to include things from a project) or leaving it intentionally up to the user to define (pro: steering clear of defining what a project is in the guidelines, which is by nature a very liberal concept).

I think I'm ok erring on the side of not defining what a project is here, and letting users figure out how best to include things from within one. The guideline is clear about how "" should be used, but if for some reason they're still adamant that "" must be used within their project, they'll have to align that with their policy about relatively accessing everything within their project (this should be intentionally left up to the user: some are ok with "..\bar\bar.h", some might want <foo/bar/bar.h>)

As a funny anecdote, I'm actually trying to dispel the opposite myth in my codebase: a lot of developers are under the impression that <> must always be used from within a project (from what I can tell it's a myth created because so many have been bitten by differences in the "" vs <> they inevitably decide to stick with one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of developers are under the impression that <> must always be used from within a project (from what I can tell it's a myth created because so many have been bitten by differences in the "" vs <> they inevitably decide to stick with one)

That pretty much sums up my attitude, except: s/must always/should always/ and s/myth/guideline/. 🙂

"A lot of cats are under the impression that stoves are not for sitting on (from what I can tell it's a myth created by once sitting on a hot stove)" 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"project" is mostly a Microsoft IDE-ism. We need to look beyond the particular workflow of a specific environment and implementation at one point of software organization.

@hsutter
Copy link
Contributor

hsutter commented Sep 3, 2020

Editors call: Thanks! This looks good, just waiting to see if there's a last reply from @apenn-msft and then this is ready to merge.

@apenn-msft
Copy link
Contributor

apenn-msft commented Sep 3, 2020

To summarize the current state of the various comments:
apenn-msft:
I'm approved with comments; if we can slip in some of the copy-edits @Quuxplusone brought up, that would be great, here they are again:

  • "alternate search path" - I think we all agree it's not descriptive enough and we should refer to either the "angle-bracket search path" or the "quoted search path" (or <> and "" respectively)
  • re-reading the Enforcement section, I'm walking back my earlier comment and I think it is actually ok as-is: "A tool could identify headers referenced via "" that could have been referenced with <>." (we just need to enforce that if "" is used, it is relative; we don't need the converse since one could use <> even where a header is relative if they actually intended the header from the <>-search)

These aren't new to this PR so they don't have to be addressed here. If we can fit them in great, otherwise minor enough we can also do it on another PR.

@johnmcfarlane:
wants to see an example that illustrates including a file from the same project but using the <>-form (to dispel any myth that "" must always be used from within the same project). I'm not sure this can be done without giving a formal treatment of the term "project" or whatever we decide to call the cohesive organization of a collection of code.

@gdr-at-ms:
wants this notion (of a project / organization of code) to address the case where someone might interpret it to mean ""-include could be used if the including file will be relative at the location of installation (even though it is not relative at the location where the file is authored). For example, someone does #include "system.h" because they know their header will be installed in usr/include alongside file system.h. My take on this is that I don't think the guideline is likely to be interpreted in this way and it naturally has to apply to the code at the location it is being authored (and we don't need to give this location a formal treatment) because the guidelines are applied while authoring code. I think it would be nice to address any possible misinterpretation, but if it is an unlikely one, and addressing it comes at the price of the guidelines attempting to define what is going to be such a contested and liberally-defined concept like a "project" or "structure of code", I think it's not worth it (or at least it would be a large undertaking for another PR).

@Quuxplusone
Copy link
Contributor Author

These aren't new to this PR so they don't have to be addressed here. If we can fit them in great

IIUC, this sentence referred only to the two bullets in your list above? which were:

  • "alternate search path" - I think we all agree it's not descriptive enough — I wouldn't object to a PR that changes my text "a header from the header search path" to "a header from the search path used for angle-bracket headers". However, as I was making that commit, I realized that it's kind of tautological in context: "use the <> syntax as an indication to the reader that you want to use the <> search path"? I mean, duh. 🙂 Thus I'm not confident enough to just roll that addition into this PR: it might be seen as unnecessary tautology, thus slightly increasing controversy and delaying the merging of this otherwise-converging-to-consensus PR. So, "someone make a separate PR to add the words 'angle-bracket'" sounds good to me.

  • re-reading the Enforcement section — IIUC, this isn't a proposed change at all; you're saying that this PR's text is good as-is! So, good. 🙂

… also copyedit.

The major change here is simply to delete parentheticals that muddy the waters
by talking about files belonging to "projects" or "components". My experience
has been that in industry we have "codebases," which are sometimes split up into
"repositories" and/or "libraries" (which may be installed in e.g. /usr/local/include
or may be used straight from their source tree), and everyone works on everything
or at least knows someone who works on the other thing. Figuring out what's
"in the same project" and what's "in a different project" is often fuzzy.

I would try to set up the build system so that its `target_include_directories`
(or equivalent) could find every header via `<>`, and then use `<>` everywhere:
even for files in the "same" component. (This does require that you follow SF.12's
advice for "Library creators", though, which I admit I don't for just about all
of my hobby projects.)

Strangely, the "Enforcement" section seems to have been written by a `<>` partisan
like me, even though the rule itself is `""`-endian.
I copyedited the "Enforcement" section for grammar; did I unconsciously misinterpret
what it was trying to say? What *was* it trying to say? Should it just be deleted?
@apenn-msft
Copy link
Contributor

apenn-msft commented Sep 5, 2020

kind of tautological in context:

here's how it is used:

from a locally relative file, versus a standard library header or a header from the header search path (e.g. a header from another library or from a common set of includes).

the example tries to draw a distinction between 3 types of include locations:

  1. locally relative (edit: just "relative")
  2. standard library
  3. another library, common includes

We know 1 is the ""-form, and 2 is the <>-form, but we are struggling to find a name for 3, which verbosely I would call "the angle-bracket search path configurable by the user".
If we refer to it as "header search path" it's ambiguous whether this is the "" search or the <> search path (and this matters, because the guideline is attempting to espouse that headers in 3 be located from the <> search path)

Like coming up with a term for "project", this may be more trouble than it's worth; feel free to side-step having to name this thing by just enumerating the example:

from a relative file, versus a standard library header or a header from another library or from a common set of includes.

of the file relative to files that include it, or scenarios where the different search algorithm is
required. It makes it easy to understand at a glance whether a header is being included from a locally
relative file, versus a standard library header or a header from the header search path (e.g. a header
from another library or from a common set of includes).
Copy link
Contributor Author

@Quuxplusone Quuxplusone Sep 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Moving my reply inline, so that we can thread replies if needed)

The example tries to draw a distinction between 3 types of include locations: locally relative (edit: just "relative"); standard library; another library, common includes.

We know 1 is the ""-form, and 2 is the <>-form, but we are struggling to find a name for 3, which verbosely I would call "the angle-bracket search path configurable by the user". If we refer to it as "header search path" it's ambiguous whether this is the "" search or the <> search path (and this matters, because the guideline is attempting to espouse that headers in 3 be located from the <> search path)

Well, that's the tautological part. If the guideline intends to say merely "You should use "" for headers you intend to find via the "" search path, and <> for headers you intend to find via the <> search path," then that's completely tautological. I think the useful guideline in this area would say something about which kinds of headers you should try to find via each of those search paths. E.g., I'm sure we agree "You should always intend to find iostream via the <> search path." IMHO it should also be acknowledged that the build system is likely to have vastly greater control over the <> search path than over the "" search path, and so <> should be preferred by default because it'll be easier for the human builder to configure.

Btw, I agree that "locally relative" is a bit unclear and/or tautological, but consider that #include <sys/types.h> provides a pathname relative to somewhere. Essentially nobody ever uses #include on an absolute path. So the (ill-defined) "local" part of the phrase is actually much more important to the meaning than the "relative" part. (Unfortunately when we try to pin it down we end up trying to define "project" or "codebase" again, which I want to push out of scope for this PR if at all humanly possible.)

Copy link
Contributor

@apenn-msft apenn-msft Sep 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the guideline intends to say merely "You should use "" for headers you intend to find via the "" search path, and <> for headers you intend to find via the <> search path,"

I think the intent is clear from the title, verbatim:

"Prefer the quoted form of #include for files relative to the including file and the angle bracket form everywhere else"

"" is for relative files (not "" files) and <> is for everything else (not <> files)
That's important because there can be non-relative files in "" (e.g. Apple's user-header-search-path) and there can be relative files in <> (e.g. codebases like the one you mention that exclusively use <>).
The guideline attempts to give examples of files that should be included using each type of search, and the phrase "a header from the header search path" is currently ambiguous; it is important to clarify for the given examples "another library or common includes" which header search path should be used (e.g. the <>-form)

#include <sys/types.h> provides a pathname relative to somewhere

is also local to somewhere too; "relative" and "local" are both terms that require context. The implied context is "to the including file", and this is explicit in the title (and at least a few other occurrences in the text). I think using "relative" from there on reasonably implies "to the including file", but if that's not clear it's also fine to explicitly say "relative to the including file" for every occurrence of "relative" (verbose, but if there's confusion and it dispels it, then it's worth it).

I found another usage of "alternate search path":

the different search algorithm is required

We should probably change any occurrence of "different" or "alternate" search path to a term we come up with for the <>-search that is configurable by the user. Or we don't want to term it, then another option is to rephrase it to just 2 distinctions "relative" and "non-relative" (but this also isn't specific to this PR, so it only needs to be addressed if we're also addressing usage of "alternate").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants