Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespaces and imports #718

Closed
5 of 9 tasks
jafingerhut opened this issue Jan 7, 2019 · 20 comments
Closed
5 of 9 tasks

Namespaces and imports #718

jafingerhut opened this issue Jan 7, 2019 · 20 comments

Comments

@jafingerhut
Copy link
Collaborator

jafingerhut commented Jan 7, 2019

Personnel

Design

Implementation

Process

  • LDWG discussed: repeatedly
  • LDWG approved:
  • Merged into p4-spec:
  • Merged into p4c:

Apologies if this is a duplicate of some other issue. Happy to close this and replace it with something else if that is preferred for long-term tracking.

I just wanted to record one idea mentioned in the language design WG earlier today, in case it turns out to be of interest for a P4_16 module / namespace system.

I will give syntax examples from Python, simply because I am more familiar with that and less with others, but hopefully the ideas are clear.

Python allows several ways to import a package, of which I will possibly leave some out since I may not know them all:
(a) import <package_name> enables the "caller" to use names inside of <package_name> with a syntax like <package_name>.<name_inside_package>. You cannot use <name_inside_package> without the prefix.
(b) import <package_name> as <alias_name> enables the caller to use names inside of <package_name> with the syntax <alias_name>.<name_inside_package>. The reason it exists in addition to the previous option is that the caller can choose to make <alias_name> much shorter than <package_name>. The caller is responsible for ensuring that multiple <alias_name>'s to not collide with each other.
(c) from <package_name> import * enables the caller to use names inside of <package_name> with the syntax <name_inside_package>, with no prefix required. Any local names that happen to be the same as a name inside of the package are subject to the normal language rules for shadowing or redefinition.

If P4_16 enabled operations like (a) and (b), but not (c), then it has the property that either <package_name>. or <alias_name>. prefixes are always required when referring to a name inside of another package. They can be short because <alias_name>'s can be short, but they must be present.

Some people consider this an advantage, in that it makes it clear on a casual reading which names are from other packages, and which are local. If you allow option (c), then it is not clear when you read a name foo whether it is in this package, or one of 1 up to N other packages for which you have done from <package_name> import *.

Requiring the prefixes also means that later versions of packages can introduce new names, and be guaranteed that they will not collide with names in the code that imports the package.

Anyway, nothing earth-shattering here, and I know some people probably dislike having to type prefixes, but I personally like requiring them simply for a code reader to easily determine which package that names come from.

@antoninbas
Copy link
Member

@jafingerhut there is also (d) from <package_name> import <name_inside_package>. This is less dangerous than (c) and doesn't require <package_name>. / <alias_name>. when using the imported name.

@jafingerhut
Copy link
Collaborator Author

Yes, glad you mentioned that variant, too, since I forgot it, and because from <package_name> import <name_inside_package1>, <name_inside_package2>, ... also has the property that future changes inside of <package_name> should not break the code that does the import, and it also gives an explicit occurrence of <name_inside_package> that readers can search for and find must be defined inside <package_name>, so it has that nice property of the other variants, too.

@mihaibudiu
Copy link
Contributor

Yes, python is a good model.
The only tricky issue is handling nested modules correctly.
Given our design so far we most likely won't support recursive modules.
I'd like to have a design that can be implemented simply as an early front-end pass.

@jnfoster
Copy link
Collaborator

jnfoster commented Jan 8, 2019

Should we pick up the Modularity WG sub-group and bang out a quick proposal together? (Maybe just the three of us?)

I'll even be in CA much of the next week if we want to do it in person... :-)

@jafingerhut
Copy link
Collaborator Author

Apologies for missing you in person, given the invitation. I would be open to doing a sub-group meeting focused on this topic. Given my lack of knowledge of more sophisticated module systems from other programming languages, my initial proposal here is focused on what I understand better, i.e. visibility or non-visibility of object names (object here meant to include things like P4_16 function definitions, extern objects, extern functions, etc.), which seems like it at least covers concerns about name conflicts.

@mbudiu-vmw Your concern with nested modules isn't clear to me yet, perhaps because I'm not sure what you mean by the term.

Perhaps you mean a scenario like the following?

Someone writes a module M2.

Someone else writes a module M1 that imports some names from module M2.

Yet another person writes a P4 program that imports M1, and thus also imports M2. You want it to be possible to allow some selected names from M2 to thus be visible from within the P4 program, without having to explicitly import M2 as well?

If that is the scenario you are considering, it seems to me that perhaps the following code sketch would enable this?

Skeleton of the definition of module M2:

from M1 import A1, A2, A3;
import M3 as my_alias;

// code for M1 here can use A1, A2, A3 as if they were locally defined names, with no prefix.
// code for M1 can also use my_alias.<name_from_M3>, but only with a prefix.

// code for M1 defines its own names B1, B2

Skeleton of the definition of the P4 program:

from M2 import A1, B1, B2;

// P4 program code can use names A1, B1, B2 as if they were locally defined names, with no prefix.

// the following line in the P4 program would be an error, if A4 is defined in M1, but not imported into M2
// from M2 import A4;

The above examples presume that all names in a module are "public", i.e. are allowed to be imported. One could also of course define a mechanism for a module to explicitly list which of its names can be imported by others, and define it to be an error to attempt to import any other names from that module (i.e. they would be "private" by default).

@mihaibudiu
Copy link
Contributor

You are right.

A few of us met in person and brainstormed, and we reached the conclusion that we have to continue to support a significant part of the preprocessor functionality, including #ifdef. The module system may replace #include with import. But we have to devise a solution that interacts well with a preprocessor-like system, maybe a restricted version of the preprocessor. C# does this to some degree.

@jafingerhut
Copy link
Collaborator Author

Sorry I missed the brainstorming meetings, and thus may be treading old ground with this comment, that misses some requirements that you have identified but aren't explicitly stated in comments here yet. Please feel free to add comments describing any such requirements that this comment's proposal misses.

One way to add a 'namespace' mechanism to P4 that still allows the use of #include preprocessor directives, is to imagine that for the time being, P4 programs after running through the preprocessor would still be "one big source file", then define namespace/module syntax that would have meaning within that one big source file, to separate namespaces from each other. I think that C++ for example allows multiple namespaces to be defined, and declarations/definitions made inside of each one, inside of one source file, too.

As a concrete example of some possible syntax, see this sample P4 program that includes a file "header_lengths.p4" that defines a namespace named "header_lengths": https://github.com/jafingerhut/p4-guide/blob/master/namespaces/sample1.p4#L3-L30

Here is the contents of header_lengths.p4: https://github.com/jafingerhut/p4-guide/blob/master/namespaces/header_lengths.p4

After the preprocessor runs, all of those #include's in sample1.p4 are expanded, and you have one big source file, which may optionally define any number of "namespace" blocks inside of it. Later code can refer to earlier namespaces (unless someone thinks it is important to allow cycles of reference between namespaces, but hopefully that isn't necessary for the idea to be useful).

@jafingerhut
Copy link
Collaborator Author

Carrot: Just imagine if we never had to have a discussion in a LDWG meeting again about adding a new function/action/etc. could break existing user programs that use that name! Of course, we would have to bikeshed over the names of namespaces, and whether something should be added to an existing namespace, or a new one. Still, it seems like progress.

@mihaibudiu
Copy link
Contributor

I don't think modules need to declare namespaces. The ones who import the modules can declare namespaces, confining them into a namespace of their choice. In this way there are no conflicts, because the users choose the namespaces, and can use different namespaces for different modules. This is better than C++, and more like Python.

@jafingerhut
Copy link
Collaborator Author

jafingerhut commented Mar 19, 2019

What features or capabilities do people want from a P4_16 module system?

Is adding namespaces orthogonal to a module system, or intertwined in some way?

if they are considered independent, is adding namespaces useful enough on its own as a separate step?

@mihaibudiu
Copy link
Contributor

I actually think that namespaces as existing in C++ are a bad design. We don't really need them.

@jafingerhut
Copy link
Collaborator Author

Mihai, do you expect that a P4_16 module system would provide some kind of isolation of newly defined names, so that discussions like we have in a LDWG meeting about new standard function names can break previous programs, could be a thing of the past?

If a module system does not provide such benefits, why not some form of namespaces that would make them a thing of the past? I should be cautious in saying that despite my code examples using the syntax "namespace { ... }", I know very little about the namespace feature in C++ and am not trying to imply by that syntax that we should copy the C++ design.

@mihaibudiu
Copy link
Contributor

Yes, absolutely. The Python model seems useful. When you import something you can move it to its own namespace. You don't construct namespaces, they are constructed when you import something else. This way the one who imports can control all names, not the one who declares them. You can also rename symbols when you import.

@mihaibudiu
Copy link
Contributor

Perhaps I should not really call them "namespaces". What happens is that you give a new name to something when you import it. You can, for example, add a prefix to all names that you import.

@mihaibudiu
Copy link
Contributor

import * from file.p4 into X

All top-level declarations from file.p4 would now be renamed prefixed with X. (or something similar).
I think that this is the core of what we need.

The hard part is keeping all the existing constructs #define, #if, #include. I don't think we can get rid of them. But we have to blend them with the modules somehow.

@jafingerhut
Copy link
Collaborator Author

So my proposal with namespace <name> { ... } is intended not to be C++ namespaces, but simply a way to declare multiple namespace in one file (or none). That way, no statements in the P4_16 language would refer to file names (#include directives still can, but if we think of them as outside the P4_16 language proper, and part of the preprocessor, then they are not in P4_16).

The main reason I ask is whether someone has the idea that "a P4_16 module system" has some other features they want to come with it, e.g. module interfaces, contracts, etc.

If not, i.e. if the primary feature we want from it is to let people define whatever names they want in their own namespace, and anyone else declaring that same name in another namespace can be independent of that, then I think we get something useful out of it.

There is still the issue that namespace names themselves can conflict. The best solution I know of for that is Java's package naming convention, where people are suggested to create globally unique namespace names using reversed DNS names, e.g. org.p4.psa.v1 or com.cisco.csa

@mihaibudiu
Copy link
Contributor

With the "import" solution in python names will never conflict, because you import them with any name you like.

@jafingerhut
Copy link
Collaborator Author

If you try to copy Python's import features exactly, then you can make names conflict, but the important thing is that as long as you avoid implementing Python's from package_foo import *, all such conflicts are self-inflicted and easy to detect at the point of the import statements.

Python import package1: requires names inside of package1 to be qualified with package1. prefix after the import, so no unexpected name conflicts possible. The primary down side is that if package names are long, then so are the prefixes you must use in your code.

Python import package1 as X: requires names inside of package1 to be qualified with X. prefix after the import, so no unexpected name conflicts possible. I believe this is the Python syntax equivalent of the behavior you describe for import * from file.p4 as X. It is good for allowing package/module/namespace names to be long, and thus easily globally unique, but the 'alias' names X can be short, or whatever the P4 programmer wants them to be.

Python's from package1 import name1, name2, name3 makes name1, name2 and name3 from inside of package1 accessible as those names, without a prefix. Thus importing the same name1 like that from two different packages probably leaves name1 referring to the one from the last import statement. The good news is that the repeat of name1 in your source code makes that much easier to detect locally, without needing to know anything about the complete set of names defined in those packages. Note: As you have pointed out, this option could be considered an unnecessarily luxury. I only mention it because it is still safe as far as conflicting names go, because the explicit occurrence of the imported names in the statement makes conflicts or redefinitions easy to spot at the import site.

The one I would definitely anti-recommend from Python is from package1 import *. Why? Because it works like the from package1 import name1, name2, name3 above, except that it imports every name from package1 so that it can be used locally without a prefix. This is just as bad for future possible name conflicts, after your program changes, or after package1 changes in its future versions, as if you did not have modules/packages/namespaces at all. It also makes it difficult in one's program to determine where names were originally defined.

@mihaibudiu
Copy link
Contributor

We should also have something like from package import X as X1, which imports a symbol and renames it.

@jnfoster jnfoster changed the title P4_16 Module system Namespaces and imports Jun 11, 2021
@jnfoster
Copy link
Collaborator

In the interest of tidying up the set of active issues on the P4 specification repository, I'm marking this as "stalled" and closing it. Of course, we can always re-open it in the future if there is interest in resurrecting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants