Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft module system #798

Closed
wants to merge 2 commits into from
Closed

Draft module system #798

wants to merge 2 commits into from

Conversation

mihaibudiu
Copy link
Contributor

No description provided.

@mihaibudiu
Copy link
Contributor Author

This is a simple start, we can expand this if you like the approach.
For example, selective imports, exporting imports, etc. But this is the core proposal; if this mechanism works the rest will be embelishments.

of the `import` statement is to include all the definitions in the
imported module in the place of the `import` module. An import is
followed by an optional path separated by slashes and by a file name.
Only *relative* paths are allowed.
Copy link
Contributor

@ChrisDodd ChrisDodd Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relative to what? What exactly does an imported module inherit from the context where it is imported? Command line arguments? Anything else?

How do nested imports work. If I have import A into X and A contains import B into Y, does B end up in Y or X.Y?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path is relative in that it cannot start with /.
It will be relative to the specified paths (probably using command-line arguments).

Nested imports are not visible outside the importer, so the problem does not arise at all.


If the top-level P4 program imports module `a` which in turns imports
module `b`, none of the declarations in module `b` are visible at the
top-level; they are only visible in the `a` module itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to have private import... combining the two keywords?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to have the converse: all imports are private, and if you want to re-export something we can add an export keyword which allows you to move something imported to your own namespace and export it again. Then we don't have to say what happens to for transitive imports.

A declaration in a module can be prefixed with the `private` keyword.
This will cause the declaration not to be visible in the importing
programs; however, the declaration is still accessible in the imported
code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe everything should be private by default -- add a public or export keyword and get rid of private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, the downside is that people may need to modify libraries they have written so far using #include to add export to each declaration. So I thought that public by default is simpler. But for a clean design I would prefer private by default.

a *new* preprocessor is applied to its source file (no preprocessor
definitions are inherited from previous preprocessor invocations). If
a module is imported multiple times it is preprocessed anew each time.
Modules offer a functionality similar to `#include` directives:
Copy link
Contributor

@ChrisDodd ChrisDodd Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if module A imports B and module B imports A? Is that allowed?

what if mulitple modules A, B, and C all import X? Is it permissible to parse X once and import it into all of them, or do we have to reparse each time (getting different values for macros like __TIME__)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a module instantiates some extern, and the module is then imported into multiple other modules? Does that result in a separate instance for every import, or just one instance that is shared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the spec does not promise any CPP macros except the ones defined by the user without any arguments, so __TIME__ is not guaranteed to work even with just the preprocessor. It would be nice to get rid of the preprocessor - then we could answer these questions much easier - there are no macros in modules, only in the preprocessor.

But I kept the preprocessor for three reasons:

  • backwards compatibility
  • more expressivity in the preprocessor, some of which may be useful
  • we don't have a replacement for #ifdef/#else

The module system is not really designed for performance, so I don't personally mind parsing X three times. We should produce the most natural result the users would expect. For circular inclusions this could probably be an error message.

This could be a problem indeed if two libraries have private modules named "a", which they both import. With the import scheme proposed in this draft the second library would break the first one. This would suggest that import pathnames to be searched starting with the "current importing directory" rather than the global import path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the extern is a trickier question. This only is important for top-level externs really.
The extern will not be visible from the outside, so none of its methods can ever be called by user code. I am tempted to say that in this case the behavior is platform-dependent. I don't know if there are good uses for this capability.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Dec 12, 2019

What happens if you do import filepath1 followed by import filepath2, and name foo is defined and exported/public/whatever in both filepath1 and filepath2? Does the one in filepath2 shadow the one in filepath1? Is it an error?

Either way, it seems to me like import filepath1 is like Python's from modulename1 import *, which leads to all of the issues of conflicting names in independently written modules that a module system can help prevent. Did you consider simply leaving out such an option completely, leaving only something like import filepath1 into namespace1?

A similar problem can happen if you do import filepath1 and it defines something with a name foo, and you later define something with name foo in your own P4 code. Is that an error? A warning? The later one shadows the earlier? Again, it seems to me a reason to consider disallowing import filepath1, and these questions go away.

@mihaibudiu
Copy link
Contributor Author

Duplicate declarations will be flagged the same way as they are if in a single file.
The current shadowing rules will continue to apply: you can shadow something in a different scope, but in the same scope you cannot have ambiguous declarations (i.e., functions can be overloaded, but not many other things).

@jafingerhut
Copy link
Collaborator

For questions that are brought up like "should top level things declared in a module be public by default, or private by default?", it seems worth having some kind of list of advantages and disadvantages of each approach. Such a document need not be part of the spec, but having it as a record of why certain decisions are made seems useful for future reference, especially if some of those advantages/disadvantages are subtle. They are also easy to forget across discussions one month or further apart.

Other language design questions that seem to be in this same category, to me:

"Should import of B from A, and import of C from B, cause import of things in C into A?" That is, should imports be "transitive" or not?

Should import be allowed for match_kind? If yes, should a match_kind named "foo" in module B, after being imported into module A, be referred to as "B.foo", i.e. a table key field declaration would look like "hdr.ethernet.etherType : B.foo;" ?

Should import be allowed for error declarations? If yes, how should they be named in the importing code? For example, if error "foo" is declared in module B, and I imports it, should A refer to it via the name "error.B.foo"? "B.error.foo"? Something else?

Should import be allowed for enum declarations? If yes, and A imports module B containing "enum MyEnum_t { A, B }", should A refer to the possible member values as "B.MyEnum_t.A", or something else?

Should import be allowed for types declared via type? For types declared via typedef? struct type names? Should they all be referred to using "B.name_of_type" in the importing module?

Should it be allowed to have the same module imported at multiple places in the "import graph"? For example, should it be allowed to have a top level program A such the A imports B and C, B imports module D, and C also imports module D? Why or why not?

Should top level instantiations be allowed in a module? If yes, and if the answer to the previous question is "yes", should there be two instantiations constructed for such a module that is imported at multiple places in the import graph, or one? If one, what should its name be in the control plane API? Should such instantiations be allowed for all things that can be instantiated, or only some of them? I am not sure, but a complete list might be controls, parsers, packages, and extern objects. My first reaction is that it seems significantly simpler to not support top level instantiations in an imported module. I also am not imagining a good use case where allowing it is much more useful than not supporting it.

@jafingerhut
Copy link
Collaborator

I have attempted to collect together all of the questions mentioned in earlier comments into one Google Doc here: https://docs.google.com/document/d/1W1JnecoTCcgPjSObb7ZaCvPTubO5Vmbalain7NROSNo/edit?usp=sharing

If you have any trouble viewing or editing it, let me know and I can try to correct that. People interested in this issue are welcome to edit it, preferably by adding more questions, or adding lists of advantages/disadvantages they can think of for different possible answers to the questions.

@jafingerhut
Copy link
Collaborator

I could have taken better notes on some of these during the 2020-Mar-02 language design work group meeting, but there are answers to various questions in the Google Doc at: https://docs.google.com/document/d/1W1JnecoTCcgPjSObb7ZaCvPTubO5Vmbalain7NROSNo that I have recorded in that document that you can quickly search for by looking for the string "2020-Mar-02", but I will record a brief summary here, labeling each with the "Q" label from the Google Doc:

Q0: Do allow no-namespace imports as well as namespace imports. No-namespace imports seem especially useful for a possible future module like 'core'.

Q1: Modules should be able to import other modules, and in the initial design it seems reasonable for importing of names never to be 'transitive', i.e. if module B imports C, and A imports B, A will never see names from C.

Q2: Do not permit cycles in the graph of import relationships in the initial design, due to technical difficulties that might arise, and there is no known strong use case for supporting it.

Q3: In the graph of import relationships, it should definitely be possible to reach a module X via multiple paths from the 'root program'. Disallowing this would be a draconian restriction, especially for a module like 'core'.

Q4: I do not know an answer for.

Q5: It is certainly simplest in an initial module system design to disallow top level instantiations inside of a module. It seems like something that could be added later if there was a use case for it, and the questions in the Google Doc were answered to people's satisfactions.

Q6: There was some approval for, and no objections I recall, to making top level entities declared inside of a module public by default. A module developer can explicitly choose to modify each top level declaration with a new 'private' keyword to make the name not visible to code that imports the module.

Q7: Use 'my_alias.entity_name' to refer to any top level name imported from a module via a namespace import, e.g. 'import filepath/to/mymodule.p4 into my_alias;', including match_kind values, type names, etc. One wrinkle is for error declarations in such a module, which at the moment seem most reasonable to refer to using the syntax 'error.my_alias.SomeParserErrorName'.

Q8: No known tricky cases for using @name annotations inside of modules, but I would not be surprised if someone directing their attention in that direction could discover some.

@jafingerhut
Copy link
Collaborator

Mihai asked during the last LDWG meeting if I could make additional commits on this PR. Does anyone know how I can do that? I guess one way is if I had permission to push commits to Mihai's clone of the p4-spec git repo?

@jafingerhut
Copy link
Collaborator

@jnfoster Sorry, not trying to pick on you, but during the last LDWG meeting when I was asking whether there ought to be an option that a P4 module developer so that they could declare "it is an error to import this module using a no-namespace import", you mentioned that in another language (probably OCaml?) that it was considered a burden on the users of a module to handle conflicts of names, not on the module owner.

I understand that point of view, but I do not see how to reconcile it with the amount of conversation we have in the LDWG meeting when adding a new function/extern/etc. name to core, e.g. minSizeInBytes instead of sizeof, or making things look like method calls on an instance (e.g. .miss) in order to avoid name conflicts. I know we should not be trying to create name conflicts, and 'core' might be an exception to the "users are responsible for handling conflicts when they upgrade" common practice in OCaml.

If we say that P4 core should be very careful about introducing new names, but it is OK for PSA or v1model architectures, if they were later written as P4 modules, to have the attitude that users are responsible for handling name conflicts, then can you (or anyone?) explain what advantage this proposed P4 module system gives over just using #include? I am guessing someone has in mind some advantages that I haven't thought of yet in my Google Doc, because the ones I have written there seem pretty thin for the work this will require.

@jnfoster
Copy link
Collaborator

jnfoster commented Mar 5, 2020

First, one is not obliged to use the no-namespace import. So you can do that if it's safe and convenient. If there are clashes, then you do

import MyMegaModule as M

which avoids clobbering an existing definition of x, but does mean you have to write M.x to refer to the imported value. If the compiler also warned when definitions were shadowed, as it does now, then I think this could work quite reasonably.

Second, besides the benefits for programmers of being able to more flexibly split things into files and manage name spaces, I think it will be a win for compilers and tools to be able to see the actual compilation units instead of the giant string produced by cpp. I've personally written P4 tools that do unholy things to try to recover the compilation unit structure of a program for various reasons.

Having said all that, I'm not adamantly opposed to the idea that a module author could require it to be imported with a name. But it does seem a bit like a nanny-state programming construct. I'd sort of prefer to give the programmers tools for importing modules and opening or including their definitions in the current scope that they can use to manage name conflicts.

@jafingerhut
Copy link
Collaborator

Thanks for the answer. I have added to the Google Doc a section near the beginning "Advantages of proposed module system" that attempts to quickly summarize the advantages of the proposal. Feel free to comment on that, or add more. I have attempted to describe the ones you mention in your comment, near the end of that new section.

@mihaibudiu
Copy link
Contributor Author

@jafingerhut : it seems that it's enough to check out my branch:

git fetch origin
git checkout -b modules origin/modules

These are instructions from the PR at the bottom, "see command-line instructions".
Then you just commit something and when you push it shows up in the list of commits.

@vgurevich
Copy link
Contributor

@jafingerhut -- it might be useful to describe more specific goals and non-goals of the system.

For example, I remember that we discussed it previously that the main goal was to be able to easily define "derivative" architectures, was it not? If it is still the same (meaning that people will not be writing modules in the normal course of development), then mentioning it would make it easier to evaluate the proposal.

If the goal is to allow people to get rid of #include in their own programs, then some answers (e.g. to the question about @name annotation) might be different.

If the goal is to allow independent developers to create modules, reusable by others, then it is a completely different project and a lot more features will be required.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Mar 6, 2020

@vgurevich Can you give a couple of examples of the many more features you would expect to be required if the goal is to allow independent developers to create modules? Yes, I am fishing for ideas here :-) It sounds like you must have some, at least in outline form, if you mention "a lot more".

It might be the goal of some people to eliminate #include. Me personally, I'd be happy to provide a better alternative. I have added a section of what I consider to be advantages of import over #include in the Google doc, for my own clarity of thinking on the issue, if no one else's, because I wasn't clear on what they were without writing it. Developers will decide whether they think it is better or not, for their own personal use, and #include will still be there.

Food for thought: No one can pry the C preprocessor away from you, if you want to use it. Worst comes to worst, you use it on your own, independently of the dev tools of any language you want, and then eliminate the #line directives that remain before feeding it to the compiler.

@vgurevich
Copy link
Contributor

vgurevich commented Mar 6, 2020

@jafingerhut

Here are some of the typical things I expect would be needed for a reusable module, which mostly fall in the category of "amend something" or "fill in the blanks"

Ability to add new fields to the existing structs or header_unions. Imagine a typical module (L4) defining ICMP, IGMP, TCP and UDP and providing a header_union for those. Now, you want to add your own L4 headers to that header_union, while preserving its name, so that the other components can still use it. Same thing with structs, e.g. a simple program defines hdr struct with ethernet, then imports the L3 module and gets this struct amended with IPv4/IPv6 headers, etc, etc. (you might also want to keep it flat as a separate feature).

Ability to add new transitions to select() statement. This is necessary to expand a generic parser

Ability to define empty (default) controls and parsers (or parser states) in the module to be seamlessly overridden (shadowed) by the user-defined parsers/controls/etc. with the same names

Ability to create table derivatives, by taking a table from a module and adding a new key field, a new action, an additional extern (counter, meter, etc.), etc.

Ability to create a derivative action (with the same name), but with extra parameters (or just with extra statements inside).

I am sure there is more, but these are minimal things without which reusable modules can't be combined.

@jafingerhut
Copy link
Collaborator

jafingerhut commented Mar 6, 2020

Looking at that list (thanks for that), I see where you are coming from there. My first reaction is that I could spend 2 years full time trying to define and specify precisely language features that would enable that, and it would still be missing half of the features you would wish for, and make the language spec about 3 times longer. I am probably being a little bit pessimistic there, but not by more than 1 year :-)

What would you point at, existing in other languages, that is comparable?

For example, languages with higher-order functions or function pointers let you 'fill in the blanks' when calling another function, by calling a function/procedure/whatever A, and one or more of the parameters is a function/closure/function-pointer B that will be called at documented places by A. The closest we have to that in P4_16 is defining a control A that takes a compile-time constructor parameter of another control B, then when instantiating A, pass it a constructor parameter of whatever compatible control B' that you want A to use.

One might be able to extend that idea to passing not only a control at constructor time, but also an action, or a function.

@vgurevich
Copy link
Contributor

@jafingerhut ,

I understand that this might be a lot, but that's precisely why I asked what is the desired use case :)

I think that if we look at the other languages, real code reusability comes with typical OO approaches. For example, the ability to amend a struct or a header_union is pretty much the same as object inheritance. Having namespaces allows the module to refer to the parent and then define the new struct/header_union with the same name (since it will be in the new namespace). Ability to amend an enum (including a serializable one) is also a very useful feature in that respect. We do have it, but only for the error type.

You described other methods, although more at the implementation level. In many cases, in P4 a typical implementation is just a text transformation, no more than that. Currently, a number of these things can be achieved via preprocessor hacks, but it usually results in the code, that is not very readable and if so, what's the point.

I'd be happy to work more on these features or we can split the problem and go only for (1) and maybe (2), while leaving (3) out of scope for now.

@jnfoster
Copy link
Collaborator

jnfoster commented Mar 6, 2020

P4 is not an object-oriented language though, so it's not clear that borrowing ideas about code reuse from that paradigm is best.

I think we should not slow down the current effort, even if it only accomplishes the ability to split a P4 program into separate compilation units (i.e., files).

@vgurevich
Copy link
Contributor

@jnfoster -- as I said, it would probably be easier if we define the scope explicitly, which is what prompted the discussion.

Speaking of limiting the scope, splitting the code into multiple files and making a file a compilation unit are two very different things. I believe that the first one is easily achievable, but the second one is much more difficult (unless I misunderstand what a compilation unit is).

Again, all these goals make sense only when we have clear use cases. I think we discussed some of them above.

@jafingerhut
Copy link
Collaborator

I think one way to summarize the goals of the current proposal is: "provide fine grained controls of what names are visible from an imported module". It is as little as that, and as much as that.

@jafingerhut
Copy link
Collaborator

I have added a section "Scope of this proposed module system" to the Google doc, linked again here for convenience: https://docs.google.com/document/d/1W1JnecoTCcgPjSObb7ZaCvPTubO5Vmbalain7NROSNo

@jafingerhut
Copy link
Collaborator

I have been adding a few more notes/questions as I think of them to the Google doc, still in the same place: https://docs.google.com/document/d/1W1JnecoTCcgPjSObb7ZaCvPTubO5Vmbalain7NROSNo

Q1 through Q8 were written before the 2020-Mar-02 LDWG meeting. Q9 through Q14 were written after that meeting.

@hesingh
Copy link
Contributor

hesingh commented Nov 28, 2022

@jafingerhut

Here are some of the typical things I expect would be needed for a reusable module, which mostly fall in the category of "amend something" or "fill in the blanks"

Ability to add new fields to the existing structs or header_unions. Imagine a typical module (L4) defining ICMP, IGMP, TCP and UDP and providing a header_union for those. Now, you want to add your own L4 headers to that header_union, while preserving its name, so that the other components can still use it. Same thing with structs, e.g. a simple program defines hdr struct with ethernet, then imports the L3 module and gets this struct amended with IPv4/IPv6 headers, etc, etc. (you might also want to keep it flat as a separate feature).

Ability to add new transitions to select() statement. This is necessary to expand a generic parser

Ability to define empty (default) controls and parsers (or parser states) in the module to be seamlessly overridden (shadowed) by the user-defined parsers/controls/etc. with the same names

Ability to create table derivatives, by taking a table from a module and adding a new key field, a new action, an additional extern (counter, meter, etc.), etc.

Ability to create a derivative action (with the same name), but with extra parameters (or just with extra statements inside).

I am sure there is more, but these are minimal things without which reusable modules can't be combined.

My company has already addressed such Modularity using two new keywords in P4 - override and super. See the "Quick testing..." section at this link: https://github.com/hesingh/mnkcg/tree/master/p4-ansible and https://github.com/hesingh/mnkcg/blob/master/p4-code-reuse/ansible.md

@hesingh
Copy link
Contributor

hesingh commented Nov 28, 2022

If Name Mangling is used there is no conflict between included code and code in a new P4 program. Right now, with P4-16, I can define several controls in file1.p4 and call the controls in file2.p4 just fine because file2.p4 includes file1.p4. So, why bother with P4 parser and control in namespace?

@mihaibudiu
Copy link
Contributor Author

What @hesingh describes are various constructs for supporting modular programming, and they are very useful, but they are not part of what the module system is designed to provide. The module system is just a way to divide code in multiple files. Extension mechanisms are independent on the module system, and can be used with or without one.

@hesingh
Copy link
Contributor

hesingh commented Nov 28, 2022

As a P4 programmer, if I use #include file in my P4 program, I can directly use a struct or header from the include file and don't have to use the dot notation when using a module. I know Python import has a flavor that lets me use an object as is, but what happens if A includes B and B includes C, how does A access an object in C without dot notation.

Regarding "how to test if two modules are the same", why don't you compute MD5 checksum on the content of the module?

@hesingh
Copy link
Contributor

hesingh commented Nov 30, 2022

What @hesingh describes are various constructs for supporting modular programming, and they are very useful, but they are not part of what the module system is designed to provide. The module system is just a way to divide code in multiple files. Extension mechanisms are independent on the module system, and can be used with or without one.

@mbudiu-vmw I was replying to @vgurevich's comment which asked for a reusable module and use cases with incremental merge. The uses cases are solved by the P4-Ansible design. The design doesn't care if two P4 programs or two modules are merged.

@hesingh
Copy link
Contributor

hesingh commented Nov 30, 2022

If a user does not use the new module feature, the user should be allowed to use the old #include. I suspect the reason C++ did not do a job as good as import module in Python is because C++ wanted to continue supporting #include developed by C. This could be why namespace in C++ is what the language used to support modularity. p4c can do the same in a proprietary implementation if the community does not want to support namespace in p4c.

@jnfoster
Copy link
Collaborator

In the interest of tidying up the set of active issues on the P4 specification repository, I'm marking this as "stalled" and closing it. Of course, we can always re-open it in the future if there is interest in resurrecting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants