Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing does not handle #define correctly #646

Open
hanw opened this issue May 19, 2017 · 28 comments
Open

Preprocessing does not handle #define correctly #646

hanw opened this issue May 19, 2017 · 28 comments
Labels
bug This behavior is unintended and should be fixed. core Topics concerning the core segments of the compiler (frontend, midend, parser)

Comments

@hanw
Copy link
Contributor

hanw commented May 19, 2017

I wrote a test case based on psa.p4 and ran into a preprocessing issue.

Initially, the test case defines the unspecified typedefs in psa.p4 at the top of the test program.
(psa.p4 is at https://github.com/p4lang/p4c/blob/master/p4include/psa.p4)

#define PSA_CORE_TYPES

typedef bit<5> PortId_t;
typedef bit<5> MulticastGroup_t;
typedef bit<16> PacketLength_t;
typedef bit<16> EgressInstance_t;
typedef bit<11> InstanceType_t;
typedef bit<9>  ParserStatus_t;
typedef bit<9>  ParserErrorLocation_t;
typedef bit<12> entry_key;

#include "psa.p4"

... rest of the p4 code ...

Running the test case gives the following error.

>> p4/testdata/p4_16_samples/psa_package.p4.test
./p4test --pp ./tmpG4GxQe/psa_package.p4 --dump ./tmpG4GxQe --top4 FrontEndLast,FrontEndDump,MidEndLast --testJson ../testdata/p4_16_samples/psa_package.p4
./p4test -I. --pp ./tmpG4GxQe/psa_package.p4-x --p4-16 ./tmpG4GxQe/psa_package.p4
/home/hwang/p4c/build/p4include/psa.p4(29):syntax error, unexpected IDENTIFIER, expecting INTEGER
typedef bit<unspecified
            ^^^^^^^^^^^
error: 1 errors encountered, aborting compilation

However, if I define the typedefs in a separate file, say psa_config.p4 and include both psa_config.p4 and psa.p4 in the test case. The error does not happen.

#include "psa_config.p4"
#include "psa.p4"

... rest of the p4 code ...
@sethfowler
Copy link
Contributor

That's quite bizarre considering we just invoke the C preprocessor.

If you invoke the preprocessor manually and look at the output, maybe that will help track down the issue.

@hanw
Copy link
Contributor Author

hanw commented May 20, 2017

Oh well, I lied, it seems neither is working.

./p4test --pp test.p4 ../testdata/p4_16_samples/psa_package.p4 generates the following code:

typedef bit<5> PortId_t;
typedef bit<5> MulticastGroup_t;
typedef bit<16> PacketLength_t;
typedef bit<16> EgressInstance_t;
typedef bit<11> InstanceType_t;
typedef bit<9> ParserStatus_t;
typedef bit<9> ParserErrorLocation_t;
typedef bit<12> entry_key;
#include <core.p4>
#include <psa.p4>

...

The #define is missing.

@ChrisDodd
Copy link
Contributor

The preprocessor removes all defines. It looks like the problem is that the P4 output code does not output the define, so the resulting code that relies on it in psa.h won't work properly the second time through. The P4 output code special-cases stuff in p4include, outputting an #include directive instead of any definitions that come from the file. see toP4.cpp around line 150

@ChrisDodd
Copy link
Contributor

As a workaround, as long as the header with the #define and the definitions is a 'system' file (in the p4include_path), it should be recognized and replaced with an #include directive of the original file, thus preserving the #define...

@mihaibudiu
Copy link
Contributor

Another solution is to change the way tests are done for these files.

@mihaibudiu
Copy link
Contributor

Alternatively, we can add a compiler flag so that the compiler does not generate #include statements anymore, it just dumps the internal P4 representation. Maybe that's the right solution.

@mihaibudiu mihaibudiu added the bug This behavior is unintended and should be fixed. label Aug 17, 2017
@hesingh
Copy link
Contributor

hesingh commented Feb 19, 2019

Alternatively, we can add a compiler flag so that the compiler does not generate #include statements anymore, it just dumps the internal P4 representation. Maybe that's the right solution.

Scoping out the work with a little deeper dive because I think this is where I am leaning toward.

Adding a new compiler flag is less than one day of work.
Changing toP4.cpp functionality to not generate #include should be quick.
Then, it's a matter of finding the top of the IR tree for the include file and dumping that tree. Not to hard to do.

@mihaibudiu
Copy link
Contributor

Top4 already has a boolean flag to avoid includes.

@hesingh
Copy link
Contributor

hesingh commented Feb 24, 2019

@mbudiu-vmw First you said "we can add a compiler flag so that the compiler does not generate #include statements". Subsequently, you said above that toP4 already includes a boolean flag to avoid includes - is this an internal flag using p4includePath? I am trying to understand what externally-visible flag exists in p4c for the user to enable the following code changes? I see the "-v" flag that dumps IR - is this what you meant? If none exists, we need to add a new flag.

diff --git a/frontends/p4/toP4/toP4.cpp b/frontends/p4/toP4/toP4.cpp
index 21a037b4..3101f112 100644
--- a/frontends/p4/toP4/toP4.cpp
+++ b/frontends/p4/toP4/toP4.cpp
@@ -152,37 +152,6 @@ bool ToP4::preorder(const IR::P4Program* program) {
 -   bool first = true;   
     for (auto a : program->objects) {
-        // Check where this declaration originates
-        cstring sourceFile = ifSystemFile(a);
-        if (!a->is<IR::Type_Error>() &&  // errors can come from multiple files
-            sourceFile != nullptr) {
-            /* FIXME -- when including a user header file (sourceFile != mainFile), do we want
-             * to emit an #include of it or not?  Probably not when translating from P4-14, as
-             * that would create a P4-16 file that tries to include a P4-14 header.  Unless we
-             * want to allow converting headers independently (is that even possible?).  For now
-             * we ignore mainFile and don't emit #includes for any non-system header */
-
-            if (includesEmitted.find(sourceFile) == includesEmitted.end()) {
-                builder.append("#include ");
-                if (sourceFile.startsWith(p4includePath)) {
-                    const char *p = sourceFile.c_str() + strlen(p4includePath);
-                    if (*p == '/') p++;
-                    builder.append("<");
-                    builder.append(p);
-                    builder.appendLine(">");
-                } else {
-                    builder.append("\"");
-                    builder.append(sourceFile);
-                    builder.appendLine("\"");
-                }
-                includesEmitted.emplace(sourceFile);
-            }
-            first = false;
-            continue;
-        }
-        if (!first)
-            builder.newline();
-        first = false;
         visit(a);
     }
     if (!program->objects.empty())
@@ -641,9 +610,6 @@ bool ToP4::preorder(const IR::Type_Error* d) {
     dump(1);
     bool first = true;
     for (auto a : *d->getDeclarations()) {
-        if (ifSystemFile(a->getNode()))
-            // only print if not from a system file
-            continue;
         if (!first) {
             builder.append(",\n");
         } else {

@ChrisDodd
Copy link
Contributor

Currently there's no specific flag -- ifSystemFile checks to see if the file comes from a directory in the p4includePath (for system headers). I think there's a way of changing/clearing that (so no files would be identified as system files), however that seems undesirable -- we WANT to omit system header files from the dumped code as they are lots of (uninteresting) cruft that doesn't depend on the source file. The problem in this bug seems to be that #defines are not dumped in the dumped code (as they are stripped out by the preprocessor), so the dumped .p4 code cannot be directly recompiled?

@ChrisDodd
Copy link
Contributor

That said, adding a flag to disable the generation of #include directives in the dumped code would be trivial -- just need tests of that flag in these two toP4::preorder methods.

@hanw
Copy link
Contributor Author

hanw commented Feb 25, 2019

Another issue is we have midend pass, e.g, elimNewType, that transforms both user program AND system header files. For example, elimNewType replaces newtype in psa.p4 with a typedef. If we omit system header file from the dumped code, we will ends up with transformed user program + not transformed system header file, which makes the resulting dumped code incorrect. Currently, any dumped code after elimNewType pass in the midend is incorrect and cannot be compiled, which is pretty much all passes in midend, because elimNewType is at the very beginning of the midend.

@jafingerhut
Copy link
Contributor

jafingerhut commented Feb 25, 2019

Does the original problem disappear if we use the latest psa.p4 #include file checked into p4include/psa.p4 as is, without doing any #define's that affect any #ifdef statements inside of psa.p4? I believe the latest version should do what we want for the BMv2 back end, at least, without having to have any code before the #include <psa.p4> line

EDIT: You can probably disregard my question here. If I understand correctly from the rest of the discussion, a problem is that p4c's mid-end is replacing type declarations with typedef in the user program, but not in the psa.p4 #include file, and this leads to incorrect P4 code after that substitution, if you continue #include'ing the original psa.p4 source file.

The C preprocessor doesn't let you do this kind of "preserve the #include directives" after running, for pretty good reason, yes? I know it is a lot of cruft to wade through, but it does produce #line directives with file names and line numbers, that can let you search for where in the post-preprocessed source that the lines are from original source file foo.p4 instead of bar.p4.

@mihaibudiu
Copy link
Contributor

As @hanw says,the problem is that during compilation the compiler transforms the code included, so when dumping it should dump the transformed code. If you include the original code it doesn't typecheck anymore. The toP4 pass already has a flag to skip the includes; we just need to expose it from the command-line. Then, when compiling PSA programs, we will not dump includes in the intermediate passes. Hopefully that will be enough.

We don't need to do this for v1model, because none of the transformations affect it much (e.g., it does not have any type declarations).

@ChrisDodd
Copy link
Contributor

If you don't replace things with #includes, it makes the resulting dumped code mostly useless -- it could only be recompiled for the exact same target and is not useful for any other target with the same architecture. So If we want to be able to use the compiler as a P4_14->16 translator in any useful way, we must preserve it.

Note that when we replace things from the system header files with other definitions, those other definitions will not be in the system header file, so will still be dumped. So to make the resulting dumped code recompilable, we need to make sure that if any definitions from the original system file remain, other definitions that have been replaced will not conflict with their replacements (eg, the replacement process should involve some renaming to avoid name conflicts). Even then, it will still only work with the identical target (and likely the identical compiler version).

@mihaibudiu
Copy link
Contributor

It depends when you dump the code. If you dump the code early in the front-end most likely it will still work with includes.

Currently the tests dump the code after the mid-end, and there the code already has target-dependent features. You can dump code even in the back-end, and this you certainly don't expect to be target-independent.

We are just exposing a knob to the user to be able to do it both ways.

The way we decide whether something is part of the system header is based on line numbers, which are often preserved when things are changed. So that's why some changed objects do not show up in the dumps. Moreover, sometimes we may change the structure of a type but not its name; if we dump both versions of the type we will have conflicts in the generated code.

The only promise we made in the compiler is that when using the --pp flag you get a compilable p4-16 program. We never promised that the result of a dump in the middle will be a correct legal program which is target-independent.

@jafingerhut
Copy link
Contributor

Perhaps there could be a compiler pass that "really expands #include's", meaning that #include statements will no longer appear in the intermediate dumped representations from that point and later, and this pass should be run somewhere before a later compiler pass would break if it has not yet been done?

e.g. ReallyExpandIncludes pass before elimNewType pass?

@ChrisDodd
Copy link
Contributor

ChrisDodd commented Feb 25, 2019

#includes never appear in the IR anywhere. We're just reconstructing them based on the source code locations in the code when dumping p4 code. So there's nothing for ReallyExpandIncludes to do.

For the most part, dumping midend (or backend) code is for debugging the compiler, so we don't care too much if the result is compilable. Having it be compact and understandable is more important, so using #includes rather than dumping lots of cruft is still preferable (IMO)

@mihaibudiu
Copy link
Contributor

Yes, includes are removed very early by our preprocessor. We just use some heuristics to reverse-engineer the includes and put them back in the dumped code. One good reason is that the included code is in many tests very large, and it would blow up the size of the intermediate files, and also make it harder to read them for humans.

@jafingerhut
Copy link
Contributor

So change how elimNewType pass works?

Remove it, because it is incompatible with re-inserting #include of psa.p4?

Something else?

@mihaibudiu
Copy link
Contributor

No, we just do not dump includes, we dump the whole code.

@ChrisDodd
Copy link
Contributor

Something else?

do nothing? I don't actually see any problem here.

@jafingerhut
Copy link
Contributor

Backing up a second, is the only problem here that the intermediate dumped P4_16 source code gives errors if you attempt to feed it as input to the compiler?

If that is the only problem, then yeah, it seems like doing nothing is an option.

@mihaibudiu
Copy link
Contributor

The python testing scrips compile the intermediate code. So we have to change these scripts for psa.

@hesingh
Copy link
Contributor

hesingh commented Feb 25, 2019

Mihai stole my thought. Yes, this is what I was suggesting but got interrupted. Just for the 4-5 PSA 1.1 P4 programs, we change the python scripts. The two steps included below for another test are the ones to skip for PSA 1.1 P4 programs. Thanks all for replying.

/p4test -I. --pp ./tmpd58kLB/psa-test.p4-x --std p4-16 ./tmpd58kLB/psa-test.p4
Exit code  0
./p4test ./tmpd58kLB/psa-test-midend.p4
./p4test -I. --pp ./tmpd58kLB/psa-test-midend.p4-x --std p4-16 ./tmpd58kLB/psa-test-midend.p4

@hesingh
Copy link
Contributor

hesingh commented Feb 25, 2019

This code change changes the test for 4 files and retains the test for others. The file list is created in run-p4-sample.py. Maybe, one would like to provide this list from elsewhere to this script or is the change ok? We can make the same change for any other backend such as bmv2.

diff --git a/backends/p4test/run-p4-sample.py b/backends/p4test/run-p4-sample.py
index c3e9cd8a..1bd99897 100755
--- a/backends/p4test/run-p4-sample.py
+++ b/backends/p4test/run-p4-sample.py
@@ -171,6 +171,11 @@ def file_name(tmpfolder, base, suffix, ext):
 def process_file(options, argv):
     assert isinstance(options, Options)
 
+    psa11_list = ["psa-example-digest-bmv2.p4",
+                  "p4rt_digest_complex.p4",
+                  "psa-example-register2-bmv2.p4",
+                  "psa-example-counters-bmv2.p4"]
+
     tmpdir = tempfile.mkdtemp(dir=".")
     basename = os.path.basename(options.p4filename)
     base, ext = os.path.splitext(basename)
@@ -277,7 +282,7 @@ def process_file(options, argv):
         result = check_generated_files(options, tmpdir, expected_dirname);
     if (result == SUCCESS) and (not expected_error):
         result = recompile_file(options, ppfile, False)
-    if (result == SUCCESS) and (not expected_error) and (lastFile is not None):
+    if (result == SUCCESS) and (not expected_error) and (lastFile is not None) and (not basename in psa11_list):
         # Unfortunately compilation and pretty-printing of lastFile is
         # not idempotent: For example a constant such as 8s128 is
         # converted by the compiler to -8s128.

@ChrisDodd
Copy link
Contributor

Maybe base it on the includes in the .p4 file rather than having a fixed list? See the getArch code here that does that already...

@hesingh
Copy link
Contributor

hesingh commented Feb 25, 2019

Excellent suggestion, Chris! Changed the code and testing passed with psa and non-psa P4 program.

diff --git a/backends/p4test/run-p4-sample.py b/backends/p4test/run-p4-sample.py
index c3e9cd8a..ed759a95 100755
--- a/backends/p4test/run-p4-sample.py
+++ b/backends/p4test/run-p4-sample.py
@@ -277,7 +277,7 @@ def process_file(options, argv):
         result = check_generated_files(options, tmpdir, expected_dirname);
     if (result == SUCCESS) and (not expected_error):
         result = recompile_file(options, ppfile, False)
-    if (result == SUCCESS) and (not expected_error) and (lastFile is not None):
+    if (result == SUCCESS) and (not expected_error) and (lastFile is not None) and (arch is not "psa"):
         # Unfortunately compilation and pretty-printing of lastFile is
         # not idempotent: For example a constant such as 8s128 is
         # converted by the compiler to -8s128.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This behavior is unintended and should be fixed. core Topics concerning the core segments of the compiler (frontend, midend, parser)
Projects
None yet
Development

No branches or pull requests

7 participants