Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More questions on setInvalid #2323

Open
fruffy opened this issue Apr 17, 2020 · 19 comments
Open

More questions on setInvalid #2323

fruffy opened this issue Apr 17, 2020 · 19 comments
Labels
enhancement This topic discusses an improvement to existing compiler code.

Comments

@fruffy
Copy link
Collaborator

fruffy commented Apr 17, 2020

Hello.
I have another clarification question on setInvalid, this is a follow-up to #2212.
This issue is quite esoteric but has given me some trouble recently.

So for a program like this:

control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    apply {
        h.h.setInvalid();
        h.h.a = 16w2;
        h.eth_hdr.eth_type = h.h.a * h.eth_hdr.eth_type;
    }
}

the LocalCopyPropagation pass collapses the values into

control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    apply {
        h.h.setInvalid();
        h.eth_hdr.eth_type = 16w2 * h.eth_hdr.eth_type;
    }
}

h.eth_hdr is a valid header and its value is now always 16w2 * h.eth_hdr.eth_type after the pass instead of being undefined. However, since the alternative is undefined behaviour does it matter? The only case I can think of where this may affect the output is something like

control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    apply {
        h.h.setInvalid();
        h.h.a = 1;
        h.eth_hdr.src_addr = h.h.a;
        if (h.eth_hdr.src_addr != 1) {
            h.h.setValid();
            h.h.a = 1;
        }
    }
}

which is eventually turned into

control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    apply {
        h.h.setInvalid();
        h.h.a = 48w1;
        h.eth_hdr.src_addr = 48w1;
    }
}

because the assumption is that h.eth_hdr.src_addr is always 1.
There may be undefined cases where the inequality is true, however. Or can this be treated as a case where all bets are off?

strange_set_valid.p4.txt
strange_set_valid.stf.txt

@fruffy fruffy changed the title More questions on setInvalid() More questions on setInvalid Apr 17, 2020
@jnfoster
Copy link
Contributor

I agree this is unsound.

It is only safe to propagate an equality h.f = n if h is valid. So the analysis needs to also track that...

@jafingerhut
Copy link
Contributor

When an input program has some undefined values/behavior involved, is it a goal of a compiler to "preserve" as much of the undefined values/behavior as exist in the input program?

I would have thought that it is acceptable for a compiler to eliminate some of the undefined behavior, as long as the resulting possible behaviors of the output program are a subset of the possible behaviors of the input program.

At least, it seems from the examples in the original comment that the behaviors of the compiler's resulting program are a subset of the behaviors of the input program.

@jnfoster
Copy link
Contributor

jnfoster commented Apr 17, 2020

In general, you're right. End-to-end, the entire compiler is free to generate an implementation that is a refinement the semantics of the input program. And there are lots of places in the spec where we give significant freedom to architectures and specific implementation to make decisions -- e.g., about what happens with reads and writes to invalid headers.

On the other hand, it seems problematic for the front-end of the p4c reference compiler to pick a specific semantics for reads and writes to invalid headers. For example, suppose I want to build a P4 implementation in which hdr.setInvalid() and hdr.setValid() are realized by zeroing out all fields. If the front-end has the behavior shown above, then I cannot it to build my compiler -- it will produce an incorrect IR, as this example shows, and there is no way for me to recover the original program...

@mihaibudiu
Copy link
Contributor

As long as this behavior is one of the "undefined" ones mandated by the spec I think it's correct.
But we should probably give warnings for writes to invalid header fields.

@mihaibudiu mihaibudiu added the enhancement This topic discusses an improvement to existing compiler code. label Apr 17, 2020
@jnfoster
Copy link
Contributor

Sure. The spec says that reading a field in an undefined header can return any value, and even a different value from one read to the next.

I wrote unsound above and maybe that's too strong. If I were a backend compiler writer, I might expect the front-end to not tie my hands. Anyway, hopefully we agree it will be difficult-to-impossible for any compilers built on p4c to make any promises to programmers on the more specific behaviors they might ensure unless those promises are "I make no promises; anything goes" or "I promise to do the same thing as the p4c front-end"

@fruffy
Copy link
Collaborator Author

fruffy commented Apr 18, 2020

Thanks! So, to summarize that I get the semantics right.

When setInvalid is called all the values of the header in question are now undefined. However, it is still possible to write and read from an invalid location and the compiler optimizes or transforms the program accordingly.

On the other hand, what happens when `setValid´ is called? I am seeing two cases:

  1. What I observe from simple_switch is that the header is reallocated, so setValid
    effectively erases all values written after the setInvalid call and the header is zeroed out. Is this considered a backend-specific operation?
  2. What I observe from the compiler front-end is that, when setValid is called on the header, no previous assignment to the invalid header is removed. The implicit assumption is that the reactivated header may contain values that were written after the setInvalid call. This means it is up to the programmer or the backend developer to make sure the values after the setValid call are consistent. After all, setValid and setInvalid may be no-ops.

Which case is more in line with the spec? Case 2?

@jnfoster
Copy link
Contributor

jnfoster commented Apr 18, 2020

The spec says that reads to a field in an invalid header or where the field has not yet been initialized may return an arbitrary value.

Writing to an invalid header may not change any of the defined state in the program -- e.g., making invalid headers valid or changing fields in valid headers.

So both semantics are in line with the spec. The front-end is optimizing with a particular semantics in various places.

@fruffy
Copy link
Collaborator Author

fruffy commented Apr 18, 2020

Okay, thank you, that makes sense. I will go with the first option then.

@jafingerhut
Copy link
Contributor

If simple_switch zeroes out all fields on a call to setValid, that is definitely behavior that is specific to simple_switch, and not required by the language specification. It seems like perfectly reasonable behavior for simple_switch to work this way, but it definitely is not trying to be "undefined as much as the P4 language specification allows", or anything like that.

If p4c could guarantee that a header was invalid at the time of an assignment to one of its fields, it seems perfectly reasonable to me to (a) warn the user that this is a no-op according to the language spec, and on some non-spec-conforming implementations may do much worse things, and (b) eliminate the assignment.

One can easily write a P4 program containing assignments to header fields where it is impossible to determine at compile time whether the header is valid at that time, or not, because that valid/invalid status can only be determined at run time (e.g. the control flow before that time could have branches where one leaves the header valid, and other branches that leave the header invalid, and the condition of the branch is a run-time input to the program, e.g. another field of an input packet, or the contents of a control-plane-written table entry).

For an assignment where it cannot be determined at compile time whether the header is valid or invalid (or that the compiler did not implement a complete enough check to determine it, even if there was enough information at compile time to do so), it isn't clear to me what a P4 compiler front end is allowed to do as far as transformations or optimizations, if you want to maximally preserve undefined behavior. Maybe no such transformations should be allowed, if that is the goal?

I am not clear what you mean in your point #2 "it is up to the programmer or the backend developer to make sure the values after the setValid call are consistent".

@fruffy
Copy link
Collaborator Author

fruffy commented Apr 18, 2020

In essence, I had a little bit of trouble consolidating the transformations in the front-end with target-specific interpretations of setInvalid or setValid.

For example, looking at this sequence of operations:

control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    apply {
        h.h.a = 8w1;
        h.h.setInvalid();
        h.h.c = 8w1;
        h.h.setValid();
        h.h.b = 8w2;
    }
}

the outcome may either be

  1. (a=1, b=2, c=1), when setInvalid or setValid are no-ops
  2. (a=1, b=2, c=undef), if assignments on a invalid header are void
  3. (a=undef, b=2, c=1), if setInvalid sets header values as undefined
  4. (a=undef, b=2, c=undef), if setValid sets header values as uninitialized

The front-end assumes the third case and removes all assignments preceding setInvalid but leaves subsequent assignments. Simple switch on the other hand relies on the fourth case and also resets the header completely with a setValid call.

I just had to pick semantics that are consistent across each compiler pass and do not produce false positives, which rules out interpretation 1 and 2. Because I would also like to perform automatic test case generation for bmv2, I can not use interpretation 3, which leaves interpretation 4 (also okay as @jnfoster said).
I just wanted to make sure I understand the meaning of these two functions correctly, hope this explanation makes sense.

@hesingh
Copy link
Contributor

hesingh commented Apr 18, 2020

The frontend is assuming "soon as a field in an invalid header is written to, the header becomes valid". I think this is incorrect because unless all fields of the header are written to or initialized to zero, the header should not change to valid. Should the spec make such a clarification?

@jafingerhut
Copy link
Contributor

jafingerhut commented Apr 18, 2020

According to the spec, interpretation #4 is correct, I believe, but as a reason I would not give "setValid sets header values as uninitialized". Why not? If a header is already valid, and all of its fields have defined values, e.g. {a=1, b=2, c=3}, then doing setValid again leaves the header unchanged. It does not make the fields become undefined.

Rather, I would say that initially all headers have undefined field values, and have valid=false. If you ever do setInvalid on them, then all fields become undefined and valid becomes false. Doing setValid on a header makes its valid bit true, and leaves all fields in their current state.

bmv2 simple_switch does not represent "undefined" as a specific value. One could imagine a modified version of simple_switch that did have an explicit "undefined" value for a field, but that isn't what it does today. It seems the best one can do is if you have some kind of validation tool that expects a field value to be undefined, do not do any kind of checking on its bmv2 simple_switch value at all, since it might be anything.

@hesingh
Copy link
Contributor

hesingh commented Apr 18, 2020

Regarding a backend for an asic, no value should be undefined for hardware. I have worked with the MIPS Tensilica compiler for a router asic where the compiler initializes any C code var, if not initialized, to zero. Why doesn't the bmv2 simple_switch backend mimic the asic hardware which does not allow any un-initiazlied var?

@jafingerhut
Copy link
Contributor

jafingerhut commented Apr 18, 2020

@hesingh In the context of the language specification, a header value being undefined does not mean that there must be some special magical "undefined" value that is different than the other values for a field with type bit<8>, for example. In a hardware implementation, and most software implementations, a field with type bit<8> always has one of 2^8 possible values when you read it.

In the context of the spec, undefined means the implementation can return any of those 256 possible values when you read it, and you cannot predict which one, unless you know something about the specific implementation that you work with. An implementation is allowed to return a pseudo-random value each time you read an undefined value, or always 0, or the value you most dread it will read.

@hesingh
Copy link
Contributor

hesingh commented Apr 18, 2020

@jafingerhut I see, thanks for the clarification.

@fruffy
Copy link
Collaborator Author

fruffy commented Apr 18, 2020

@jafingerhut
Yes, the way you described it is essentially the model I have right now! I was a bit imprecise there. If an outcome is undefined, I use don't-care bits for the expected output. As you said, I do not think there is a better way to handle it.
For example:

header ethernet_t {
    bit<48> dst_addr;
    bit<48> src_addr;
    bit<16> eth_type;
}

header H {
    bit<8> a;
    bit<8> b;
    bit<8> c;
}

struct Headers {
    ethernet_t eth_hdr;
    H h;
}
control ingress(inout Headers h, inout Meta m, inout standard_metadata_t sm) {
    apply {
        h.h.a = 8w1;
        h.h.setInvalid();
        h.h.c = 8w1;
        h.h.setValid();
        h.h.b = 8w2;
        h.h.setValid();
    }
}

will generate the following stf test:

packet 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
expect 0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ** 02 **

@antoninbas
Copy link
Member

A note of explanation on the simple_switch behavior. p4c translates setValid calls into calls to the add_header primitive action, which is implemented by simple_switch. This primitive action comes from P4_14, and the specification for it reads as follows:

The indicated header instance is set valid. If the header instance was invalid, all its fields are initialized to 0, and if the header instance was already valid, it is not changed.

Ideally, we should have a set_valid in the bmv2 core primitive actions, which could implement a different behavior if needed. We could even use it for simple_switch and deprecate add_header altogether, but the compiler would need to generate the zero assignments when translating from P4_14 to P4_16 (which I do not think it does today - @mbudiu-vmw) .

@jafingerhut
Copy link
Contributor

@fruffy Is it reasonable to close this issue now?

@fruffy
Copy link
Collaborator Author

fruffy commented Nov 24, 2022

I left this open because of Antonin's comment but I do not think we will ever implement this primitive. So this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This topic discusses an improvement to existing compiler code.
Projects
None yet
Development

No branches or pull requests

6 participants