Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PSA] Restrictions on use of Checksum extern? #385

Closed
jafingerhut opened this issue Aug 16, 2017 · 18 comments
Closed

[PSA] Restrictions on use of Checksum extern? #385

jafingerhut opened this issue Aug 16, 2017 · 18 comments

Comments

@jafingerhut
Copy link
Collaborator

The current draft of the Checksum extern has clear(), update(), remove(), and get() methods proposed.

Some externs explicitly maintain state across processing of different packets. There doesn't appear to be any desirable reason why the Checksum extern should. Is it considered that every time a new packet begins processing, this extern has automatically had clear() called on it already? Or is it required that for each new packet you first call clear(), because if you do not, it might have internal checksum state left over from a previous packet?

If we restrict our attention to the 16-bit one's complement sum used for IPv4 header checksums for a moment, the primary, and maybe only (?), use case for this extern, a few more questions:

Is it considered an error to call update() or remove() with a collection of data that is anything other than a multiple of 16 bits?

Is it expected that if a list of fields is given, that their order is significant, because the extern effectively takes the list of fields, concatenates them, breaks it up into consecutive multiples of 16 bits long, and does the one's complement sum on these 16-bit pieces?

If all of that is true, then an example showing how to correctly use this to first calculate an IPv4 header checksum, then modify only 1 byte, e.g. the TTL, then use remove() followed by update() to recalculate the new checksum with minimal work, would be good, including pointing out that the remove() and update() must both contain the "2-byte aligned" group of 16 bits in the header that contains the TTL field.

@jafingerhut
Copy link
Collaborator Author

Also, I am guessing that it is not practical to try to implement a checksum other than one like the IPv4 header checksum using the Checksum extern? That one is commutative and associative in how it is calculated, over 16-bit "words". Things like CRCs are not commutative, so implementing update() of say 20 bytes of data, then trying to remove() on some 4-byte piece in the middle, then update() to add back in a different 4-byte piece, would not be possible unless you knew exactly where that 4-byte piece occurred in the middle of the 20-byte original. Even if you augmented the remove() and update() methods to indicate such a position in the data, I'm not sure it is practical to implement such changes to a CRC with such an API.

@DGeorgeNetro
Copy link

I agree with @jafingerhut, having explicit incremental checksum updates is a really nice idea but they are only applicable to checksum algorithms. The penalty for recomputing UDP/TCP checksums is so high, that having a specific incremental checksum extern is worth while.

In P4-14 days it was a lot easier to detect when an incremental checksum update could be done and it was feasible to have the backend handle the updates. I don't think this that easy in P4-16.

@mihaibudiu
Copy link
Contributor

I expect this will be a difficult design decision. Some architectures may support incremental checksum computation, while other may not. Perhaps the right choice is to offer both and to require that at least one of them is supported.

#if INCREMENTAL 
extern IncrementalUnit { ... }
#else
extern ChecksumUnit { .. }
#endif

@mihaibudiu
Copy link
Contributor

BTW: there is nothing in P4-16 about checksum units; you can design a checksum unit which looks very much like the P4-14 version; the actual API is up to the architecture.

@jnfoster
Copy link
Collaborator

jnfoster commented Aug 16, 2017 via email

@mihaibudiu
Copy link
Contributor

If your programmable platform has only support for the "wrong" kind of checksum unit, then it won't ever be able to implement PSA.

@jnfoster
Copy link
Collaborator

I understand.

@jafingerhut
Copy link
Collaborator Author

Something to keep in mind about whatever checksum extern we end up with in PSA:

It would be nice if the checksum extern API made it reasoanble to implement 'fixing up' a TCP header checksum for packets experiencing changes to their header fields due to NAT. A reasonable common way to do this, I believe, is to assume that the TCP header checksum is correct, and 'subtract out' the effects of the old TCP pseudo-header (containing IP source and destination addresses), and then 'add in' the effects of the new TCP pseudo-header. The remove() extern method in the latest PSA draft was created with this use case in mind, I would bet.

@jnfoster
Copy link
Collaborator

jnfoster commented Sep 7, 2017

It seems like there are three main issues related to checksums.

  1. Should the PSA architecture expose separate control blocks for computing and verifying checksums?

The current PSA source in master includes the following declarations:

control ComputeChecksum<H, M>(inout H hdr, inout M user_meta);
package PSA_Switch<IH, IM, EH, EM>(IngressParser<IH, IM> ip,
                                   Ingress<IH, IM> ig,
                                   ComputeChecksum<IH, IM> ic,
                                   Deparser<IH> id,
                                   EgressParser<EH, EM> ep,
                                   Egress<EH, EM> eg,
                                   ComputeChecksum<EH, EM> ec,
                                   Deparser<EH> ed);

Note that in issue #360, we already decided to eliminate the first ComputeChecksum block and perform all checksum verification in the parse. As part of that chance, we will also extend the ParserError_t type to include data for indicating that checksum verification failed.

However, it's natural to wonder whether the ComputeChecksum controls should be retained or if that functionality should be folded into the Ingress and Egress controls. An argument for eliminating it is that the programmer can just use the checksum extern to compute the updated checksum in the Egress block. An argument against eliminating it is that some targets may only provide the checksum extern at the location in the pipeline just before the deparser.

  1. Assuming that checksum functionality is realized in an extern function, what should it look like?

The V1Model architecture recently proposed to adopt a new extern function (here specialized to 16-bits):

/**
Verifies the 16-bit checksum of the supplied data.
T must be a list expression where all the fields are bit-fields or varbits
and where the total dynamic length of the fields is a multiple of 16 bits.
If this method detects that a checksum of the data is not correct it
sets an internal error flag.
@param condition  If 'false' the verification always succeeds.
@param data       Data whose checksum is verified.
@param checksum   Expected checksum of the data.
*/
extern void verify_checksum<T>(in bool condition, in T data, in bit<16> checksum);
/**
Computes the 16-bit checksum of the supplied data.
@param condition  If 'false' the checksum is not changed
@param data       Data whose checksum is computed.
@param checksum   Checksum of the data.
T must be a list expression where all the fields are bit-fields or varbits
and where the total dynamic length of the fields is a multiple of 16 bits.
*/
extern void update_checksum<T>(in bool condition, in T data, inout bit<16> checksum);

There are a few issues with this definition we might consider in defining the PSA extern:

  • Should the verify_checksum function have an effect on global state (the "internal error flag" mentioned in the comment) or should it return a boolean value instead?
  • Given that P4_16 already has conditionals in several forms (e.g.,if-then-else, select, and switch), should we introduce yet another primitive that includes conditional functionality? Alternatively, if the main use of condition is to check header validity, would it be simpler to define the behavior on invalid headers as a no-op?

As a strawman, suppose that PSA provided the following function:

extern O checksum<O,T>(in T data, HashAlgorithm hash);

Rationale:

  • Since we have already relocated checksum verification it may be the case that the condition argument is not so useful -- as far as I know the most common use of this is to check for header validity, but in the parser any headers extracted will already be guaranteed to be valid.
  • The existing verify function provides a mechanism for error-handling in the parser. Can we use this?

For example, I could imagine writing the following code in the parser:

state parse_ipv4 {
  extract(hdrs.ipv4);
  verify(hdr.ipv4.checksum == checksum({ hdr.ipv4.src, ... }, HashAlgorithm.crc16 ), error.ChecksumError);
  transition ...
}

Note that if I didn't want to bail out of the parser, instead of using verify I could have instead OR'd the result of the boolean test with the appropriate bit in ParserError_t metadata.

In the same vein, I could imagine writing the following code in the egress control (or update checksum control if we keep that):

  if (hdr.ipv4.isValid()) 
    hdr.ipv4.checksum = checksum({hdr.ipv4.src, ...}, HashAlgorithm.crc16);

I realize this proposal slightly complicates compilation, but I would argue not in a particularly harmful way. A compiler for the V1Model functions already has to keep track of boolean conditions and references to state in parsed headers and metadata. It should be straightforward to rewrite code that uses (only) verify, conditionals, and the checksum extern into the V1Model functions and vice versa.

  1. Should the checksum functionality be realized in an object instead of a function to support incremental computation of checksums? And what should we require of PSA-compliant targets: support for incremental and batch mode checksums or just one?

Here's one proposal for a extern object:

extern checksum<O> {
    checksum(HashAlgorithm hash);
    void clear();
    void add<T>(in T data);
    void remove<T>(in T data);
    O get();
    bool verify(); // equivalent to get() == 0
}

Comments? @jafingerhut @cc10512 @mbudiu-vmw @hanw @antoninbas

@jafingerhut
Copy link
Collaborator Author

jafingerhut commented Sep 7, 2017

In the latest PSA draft, there are two ComputeChecksum blocks not because one is after parser and other before deparser, but because one is after ingress, and before ingress deparser (currently a separate control block in the PSA draft), and the other is after egress, and before egress deparser. There is currently no 'verifyChecksum' control block in the latest PSA draft, in favor of performing checksum verification of received packets in the parser.

@jafingerhut
Copy link
Collaborator Author

jafingerhut commented Sep 7, 2017

From some of the discussion on this p4lang/p4c issue p4lang/p4c#775 I had the impression that verify_checksum() and update_checksum() were introduced primarily to ease automatic conversion from P4_14 to P4_16 programs.

If that is so, perhaps those extern functions could be preferred for use by that automatic conversion process, but the checksum extern object at the end of @jnfoster's comment above [1], with both add() and remove() methods, might be preferable for use when writing P4_16 PSA programs by hand?

[1] #385 (comment)

@jnfoster
Copy link
Collaborator

jnfoster commented Sep 7, 2017

In the latest PSA draft, there are two ComputeChecksum blocks not because one is after parser and other before deparser, but because one is after ingress, and before ingress deparser (currently a separate control block in the PSA draft), and the other is after egress, and before egress deparser. There is currently no 'verifyChecksum' control block in the latest PSA draft, in favor of performing checksum verification of received packets in the parser.

That's my fault. I think I missed the meeting where that was discussed in detail and in my conversations this week, there was some confusion. I've updated my comment to reflect that.

@jnfoster
Copy link
Collaborator

jnfoster commented Sep 7, 2017

From some of the discussion on this p4lang/p4c issue p4lang/p4c#775 I had the impression that verify_checksum() and update_checksum() were introduced primarily to ease automatic conversion from P4_14 to P4_16 programs.

That's right.

If that is so, perhaps those extern functions could be preferred for use by that automatic conversion process, but the checksum extern object at the end of @jnfoster's comment above [1], with both add() and remove() methods, might be preferable for use when writing P4_16 PSA programs by hand?

It's fine with me. However, I'd be curious to hear whether we want to require this functionality -- e.g., are there targets that might provide an extern function for checksumming but not an incremental unit that could implement the object?

@mihaibudiu
Copy link
Contributor

@jafingerhut is right: the changes about *_checksum functions which were made recently were entirely confined to v1model and were made to more accurately reflect restrictions of the BMv2 simple_switch implementation of these functions. These restrictions are directly inherited from the P4-14 spec, so we cannot work around them.

If your target is very restricted in what it can do, it may be actually very difficult to write compiler transformations to massage all feasible programs into the restricted forms allowed by the target. I expect that the checksum unit will be a perfect example. You should consider what is needed to do this for an exotic target such as Tofino.

The discussion is whether PSA has to support incremental checksums. I believe it should. I don't know exactly what should happen when someone implements support for PSA on a target device which actually does not have incremental checksums.

@jnfoster
Copy link
Collaborator

jnfoster commented Sep 7, 2017

As an addendum, without thinking too carefully about resource implications for any particular target, I think I would be in favor of a solution that eliminated the ComputeChecksum block.

Suppose that we are on a target where all checksumming must happen after the ingress/egress pipelines, but the programmer has written a program that computes a checksum in the middle of one of the ingress/egress pipelines. It follows that the compiler will either have to implement the checksum computation using ALUs or other resources -- which might be expensive but should be feasible -- or it will have to relocate the checksum computation to the end of the block.

  • If the new checksum is assigned into some header or metadata, but is not actually read in the rest ingress/egress pipeline (this should be the common case), then I claim the compiler transformation to relocate the computation to the end of the pipeline is relatively simple. All the compiler has to do is store the values of the headers that were read to compute the checksum in some metadata. And as an optimization, any fields used in the checksum that are read but not modified later in the pipeline would not need to be stored -- in the limit, if none of the fields are modified, the entire checksum computation can be relocated without any extra state.

  • If the new checksum is read in the rest of the ingress/egress pipeline (which should be an uncommon case) then the compiler has to somehow implement the checksum using other hash externs or ALU operations. That might be expensive, and even prohibitive on certain targets, but it would be okay in my view for the compiler to reject programs that require too many resources.

This story is relatively simple for a batch-mode checksum function. It's more complicated for an object, which has state that might persist across calls. But, waving my hands slightly, it should be possible for the compiler to extract that state from the call site and reconstruct it later to emulate the semantics of the original program.

@jnfoster
Copy link
Collaborator

jnfoster commented Sep 7, 2017

@mbudiu-vmw

@jafingerhut is right: the changes about *_checksum functions which were made recently were entirely confined to v1model and were made to more accurately reflect restrictions of the BMv2 simple_switch implementation of these functions. These restrictions are directly inherited from the P4-14 spec, so we cannot work around them.

Right. Note that the vagaries of the P4_14 conversion need to have any effect on PSA. We are starting from a clean slate and can design an elegant solution.

If your target is very restricted in what it can do, it may be actually very difficult to write compiler transformations to massage all feasible programs into the restricted forms allowed by the target. I expect that the checksum unit will be a perfect example. You should consider what is needed to do this for an exotic target such as Tofino.

I think that for the common case, the transformation will be not too difficult to implement and will lead to the same efficient implementations as one would write with a ComputeChecksum block even on a target like Barefoot's Tofino.

@jafingerhut
Copy link
Collaborator Author

I believe that the parts of this issue related to the Internet checksum for IPv4, TCP, and UDP headers have been fully addressed by the addition of the InternetChecksum extern to the PSA, which supports incremental checksums and 'calculated from scratch' checksums, at least for headers (by design, it does not support performing checkums over packet payloads).

Unless there is some other aspect of this issue that remains, I think it can be closed now.

@cc10512
Copy link
Contributor

cc10512 commented Nov 6, 2017

Indeed, we have considered most aspects.

@cc10512 cc10512 closed this as completed Nov 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants