Skip to content

Compiler Feature Requests

Anna Kornfeld Simpson edited this page Jul 26, 2018 · 2 revisions

Compiler Feature Requests

This document contains a prioritized list of feature requests for the compiler with some example code snippets. By definition, it won't mention all the features that worked smoothly or all the typos found via the compiler's extra checking.

Unsigned Integer Comparisons

One of the most common types of bounds-related warnings looks like the following example:

free(example_string);

**warning: cannot prove argument meets declared bounds for 1st parameter**
expected argument bounds are bounds(example_string, example_string + 0)
inferred bounds are bounds(example_string, example_string + example_len).

example_len is of type size_t, which is an unsigned type, thus it must always be >= 0. It would be very helpful if the compiler could prove the bounds satisfied without requiring any additional dynamic annotation.

I have found nearly 10 instances of the precise example above with size_t typed string lengths not found to be >= 0.

Another example - for the following code:

int is_decimal(const char *string : itype(_Nt_array_ptr<const char>) count(length), size_t length) {
if (length > 2 && !strncmp(string, "-0", 2) && string[2] != '.') ...
}

the compiler produces the following warning about bounds:

warning: cannot prove argument meets declared bounds for 1st parameter  [-Wcheck-bounds-decls-checked-scope]
                            strncmp(string, "-0", 2)
note: (expanded) expected argument bounds are 'bounds(string, string + 0)'
note: (expanded) inferred bounds are 'bounds(string, string + length)'

The clause to the left of the strncmp call establishes that the unsigned length > 2, so it must be greater than 0.

Arithmetic and Multiplicative Identities

Many calls to malloc for strings will have arguments of the form (length * (size_t)1) which is how the compiler simplifies (length * sizeof(char)). In order for the bounds checks to go through without warning, all bounds declarations for such a malloc'd string must also have the * (size_t)1, which makes the code harder to read. If the compiler could simplify identity functions for multiplication (and perhaps addition as well), it would mean less work for a developer converting code.

Reasoning about invariants from callers of a method (via where clauses)

Several bounds statements cannot be proved only from information inside the current function. Instead they require examining logic at the call site. The following code has two examples of this, pointed out in the comments.

size_t string_length = strlen(*unprocessed);
  // This function has only one caller: process_string. The second argument is malloc'd to
  // have len + 1 of the first argument in that function. Therefore assigning this as bounds
  // is brittle but correct.
  _Nt_array_ptr<char> processed_ptr : count(string_length + 1) = _Dynamic_bounds_cast<_Nt_array_ptr<char>>(*processed, count(string_length + 1));
_Nt_array_ptr<const char> unprocessed_ptr = *unprocessed;
parse_succeeded = parse_utf16_hex(unprocessed_ptr, &cp);
  if (!parse_succeeded) {
    return JSONFailure;
  }
  // Note: If parse_succeeded, that means there are at least 4 elements in unprocessed.
  // By the logic of this function's only caller, processed is the same size as unprocessed.
  _Dynamic_check(string_length >= 4); 
  
  if (cp < 0x80) {
    processed_ptr[0] = (char)cp; /* 0xxxxxxx */
  } else if (cp < 0x800) {
    processed_ptr[0] = ((cp >> 6) & 0x1F) | 0xC0; /* 110xxxxx */
    processed_ptr[1] = ((cp)      & 0x3F) | 0x80; /* 10xxxxxx */
    processed_ptr += 1;
    ...

In this case we have a length we can use to express an invariant and check it dynamically. In other cases, we may be dealing with items that do not have a known length. To express invariants such as those requires the where clauses from the checkedc spec.