This document describes our C and C++ programming style. While it’s a good idea to conform to the project style, there may be exceptions where departing from the style produces more readable code.
In brief, we use C99 and C++11 (no RTTI or exceptions) on POSIX, lines are no more than 77 columns long, indentation is made with four spaces and curly brackets appear at the end of the opening line except for functions.
For general information about contributing to Plasma please see our contributors' documentation.
We follow a pattern on C to allow us to emulate (poorly) the modules of languages such as Ada and Modula-3.
-
Every
.c
/.cpp
file has a corresponding.h
file with the same base name. For example,list.c
andlist.h
. The exceptions are:-
The alternative interpreters are exceptions, they share the same header
pz_interp.h
but have different implementations. -
Each interpreters implementation begins with the same prefix, such as
pz_generic_*.c
which is the generic interpreter’s files. -
pz_main.cpp
is also an exception, it only exportsmain()
which needs no declaration. -
Finally
pz_gc_layout.h
is an exception, it provides the class eclarations for the GC’s layout while other files contain the implemention organised by function. This organisation groups functions with related behaviours which makes more sense than by class.
-
-
Not all
.h
files have a corresponding.c
/.cpp
files. -
We consider the
.c
/.cpp
file to be the module’s implementation and the.h
file to be the module’s interface. We’ll just use the terms ‘source file’ and ‘header’. C++ templates are an exception since their implementation must be in a header file, these headers have special names ending intemplate.h
-
All items exported from a source file must be declared in the header. Declarations for variables (although rare) must use the
extern
keyword, otherwise storage for the variable will be allocated in every source file that includes the header containing the variable definition. -
All items not-exported from a module must be declared to be static.
-
We import a module by including its header. Never give
extern
or forward declarations for imported functions in source files. Always include the header of the module instead. When C++ classes form cycles, forward declare one of the class names to break the cycle immediately before its use. -
Each header must include any other headers on which it depends. Hence it’s imperative every header be protected against multiple inclusion. Also, take care to avoid circular dependencies where possible.
-
Always include system headers using the angle brackets syntax, rather than double quotes. That is
#include <stdio.h>
. Plasma-specific headers should be included using the double quotes syntax. That is#include "pz_run.h"
Do not put root-relative or ‘..’-relative directories in#includes
. -
Includes should be organised into 4 groups, separated by a blank line:
pz_common.h
, system includes, this module’s header file, other Plasma includes. Each group should be sorted alphabetically where possible.
C/C language source and header files should begin with the prefix
use
the pz_
.
The C language does not have a namespace concept, prefixing C symbols with
pz_
can make linking, and debugging linked programs easier. In Cpz
namespace.
Sometimes a file (header or source file) will cover multiple concepts. In
these cases the order above may be broken in order to keep things with the
same concept together. For example, this may mean placing a struct
followed by the functions that operate on it, followed by a global variable,
and the functions that operate on it.
In some cases the environment may force a different order. For example C preprocessor macros may need to be placed in a specific order.
Generally items within a file should be organised as follows:
Items in source files should in general be in this order:
-
Prologue comment describing the module.
-
#includes
-
Any local
#defines
. -
Definitions of any local (that is, file-static) global variables.
-
Prototypes for any local (that is, file-static) functions.
-
Definitions of functions.
Within each section, items should generally be listed in top-down order, not
bottom-up. That is, if foo()
calls bar()
, then the definition of foo()
should precede the definition of bar()
.
Items in headers should in general be in this order: typedefs, structs, unions, enums, extern variable declarations, function prototypes then #defines
Every header should be protected against multiple inclusion using the following idiom:
#ifndef MODULE_H
#define MODULE_H
/* body of module.h */
#endif // ! MODULE_H
Update headers to use the new style comment
-
Files should be saved as ascii or UTF-8 and must use unix style (LF) line endings.
-
Lines must not be more than 77 columns long.
-
Indentation is to be made with spaces, usually four spaces.
-
One line of vertical whitespace should usually be used to seperate top-level items and sections within an item. Two lines may be used at the type level to create more separation when desired.
TODO editor hint for vim.
If a statement is too long, continue it on the next line indented two levels deeper (but less or more is okay depending on the situation).
Break the line after an operator:
int var = really really long expression +
more of this expression;
And usually at an outer element if possible, this could be the assignment operator itself.
int var = (expr1 + expr2) *
(expr3 + expr4);
Sometimes line-breaking can be done nicely by naming a sub-expression, give it a meaningful name:
int sub_expr = some rather complex but separate expression;
int var = foo(a + b, sub_expr);
You may choose to align sub-expressions during breaking. This is
recommended when an expression is broken over several lines. Even though
name
is short we give it its own line because the other expressions are
long.
int var = fprintf("%s: %d, %s\n",
name,
some detailed and rather long expression,
a comment);
When things that may need wrapping occur at different depths within an expression then different levels of indentation can help convey that depth:
int var = fprintf("%s: %d, %s\n",
name,
foo(some detailed and long expression,
another detailed and long expression),
a comment);
These two sub-expressions are aligned, but they don’t have to be (see Tables below).
Sometimes breaking early can allow you to align things towards the left and give them more room. For example we prefer:
static PZ_Proc_Symbol builtin_setenv = {
PZ_BUILTIN_C_FUNC,
{ .c_func = builtin_setenv_func },
false
};
While clang-format prefers:
static PZ_Proc_Symbol builtin_setenv = { PZ_BUILTIN_C_FUNC,
{.c_func = builtin_setenv_func},
false };
Use all lowercase with underscores to separate words. For instance,
soul_machine
.
Use all uppercase with underscores to separate words. For instance, MAX_HEADROOM.
TODO: Maybe make function-like macros belong here.
Use first letter uppercase for each word, other letters lowercase and underscores to separate words. For instance, Directory_Entry.
Note
|
this is rarely used and might become the same as classes and structs. |
If something is both a struct and a typedef, the name for the struct should be formed by appending ‘_S’ to the typedef name. This overrides the style for typedefs above:
typedef struct DirectoryEntry_S {
...
} DirectoryEntry;
For unions, append ‘_U’ to the typedef name.
Fields of classes (but not structs) should begin with m_, static data members should begin with s_.
Our minimum requirements from the C and C environment is C99 (may move to
C11 in the future) and C
11 on a POSIX.1-2008 environment,
this may change as dependencies are added in this early stage of development,
however those changes should be carefully reviewed,
and if possible they should be optional.
Differences between operating systems and the use of a tool like autoconf should be handled by having different configurations available via different Makefiles and header files. We will revisit this when development reaches that stage. Autoconf should be avoided, it brings only pain.
While it’s best to keep things portable, if you need a non-standard API, or an API that’s different on each operating system. You should make it available by a macro or protecting it by #ifdefs.
C99 provides many basic data types, char
, short
, int
etc. All being
defined to be at least a certain size.
These should be used when the size doesn’t exactly matter. For example use
bool
for booleans and int
or unsigned
when you’re counting a normal
amount of something - you should not need to use the macros such as
INT_MAX
.
When size matters the inttypes.h
types are strongly recommended, including
the fast types, eg: uint_fast32_t
and their macros.
float
should be used in preference to double
which is seldom necessary
and uses more memory.
Don’t rely on exact IEEE-754 semantics.
Since C99 does not specify the representation of signed values, we will assume 2’s complement arithmetic (we’re not exactly C99 pure).
Endianness and alignment must not be assumed. If laying out a structure manually align each member based on its size.
Operating system APIs differ from platform to platform. Although most
support standard POSIX calls such as read
, write
and unlink
, you
cannot rely on the presence of, for instance, System V shared memory.
Adhere to POSIX-supported operating system calls whenever possible
since they are widely supported, even by Windows.
The CFLAGS
variable in the Makefile
will request that modern C compilers
fail to compile Plasma if it uses non-POSIX APIs.
CFLAGS=-std=c99 -D_POSIX_C_SOURCE=200809L -Wall -Werror
When POSIX doesn’t provide the required functionality, ensure that the operating system specific calls are localised.
We require a C99 compiler. However many compilers often provide
non-standard extensions. Ensure that any use of compiler extensions is
localised and protected by #ifdefs. Don’t rely on features whose behaviour
is undefined according to the C99 standard. For that matter, don’t rely on C
arcana even if they are defined. For instance, setjmp
/longjmp
and ANSI
signals often have subtle differences in behaviour between platforms.
If you write threaded code, make sure any non-reentrant code is appropriately protected via mutual exclusion. The biggest cause of non-reentrant (non-thread-safe) code is function or module-static data. Note that some C library functions may be non-reentrant. This may or may not be documented in the man pages.
In addition to sticking to C11 (which is the minimum required for "modern
C
").
We also forbid use of exceptions and RTTI, they’re unnecessary and add too
much magic.
You should also be frugal with templates and vtables.
You may follow guidelines for "good C" from other sources,
I’ve been reading the Essential C
series and found it helpful.
If you need a feature from a newer version of one of these standards, but we
don’t have the need to upgrade our minimum dependencies and the new feature
is a change you can easily add as a utility function. Then add it to
pz_cxx_future.h/cpp
(or create a new future file for other libraries),
and indicate in a comment what version of the standard they’re from.
Then when we do update our dependencies we can look in these files to easily find what workarounds we can remove.
This also applies to things that haven’t been added to a standard but might be someday.
This is one of the most important sections in the coding standard. Here we mention what other tools Plasma may depend on.
In order to build Plasma you need: * A POSIX (1-2008) system/environment. * A shell compatible with Bourne shell (sh) * GNU make * A C99/C++11 compiler * Mercury 14.01.1 or newer.
Basic layout (line length, indentation etc) is covered above in File encoding.
Clang-format has been configured and mostly does the right thing. But often doesn’t. You could check "what would clang-format do" but it is not to be relyed on.
Curly brackets should be placed at the end of the opening line, and on a new line not-indented at the end:
if (condition) {
...
}
Except for functions and classes, which should have the opening curly on a new line.
int
foo(arg)
{
...
}
If the opening line is split between multiple lines, such as a long condition in an if-then-else, then place the opening curly on a new line to clearly separate the condition from the body:
if (this_is_a_somewhat_long_conditional_test(
in_the_condition_of_an +
if_then))
{
...
}
There should be a space between the statement keywords like if
, while
,
for
and return
and the next token. The return value should not be
parenthesised. There should also be a space around an operator.
There should be no space between the function-like keywords like sizeof
and their argument list. There also be no space between a cast and its
argument.
Place the pointer or reference qualifier between the type and the variable name.
char * str1, * str2;
This avoids confusion that might occur when the pointer qualifier is attached to the type.
char* str1, not_really_a_str;
TODO: find out if the same trap exists for C++ references.
And makes the symbol easier to notice.
Use one statement per line.
Use an +// end + comment if the if statement, switch or loop is quite large, particularly if there are multiple nested structures. It may be helpful to describe the condition of the branch in this comment.
if (blah) {
// Use curlies, even when there's only one statement in the block.
...
// Imagine dozens of lines here.
...
} // end if
An exception to the above rule about always using curlies, is that an if
statement may omit the curlies if its body is a single return
or goto
instruction and is placed on the same line.
file = fopen("file.txt", "r");
if (NULL != file) goto error;
or
file = fopen("file.txt", "r");
if (NULL != file) {
goto error;
}
but not:
file = fopen("file.txt", "r");
if (NULL != file)
goto error;
and not:
if (a_condition)
do_action();
Additionally, if one branch uses curlies then all must use curlies. Do not mix styles such as:
if (a_condition) goto error;
else {
do_something();
}
And if the condition covers multiple lines, then the body must always appear within curlies (with the opening curly on its own line as noted above).
if (0 == read_proc(file, imported, module, code_bytes,
proc->code_offset, &block_offsets[i]))
{
goto end;
}
TODO: Consider removing this rule.
To make clear your intentions, do not rely on the zero / no-zero boolean behaviour of C. This means explicitly comparing a value:
if (NULL != file) goto error
If using the equality operator ==
, use a non-lvalue on the
left-hand-side if possible.
This way the comparison can not be mistaken for an assignment.
if (0 == result) {
...
}
Case labels should be indented one level, which will indent the body by two levels.
Switch statements should usually have a default case, even if it just calls
abort()
.
If the switched-on value is an enum, the default may be omitted since the
compiler will check that all the possible values are covered.
If a switch case falls through, add a comment to say that this is deliberately intended.
switch (var) {
case A:
...
break;
case B:
...
// fall-through
case C:
...
break;
}
If a case requires local variable declarations, place the curlies like this:
...
case A: {
int foo;
...
break;
}
case B:
...
Loops that end in a non-obvious way, such as infinite while loops that use break to end the loop. Should be documented. You’ll need to use judgement about when this is needed.
// Note that the loop will exit when ...
while (true) {
...
if (some condition)
break;
...
}
or
while (everything_is_okay) {
...
if (some condition) {
// Exit the loop on the next iteration.
everything_is_okay = false;
}
...
}
In argument lists, put space after commas. Include parameter names in the declaration as this can aid in documentation.
Unlike other code blocks, the open-curly for a function should be placed on a new line.
int rhododendron(int a, float b, double c)
{
...
}
If the parameter list is very long, then you may wish, particularly for long or complex parameter lists, place each parameter on a new line aligning them. Aligning names as in variable definition lists is also suggested but not required.
int rhododendron(int a_long_parameter,
struct AComplexType* b,
double c)
{
...
}
Variable declarations shouldn’t be flush left, however.
int x = 0,
y = 3,
z;
int a[] = { 1,2,3,4,5 };
When defining multiple variables or structure fields or in some cases function parameters, then lining up their names is recommended. This also applies to structure and union fields.
There should be one line of vertical space between the definition list and the next statement.
char * some_string;
int x;
MyStructure * my_struct;
if (...) {
Prefer enums to lists of #defines. Note that enums constants are of type int, hence if you want an enumeration of chars or shorts, then you must use lists of #defines.
Nested #ifdefs, #ifndefs and #ifs should be indented by two spaces for each level of nesting. For example:
#ifdef GUAVA
#ifndef PAPAYA
#else // PAPAYA
#endif // PAPAYA
#else // not GUAVA
#endif // not GUAVA
If a thing will have methods that act on instances, it is a class and should begin with the "class" keyword, and keep its data members private. Otherwise it is a struct and shell begin with a struct keyword..
Bare pointers aren’t "modern C+\+". However in Plasma’s runtime system they show that the lifetime of the object is handled elsewhere. Either it is known to live a very long time and live in static data or on the C++ heap and destroyed when the program ends. Or it is a GC allocated object and we additionally guarantee that in the time while it’s live (passed around) it’s impossible for a GC to occur (there’s also a NoGCScope present.
TODO: Describe how we root GC pointers within runtime code.
C++ exposes implementation details of classes in their declarations as private members. This means that changes to these internal details can cause unnecessary recompilation. On the other hand it allows the compiler to inline functions defined in the class definition that do access private members.
When the latter need is not great it can be good to avoid creating the former problem by hiding these details. There are a few different techniques
The pImpl pattern is done where the class now contains a pointer to a
class that contains the actual implementation. This pointer should be a
std::shared_ptr
and the outer class is expected to be passed by value
rather than by reference.
While this still allows callers to use object.method()
style calls (which
then forward), it breaks the normal expectations where "most objects should
be passed by reference". Of course you can pass them by reference but
doing so creates an extra pointer indirection.
Passing by value isn’t great either, causing extra work in the
std::shared_ptr
to maintain its reference count.
There’s another pattern where an abstract base class contains a virtual public interface and a private derived class containing the actual implementation. We avoid this because we want to avoid vtables when we can.
Therefore the pattern we use in Plasma’s runtime (when we choose to hide
implementation details at all) is to forward declare the class, and define
it in an implementation file or implementation-only header file.
The public interface is defined as non-member forwarding functions.
This pattern can be seen in
pz_gc.h
and
pz_gc.impl.h
.
Use your judgement for whether a function should be commented. Sometimes the function name and parameter names will provide a lot of information. However for more complex functions a comment will be necessary. Comments are strongly recommended when:
-
They have side-effects
-
They require an input to be sorted, non-null or similar.
-
They have different semantics when an input has a different value (they should be separate functions if they do a different function).
-
They allocate memory that the caller is now responsible for.
-
They return statically allocated memory (try to avoid this).
-
They free memory.
-
They return certain values (non-zero, -1 etc) for errors.
-
They ain’t thread safe or reenterant.
Each non-trivial macro should be documented just as for functions (see above). It is also a good idea to document the types of macro arguments and return values, e.g. by including a function declaration in a comment.
Parameters to macros should be in parentheses.
#define STREQ(s1,s2) (strcmp((s1),(s2)) == 0)
This ensures than when a complex expression is passed as a parameter that different operator precedence does not affect the meaning of the macro.
Such function comments should be present in header files for each function exported from a source file. Ideally, a client of the module should not have to look at the implementation, only the interface. In C terminology, the header should suffice for working out how an exported function works.
Every source file should have a prologue comment which includes:
-
Copyright notice.
-
License info
-
Short description of the purpose of the module.
-
Any design information or other details required to understand and maintain the module (may be links to other documents).
Describe the exact format in use and ensure that all the C code conforms to this.
Use comments of this form:
/*
* This is a block comment,
* it uses multiple lines.
* It should have a blank line before it and it comments the declaration,
* definition, block or group of statements immediately following it.
*/
For annotations to a single line of code:
i += 3; // Add 3.
Note that the //
comment is standard in C99, which we are using.
If the comment fits on one line, even if it describes multiple lines, a
single line comment is okay:
// Add 3.
i += 3;
However if the comment is important, or the thing it documents is significant. Then use a block comment.
Any code that needs to be revisited because it is a temporary hack (or some other expediency) must have a comment of the form:
/*
* XXX: <reason for revisit>
* - <Author name>
*/
The <reason for revisit> should explain the problem in a way that can be understood by developers other than the author of the comment. Also include the author of this comment so that a reader will know who to ask if they need further information.
"TODO" and "Note" are also common revisit labels. Only "XXX" requires the author’s name.
The #ifdef
constructs should be commented like so if they extend for more
than a few lines of code:
#ifdef SOME_VAR
...
#else // ! SOME_VAR
...
#endif // ! SOME_VAR
Similarly for #ifndef
.
Use the GNU convention of comments that indicate whether the variable is
true in the #if
and #else
parts of an #ifdef
or #ifndef
. For
instance:
#ifdef SOME_VAR
#endif // SOME_VAR
#ifdef SOME_VAR
...
#else // ! SOME_VAR
...
#endif // ! SOME_VAR
#ifndef SOME_VAR
...
#else // SOME_VAR
...
#endif // SOME_VAR
Typing make format
will run clang-format-10 on the C/C++ code. It
mis-formats quite a few things so we don’t yet use it automatically, or may
do on a file-by-file basis some time.
When code or data is tabular then using a tabular layout makes the most sense. This may be something formatters cannot handle, some will allow you to describe excisions.
We don’t have a good example of this in the code base,
however the data in pz_builtin.c
could probably be set out in a table.
If it were it might look like:
static PZ_Proc_Symbol builtins[] = {
{ PZ_BUILTIN_C_FUNC, {.c_func = builtin_setenv_func}, false },
{ PZ_BUILTIN_C_FUNC, {.c_func = builtin_free_func}, false }
};
Macros should either be expressions (they have a value) or statements (they do not), this must always be clear. If necessary make a single statement using a block. The do {} while (0) pattern is not necessary since bodies of if statments may not be macros without their own curly brackets.
#define PZ_WRITE_INSTR_1(code, w1, tok) \
if (opcode == (code) && width1 == (w1)) { \
token = (tok); \
goto write_opcode; \
}
C expressions may have side-effects, this is okay most of the time but can lead to confusion with macros. A macro can evaluate its parameters more than once. Avoid doing this in your macros, and if you must add a comment explaining that this can happen.
It’s very easy for C++ compilers to want to perform type conversions for you. This is frequently done via conversion operators and constructions that take a single argument. The later are easy to provide by mistake, therefore 1-arg constructors should be declared as explicit, which will prevent the compiler from using them automatically.
explicit MyType(const OtherType &other);
When implicit conversion is desired, add a comment to tell anyone reading your code that you didn’t forget, that you want it to be implicit.
// Implicit constructor Optional(T &other);
C will create implicit copy constructors. These don’t always do the right
thing so it is best to either create them explicitly or tell C
you don’t
want them. The same is true for copy assignment operators.
MyClass(const MyClass &) = delete; void operator=(const MyClass &) = delete
-
Limit module exports to the absolute essentials. Make as much static (that is, local) as possible since this keeps interfaces to modules simpler.
-
Use typedefs to make code self-documenting. They are especially useful on structs, unions, and enums. Use them on the struct or union’s forward declaration or header declaration when the definition is provided elsewhere.