Skip to content

Latest commit

 

History

History
2856 lines (2257 loc) · 96.9 KB

README.org

File metadata and controls

2856 lines (2257 loc) · 96.9 KB

C Lessons

clang, gcc and msvc

Quick start

It is not C Lessons at all :). I’d programming in C long time ago, sometimes I want to pick something up, but I cannot find the peice of code somewhere or cannot run the code written in another machine.

Sadly, old dog always need to learn something new.

  • Access the code from anywhere, oh, GitHub is good one
  • Run or write code on anywhere, so Linux, Darwin, or Windows, Docker Box
  • Easy to try and learn

Now, we had Nore, something changed and something not.

Let’s start …

# bootstrap Nore
curl https://raw.githubusercontent.com/junjiemars/nore/master/bootstrap.sh -sSfL | sh

# configure -> make -> test -> install
./configure --has-hi
make
make test
make install

Language

Run the example under src/lang.

./configure --has-lang
make clean test

Preprocessor

The preprocessor runs first, as the name implies. It performs some text manipulations, such as:

  • stripping comments
  • resolving #include directives and replacing them with the contents of the included file
  • #include_next directives does not distinguish between <file> and "file" inclusion, just look the file in the search path
  • evaluating #if and #ifdef directives
  • evaluating #define
  • expanding the macros found in the rest of the code according to those #define
./configure --lang
make clean lang_preprocessor_test

#ident

#include

The #include directive instructs the preprocessor to paste the text of the given file into the current file. Generally, it is necessary to tell the preprocessor where to look for header files if they are not placed in the current directory or a standard system directory.

#define

The #define directive takes two forms: defining a constant or creating a macro.

  • Defining a constant
#define identifier [value]

When defining a constant, you may optionally elect not to provide a value for that constant. In this case, the identifier will be replaced with blank text, but will be “defined” for the purposes of #ifdef and ifndef. If a value is provided, the given token will be replaced literally with the remainder of the text on the line. You should be careful when using #define in this way.

  • Defining a parameterized macro
#define identifier(<arg> [, <arg> ...]) statement
#define max(a, b) ((a) > (b) ? (a) : (b))

#undef

#undef identifier

The #undef directive undefines a constant or macro that defined previously using #define.

For example:

#define E 2.71828
double e_squared = E * E;
#ifdef E
#  undef E
#endif

Usually, #undef is used to scope a preprocessor constant into a very limited region: this is done to avoid leaking the constant. #undef is the only way to create this scope since the preprocessor does not understand block scope.

#if vs. #ifdef

#if check the value of the symbol when the symbol had been defined, #ifdef just check the existence of the symbol.

Prefer #if defined(...), it’s more flexible

#if defined(LINUX) || defined(DARWIN)
/* code: when on LINUX or DARWIN platform */
#endif

#if defined(CLANG) && (1 == NM_CPU_LITTLE_ENDIAN)
/* code: when using clang compiler and on a little endian machine */
#endif

#ifndef

#ifndef identifer
/* code: when the identifier had not been defined */
#endif

#ifndef checks whether the given identifier has been #defined earlier in the file or in an included file; if not, it includes the code between it and the closing #else or, if no #else is present, #endif statement. #ifndef is often used to make header files idempotent by defining a identifier once the file has been included and checking that the identifier was not set at the top of that file.

#ifndef    _LANG_H_
#  define  _LANG_H_
#endif

#if !defined(identifier) is equivalent to #ifndef identifier

#if !defined(min)
#  define min(a, b) ((a) < (b) ? (a) : (b))
#endif

#error

#error "[description]"

The #error macro allows you to make compilation fail and issue a statement that will appear in the list of compilation errors. It is most useful when combined with #if/#elif/#else to fail compilation if some condition is not true. For example:

#if (1 == _ERROR_)
#  error "compile failed: because _ERROR_ == 1 is true"
#endif

#pragma

The #pragma directive is used to access compiler-specific preprocessor extensions.

A common use of #pragma is the #pragma once directive, which asks the compiler to include a header file only a single time, no matter how many times it has been imported.

#pragma once
/* header file code */

/* #pragma once is equivalent to */
#ifndef    _FILE_NAME_H_
#  define  _FILE_NAME_H_
/* header file code */
#endif

The #pragma directive can also be used for other compiler-specific purposes. #pragma is commonly used to suppress warnings.

#if (MSVC)
#  pragma warning(disable:4706) /* assignment within conditional expression */
#  pragma comment(lib, "Ws2_32.lib") /* link to Ws2_32.lib */
#elif (GCC)
#  pragma GCC diagnostic ignored "-Wstrict-aliasing" /* (unsigned*) &x */
#elif (CLANG)
#  pragma clang diagnostic ignored "-Wparentheses"
#endif

__FILE__

  • __FILE__ expands to full path to the current file
  • __LINE__ expands to current line number in the source file, as an integer
  • __DATE__ expands to current date at compile time in the form Mmm dd yyyy as a string, such as “Oct 26 2021”
  • __TIME__ expands to current time at compile time in the form hh:mm:ss in 24 hour time as a string, such as “16:08:17”
  • __TIMESTAMP__ expands to current time at compile time in the form Ddd Mmm Date hh::mm::ss yyyy as a string, where the time is in 24 hour time, Ddd is the abbreviated day, Mmm is the abbreviated month, Date is the current day of the month (1-31), and yyyy is the four digit year, such as “Tue Oct 26 12:42:21 2021”
  • __func__ expands to the function name as part of C99

main

exit

Most C programs call the library routine exit, which flushes buffers, closes streams, unlinks temporary files, etc., before calling _exit.

assert

No, there’s nothing wrong with assert as long as you use it as intended.

  • assert: a failure in the program’s logic itself.
  • error: an erroneous input or system state not due to a bug in the program.

Assertions are primarily intended for use during debugging and are generally turned off before code is deployed by defining the NDEBUG macro.

# with assert
./configure --has-lang
make clean lang_assert_test

# erase assertions: simple way
./configure --has-lang --with-release=yes
make clean lang_assert_test

An assertion specifies that a program statisfies certain conditions at particular points in its execution. There are three types of assertion:

  • preconditions: specify conditions at the start of a function.
  • postconditions: specify conditions at the end of a function.
  • invariants: specify conditions over a defined region of a program.

The static_assert macro, which expands to the _Static_assert_, a keyword added in C11 to provide compile-time assertion.

enum

enum [identifier] { enumerator-list };

enumerator = constant-expression;

enumerator-list is a comma-separated list, tailing comma permitted since C99, identifier is optional. If enumerator is followed by constant expression, its value is the value of that constant expression. If enumerator is not followed by constant-expression, its value is the value one greater than the value of the previous enumerator in the same enumeration. The value of the first enumerator if it does not use constant-expression is zero.

Unlike struct and union, there are no forward-declared enum in C.

Error

  • fail safe pertaining to a system or component that automatically places itself in a safe operating mode in the event of a failue: a traffic light that reverts to blinking red in all directions when normal operation fails.
  • fail soft pertaining to a system or component that continues to provide partial operational capability in the event of certain failues: a traffic light that continues to alternate between red and green if the yellow light fails. A static variable errno indicating the error status of a function call or object. These indicators are fail soft.
  • fail hard aka fail fast or fail stop. The reaction to a detected fault is to immediately halt the system. Termination is fail hard.

errno

Before C11, errno was a global variable, with all the inherent disadvantages:

  • later system calls overwrote earlier system calls;
  • global map of values to error conditions (ENOMEM, ERANGE, etc);
  • behavior is underspecified in ISO C and POSIX;
  • technically errno is a modifiable lvalue rather than a global variable, so expressions like &errno may not be well-defined;
  • thread-unsafe;

In C11, errno is thread-local, so it is thread-safe.

Disadvantages of Function Return Value:

  • functions that return error indicators cannot use return value for other uses;
  • checking every function call for an error condition increases code stabilities by 30%-40%;
  • impossible for library function to enforce that callers check for error condition.

strerror

char * strerror(int errnum);

Interprets the value of errnum, generating a string with a message that describes the error condition as if set to errno by a function of the library. The returned pointer points to a statically allocated string, which shall not be modified by the program. Further calls to this function may overwrite its content (particular library implementations are not required to avoid data races). The error strings produced by strerror may be specific to each system and library implementation.

perror

void perror(const char *str);

Interprets the value of errno as an error message, and prints it to stderr (the standard error output stream, usually the console), optionally preceding it with the custom message specified in str. If the parameter str is not a null pointer, str is printed followed by a colon : and a space. Then, whether str was a null pointer or not, the generated error description is printed followed by a newline character '\n'. perror should be called right after the error was produced, otherwise it can be overwritten by calls to other functions.

Function

main

C90 main() declarations:

int main(void);

int main(int argc, char **argv);

/* samed with above */
int main(int argc, char *argv[]);

/* classicaly, Unix system support a third variant */
int main(int argc, char **argv, char**envp);

C99 the value return from main():

  • the int return type may not be omitted.
  • the return statement may be omitted, if so and main() finished, there is an implicit return 0.

In arguments:

  • argc > 0
  • argv[argc] == 0
  • argv[0] through to argv[argc-1] are pointers to string whose meaning will be determined by the program.
  • argv[0] will be a string containing the program’s name or a null string if that is not avaiable.
  • envp is not specified by POSIX but widely supported, getenv is the only one specified by the C standard, the putenv and extern char **environ are POSIX-specific.

Forward declaration

  • call graph is cyclic
  • cross more than one translation unit

Macro

# macro operator

Prefixing a macro token with # will quote that macro token. This allows you to turn bare words in your source code into text token. This can be particularly useful for writing a macro to convert the member of enum from int into a string.

enum COLOR { RED, GREEN, BLUE };
#define COLOR_STR(x) #x

## macro operator

The ## operator takes two separate tokens and pastes them together to form a single identifier. The resulting identifier could be a variable name, or any other identifier.

#define DEFVAR(type, var, val) type var_##var = val

DEFVAR(int, x, 1); /* expand to: int var_x = 1; */
DEFVAR(float, y, 2.718); /* expand to: float var_y = 2.718; */

Expression

Expression-type macro will expand to expression, such as the following macro definition

#define double_v1(x) 2*x

But double_v1 has drawback, call double_v1(1+1)*8 expands to wrong expression 2*1+1*8 .

Use parens to quoted input arguments and final expression:

#define double_v2(x) (2*(x))

Now, it expands to (2*(1+1))*8

But, max macro has side-effect that eval the argument twice

#define max(a, b) ((a) > (b) ? (a) : (b))

when call it with max(a, b++) .

Block

If the macro definition includes ; statatment ending character, we need to block it.

#define incr(a, b)   \
    (a)++;           \
    (b)++;

Call it with

int a=2, b=3;
if (a > b) incr(a, b);

just only b will be incremented. We can block it and convert it to block-type macro.

#define incr(a, b) { \
   (a)++; (b)++;     \
}

But the aboved block macro is not good enough: omit ; is no intitutive and the tailing ; will wrong in some cases, such as

int a = 2, b = 3;
if (a < b)
  incr(a, b); /* tailing ; */
else
  a *= 10;

/* expanded code, and should compile failed */
if (a < b)
  { (a)++; (b)++; };
else
  a *= 10;

do { ... } while (0) resolved those issues.

#define incr(a, b) do { \
   (a)++; (b)++;        \
} while (0) /* no tailing ; */

/* expanded code */
if (a < b)
  do { (a)++; (b)++; } while (0); /* append ; */
else
  a *= 10;

Name clash

We can use same machinism like Lisp’s (gensym) to rebind the input arguments to new symbols.

Nested macro

Macro name within another macro is called Nesting of Macro.

#define SQUARE(x) ((x)*(x))
#define CUBE(x) (SQUARE(x)*(x))

Check expansion

cc -E <source-file>

Pointer

& and *

The & address of.

The * has two distinct meanings within C in relation to pointers, depending on where it’s used. When used within a variable declaration, the value on the right hand side of the equals side should be a pointer value to an address in memory. When used with an already declared variable, the * will deference the pointer value, following it to the pointer-to place in memory, and allowing the value stored there to be assigned or retrieved.

sizeof Pointer

Depends on compiler and machine, all types of pointers on specified machine and compiled via specified compiler has same the size, generally occupy one machine word.

const Pointer

Threre is a technique known as the Clockwise/Spiral Rule enables any C programmer to parse in their head any C declaration.

The first const can be either side of the type.

const int * == int const *; /* pointer to const int */
const int * const == int const * const; /* const pointer to const int  */
  • pointer to const object
    int v = 0x11223344;
    const int *p = &v;
        
  • const pointer to object
    int v1=0x11223344;
    int *const p1 = &v1;
        
  • const pointer to const object
    int v1=0x11223344;
    const int *const p = &v1;
        
  • pointer to pointer to const object
    const int **p;
        
  • pointer to const pointer to object
    int *const *p;
        
  • const pointer to pointer to object
    int* *const p;
        
  • pointer to const pointer to const object
    const int *const *p;
        
  • const pointer to pointer to const object
    const int **const p;
        
  • const pointer to const pointer to object
    int *const *const p;
        

Run example:

./configure --has-lang
make clean lang_ptr_const_test

volatile Pointer

The volatile is to tell the compiler not to optimize the reference, so that every read or write does not use the value stored in register but does a real memory access.

volatile int v1;
int *p_v1 = &v1; /* bad */
volatile int *p_v1 = &v1; /* better */

restrict Pointer

  • restrict keyword had been introduced after c99
  • It’s only way for programmer to inform about an optimizations that compiler can make.

function Pointer

return_type_of_fn (*fn)(type_of_arg1 arg1, type_of_arg2 arg2 ...);
  • void Pointer

The void* is a catch all type for pointers to object types, via void pointer can get some ploymorphic behavior. see qsort in stdlib.h

Dangling Pointer

Pointers that point to invalid addresses are sometimes called dangling pointers.

Pointer decay

Decay refers to the implicit conversion of an expression from an array type to a pointer type. In most contexts, when the compiler sees an array expression it converts the type of the expression from N-element array of T to const pointer to T and set the value of the expression to the address of the first element of the array. The exceptions to this rule are when an array is an operand of either the sizeof or & operators, or the array is a string literal being used as an initializer in a declaration. More importantly the term decay signifies loss of type and dimension.

Pointer aliasing

In computer programming, aliasing refers to the situation where the same memory location can be accessed using different names.

Storage

Storage class in C decides the part of storage to be allocated for a variable, it also determines the scope of a variable. Memory and CPU registers are types of locations where a variable’s value can be stored. There are four storage classes in C those are automatic, register, static, and external.

Each declaration can only have one of five storage class specifier: static, extern, auto, register and typedef.

typedef storage class specifier does not reserve storage and is called a storage class specifier only for syntatic convenience.

The general declaration that use a storage class is show here: <storage-class-specifier> <type> <identifer>

Living example:

./configure --has-lang
make clean lang_storage_test

Automatic storage class

auto storage class specifier denotes that an identifier has automatic duration. This means once the scope in which the identifier was defined ends, the object denoted by the identifier is no longer valid.

Since all objects, not living in global scope or being declared static, have automatic duration by default when defined, this keyword is mostly of historical interest and should not be used. auto can’t apply to parameter declarations. It is the default for variable declared inside a function body, and is in fact a historic leftover from C predecessor’s B.

Register storage class

Hints to the compiler that access to an object should as fast as possible.Whether the compiler actually uses the hint is implementation-defined; it may simply treat it as equivalent to auto.

The compiler does make sure that you do not take the address of a vairable with the register storage class.

The only property that is definitively different for all objects that are declared with register is that they cannot have their address computed. Thereby register can be a good tool to ensure centain optimizations:

/* error: address of register variable requested */
register int i = 0x10;
int *p = &i;

i that can never alias because no code can pass its address to another function where it might be changed unexpectedly

This property also implies that an array

void decay(char *a);
register char a[] = { 0x11, 0x22, 0x33, 0x44, };
decay(a);

cannot decay into a pointer to its first element (i.e. turning into &a[0]). This means that the elements of such an array cannot be accessed and the array itself cannot be passed to a function.

In fact, the only legal usage of an array declared with a register storage class is the sizeof operator; Any other operator would require the address of the first element of the array. For that reason, arrays generally should not be declared with the register keyword since it makes them useless for anything other than size computation of the entire array, which can be done just as easily without register keyword.

The register storage class is more appropriate for variables that are defined inside a block and are accessed with high frequency.

Static storage class

The static storge class serves different purposes, depending on the location of the declaration in the file. >=C99, used in function parameters to denote an array is expected to have a constant minimum number of elements and a non-null parameter.

  • scope: file scope (confine the identifier to that translation unit only) or function scope (save data for use with the next call of a function)
  • duration: static
  • default initial value: 0

External storage class

extern keyword used to declare an object or function that is defined elsewhere (and that has external linkage). In general, it is used to declare an object or function to be used in a module that is not the one in which the corresponding object or function is defined.

  • scope: global
  • duration: static
  • default initial value: 0

Scope

In C, all identifiers are lexically (or statically) scoped.

The scope of a declaration is the part of the code where the declaration is seen and can be used. Note that this says nothing about whether the object associated to the declaration can be accessed from some other part of the code via another declaration. We uniquely identify an object by its memory: the storage for a variable or the function code.

Finally, note that a declaration in a nested scope can hide a declaration in an outer scope; but only if one of two has no linkage.

Declarations and Definitions

If neither the extern keyword nor an initializer are present, the statement can be either a declaration or a definition. It is up to the compiler to analyse the modules of the program and decide.

  • All declarations with no linkage are also definitions. Other declarations are definitions if they have an initializer.
  • A file scope variable declaration without the external linkage storage class specifier or initializer is a tentative definition.
  • All definitions are declarations but not vice-versa.
  • A definition of an identifier is a declaration for that identifier that: for an object, causes storage to be reserved for that object.

A declaration specifies the interpretation and attributes of a set of identifiers. A definition of an identifier is a declaration for that identifier that:

  • for an object, causes storage to be reserved for that object;
  • for a function, includes the function body;
  • for an enumeration constant or typedef name, is the only declaration of the identifier.

In the following example we declared a function. Using extern keyword is optional while declaring function. If we don’t write exern keyword while declaring function, it is automatically appended before it.

int add(int, int);

Block scope

Every variable or function declaration that appears inside a block has block scope. The scope goes from the declaration to the end of the innermost block in which the declaration appears. Function parameter declarations in function definitions (but not in prototypes) also have block scope. The scope of a parameter declaration therefore includes the parameter declarations that appears after it.

Function scope

goto <label> is a bit special, which are implicitly declared at the place where they appears, but they are visible throughout the function, even if they appear inside a block.

function prototype scope is the scope for function parameters that appears inside a function prototype. It extends until the end of the prototype. This scope exists to ensure that function parameters have distinct names.

File scope

All vairables and functions defined ouside functions have file scope. They are visible from their declaration until the end of the file. Here, the term file should be understood as the source file being compiled, after all includes have been resolved.

Duration

Indicates whether the object associated to the declaration persists throughout the program’s execution (static) or whether it is allocated dynamically when the declaration’s scope is entered (automatic).

There are two kind of duration:

  • automatic
  • static

Within functions at block scope, declarations without extern or static have automatic duration. Any other declaration at file scope has static duration.

Linkage

Linkage describes how identifiers can or can not refer to the same entity throughout the whole program or one single translation unit.

Living example:

./configure --has-lang
make clean lang_linkage_test

Translation unit

A translation unit is the ultimate input to a C compiler from which an object file is generated. In casual usage it is sometimes referred to as a compilation unit. A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in #include directives are literally included, sections of code within #ifdef may be included, and macros have been expanded.

No linkage

A declaration with no linkage is associated to an object that is not shared with any other declaration. All declarations with no linkage happen at block scope: all block scope declarations without the extern storage class specifier have no linkage.

Internal linkage

Internal linkage means that the variable must be defined in your translation unit scope, which means it should either be defined in any of the included libraries, or in the same file scope. Within the translation unit, all declarations with internal linkage for the same identifier refer to the same object.

External linkage

External linkage means that the variable could be defined somewhere else outside the file you are working on, which means you can define it inside any other translation unit rather your current one. Within the whole program, all declarations with external linkage for the same identifier refer to the same object.

Size type and Pointer difference types

The C language specification include the typedefs size_t and ptrdiff_t to represent memory-related quantities. Their size is defined according to the target processor’s arithmetic capabilities, not the memory capabilities, such as avaialable address space. Both of these types are defined in the <stddef.h> header.

  • size_t is an unsigned integeral type used to represent the size of any object in the particular implementation. The sizeof operator yields a value of the type size_t. The maximum size of size_t is provided via SIZE_MAX, a macro constant which is defined in the <stdint.h> header.
  • ptrdiff_t is a signed integral type used to reprensent the difference between pointers. It is only guranteed to be valid against pointers of the same type.
  • ssize_t is POSIX standard not C standard.

Literal suffix

  • l or L for long, such as 123l, 3.14L
  • f for float, such as 2.718f

struct

A struct is a type consisting of a sequence of members whose storage is allocated in order which the members were defined.

struct optional_name { declaration_list; };
struct name;

Initialization, sizeof and === operator ignore the flexible array member.

Run example

./configure --has-lang
make clean lang_struct_test

Padding

There may be unnamed padding between any two members of a struct or after the last member, but not before the first member. The size of a struct is at least as large as the sum of the sizes of its members.

extern int a[]; /* the type of a is incomplete */
char a[4];      /* the type of a is now complete */

struct node {
  struct node *next; /* struct node is incomplete type at this point */
} /* struct node is now complete at this point */

union

A union is a type consisting of a sequence of members whose storage overlaps.

union optional_name { declaration_list; };
union name;

Type

Basic types

Integer

All C types be represented as binary numbers in memory, the way how to interprete those numbers is what type does.

C provides the four basic arithmetic type specifiers char, int, float and double, and the modifiers signed, unsigned, short and long.

long and long int are identical. So are long long and long long int. In both case, the int is optional.

specifiertype
long long intlong long int
long longlong long int
longlong int

Incomplete type

An incomplete type is an object type that lacks sufficent information to determine the size of the object of that object, and an incomplete type may be completed at some point in the translation unit.

  • void cannot be completed.
  • [] array type of unknown size, it can be completed by a later declaration that specifies the size.

typedef

typedef type_specifier declarator;
typedef type_specifier declarator1, *declarator2, (*declarator3)(void);

The typedef used to create an alias name for another types. As such, it is often used to simplify the syntax of declaring complex data structure consisting of struct and union types, but is just as common in providing specific descriptive type names for integer types of varying lengths. The C standard library and POSIX reserve the suffix _t, for example as in size_t and time_t.

#define is a C directive which is also used to define the aliases for various data types similar to typedef but with the following differences:

  • typedef is limited to givien symbolic names to types only where as #define can be used to define alias for values as well.
  • typedef interpretation is performed by the compiler whereas #define statements are processed by the preprocessor.

Using typedef to hide struct is considered a bad idea in Linux kernel coding style

Run typedef example

./configure --has-lang
make clean lang_typedef_test

typeof

typeof operator is not C standard.

Run typeof example

./configure --has-lang
make clean lang_typeof_test

cdecl

A declaration can have exactly one basic type. The basic types are argumented with derived types, can C has three of them:

  • function [(decl-list)] returning: ()
  • array [number] of: []
  • [const | volatile | restrict] pointer to: ***

The array of [] and function returning () type operators have higher precedence than pointer to *.

alloc

malloc

Don’t cast the result of malloc. It is unneccessary, as void * is automatically and safely prompted to any other pointer type in this case. It adds clutter to the code, casts are not very easy to read (especially if the pointer type is long). It makes you repeat yourself, which is generally bad. It can hide an error, if you forgot to include <stdlib.h>. This can crashes (or, worse, not cause a crash until way later in some totally different part of the code). Consider what happens if pointers and integers are differently sized; then you’re hiding a warning by casting and might lose bits of your returned address. Note: as of C11 implicit functions are gone from C, and this point is no longer relevant since there’s no automatic assumption that undeclared functions return int.

To add further, your code needlessly repeats the type information (int) which can cause errors. It’s better to dereference the pointer being used to store the return value, to lock the two together: int*x = malloc(length * sizeof *x); This also moves the lengh to theront for increased visibility, and drops the redundant parentheses with sizeof(); they are only needed when the argument is a type name. Many people seem to not know or ignore this, which makes their code more verbose. Remember: sizeof is not a function!

While moving length to the front may increase visibility in some rare cases, one should also pay attention that in the general case, it should be better to write the expression as: int *x = malloc*x * length); Compare with malloc(sizeof *x * length * width) vs. malloc(length * width * sizeof *x) the second may overflow the length * width when length and width are smaller types than size_t.

calloc

calloc should zero intializes the allocated memory. Call calloc is not necessarily more expensive.

realloc

libc

The C standard library is a standardized collection of header files and library routines used to implement common operations.

std

There has an good answer of What is the difference between C, C99, ANSI C and GNU C:

  • Everything before standardization is generally called “K&R C”, after the famous book, with Dennis Ritchie, the inventor of the C language, as one of the authors. This was “the C language” from 1972-1989.
  • The first C standard was released 1989 nationally in USA, by their national standard institute ANSI. This release is called C89 or ANSI-C. From 1989-1990 this was “the C language”.
  • The year after, the American standard was accepted internationally and published by ISO (ISO 9899:1990). This release is called C90. Technically, it is the same standard as C89/ANSI-C. Formally, it replaced C89/ANSI-C, making them obsolete. From 1990-1999, C90 was “the C language”.
  • Please note that since 1989, ANSI haven’t had anything to do with the C language. Programmers still speaking about “ANSI C” generally haven’t got a clue about what it means. ISO “owns” the C language, through the standard ISO 9899.
  • In 1999, the C standard was revised, lots of things changed (ISO 9899:1999). This version of the standard is called C99. From 1999-2011, this was “the C language”. Most C compilers still follow this version.
  • In 2011, the C standard was again changed (ISO 9899:2011). This version is called C11. It is currently the definition of “the C language”.

headers

namestdintro
assert.hC90conditionally compiled macro that compare its argument to zero
ctype.hC90functions to determine the type contained in character data
errno.hC90macros reporting error conditions
float.hC90limits of float types
limits.hC90sizes of basic types
locale.hC90localization utilities
math.hC90common mathematics functions
setjmpC90nonlocal jumps
signal.hC90signal handling
stdarg.hC90variable arguments
stddef.hC90common macro definitions
stdio.hC90input/output
stdlib.hC90general utilities: memory, program, string, random, algorithms
string.hC90string handling
time.hC90time/date utilites
iso646.hC95alternative operator spellings
wchar.hC95extended multibyte and wide character
wctype.hC95functions to determine the type contained in wide character utilities
complex.hC99complex number arithmetic
fenv.hC99floating-point environment
inttypes.hC99format conversion of integer types
stdbool.hC99macros for boolean types
stdint.hC99Fixed-width integer types
tgmath.hC99type-generic math
stdalign.hC11alignas and alignof convenience macros
stdatomic.hC11atomic types
stdnoreturn.hC11noreturn convenience macros
threads.hC11thread library
uchar.hC11UTF-16/32 character utilities

References

Compiler

flex

References

x86

While memory stores the program and data, the Central Processing Unit does all the work. The CPU has two parts: registers and Arithmetic Logic Unit(ALU). The ALU performs the actual computations such as addtion and multiplication along with comparison and other logical operations.

Load

Load instructions read bytes into register. The source may be a constant value, another register, or a location in memory.

;; load the constant 23 into register 4
R4 = 23

;; copy the contents of register 2 into register 3
R3 = R2

;; load char (one byte) starting at memory address 244 into register 6
R6 = .1 M[244]

;; load R5 with the word whose memory address is in R1
R5 = M[R1]

;; load the word that begins 8 bytes after the address in R1.
;; this is known as constant offset mode and is about the fanciest
;; addressing mode a RISC processor will support
R4 = M[R1+8]

Store

Store instructions are basically the reverse of load instructions: they move values from registers back out to memory.

;; store the constant number 37 into the word beginning at 400
M[400] = 37

;; store the value in R6 into the word whose address is in R1
M[R1] = R6

;; store lower half-word from R2 into 2 bytes starting at address 1024
M[1024] = .2 R2

;; store R7 into the word whose address is 12 more than the address in R1
M[R1+12] = R7

ALU

;; add 6 to R3 and store the result in R1
R1 = 6 + R3

;; subtract R3 from R2 and store the result in R1
R1 = R2 - R3

Branching

By default, the CPU fetches and executes instructions from memory in order, working from low memory to high. Branch instructions alter this order. Branch instructions test a condition and possibly change which instruction should be executed next by changing the value of the PC register. The operands in the test of a branch statement must be in registers or constant values. Branches are used to implement control structures like if as well as loops like for and while.

;; begin executing at address 344 if R1 equals 0
BEQ R1, 0, 344

;; begin executing at address 8 past current instruction if R2 less than R3
BLT R2, R3, PC+8

;; The full set of branch variants:
BLT ... ;; branch if first argument is less than second
BLE ... ;; less than or equal
BGT ... ;; greater than
BGE ... ;; greater than or equal
BEQ ... ;; equal
BNE ... ;; not equal

;; unconditional jump that has no test, but just immediately
;; diverts execution to new address
;; begin executing at address 2000 unconditionally: like a goto
JMP 2000

;; begin executing at address 12 before current instruction
JMP PC-12

Type Convertion

The types char, short, int, and long are all in the same family, and use the same binary polynomial representation. C allows you to freely assign between these types.

  • broaden: When assigning from a smaller-sized type to a larger, there is no problem. All of the source bytes are copied and the remaining upper bytes in the destination are filled using what is called sign extension – the sign bit is extended across the extra bytes.
  • narrow: Only copy the lower bytes and ignores the upper bytes.

Remember a floating point 1.0 has a completely different arrangement of bits than the integer 1 and instruction are required to do those conversions.

;; take bits in R2 that represent integer, convert to float, store in R1
R1 = ItoF R2

;; take bits in R4, convert from float to int, and store back in same Note
;; that converting in this direction loses information, the fractional
;; component is truncated and lost
R4 = FtoI R3

Typecast

A typecast is a compile-time entity that instructs the compiler to treat an expression differently than its declared type when generating code for that expression.

  • casting a pointer from one type to another could change the offset was multiplied for pointer arithmetic or how many bytes were copied on a pointer dereference.
  • some typecasts are actually type conversions. A type conversion is required when the data needs to be converted from one representation to another, such as when changing an integer to floating point representation or vice versa.
  • most often, a cast does affect the generated code, since the compiler will be treating the expression as a different type.
int i;
((struct binky *)i)->b = 'A';

What does this code actually do at runtime? Why would your ever want to do such a thing? The typecast is one of the reasons C is a fundamentatlly unsafe launguage.

Data Sizes

16-bitsSize (bytes)Size (bits)
Word216
Doubleword432
Quadword864
Paragraph16128
Kilobyte10248192
Megabyte1,048,5768388608

In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word is an important characteristic of any specific processor design or computer architecture.

Registers

rsp

rbp

callq

pushq <address-of-after-callq>

retq

jmp <address-of-$rsp>

cmp

cmp dst src perfomans a substraction but does not store result. Such as sub dst src.

cmp dst, srcCFPFAFZFSFOF
unsigned src < unsigned dst1
parity of LSB is even1
carry in the low nibble of (src-dst)1
0, (i.e src == dst)1
if MSB of (src-dst) == 11
sign bit of src != sign bit of (src-dst)1

jmp

JumpDescriptionsigned-nessFlags
jejump if equalZF = 1
jgjump if greatersignedZF = 0 and SF = OF
jgejump if greater or equalsignedSF = OF
jljump if lesssignedSF != OF
jlejump if less or equalsignedZF = 1 or SF != OF

rflags

RFLAGS Register

Bit(s)LabelDescription
0CFCarry Flag
11Reserved
2PFParity Flag, set if LSB contains 1 is even bits
30Reserved
4AFAuxiliary Carry Flag
50Reserved
6ZFZero Flag, set if result is zero
7SFSign Flag, set MSB of result
8TFTrap Flag
9IFInterrupt Enable Flag
10DFDirection Flag
11OFOverflow Flag
12-13IOPLI/O Privilege Level
14NTNested Task
150Reserved
16RFResume Flag
17VMVirtual-8086 Mode
18ACAlignment Check / Access Control
19VIFVirtual Interrupt Flag
20VIPVirtual Interrupt Pending
21IDID Flag
22-630Reserved

Addressing

References

Memory

Run the examples under src/memory.

./configure --has-memory
make clean test

Bits and Bytes

Bits

The smallest unit of memory is the bit. A bit can be in one of two states: on vs. off, or alternately, 1 vs. 0.

Most computers don’t work with bits individually, but instead group eight bits together to form a byte. Eash byte maintains one eight-bit pattern. A group of N bits can be arranged in 2^N different patterns.

Strictly speaking, a program can interpret a bit pattern any way it chooses.

Bytes

The byte is sometimes defined as the smallest addressable unit of memory. Most computers also support reading and writting larger units of memory: 2 bytes half-words (sometimes known as a short word) and 4 byte word.

Most computers restrict half-word and word accesses to be aligned: a half-word must start at an even address and a word must start at an address that is a multiple of 4.

Shift

Logical shift always fill discarded bits with 0s while arithmetic shift fills it with 0s only for left shift, but for right shift it copies the Most Significant Bit thereby preserving the sign of the operand.

Left shift on unsigned integers, x << y

  • shift bit-vector x by y positions
  • throw away extra bits on left
  • fill with 0s on right

Right shift on unsigned integers, x >> y

  • shift bit-vector x right by y positions
  • throw away extra bits on right
  • fill with 0s on left

Left shift, x << y

  • equivalent to multiplying by 2^y
  • if resulting value fits, no 1s are lost

Right shift, x >> y

  • logical shift for unsigned values, fill with 0s on left
  • arithmetic shift for signed values
    • replicate most significant bit on left
    • maintains sign of x
  • equivalent to floor(2^y)
    • correct rounding towards 0 requires some care with signed numbers.
    • (unsigned)x >> y | ~(~0u >> y)

Basic Types

Character

The ASCII code defines 128 characters and a mapping of those characters onto the numbers 0..127. The letter ‘A’ is assigned 65 in the ASCII table. Expressed in binary, that’s 2^6 + 2^0 (64 + 1). All standard ASCII characters have zero in the uppermost bit (the most significant bit) since they only span the range 0..127.

Short Integer

2 bytes or 16 bits. 16 bits provide 2^16 = 65536 patterns. This number is known as 64k, where 1k of something is 2^10 = 1024. For non-negative numbers these patterns map to the numbers 0..65535. Systems that are big-endian store the most-significant byte at the lower address. A litter-endian (Intel x86) system arranges the bytes in the opposite order. This means when exchanging data through files or over a network between different endian machines, there is often a substantial amount of byte-swapping required to rearrange the data.

Long Integer

4 bytes or 32 bits. 32 bits provide 2^32 = 4294967296 patterns. 4 bytes is the contemporary default size for an integer. Also known as a word.

Fixed-point

Floating-point

4,8, or 16 bytes. Almost all computers use the standard IEEE-754 representation for floating point numbers that is a system much more complex than the scheme for integers. The important thing to note is that the bit pattern for the floating point number 1.0 is not the same as the pattern for integer 1. IEEE floats are in a form of scientific notation. A 4-byte float uses 23 bits for the mantissa, 8 bits for the exponent, and 1 bit for the sign. Some processors have a special hardware Floating Point Unit, FPU, that substantially speeds up floating point operations. With separate integer and floating point processing units, it is often possible that an integer and a floating point computation can proceed in parallel to an extent. The exponent field contains 127 plus the true exponent for sigle-precision, or 1023 plus the true exponent for double precision. The first bit of the mantissa is typically assumed to be 1._f_, where f is the field of fraction bits.

signexponentmantissa
(base 2 + 127)(base 2, 1/2, 1/4…)
(base 2 + 1023)
signle precision1 [31]8 [30-23]23 [22-00]
double precision1 [63]11 [62-52]52 [51-00]

References

Record

The size of a record is equal to at least the sum of the size of its component fields. The record is laid out by allocating the components sequentially in a contiguous block, working from low memory to high. Sometimes a compiler will add invisible pad fields in a record to comply with processor alignment rectrictions.

Array

The size of an array is at least equal to the size of each element multiplied by the number of components. The elements in the array are laid out consecutively starting with the first element and working from low memory to high. Given the base address of the array, the compiler can generate constant-time code to figure the address of any element. As with records, there may be pad bytes added to the size of each element to comply with alignment retrictions.

Pointer

A pointer is an address. The size of the pointer depends on the range of addresses on the machine. Currently almost all machines use 4 bytes to store an address, creating a 4GB addressable range. There is actually very little distinction between a pointer and a 4 byte unsigned integer. They both just store integers– the difference is in whether the number is interpreted as a number or as an address.

Instruction

Machine instructions themselves are also encoded using bit patterns, most often using the same 4-byte native word size. The different bits in the instruction encoding indicate things such as what type of instruction it is (load, store, multiply, etc) and registers involved.

Pointer Basics

Pointers and Pointees

We use the term pointee for the thing that the pointer points to, and we stick to the basic properties of the pointer/pointee relationship which are true in all languages.

Allocating a pointer and allocating a pointee for it to point to are two separate steps. You can think of the pointer/pointee structure are operating at two levles. Both the levels must be setup for things to work.

Dereferencing

The dereference operation starts at the pointer and follows its arrow over to access its pointee. The goal may be to look at the pointee state or to change the state.

The dereference operation on a pointer only works if the pointer has a pointee: the pointee must be allocated and the pointer must be set to point to it.

Pointer Assignment

Pointer assignment between two pointers makes them point to the same pointee. Pointer assignment does not touch the pointees. It just changes one pointer to have the same refrence as another pointer. After pointer assignment, the two pointers are said to be sharing the pointee.

C Array

A C array is formed by laying out all the elements contiguously in memory from low to high. The array as a whole is referred to by the address of the first element.

The programmer can refer to elements in the array with the simple [] syntax such as intArray[1]. This scheme works by combing the base address of the array with the simple arithmetic. Each element takes up a fixed number of bytes known at compile-time. So address of the nth element in the array (0-based indexing) will be at an offset of (n * element_size) bytes from the base address of the whole array.

[] Operator

The square bracket syntax [] deals with this address arithmetic for you, but it’s useful to know what it’s doing. The [] multiplies the integer index by the element size, adds the resulting offset to the array base address, and finally deferences the resulting pointer to get to the desired element.

a[3] == *(a + 3);
a+3 == &a[3];

a[b] == b[a];

The C standard defines the [] operator as follows: a[b] => *(a+b), and b[a] => *(b+a) => *(a+b), so a[b] = b[a]=.

In a closely related piece of syntax, adding an integer to a pointer does the same offset computation, but leaves the result as a pointer. The square bracket syntax dereferences that pointer to access the nth element while the + syntax just computes the pointer to the nth element.

Any [] expression can be written with the + syntax instead. We just need to add in the pointer dereference. For most purposes, it’s easiest and most readable to use the [] syntax. Every once in a while the + is convenient if you needed a pointer to the element instread of the element itself.

Pointer++

If p is a pointer to an element in an array, then (p+1) points to the next element in the array. Code can exploit this using the construct p++ to step a pointer over the elements in an array. It doesn’t help readability any.

Pointer Type Effects

Both [] and ++ implicitly use the compile time type of the pointer to compute the element size which effects the offset arithmetic.

	int *p;
	p = p + 12; /* p + (12 * sizeof(int)) */

	p = (int*) ((char*)p + 12); /* add 12 sizeof(char) */

Each int takes 4 bytes, so at runtime the code will effectively increment the address in p by 48. The compiler figures all this out based on the type of the pointer.

Arithmetic on a void pointer

What is sizeof(void)? Unknown! Some compilers assume that it should be treat it like a (char*), but if you were to depend on this you would be creating non-portable code.

Note that you do not need to cast the result back to (void*), a (void*) is the universal recipient of pinter type and can be freely assigned any type of pointer.

Arrays and Pointers

One effect of the C array scheme is that the compiler does not meaningfully distinguish between arrays and pointers.

Array Names are const

One subtle distinction between an array and a pointer, is that the pointer which represents the base address of an array cannot be changed in the code. Technically, the array base address is a const pointer. The constraint applies to the name of the array where it is declared in the code.

Dynamic Arrays

Since arrays are just contiguous areas of bytes, you can allocate your own arrays in the heap using malloc. And you can change the size of the malloc=ed array at will at run time using =realloc.

Passing multidimensional arrays to a function

Iteration

Row-major order, so load a[0][0] would potentially load a[0][1], but load a[1][0] would generate a second cache fault.

Stack Implementation

Writing a generic container in pure C is hard, and it’s hard for two reasons:

The language doesn’t offer any real support for encapsulation or information hiding. That means that the data structures expose information about internal representation right there in the interface file for everyone to see and manipulate. The best we can do is document that the data structure should be treated as an abstract data type, and the client shouldn’t directly manage the fields. Instead, he should just rely on the fuctions provided to manage the internals for him.

C doesn’t allow data types to be passed as parameters. That means a generic container needs to manually manage memory in terms of the client element size, not client data type. This translates to a bunch of malloc, realloc, free, memcpy, and memmove calls involving void*.

Endian

Endianness refers to the sequential order used to numerically interpret a range of bytes in computer memory as larger, composed word value. It also describes the order of byte transmission over a **digital link**.

However, if you have a 32-bit register storing a 32-bit value, it makes no to talk about endianness. The righmost bit is the least significant bit, and the leftmost bit is the most significant bit.

Big Endian

src/memory/big-endian.png

Little Endian

src/memory/little-endian.png

The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses. For example, a 32-bit memory location with content 4A 00 00 00 can be read at the same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit (00004A), or 32-bit (0000004A), all of which retain the same numeric value.

Bit Swapping

Some CPU instruction sets provide native support for endian swapping, such as bswap (x86 and later), and rev (ARMv6 and later).

Unicode text can optionally start with a byte order mark (BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In UTF-32 for example, a big-endian file should start with 00 00 FE FF; a little endian should start with FF FE 00 00.

Endianness doesn’t apply to everything. If you do bitwise or bitshift operations on an int you don’t notice the endianness.

TCP/IP are defined to be big-endian. The multi-byte integer representation used by the TCP/IP protocols is sometimes called network byte order.

In <arpa/inet.h>:

  • htons() reorder the bytes of a 16-bit unsigned value from processor order to network order, the macro name can be read as “host to network short.”
  • htonl() reorder the bytes of a 32-bit unsigned value from processor order to network order, the macro name can be read as “host to network long.”
  • ntohs() reorder the bytes of a 16-bit unsigned value from network order to processor order, the macro name can be read as “network to host short.”
  • ntohl() reorder the bytes of a 32-bit unsigned value from network order to processor order. The macro name can be read as “network to host long

Tools

  • hexdump on Unix-like system

Memory Model

The only thing that C must care about is the type of the object which a pointer addresses. Each pointer type is derived from another type, its base type, and each such derived type is a distinct new type.

Memory Copy

References

CPU

cpuid

Cache

Check cache line

  • Linux
ll /sys/devices/system/cpu/cpu0/cache/
cat /sys/devices/system/cpu/cpu0/cache/cherency_line_size
  • Windows
wmic cpu list
wmic cpu get
wmic cpu get L2CacheSize, L2CacheSpeed

References

Timing

time ls /tmp
# ...
# ls -G /tmp  0.00s user 0.00s system 73% cpu 0.003 total

real refers to actual elapsed time, user and sys refer to CPU time used only by the process.

  • real is wall clock time.
  • user is the amount of CPU time spent in user-mode code within the process.
  • sys is the amount of CPU time spent in the kernel within the process.

user+sys is the actual all CPU time the process used.

POSIX

Library

Static Library

Shared Library

Library References

ELF

References

OS

References

Flex & Bison

The asteriod to kill this dinosaur is still in orbit. – Lex Manual Page

References

Unicode

References

IO

Stream

Streams are a portable way of reading and writing data. They provide a flexible and efficient means of I/O.

A Stream is a file or a physical device (e.g. printer or monitor) which is manipulated with a pointer to the stream.

There exists an internal C data structure, FILE, which represents all streams and is defined in stdio.h.

Stream I/O is buffered: That is to say a fixed chunk is read from or written to a file via some temporary storage area (the buffer).

Predefined streams

There are stdin, stdout, and stderr predefined streams.

Redirection

  • >: redirect stdout to a file;
  • <: redirect stdin from a file to a program;
  • |: puts stdout from one program to stdin of another.

Buffered vs. Unbuffered

All stdio.h functions for reading from FILE may exhibit either buffered or unbuffered behavior, and either echoing or non-echoing behavior.

The standard library function setvbuf can be used to enable or disable buffering of IO by the C library. There are three possible modes: block buffered, line_buffered, and unbuffered.

Buffered

Buffered output streams will accumulate write result into immediate buffer, sending it to the OS file system only when enough data has accumulated (or flush() is requested).

C RTL buffers, OS buffers, Disk buffers.

The function fflush() forces a write of all buffered data for the given output or update stream via the stream’s underlying write function. The open status of the steam is unaffected.

The function fpurge() erases any input or output buffered in the given steam. For output streams this discards any unwritten output. For input streams this discards any input read from the underlying object but not yet obtained via getc(); this includes any text pushed back via ungetc()

Unbuffered

Unbuffered output has nothing to do with ensuring your data reaches the disk, that functionality is provided by flush(), and works on both buffered and unbuffered steams. Unbuffered IO writes don’t gurantee the data has reached the physical disk.

close() will call flush().

The open system call is used for opening an unbuffered file.

ASCII vs. Binary

ASCII

Terminals, keyboards, and printers deal with character data. When you want to write a number like 1234 to the screen, it must be converted to four characters {'1', '2', '3', '4'} and written. Similarly, when you read a number from the keyboard, the data must be converted from characters to integers. This is done by the sscanf routine.

Binary

Binary files require no conversion. They also generally take up less space than ASCII files. The drawback is that they cannot be directly printed on a terminal or printer.

References

Network

DNS

simple.c using getaddrinfo() API call to query name.

query.c using domain name protocol to query name directly without -lresolv library.

TIL

  • getaddrinfo() is a POSIX.1g extension and is not available in pure C99,

on Linux, so We need -D_GNU_SOURCE if -std=c99 be specified (see c99 does not define getaddrinfo).

HTTP

References

Parallel

OpenMP

References

Pthread

References

Algorithm

Hash

Algorithm References

Regex

In POSIX-Extended regular expressions, all characters match themselves except for the following special characters: .[{}()\*+?|^$

WebAssembly

Run example in browser:

// directly call, shorten version
Module._sum(10, 0);
// ccall
Module.ccall('sum', 'number', ['number', 'number'], [10, 0]);

Tools

Display Dependents of Executable

OSnamecommand line
MacOSotoolotool -L <bin>
Linuxobjdumpobjdump -p <bin>
lddldd <bin>
Windowsdumpbindumpbin -dependents <bin>

Read ELF Format

readelf displays information about one or more ELF format object files.

This readelf program performs a similar function to objdump but it goes into more detail and it exists independently of the BFD library, so if there is a bug in BFD then readelf will not be affected.

On Darwin, there are no readelf, but we can use otool do the trick.

OSnamecommand line
MacOSotoolotool -l <bin>
Linuxreaelfreadelf <bin>
Windows

Metainformation about Libraries

pkg-config

Display Symbol Table

On Unix-like platform, there are nm program can view the symbol table in a executable.

OSnamecommand line
MacOSnmnm <bin>
nm -m <bin>
Linuxnmnm <bin>

Remove symbols

OSnamecommand line
MacOSstripnm <bin>
Linuxstripnm <bin>

Disassembly

OSnamecommand line
MacOSotoolotool -tV <bin>
Linuxobjdumpobjdump -d <bin>

Hex Dump

OSnamecommand line
MacOShexdumphexdump <file>
Linuxhexdumphexdump <file>
Window
Emacshexl-mode

Trace System Call

OSnamecommand line
MacOSdtrussdtruss <bin>
Linuxstracestrace -o <out-file> -C <bin>

Kernel Trace

  • MacOSX: ktrace

Memory Leak Detection

valgrind

sanitize

References

Debugger

Environment

examplecommand
set working directory(lldb) platform settings -w <pwd>
(gdb) cd <pwd>
list env vars(lldb) env
(lldb) settings show target.env-vars
(gdb) show env
set env var(lldb) env XXX=zzz
(lldb) settings set target.env-vars XXX=aa YYY=bb
(gdb) set env XXX=zzz
unset env var(lldb) settings remove target.env-vars XXX
(gdb) unset env XXX
set argv for main entry(lldb) r arg1 arg2 arg3
(lldb) settings set target.run-args arg1 arg2
(gdb) r arg1 arg2 arg3
(gdb) set args arg1 arg2
0:000> .kill; .create <target> arg1 arg2
0:000> .exepath+ <path>

Process

examplecommand
run process(lldb) process launch
(gdb) r
0:000> g
attach process with pid(lldb) process attach --pid 123
(gdb) attach 123
attach process with name(lldb) process attach --name a.out
(lldb) attach a.out
wait for process(lldb) process attach --name a.out --wait-for
(gdb) attach -waitfor a.out

Image

examplecommand
list dependents of executable(lldb) image list
(gdb) info sharedlibrary
0:000> lm
lookup main entry address in the executable(lldb) image lookup -a main -v
(gdb) info symbol main
lookup fn or symbol by regexp(lldb) image lookup -r -n'[fsv]printf'
lookup type(lldb) image lookup -t'FILE'
add moudle(lldb) image add /opt/local/lib/libgeo.dyld
0:000> .reload -f -i libcffix.dll
unload module(lldb) ==
0:000> .reload -u libcffix.dll

Breakpoint

examplecommand
list breakpoint(lldb) b
(lldb) breakpoint list
(gdb) info break
0:000> bl
breakpoint at fn(lldb) b main
(lldb) b -nmain
(gdb) b main
0:000> bu <module>!main
breakpoint at line(lldb) b -ftest.c -l32
(gdb) b test.c:32
breakpoint at fn by regexp(lldb) b -rm[a-z]in
breakpoint at source by regexp(lldb) b -p'm[a-z]in' -ftest.c
conditional breakpoint(lldb) breakpoint set -fvar.c -l23 -c'2 = argc’=
delete breakpoint(lldb) breakpoint delete 1.1
(lldb) breakpoint delete 2
0:000> bc 1 2

Memory

examplecommand
print argv in /main entry(lldb) p -Z`argc` -- argv
0:000 ==
(gdb) p -- argv[0]@argc
examine argv in main entry(lldb) x -t'char*' -c`argc` argv
0:000> dp @@(argv)
(gdb) ==
examine array of char* of /argv(lldb) x -s`sizeof(char*)` -c`argc` -fx argv
exmaine &argc in main entry(lldb) x -s`sizeof(int)` -fx -c1 &argc
(gdb) x/1xw &argc
memory read(lldb) memory read -o/tmp/x.out -s1 -fu -c10 – &argv[0]

~*** Frame

examplecommand
check stack frame(lldb) frame info
0:000> k
list frame variable(lldb) frame variable
0:000> dv

Evaluate

examplecommand
evaluate argc in main entry(lldb) e -- argc
(lldb) e -fx -- argc
0:000> ?? argc
0:000> .formats poi(argc)

Disassemble

examplecommand
disassemble0:000> u
disassemble function0:000> uf main
disassemble(lldb) d
disassemble function(lldb) d -nmain
disassemble favor(lldb) d -Fatt
disassemble(gdb) disassemble

Step

examplecommand
quit(lldb) q
(gdb) q
0:000> -q=
continue(lldb) c
0:000> g
step over(lldb) n
0:000> p
step into(lldb) s
(gcc) s
0:000> t

Thread

examplecommand
list threads0:000> ~

Tools References

CPU Features

  • Linux:
lscpu
  • Darwin:
sysctl -a | grep machdep.cpu.features

Making the Best Use of C