Since programming is to manipulate data, and C/C++ handle data more efficiently at a lower level than many other languages such as Java, Python. To make the programs robust needs developers understand different data types and their behaviors in arithmetic expressions. Let us start from integers.
There are several kinds of integer types, and int
should be the most commonly used one. Variables of int
type can be declared as follows. Other types are similar.
int n; //declare a variable
This line is to declare and initialize a variable.
int n = 10; //declare and initialize
The following line looks very similar to the previous one. But the operations are different. But no initialization is in it. The first line is declaring a variable, and the second is assigning a value. Assignment is a different operation from initialization. The two pieces of source code are equivalent. They both declare a variable, and then its value is 10
. But if the data type is not a fundamental type (int
, float
, etc) and is a compound type (such as a class
type), the two operations, initialization and assignment, may have different behaviors. The reason is that the operations in an initialization function may be different from those in an assignment function. You can get related information in the operator overloading part of this book.
int n; //declare a variable, its value may be a random one
n = 10; //assign (not initialize) a value to the variable
Variables can also be initialized in the following two manners. The second one is from the C++11 standard. It can only be compiled with the C++11 standard or a newer standard.
int n (10); //initialize
int n {10}; //initialize, C++11 standard
It must be emphasized that the danger of uninitialization here. If a variable is not initialized, the C/C++ compiler will not report an error, even a warning. The Java compilers will report an error for an uninitialized variable. C/C++ compilers will not do that as strictly as a Java compiler. The values of uninitialized variables are also not well defined in C/C++ standards. You may get a random value from an uninitialized variable.
//init.cpp
#include <stdio.h>
int main()
{
int i; //bad: uninitialized variable i
int j; //bad: uninitialized variable j
printf("i = %d, j = %d\n", i, j);
return 0;
}
The example init.cpp
has two uninitialized variables i
and j
. They are printed out immediately after their declarations. I tried the example on different platforms x86_64
and arm64
, and different results were given. Random values were given on x86_64
, but zeros on arm64
.
$ file a.out
a.out: Mach-O 64-bit executable x86_64
$ ./a.out
i = 2, j = 13299749
$ file a.out
a.out: Mach-O 64-bit executable arm64
$ ./a.out
i = 0, j = 0
If you use some uninitialized variables, sometimes your program can work very well, such as the example on arm64
. Sometimes it cannot. If it is a huge program and works very well most of the time, but bugs appear when you migrate it to another platform. It will be your nightmare to debug a random error from a huge program. To avoid those situations, it is better to write each line of source code carefully.
C and C++ standards do not specify the width of int
. It is 32-bit on most platforms, and int
is equivalent to signed int
. The range of int
is [-231, 231-1]. The maximal value 231 (2,147,483,648) is not a great number that is hard to reach.
The example overflow.cpp
multiply two integers, and their values are both 56789
. The product c
should be 3,224,990,521
, but the program will print out c = -1069976775
. It is a negative number, and surely it is wrong.
//overflow.cpp
#include <iostream>
using namespace std;
int main()
{
int a = 56789;
int b = 56789;
int c = a * b;
cout << "c = " << c << endl;
return 0;
}
The compilation command and the output:
$ g++ overflow.cpp
$ ./a.out
c = -1069976775
The number 56789
is 0xDDD5
in hexadecimal format and is a 16-bit length number. Their product is a 32-bit number 0xC0397339
. The thirty-second bit of it is 1. The result 0xC0397339
is copied into an int
variable and its highest bit 1 is taken as the sign bit of the signed integer. That is the reason why the output is a negative number. If we use unsigned int
for the variable c
. The result 0xC0397339
will be taken as an unsigned 32-bit integer. It will be the result that we expect.
int a = 56789;
int b = 56789;
unsigned int c = a * b; // the value of C will be a positive number 0xC0397339 (3224990521)
To expect an int
variable to hold any integer numbers is unrealistic. Before you choose the data type for a variable, you must carefully consider its data range.
The width of int
in bits is not fixed in C and C++ standards. The standards just require int
should have at least 16 bits. It is 32 bits on most modern platforms. Besides of int
, there are short int
, long int
and long long int
(long long int
is in C++11). short int
has 16 bits. int
has 16 or 32 bits. long int
has 32 or 64 bits. long long int
has 64 bits.
char
is also a frequently used integer type. Someone may be confused and think char
is for characters only. Since characters are encoded into integer values, char
indeed is an integer type and has 8 bits. char
is wide enough for English characters, but not for Chinese, Japanese, Korean and some other characters. char16_t
and char32_t
have been introduced into C++11 for ranges of 16 bits and 32 bits respectively.
The following three lines of source code are equivalent.
char c = 'C'; // its ASCII code is 80
char c = 80; // in decimal
char c = 0x50; // in hexadecimal
The 16-bit and 32-bit character types can be declared and initialized as follows.
char16_t c1 = u'于'; //C++11
char32_t c2 = U'于'; //C++11
The data widths of different integers are listed in the following table. For more details please visit https://en.cppreference.com/w/cpp/language/types
Integer type | Width in bits |
---|---|
char |
8 |
short (short int ) |
16 |
int |
16 or 32 |
long (long int ) |
32 or 64 |
long long (long long int ) |
64 |
signed
or unsigned
can be used before the integer type names to indicate if the integer is a signed one or unsigned one. When the integer is a signed one, the keyword signed
can be omitted. It means int
is for signed int
, and short
is for signed short
. But there is an exception. char
is not always for signed char
, and it is unsigned char
on some platforms. I strongly suggest always using signed char`` or
unsigned char, and not using
char`.
If the integer is a signed one, the highest bit (the 32nd bit for int
) will be its sign bit. It is a negative number if the sign bit is 1, and a positive number if it is 0. The signed int and unsigned int are shown in the following figure.
The Boolean type in C++ (not in C) is bool
, and its value can be true
or false
. The integer value of true
is 1, and the value of false
is 0. The width of bool
is 1 byte, not 1 bit. It means that a bool
variable consumes 1 byte for data, but only uses the lowest 1 bit.
Since bool
indeed is a kind of integer in C/C++, bool
can be converted to other kinds of integers.
bool b = true;
int i = b; // the value of i is 1.
Integers can also be converted to bool
, and non-zero values (even floating-point numbers) will be converted to true
as shown in the following source code. But I do not recommend doing so since it is misleading.
bool b = -256; // unrecommended conversion. the value of b is true
bool b = (bool)0.4; // unrecommended conversion. the value of b is true
The following equivalent source code is easier to understand. The expression (-256 != 0)
is to judge if -256
is equal to 0
, and its result is true
or false
.
bool b = (-256 != 0);
There is not a Boolean type in the C standard. Some old programs may use typedef
to create a customized Boolean type.
typedef char bool;
#define true 1
#define false 0
If you program in pure C language and want to use bool
, you can include a header file introduced in C99. It is a better choice than to define it using typedef
. C had no bool
before the C99 standard. C99 adds _Bool
for the Boolean type. Additionally, bool
is defined as an alias of _Bool
, true
is as 1
and false
is as 0
by macros in the header file <stdbool.h>
which is also introduced by C99.
#include <stdbool.h>
Another frequently used integer type is size_t
. It is the type of the sizeof
operator. It can store the maximum size of a theoretically possible object of any type. Computer memory kept increasing in the past decades and will continue to increase in the future. We often need an integer variable to store the data size of a specific piece of memory. unsigned int
is not enough since its maximum value is 232, 4GB for memory. malloc()
function which can allocates size
bytes of memory takes an argument of type size_t
. If its argument size
is int
type, its maximum memory is 2GB, and 4GB for unsigned int
.
void* malloc( size_t size );
The width of size_t
depends on the platforms. It is 64 bits for most modern platforms. Since it can store the maximum size of a theoretically possible object of any type, you can safely use it for memory sizes.
sizeof
can yield the size in bytes of a type or an object/variable. Example size.cpp
demonstrates how to use sizeof
to get the width of different data types and variables. Not only the fundamental types, it can also take a compound type or any variables as its input.
//size.cpp
#include <iostream>
using namespace std;
int main()
{
int i = 0;
short s = 0;
cout << "sizeof(int)=" << sizeof(int) << endl;
cout << "sizeof(i)=" << sizeof(i) << endl;
cout << "sizeof(short)=" << sizeof(s) << endl;
cout << "sizeof(long)=" << sizeof(long) << endl;
cout << "sizeof(size_t)=" << sizeof(size_t) << endl;
return 0;
}
The output of the example on my computer is
sizeof(int)=4
sizeof(i)=4
sizeof(short)=2
sizeof(long)=8
sizeof(size_t)=8
Someone may think sizeof
is a function since its grammar looks like. But it is an operator, not a function. Functions cannot take a data type as an argument. sizeof(int)
is to take type int
as its input. To yield the width of an expression (or a variable), it can also be used as sizeof expression
. But sizeof type
cannot work. The following code shows how it works without parentheses.
//size2.cpp
#include <iostream>
using namespace std;
int main()
{
int i = 0;
short s = 0;
cout << "sizeof int =" << sizeof int << endl; // error
cout << "sizeof i =" << sizeof i << endl; // okay
cout << "sizeof short =" << sizeof s << endl; // okay
cout << "sizeof long =" << sizeof long << endl; // error
cout << "sizeof size_t =" << sizeof size_t << endl; // error
return 0;
}
Different widths for the same integer type may cause the program difficult to port to different platforms. Some fixed width integer types are introduced in <cstdint>
since C++11. They are int8_t
, uint8_t
, int16_t
, uint16_t
, int32_t
, uint32_t
, int64_t
, uint64_t
, etc. There are some useful macros such as INT_MAX
, INT_MIN
, INT8_MAX
, UINT8_MAX
, etc. Those integer types can explicitly declare the widths of variables.
Before introducing floating-point numbers, I would like to introduce the following example float
.cpp.
f1is assigned
1.2, and then is multiplied by
1000000000000000(15 zeros). We may think
f2should be
1200000000000000. But if we print
f1and
f2with very high precision, we will find a terrible truth.
f2is not what we expected, and even
f1is also not exactly equal to
1.2`.
//float.cpp
#include <iomanip>
using namespace std;
int main()
{
float f1 = 1.2f;
float f2 = f1 * 1000000000000000;
cout << std::fixed << std::setprecision(15) << f1 << endl;
cout << std::fixed << std::setprecision(15) << f2 << endl;
return 0;
}
The output is:
1.200000047683716
1200000038076416.000000000000000
We may think computers are always accurate. But it is not. Floating-point operations always bring some tiny errors. Those errors cannot be eliminated. What we can do is manage them not to cause a problem.
Why cannot floating-point data be accurate? We can go deeper into the floating-point format. The following figure1 shows an example of a 32-bit floating-point number. There are 1 sign bit, 8 exponent bits and 23 fraction bits.
The value of a 32-bit floating-point number is $(-1)^{b_{31}} \times 2^{(b_{30}b_{29} \dots b_{23})2 - 127} \times (1.b{22}b_{21} \dots b_0)_2$. Its minimal normal is 1.2
, and only an approximation 1.200000047683716
is in its space.
Since there are precision errors for floating-point numbers, using ==
to compare two floating-point numbers is a bad choice. If the difference between two numbers is less than a very small number, such as FLT_EPSILON
or DBL_EPSILON
for float
and double
respectively, we can think they are equal.
if (f1 == f2) //bad
{
// ...
}
if (fabs(f1 - f2) < FLT_EPSILON) // good
{
// ...
}
The following example precision.cpp
demonstrates if a large number is added to a small one, the result will be the same as the large one. It is caused by the precision problem.
//precision.cpp
#include <iostream>
using namespace std;
int main()
{
float f1 = 2.34E+10f;
float f2 = f1 + 10;
cout.setf(ios_base::fixed, ios_base::floatfield); // fixed-point
cout << "f1 = " << f1 << endl;
cout << "f2 = " << f2 << endl;
cout << "f1 - f2 = " << f1 - f2 << endl;
cout << "(f1 - f2 == 0) = " << (f1 - f2 == 0) << endl;
return 0;
}
The output:
f1 = 23399999488.000000
f2 = 23399999488.000000
f1 - f2 = 0.000000
(f1 - f2 == 0) = 1
There are two typical floating-point types, float
and double
. They are for the single-precision floating-point numbers and double-precision numbers specifically. The former has 32 bits, and the latter has 64 bits. double
has a wider range and a better precision than float
. But float
operations are normally much faster than double
ones.
Please be careful with division operations, if the divisor is zero, the result may be INF
or NAN
. The example nan.cpp
demonstrates how to produce invalid floating-point numbers.
//nan.cpp
#include <iostream>
using namespace std;
int main()
{
float f1 = 2.0f / 0.0f;
float f2 = 0.0f / 0.0f;
cout << f1 << endl;
cout << f2 << endl;
return 0;
}
Output:
inf
nan
Constant numbers can be in decimal, octal or hexadecimal. An integer number like 95
will be interpreted as an int
. 95u
will be an unsigned int
, and 95ul
will be an unsigned long
. Floating-point numbers can be 3.141592
or 6.02e13
, but the two numbers are all in double
. A postfix f
is needed for float
numbers, such as 6.02e13f
. Some more examples are listed as follows.
95 // decimal
0137// octal
0x5F // hexadecimal
95 // int
95u // unsigned int
95l // long
95ul // unsigned long
95lu // unsigned long
3.14159 // 3.14159
6.02e13 // 6.02 x 10^13
1.6e-9 // 1.6 x 10^-9
3.0 // 3.0
6.02e13f // float
6.02e13 // double
6.02e13L // long double
If a variable/object is const-qualified, it cannot be modified. It must be initialized when you define it.
const float pi = 3.1415926f;
pi += 1; //error!
The arithmetric operators are listed in the following table.
Operator name | Syntax |
---|---|
unary plus | +a |
unary minus | -a |
addition | a + b |
subtraction | a - b |
multiplication | a * b |
division | a / b |
modulo | a % b |
bitwise NOT | ~a |
bitwise AND | a & b |
bitwise OR | `a |
bitwise XOR | a ^ b |
bitwise left shift | a << b |
bitwise right shift | a >> b |
The operators and their operants can be connected together to create an expression. Some are simple ones such as a + b
, and some may be long ones with multiple operators such as a + b * c / (e + f)
. The expression can be assigned to a variable.
int result = a + b * c / (e + f);
There are many data types in C and C++, but only four types are for arithmetic operations. They are int
, long
, float
and double
. If the operands are not the four types, they will be converted to one of the four implicitly. It can be explained by the following code. The types of char1
and char2
are all unsigned char
, and their maximal values are 255
. If the operation of char1 + char2
is in unsigned char
type, the sum 256
will overflow since no 256
is in unsigned char
. In the operation, char1
and char2
will be converted int
implicitly first, then the two int
values are added together.
unsigned char char1 = 255;
unsigned char char2 = 1;
int num = char1 + char2; // why num = 256?
The following piece of source code is equivalent to the previous one.
unsigned char char1 = 255;
unsigned char char2 = 1;
int num = (int)char1 + (int)char2; //convert to explicitly
In this example, float
number 1.2f
will be converted to double
first, and then the two numbers are added together. Their sum is 4.6
(not 4.6f
). The last step is to assign a double
number 4.6
to an integer variable num
. Compilers will give a warning message to indicate that the assignment may lose data.
int num = 1.2f + 3.4; // -> 1.2 + 3.4 -> 4.6 -> 4
The equivalent code is as follows, and compilers will not give warning messages since you convert the result to int
explicitly. Compilers will think you know clearly what you do. Explicit type conversions are recommended in most cases.
int num = (int)((double)1.2f + 3.4);
The programmers should be very careful with data type conversions because it will cause data loss. The typical one is that (int)3.6
will be an integer 3
. The fractional part of a floating point number will be lost.
The following code may also be easy to mislead us. Since 17
and 5
are all int
, so the operation is an int
division, not a float
division. The result of the expression 17 / 5
is an integer 3
, not a floating-point 3.4f
. That's the reason why float_num
is 3.0f
, not 3.4f
.
float float_num = 17 / 5; // f = 3.0f, not 3.4f.
When we convert numbers according to the direction from char
-> short
-> int
-> long
-> float
-> double
-> long double
, normally there is not data loss. But if the conversion is in the opposite direction from long double
to char
, it will cause data loss and compilers will warn you most of the time. But it is not always true. Some big integer numbers in int
may loss precision when they are converted to float
as shown in the following code.
int num_int1 = 100000004;
float num_float = num_int1;
int num_int2 = num_float; // num_int2 = 100000000
The placeholder type specifier auto
is introduced in C++11. The real type of a variable with auto
is deduced by its initializer. We can declare and initialize some variables as follows.
auto a = 2; // type of a is int
auto bc = 2.3; // type of b is double
auto c; //valid in C, but not in C++
auto d = a * 1.2; // type of d is double
Once the type of an auto
variable is deduced, its type will be fixed and not change again. In the following source code, a
is initialized as an int
type and then assigned a double 2.3
. 2.3
will be converted to an int
value 2
implicitly first, and then assigned to the variable a
. So the value of a
should be 2
, not 2.3
since a
is in type int
. Please be careful with the real data type by auto
.
auto a = 2; // type of a is int
a = 2.3; // Will a be converted to a double type variable? NO!
Besides of =
, there are some compound-assignment operators as shown in the following table. They are convenient when we change the lvalue of an operator.
Assignment expression | Equivalent expression |
---|---|
a = b |
|
a += b |
a = a + b |
a -= b |
a = a - b |
a *= b |
a = a * b |
a /= b |
a = a / b |
a %= b |
a = a % b |
a &= b |
a = a & b |
`a | = b` |
a ^= b |
a = a ^ b |
a <<= b |
a = a << b |
a >>= b |
a = a >> b |
In programming of C or C++, the developers are expected to understand all the details of different data types. Different from Python, the values in a variable can increase even out of the boundary of int32
, the real storage for variables will adapt automatically. In Java, more warnings and errors will be given to prevent you from overflowing or precision problems. But in C or C++, there are much less warnings. You have to be very careful with different data types since your program will be wrong even no compilation errors.
Besides, we also need to change the idea that computers are accurate. If the computation is carried out with floating-point numbers, there must be some tiny errors. We should realize that tiny errors are always there. What we can do is just to control the errors since they are difficult to eliminate.
To repeat, the numbers may be out of range, the computation may have errors, the results may be integer division, not floating-point division, etc. Please try to explore all possibilities and how the instructions work when you write a line of source code. More thinking and deeper understanding, fewer bugs.
Compile and run the following source code. Is the output exactly match what you expect? If not, explain why?
#include <iostream>
using std::cout;
using std::endl;
int main() {
int num1 = 1234567890;
int num2 = 1234567890;
int sum = num1 + num2;
cout << "sum = " << sum << endl;
float f1 = 1234567890.0f;
float f2 = 1.0f;
float fsum = f1 + f2;
cout << "fsum = " << fsum << endl;
cout << "(fsum == f1) is " << (fsum == f1) << endl;
float f = 0.1f;
float sum10x = f + f + f + f + f + f + f + f + f + f;
float mul10x = f * 10;
cout<<"sum10x = "<< sum10x << endl;
cout<<"mul10x = "<< mul10x << endl;
cout<<"(sum10x == 1) is "<< (sum10x == 1.0) << endl;
cout<<"(mul10x == 1) is "<< (mul10x == 1.0) << endl;
return 0;
}