Skip to content

Latest commit

 

History

History
119 lines (94 loc) · 7.01 KB

numbers.md

File metadata and controls

119 lines (94 loc) · 7.01 KB

Number Representation

A set of n bits can represent 2n values. This section examines how we use bits to represent numbers.

Unsigned Integers

An n-bit unsigned number represents values from 0 to 2n-1, and cannot represent negative or fractional numbers. Listed below are the ways of converting between unsigned values in various bases.

Decimal to Base-n

Given a base-10 number x, we can convert to base-n as follows:

  1. Find n', the largest power of n that is less than x.
  2. Record d, the number of multiples of n' that fit in x, as a digit of the base-n number.
  3. Subtract d × n' from x.
  4. Set n', as the next lowest power of n.
  5. Repeat steps 2-4 until n0 is reached.

Base-n to Decimal

If we consider di to be the ith digit of the base-n number beginning from the right (such that the rightmost digit is d0), we simply take the sum of di × ni for all i.

Binary to Hexadecimal

Given a base-2 number x, we can convert to base-16 as follows:

  1. Group x into 4-bit numbers, padding the left side with 0s as needed.
  2. Evaluate each group to a (base-10) number between 0 and 15.
  3. Substitute each group with the base-16 equivalent.

Hexadecimal to Binary

Similarly to the reverse, we can simply replace each base-16 digit with its 4-bit equivalent and remove leading 0s.


Signed Integers

There exist several different strategies for representing signed integers.

Sign and Magnitude

The simplest method for representing negative numbers uses the leftmost bit to indicate sign. In this representation, 0 represents positive and 1 represents negative.

This method is rather flawed, as it results in a "positive" and "negative" zero: 0000 and 1000. Furthermore, it complicates arithmetic as different steps must be taken depending on the signs of the integers. Finally, incrementing bits sometimes increases values (if the first bit is 0) and sometimes decreases values (if the first bit is 1), making the system less intuitive.

One's Compliment

Given a non-negative integer x, the one's compliment represents -x by flipping all the bits of x.

  • For example, 1101 (representing +13) would become 0010 (representing -13).
  • All negative integers are still preceded by a leading 1 in this system.

While incrementing bits now uniquely increases values, this method still results in two representations of 0: 0000 and 1111. Furthermore, arithmetic remains somewhat complicated.

Two's Compliment

In order to solve the problem of repeated 0s, the two's compliment shifts all negative integers by 1 such that 1111 corresponds to a value of -1.

  • To convert between positive and negative, flip the bits and add 1.
  • For example, 1101 (representing +13) would be flipped into 0010, and finally changed to 0011 (representing -13).
  • Positive numbers contain a leading 0 while negative numbers contain a leading 1.
  • An n-bit number can represent 2n-1 negative integers, 1 zero, and 2n-1 - 1 positive integers.

This representation is the most commonly used.

Bias Encoding

With bias encoding, a bias of b is chosen and all unsigned integers are simply shifted by b.

  • For an even distribution of positive and negative integers, a bias of -(2n-1 - 1) is typically chosen.
  • 0000 would represent 0 - (23 - 1) = -7
  • 1111 would represent 15 - 7 = 8.
  • With a bias of b, this representation has b negative integers, 1 zero, and 2n-1 - 1 - b positive integers.

Floating Point Numbers

Floating point numbers allow us to represent fractional values, as well as very large and very small numbers. They are declared using the float keyword in C.

Format

Single precision 32-bit IEEE floating point numbers are calculated as follows:

(-1)sign × (1 + significandtwo) × 2(exponenttwo - 127)

  • The first bit of the float represents the sign
    • 0 for positive, 1 for negative
  • The next 8 bits represent the value of the exponent.
    • From left to right, each bit represents non-negative powers of two from 27 to 20
  • The last 23 bits represent the significand.
    • From left to right, each bit represents negative powers of two from 2-1 to 2-23
    • The significand will always have a value between 0 and 1

Double precision numbers (and onwards) function identically, albeit with 11 bits for the exponent, 52 bits for the significand, and an exponent bias of 1023.

Special Cases

Floating point representations incorporate a number of special cases.

  • Zero is represented by an exponent value of 0 and a significand of 0. Positive and negative 0 both exist and are both valid.
  • Denormalized values are represented by an exponent of 0 and a nonzero significand.
    • In other words, an exponent of 0 represents a leading 0.
    • We consider there to be an implicit exponent value of -126.
  • Infinity is represented by the maximum exponent value with a significand of 0. As per usual, the sign determines whether it is positive or negative.
  • NaN is represented by the maximum exponent value with a nonzero significand.

Spacing

The smallest representable positive value of a normalized floating point number is 2-126, since exponent values of 0 are reserved for special numbers.

The next smallest number is 2-126 × (1 + 2-23) = 2-126 + 2-149. We notice then that numbers beyond 2-126 are spaced by 2-149ths, leaving a large gap between 0 and the next representable number.

We use denormalization to resolve this issue.

  • An exponent of 0 represents a leading value of 0 and implicitly results in a 2-126 exponential term.
  • The smallest representable denormalized value is 2-126 × 2-23 = 2-149.
  • Numbers between 0 and 2 are all evenly spaced by 2-149.

Discussion

Addition in floating point is not associative.

Precision is the number of bits used to represent a value. Accuracy is the difference between the representation of a number and its true value.

Floating point addition:

  • Denormalize numbers in order to match exponents
  • Add significands together
  • Keep the same exponents
  • Renormalize