Decimal Arithmetic

It is worth refreshing your memory of how arithmetic works with positional numbers, because at this point in your life, you probably just do it without thinking. Since binary representations are mostly based on the same principles though, the same algorithms mostly apply to them as well.

In decimal, there are several variations of the algorithm to add two numbers like 38+64, but they all boil down to something like this:

  • Start at the least-significant position, summing the 8 and 4. However, there are not enough symbols to represent that result, so the next position (the 10's place) increases, taking the amount of the base value (10), and the current position of the result holds the remaining 2. You can also view this as the least-significant position has a sum of 12, so the 10 gets carried to the next position and the 2 remains.
  • Moving to the next position, combine the 3 and the 6, and the 1 from the previous position. Once again, there are not enough symbols for the result, so the next position increases, and there is nothing remaining for the current position of the result (0).
  • At the next position, since there are implied leading 0s on numbers, the 1 from the previous position is combined with the 0s, creating a result of 1 for that position.

The typical algorithm for subtraction also starts with the least-significant position, but instead of the potential to "carry" to the next place value, there is the potential need to "borrow" from it. A subtraction like 84-38 goes something like:

  • To take 8 from 4, the next position is going to decrement. This can be thought of as taking 8 from 14, giving a result of 6, and then since 10 was taken from the next position, the 8 becomes a 7.
  • In the next position, 3 can be taken from 7 for a result of 4.

There is, of course, the risk that you will need to borrow from more than one position over (e.g. when doing 100-1), and also the risk that you will need to borrow but there is nothing to borrow from (e.g. when doing 3-5), which creates a negative result. That second situation in particular is going to come up when we discuss binary.

Unsigned Addition

Exactly the same algorithm can be used for base-2 numbers. Consider

 11011
+11010

We will consider the least-significant position to be the "0th" position, because it represents the base to the 0th power.

  • Starting at the least-significant position, adding 1 and 0 results in 1.
  • At the next position (position 1), adding 1 and 1 results in running out of symbols (since the highest-value symbol is 1), so the next position (position 2) is going to get incremented. You can view this as the sum of 1 and 1 being 10. (It can also be helpful to think in decimal since we are used to it, so in this position we have 1+1=2, but the "2" must be represented in base-2 as 10.)
  • At position 2, the carried 1 combined with the two zeros results in 1.
  • At position 3, 1+1=10: a carried 1 and a remaining digit of 0.
  • At position 4, the two 1s and the carry combine to 11 (decimal 3), so a carried 1 and a remaining digit of 1.
  • The implied leading 0s are then combined with the final carried 1.

Including the "carries" above the original numbers (as is traditionally done when performing long-addition), the result is

 11 1 
  11011
+ 11010
 ------
 110101

You (the human) can convert all the numbers to their decimal representations, which would be 27+26=53, to confirm that this has correctly performed addition.

Overflow

In the example above, there was a carry out of the most-significant digit, and that was included in the result to create a numerically-correct result. However, in the section on binary numbers, it was pointed out that in digital hardware, the number of physical wires is going to be limited, because you cannot add new hardware once the system is designed and built. Because there are finite bits available to hold the result of a calculation, it is possible that the result ends up outside the range of numbers that can be represented. This situation is called overflow.

Do not conflate "overflow" and "carry out." The definition of overflow is precisely "the result of an operation cannot be represented with the digits available for the result." The relationship between overflow and a carry out depends on the number representation (unsigned, two's complement, etc.).

In the previous example, since the operands are each written with 5 bits, that typically implies that the system is designed to operate on 5-bit values. In unsigned binary, a 5-bit number can represent values from 0 to 3110, so the expected result of 53 cannot be represented, and overflow occurred. The result would be missing that leading 1, making it 10101, or 2110.

Unsigned Subtraction

As you might expect, since unsigned binary is a simple positional number system, you can use exactly the same algorithm that you use for decimal numbers. You should try the following example.

 11001
-01011
------
 01110

With subtraction, if the second number has a larger magnitude than the first number, the result should be negative. However, unsigned numbers cannot represent negative numbers, so overflow will occur, and the result will be incorrect.

Two's Complement Addition

A nice property of two's complement is that the algorithm for addition is the same as for unsigned numbers, regardless of whether the represented numbers are positive or negative. For example, consider

 11010011
+00111110
---------
 00010001

That result was obtained by following the exact same algorithm used for unsigned numbers. However, if you convert all of those numbers to their decimal representation, they are (-45)+62=17. The result is correct. Note that this only works when restricting the number of bits in the result to the same number as the operands, because two's complement relies on the modular behavior gained by restricting the size of the numbers.

As with all binary representations, it is possible for overflow to occur. For example,

 01111
+00010
------
 10001

Without even viewing these numbers in their decimal representation, you can tell that something is wrong, because two positive numbers were added and the result is negative. Indeed, this represents 15+2=(-15), which is incorrect. The expected result of 17 cannot be represented in 5-bit two's complement binary.

Two's Complement Subtraction.

Although the "long subtraction" algorithm works for two's complement numbers, there is an easier way (especially when designing hardware, as we will do in a future section). Instead of subtracting, add the negative of the second number. Two's complement negation is simple, and addition is simple, so doing those is often easier than doing subtraction.

Given a subtraction like

 00111010
-01001000

negating the second number requires inverting each bit and adding 1. That can be done separately first, or it can be done all at once.

 00111010
+10110111 (each bit inverted)
+       1 (+1 to complete the negation)
---------
 11110010

You can confirm that the decimal representation is 58-72 (which then became 58+(-72)) gave a result of -14.

Signed Magnitude Arithmetic

The signed magnitude representation of numbers has a separate sign and magnitude, so based on the signs of the numbers and based on the desired operation, either addition or subtraction must be performed on the magnitudes, and then the correct sign bit must be incorporated. This additional complexity is why two's complement is used more often when arithmetic needs to be performed on numbers.

You cannot simply use the unsigned addition algorithm with signed magnitude numbers. Adding together sign bits makes no sense! It would be like trying to use that algorithm to add Ⅶ+Ⅳ; it simply doesn't make sense.

Orders of Magnitude and Shifting

If you shift the digits of a positional number to the left or right, you effectively multiply or divide by the base. Changing from 23 (decimal) to 230 is multiplying by 10, and changing it to 2.3 is dividing by 10.

The same is true for unsigned binary numbers. Changing from 1101 to 11010 has multiplied the number by 210, and changing it to 110.1 has divided it by 210.

In hardware, as usual, the number of bits is typically fixed, so shifting too far will lose information. Going from 0110 to 1100 has gone from 610 to 1210, but shifting one more time results in 1000, and the value has overflowed.

Because shifting to the right has the potential to lose information from the least-significant bits, it is a more gradual problem. Shifting from 0110 to 0011 is fine (610 to 310), but another shift results in 0001. This can be viewed as truncation: mathematically it should be 3/2=1.5, but the fractional component has been truncated. Shifting one more time results in 0000. This is sometimes called underflow because the problem is similar in nature to overflow (the correct value cannot be represented), but instead of being outside the representable range, the problem is that there was insufficient precision. The closest representable value was 0, so all information was lost.

Shifting two's complement numbers also multiplies or divides by the base, but with one caveat: negative numbers must be kept negative. For example, shifting 10110 to the right would normally produce 01011, but that has turned a negative number into a positive number. Instead, the shift should result in 11011. This is called sign extension (because you extend the MSb to preserve the sign), and you can confirm that in this example, it has turned -1010 into -510. Note that it also works for positive numbers: 01011 shifted to the right and extending the sign results in 00101 -- ⌊11/2⌋=5.

Three Types of Shifts

Two methods of shifting values were just described: one where a 0 is always shifted in, and one where right-shifts maintain the sign. Those are both useful in different ways, and so they get the distinction of being types of shifts that hardware and software engineers learn about. There is a third common shift type as well, but first it is worth discussing what, in general, distinguishes the types of shifts from each other.

When shifting a binary value, some bits will be shifted out, and some will be shifted in. Shifts in digital hardware always maintain the same total number of bits for the output as the input, but the shift amount (the 'distance' of the shift) can be any number of bits. So, for example, an 8-bit value shifted to the left by three places can be thought of like this:

general shift

What differentiates types of shifts is 1) what happens to the values that get shifted out, and 2) what values are used to shift in.

In a logical shift, bits shifted out are discarded, and 0 is always used to shift in. This is the first type of shift discussed earlier, and one use for it is mathematical multiplication and division of unsigned numbers by powers of 2. It is also useful for simply moving bits around, because you know that bits shifted in are always going to have the same value (0).

An arithmetic shift is very similar to a logical shift, except that right-shifts maintain the sign of a 2's complement number. This is the second type of shift described earlier. Note that left-shifts still shift in 0s -- confirm for yourself that that is the appropriate behavior, even for negative numbers and odd numbers, by trying a few left-shifts and interpreting the numerical values. Arithmetic shifts have one main purpose: multiplying and dividing signed (2's complement) numbers by powers of 2.

In a third type of shift, called a barrel shift, the bits that get shifted out are not discarded. Instead, they are used as the bits that get shifted in. The name comes from imagining it as wrapping the bits around a barrel and rolling it -- whatever goes off one side comes back around on the other side. Because of this, a barrel shift is sometimes called a "rotate" or "roll" instead of a shift.

barrel shift

Barrel shifts are useful for rearranging information, because they do not destroy any information.

Summary

  • Because most binary representations are rooted in positional notation, the algorithms and properties you learned in gradeschool for decimal numbers mostly apply.
  • However, there are some differences when operating on two's complement or signed magnitude numbers, because those systems are not strictly positional.
  • In digital hardware, the number of bits used to represent a number is fixed, which means that only a certain range of numbers can be represented in a given system. This causes overflow when an operation results in a number outside the representable range.
  • Binary values can be shifted left or right. There are different types of shifts, because different desired effects require slightly different shift implementations.