Next page | Contents page |

Bits & bytes

In the early days of programming it was impossible to do anything without knowing about bits and bytes. Nowadays programming languages are sufficiently far abstracted from the details of what really happens in the computer's processor and memory that it is no longer a requirement. Indeed many IT professionals today have little or no knowledge of the subject - they just don't need it.

So skip this page if you wish but it will give you a better understanding of how computers work and in our opinion that means you will be able to write programs that will perform better.

A computer is really made up of tiny switches, implemented in solid state circuitry around transistors. One such switch can have two states: on or off.

In arithmetic a binary digit similarly has 2 states: 0 or 1. So binary numbers are the natural representation of data storage in a computer at the lowest level. The term binary digit is abbreviated to "bit".

Two binary digits together allow 4 states: 00, 01, 10 and 11.

3 bits allow 8 states. The general pattern is that n bits are enough to represent 2n distinct values. So for 4 bits we have:

0000 00
0001 11
0010 22
0011 33
0100 44
0101 55
0110 66
0111 77
1000 88
1001 99

The table shows that 4 bits can very conveniently be represented by a single hexadecimal (base 16) digit. We already met this fact in Part 1 in the box about HTML colours: so you see, knowledge of bits can help.

4 bits is sometimes referred to as a "nibble" because it is half a byte: a byte is 8 bits. So a byte can store 256 possible values, from 0 to 255 inclusive. To store larger numbers we group several bytes together. So in JavaScript the general value of type Number requires 8 bytes (or 64 bits, of which 11 are used for the power of 2).

What about text?

We have seen that basically the computer only manipulates binary numbers. Characters of text are represented by numerical codes. In the early days, when only the Roman alphabet, digits, punctuation and a few other symbols were required, 128 codes were enough, so the character set (called ASCII) fitted into 1 byte per character (with 1 bit to spare, often used as a parity check because the electronics were less reliable then). In the early 1990s the Unicode character set was specified, to cater for all written languages in the world. That is a multi-byte-per-character set but it is so arranged that the first 128 characters are the same as the original ASCII set. For most purposes 2 bytes are sufficient (65,536 values) and that is the basis of the \uxxxx notation (see next page) for using non-ASCII characters in JavaScript (4 hex digits x).

Operators on bits

Bitwise logic:
Corresponding pairs of bits in the two operands are operated on by logical operators:
| OR
^ XOR, exclusive OR: a or b but not both
~ NOT, 1's complement (ie, swap 0s and 1s)

Bit shifting:

For these operations it is necessary to know that the left-most (ie, most significant) bit is often used to represent the numerical sign. There are 2 states again: + or -. To obtain the negative of a signed number it is necessary to do an operation called a 2's complement which is the same as a 1's complement (~) followed by adding 1.

<< shift left by 1 bit, bringing 0 in from the right; equivalent to doubling
>> shift right 1 bit, signed (copy the sign bit); equivalent to halving
>>> right, unsigned (bring 0 in from the left)

Eg, a = b << 3; // Shift b left 3 bits (fill with zeroes)

Next page | Contents page |