Hamming codes

September 2000

Hamming codes are linear binary codes. The data is split into blocks and each block has check digits assigned to it. A block of data will be of length $k$ , to which $n-k$ check digits are assigned, so the length of data transmitted is $n$ digits. They use modulo 2 arithmetic. Multiplication is done as usual so $0 \times 0 = 0 \times 1 = 1 \times 0 = 0$ and $1 \times 1 = 1$ . Addition is mostly as usual with 0+0=0 and 0+1=1+0=1 but there is the one exception of 1+1=0.

Let the block of data of length $k$ (ie the original message) be represented by ${\bf m}$ . Then the codeword ${\bf x}$ will be such that ${\bf x=m G}$ where ${\bf G}$ is the generating matrix.

The generating matrix needs to be in the form ${\bf [I_ k A]}$ to ensure that after the multiplication, the first $k$ digits of the codeword are the original message and they are then followed by the check digits.

Then there is the parity check matrix, ${\bf H}$ , such that when multiplied by the transpose of any codeword, the transpose of the product (known as the syndrome) will be zero:

${\bf Hx^ T = s^ T = 0}.$

From this it can be shown that when using a generating matrix in the form ${\bf [I_ kA]}$ , the parity check matrix must have the form ${\bf [A^ T I_{n-k}]}$ .

Each pair of columns in the parity check matrix must be linearly independent. This means that there cannot be an zero column and they must all be different.

The distance between two codewords is defined as the number of places where the digits of the two words differ. For example, the distance between 0000000 and 0100010 is 2. The minimum distance between codewords in Hamming codes is 3. This means that only single errors can be detected. If two errors occurred in one codeword, then it would end up closer to a different valid codeword than the one that it had originated from and so it would be assumed it had come from the other one.

With Hamming codes, $n = 2^{n-k}-1$ , so if we were to create a code with 3 check digits appended to each message, $n = 2^3 -1 = 7$ . So the original messages could be of length four.

Our parity check matrix could be

which would make G equal to

Letting m be [1010], x = mG = [1010101].

If an error occurred and x is replaced by x' = [1011101], then Hx'^T = Syndrome .

The fact that this is not zero shows us that an error has occurred, and that the transpose of the syndrome is the same as the fourth column of the parity check matrix tells us that the error was with the fourth digit. Therefore, by changing the fourth digit, we have the correct codeword again of [1010101]. Then we just have to remove the last three digits to know what message was sent.

To see how this works, looking at G, we knew that the first four digits of the codeword were going to be the message since the identity was being used. Then, since we are using modulo 2 arithmetic, we can see that the fifth digit is the sum of the second, third and fourth digits, the sixth digit is the sum of the first, third and fourth digits and the seventh digit is the sum of the first, second and fourth digits.

$\displaystyle x_1$	$\displaystyle =$	$\displaystyle m_1,$
$\displaystyle x_2$	$\displaystyle =$	$\displaystyle m_2,$
$\displaystyle x_3$	$\displaystyle =$	$\displaystyle m_3,$
$\displaystyle x_4$	$\displaystyle =$	$\displaystyle m_4,$
$\displaystyle x_5$	$\displaystyle =$	$\displaystyle m_2 + m_3 + m_4,$
$\displaystyle x_6$	$\displaystyle =$	$\displaystyle m_1 + m_3 + m_4,$
$\displaystyle x_7$	$\displaystyle =$	$\displaystyle m_1 + m_2 + m_4.$

Then, looking at the parity check matrix, it can be seen what sums are involved in finding the syndrome:

$s_1 = x_2 + x_3 + x_4 + x_5,$

but

$x_5 = m_2 + m_3 + m_4 = x_2 + x_3 + x_4,$

$s_1 = 2(x_2 + x_3 + x_4) = 0.$

Similarly,

$s_2 = x_1 + x_3 + x_4 + x_6 = 2(x_1 + x_3 + x_4) = 0,$

and

$s_3 = x_3 + x_2 + x_4 + x_7 = 2(x_1 + x_2 + x_4) = 0.$

So if, as in our example, an error occurs in the fourth position, it is important to remember that the fifth, sixth and seventh digits were calculated using the true value of the fourth digit. All three zeros in the syndrome rely on the fourth digit of the codeword being equal to the fourth digit that was used to calculate the extra digits so, since this is not the case, it ends up with three ones which then matches the fourth column of the parity check matrix.

back to main article

Reference

http://web.syr.edu/~rrosenqu/ecc/main.htm

About the author

I am currently in the U6 at the Perse School for Girls, Cambridge studying for A levels in maths, further maths, physics and chemistry. I hope to go on to study maths at university in 2001 (destination unknown).

I wrote this article whilst working with the Millennium Mathematics Project, during the summer of 2000, arranged and funded by the Nuffield Science Bursary Scheme.