Error Detection and Correction

Issue 12

Error Detection and Correction

Everyone knows that it is easy to make mistakes when lots of numbers are involved and I'm not talking about all those slip-ups that occur in exams. It is so easy to get a few digits of a bank account number mixed up, fingers have been known to slip to keyboards and enter the wrong numbers too. Imagine the consequences of these errors�..things might go your way, you might get access to Bill Gates' bank account, for example, but the chances of that are relatively slim. Alternatively, if a bar code gets printed wrongly, you might end up paying the price of a bottle of champagne for your carton of apple juice or if the number on your airline ticket is wrong, who knows where you or your luggage might end up? These sorts of errors would probably always get detected eventually but sometimes it might be too late and the damage already been done. Luckily, there are schemes in place to detect, and in some cases even correct, such errors almost immediately. The aim of this article is to explore such schemes.

Error Detection

There are many different methods of error detection. Generally, there will be a number followed by a number of check digits (most often one for simple error detection but more help with error correction). These digits can be calculated using modular arithmetic, permutations or noncommutative schemes involving dihedral groups. Then, when the number is received by anyone, another calculation can be done to check if the number received is valid. We will now look at how these schemes work and some examples of where they are used.

Modular Arithmetic

Modular arithmetic explained
It involves working a lot with the remainders generated by division. For example, if 36 is divided by 7, the remainder is 1. Using modular arithmetic notation, this can be written 36 = 1(mod 7). Similarly, 47 = 11(mod 12) and 62 = 2(mod 10) and in general

a = b(mod N) if a-b is a multiple of the integer N.

This means that 47=11(mod 12) = -1(mod 12) since 47- (-1) = 48 = 12 × 4.

The simplest check digit schemes use the code number itself as part of a modular arithmetic operation. For example, take a code number, a, create a check digit, b, for this number such that:

a = b(mod N)

If we wanted to use modulus 10 (mod 10) and we have the number 12345, our check digit would be 5 and the number would from then be written as 123455.

Therefore, if we were to write the code number down incorrectly, we might discover that this incorrect number, a', did not fit the criterion that:

a' = b(mod N)

or if the check digit was written incorrectly, then b' would not fit the criterion that:

a = b'(mod N)

and so it would be detected that an error had occurred. However, since our number might be many digits long, trying to work out where the error was, would probably be impossible. Thus this simple scheme cannot be error correcting.

For example, if our number 123455, was wrongly written down as 123445, it would be seen that 12344=/=5(mod 10) but that does not tell us where the error has occurred because even if we know that only one error has occurred, we do not know if the penultimate or final (check) digit was incorrect.

There are various different errors that can occur when numbers are written, printed or transferred in any manner. Different methods of assigning check digits are better at detecting certain kinds of errors than others. The most common types of errors that occur in practice and their frequencies are as follows:

Error type Form Relative frequency
single error a ---> b 79.1%
transposition of adjacent digits ab ---> ba 10.2%
jump transposition abc ---> cba 0.8%
twin error aa ---> bb 0.5%
phonetic error a0 <---> 1a
a = 2,...,9
0.5%
jump twin error aca ---> bcb 0.3%

Another common type of error not mentioned here is one involving the accidental insertion or deletion of characters. These are automatically detected as long as the number has a fixed length.

We are now ready to see how this is all put into practice so let's bring on our first example...

Airline tickets

Airline tickets have a form and serial number followed by a check digit. This check digit is calculated using a modulus 7 scheme so this is like the method already discussed with n=7 and a being the form and serial number.

On the example ticket the number is 3387972544. 3387972544 = 5(mod 7) which is why the check digit is 5.

This is quite a primitive method, is it really any good at detecting errors? If we, look at the relative frequencies of errors, it is best to be able to detect the single digit errors and the transposition of adjacent digits.

Single digit errors
On the sample airline ticket, it can be seen that there is a ten digit number followed by the check digit, 5.

Let the 10 digit number be a and the check digit ac
a = ac(mod 7)

Then, if we represent a by the string (a1, a2,..., a9, a10), if ai (where 1<=i<=10 or i=c) is substituted with a'i, the result is a single digit error.

In most cases it will be found that there is an error because a'-ac=/=0(mod 7) or a-a'c=/=0(mod 7) if the error was made with the check digit.

However, if a'i = ai + or - 7, then a'-ac or a-a'c still gives 0(mod 7). In other words, the error will be undetected if |ai-a'i| = 7. So, 0<->7, 1<->8 and 2<->9 cannot be detected here. This means that amongst the first 10 digits, there are 900 possible substitutions, of which 60 cannot be detected.

An example of one of these undetectable errors is if the number on the example ticket were printed incorrectly as 3387979544. It would have been found that 3387979544 = 5(mod 7) so the number would have been assumed to be correct.

Other possible single digit errors that could occur involve the check digit. The numbers 7,8 and 9 are not allowed remainders so there are only 7 valid check digits, {0,...,6}. This means there are 63 possible errors but all of them would be detectable.

Therefore, the single error detection rate is 903/963 or 93.8%.

Throughout these calculations, we have assumed that all errors have the same probability of being introduced (ie 5 being substituted for 6 is as likely as 1 being substituted for 9). In practice, this is not so but there is not enough data for accurate probabilities to be calculated.

Transposition of adjacent digits
Amongst the first 10 digits, there are 9 pairings. For each pairing there are 100 possibilities. Only the transposition of 90 of these would result in an error since 0 to 0 does not actually affect anything. These errors are undetectable if |ai-aj| = 0(mod 7) (where ai and aj are adjacent digits) so there are 6 undetectable transpositions out of each 90. The total number of errors is therefore 810, of which 54 could not be detected.

When it comes to the pair involving the 10th digit and the check digit, there are 70 possibilities of what the pair could be (since again 7,8 and 9 are not allowed for the check digit). The transposition of 63 of these would result in an error occuring. All of these would be detectable.

This makes the detection rate for transposition of adjacent digits equal to 54/873 or 93.8%.

These error detection rates are quite high but could they be higher? Let's see if better results can be obtained simply by using a different value for N. The choice of modulus 9 is often used. For example, the identity numbers on US Postal Orders consist of 10 digits with the tenth digit being the remainder modulus 9 of the number defined by the first nine digits.

If we were to do the calculations for a number of the same length as that used in the modulus 7 example above (10 digits followed by a check digit), we would find the single error detection rate to be 98.0% which is higher than with modulus 7 and the detection rate for the transposition of adjacent digits is 9.1% which is much lower.

European Article Numbering Code

Barcodes are familiar to most as they are found on the majority of the things we buy. Look at a barcoded item, underneath the barcode itself there is a string of digits, the last of which is a check digit. These use a slightly more complicated scheme of assigning check digits involving a scalar product of the number and a series of 'weights'.

Scalar products explained
A scalar product (sometimes known as a dot product) is just a way of multiplying two lists of numbers together. One multiplies the first number of one list by the first number in the other list, the second with the second etc and then adds all the products together, that is, for example

(a,b,c).(d,e,f) = ad + be + cf

or numerically, (1,2,3).(7,8,9) = (1 × 7) + (2 × 8) + (3 × 9) = 7 + 16 + 27 = 50.

A series of weights consists of a list of certain numbers. They are called weights because they are weighting the digits in the first string. They might be alternating numbers, form an arithmetic progression or be chosen some other way.


In Europe our bar codes follow the European Article Numbering Code (EAN) format. There are two versions, EAN-8 and EAN-13 which use 8 and 13 digits respectively. Since the last digit in both cases is a check digit, only 7 and 12 digits actually encode information. The weights used are alternating 1s and 3s.

Sample EAN-13 bar code
For more details on what information is encoded and how the graphic representation is created, click on the bar code.

Both versions use a modulus 10 scheme with check digits (ac) defined by

ac = - (a1, a2,..., a6, a7).(3, 1, 3, 1, 3, 1, 3)(mod 10) for EAN-8

ac = - (a1, a2,..., a11, a12).(1, 3,...,1, 3)(mod 10) for EAN-13

For example, if we start with the number 1234567 in the EAN-8 scheme, then our check digit is
ac = - (1, 2, 3, 4, 5, 6, 7).(3, 1, 3, 1, 3, 1, 3)(mod 10) = - (1 × 3 + 2 × 1 + 3 × 3 + 4 × 1 + 5 × 3 + 6 × 1 + 7 × 3)(mod 10) = -60(mod 10) = 0,
which makes the full bar code number 12345670.

If we look at the example of an EAN-13 code pictured above, the check digit (the last digit) must be 8 because
- (7, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0).(1, 3, 1, 3, 1, 3, 1, 3, 1, 3, 1, 3)(mod 10) = - (7 × 1 + 0 × 3 + 1 × 1 + 2 × 3 + 3 × 1 + 4 × 3 + 5 × 1 + 6 × 3 + 7 × 1 + 8 × 3 + 9 × 1 + 0 × 3)(mod 10) = -92(mod 10).
8- (-92)=90=10 × 9 so 8 = -92(mod 10).

Again, we must worry about how effection these two schemes are in detecting errors.

Single error detection rate
If the number is weighted by 3, the difference between the new scalar product and the original scalar product will be 3(a'i-ai) so the error will go undetected if 3(a'i-ai) = 0(mod 10).
However, |a'i-ai| can only have the values {1,...,9} since if 0 were included, a'i and ai would have to be equal and then it wouldn't be an error. Therefore, all errors of this sort will be detected.

If the number is weighted by 1, the error will only be undetected if a'i-ai = 0(mod 10) but again |a'i-ai| can only have the values {1,...,9} so this is never possible.

Therefore, this method has a 100% single error detection method.

Transposition of adjacent digits detection rate
Let ai and aj (adjacent digits) be transposed.

If ai is weighted by 3, then the scalar product is changed by (3aj + ai) - (3ai + aj) = 2(aj- ai) which will be detected unless 2(aj - ai) = 0(mod 10) ie |aj - ai| = 5

Similarly, if ai is weighted by 1, then the scalar product is changed by (3ai + aj) - (3ai + aj) = 2(ai - aj) which will go undetected if |ai - aj| = 5

As a result, the transpositions that will go undetected must involve 0<->5, 1<->6, 2<->7, 3<->8 and 4<->9. So, 10 transpostions are undetectable.

There are 100 posibilities for each pairing, the transposition of 90 of these would result in an error. Therefore, the detection rate is 80/90 = 88.9%

UK patent application numbers use a similar modulus 10 scheme. The differences are that the series of weights consists of alternating 2s and 1s and, instead of summing the products, they sum the digits of the products. Ignoring the point of summing the products, using an alternating series of 2s and 1s instead of 3s and 1s would provide a 100% detection rate of transposition of adjacent digits which is better but would have a lower detection rate for single errors.

International Standard Book Number (ISBN)

Most people have come across a book or two in their time, even if they are only those forced on them by school. Every book has a number printed on it called an ISBN number which is ten digits long. Guess what the tenth digit is? Sure enough, this is another check digit.

Sample ISBN numbers

ISBN uses a weighted modulus 11 scheme.

a10 = - (a1, a2,..., a8, a9).(10,9,...,3,2)(mod 11)

Looking at one of the examples above, 0123456789, the last digit, 9, was calculated to be the check digit because
- (0, 1, 2, 3, 4, 5, 6, 7, 8).(10, 9, 8, 7, 6, 5, 4, 3, 2)(mod 11) = - (0 × 10 + 1 × 9 + 2 × 8 + 3 × 7 + 4 × 6 + 5 × 5 + 6 × 4 + 7 × 3 + 8 × 2)(mod 11) = -156(mod 11).
9- (-156) = 165 = 11 × 15 so 9 = -156(mod 11).

The first important thing to note about using a modulus 11 scheme is that the remainder can be 10. But, to put 10 on the end would result in an 11 digit number which would be inconsistent with the format of ISBN. To get around this problem, a check digit of 10 can be represented as an X by publishers.

One might then ask why use modulus 11 when it creates such a problem? However, the scheme is very effective.

It detects all single digit errors. Let the weight (10,9,...,3,2) = (w1, w2,..., w8, w9). If ai is substituted for a'i, the change in the scalar product will be wi(a'i-ai). This error will only go undetected if wi(a'i-ai) = 0(mod 11). Since 11 is prime and a'i-ai cannot equal 0, all such errors would be detected.

Also, this modulus 11 scheme manages to detect all transpositions of adjacent digits. This is because the weights are descending digits. The difference between each adjacent pair is always 1. Therefore, if ai and aj were interchanged, the difference this would make to the scalar product would be just aj-ai but this can never equal 11 or 0 so such a transposition would always be detected.

Magazines have ISSNs (International Standard Serial Numbers), these are not as obviously printed as ISBNs and one often has look inside a magazine to find its ISSN. Both ISSN numbers and many bank account numbers use a scheme similar to that used by ISBN but they both consist of strings that are eight digits long with the eighth digit being the check digit such that

a8 = - (a1, a2, a3, a4, a5, a6, a7).(8,7,6,5,4,3,2)(mod 11)

Therefore, the account number 12341234 would not be valid but 12341231 would be. As with ISBNs, the problem of remainder 10 still exists but banks have chosen not to introduce a letter into their account numbers.

Chosing the optimum modulus for a weighted scheme

This effectiveness of this type of scheme can obviously be varied quite a lot by changing the weights and by using a different modulus. So, what conditions are needed for errors to be detected and is there a best choice for the modulus?

Single digit errors
Let ai be any digit of the number and n be the modulus scheme being used. If ai is substituted with a'i, it will go undetected, if and only if (ai-a'i)wi = 0(mod N). So, for a 100% single error detection rate, it is necessary that the greatest common factor or divisor of wi and n is 1 [gcd(wi,N)=1] and N>=10 so that (ai-a'i) cannot equal 0(mod N).

Transpositions
Let ai be any digit of the number and aj be any other digit of the number (this generalisation is not restricted to adjacent digits and so covers jump transpositions as well) and use modulus N. If ai and aj are transposed, this will be undetectable if and only if (ai-aj)(wi-wj) = 0(mod N). Therefore, for a 100% transposition detection rate, it is necessary that gcd(wi-wj,N)=1 and N>=10.

Other types of errors have not been looked at in great detail in this article because they do not account for a large percentage of errors made. However, they do occur, so it is worthwhile taking a quick look at how to achieve full detection of these too.

Twin errors
Let a be a digit which occupies two adjacent positions in a number (call these positions i and i+1). In both positions, b (a different digit) is substituted for a. The difference this will make to the scalar product will be (awi+awi+1)- (bwi+bwi+1) = (a-b)(wi+wi+1). For these errors to always be detected, gcd(wi+wi+1,N)=1 and, again, N>=10.

Phonetic errors
Suppose 1a is substituted for a0 where 2<=a<=9. This will be undetected if and only if (a-1)wi-awi+1 = 0(mod N). Therefore, there are no particular restrictions on N that help to detect these but the weights must be chosen carefully.

Jump twin errors
Suppose the digits in positions i and i+2 are both equal to a and they are then both substituted by b. It is required that gcd(wi+wi+2,N)=1 and N>=10 so that (a-b)(wi+wi+2)=/=0(mod N).

This all goes to show that it helps to chose an N value that is not less than 10 and is prime. Of course the problem of two digit remainders is still exists.

How to deal with the remainder 10 problem

Firstly there is the method employed by ISBN � using a letter to denote a remainder of 10. This is not always suitable or convenient and lacks consistency. Alternatively, one could just avoid all numbers that yield a remainder of 10 since, providing the number is several digits long, there will still be many possibilities.

Another option is that when a remainder of 10 arises, calculate an alternative check digit using a different set of weights. For example, after finding that a5 = - (a1, a2, a3, a4).(5, 4, 3, 2)(mod 11) = 10, the check digit could then be calculated such that a5 = - (a1, a2, a3, a4).(5, 2, 4, 3)(mod 11) and this used instead. Then, when it comes to checking these numbers, if (a1, a2, a3, a4, a5).(5, 4, 3, 2, 1) = 1+a5(mod 11) rather than 0(mod 11), then instead of discounting it as invalid, (a1, a2, a3, a4, a).(5, 2, 4, 3, 1) is tried as well to see whether this gives 0(mod 11).

There are problems with this method as well. There might be numbers that give a remainder of 10 with both weights so does one then avoid using such numbers or use a third set of weights? Also, a check digit might have been calculated using the first series of weights but then an error was introduced ensuring that the number failed the first test, but passed when tested with the second series of weights and so the error would go undetected. Alternatively, the check digit could have been calculated using the second series of weights but an error is introduced that allows it to pass the first test.

Alphanumeric schemes

Some organisations like to be able to use alphabetic characters as well as numbers in their codes. This may be because they wish to encode information such as initials or to provide a greater variety in their codes. Whatever the reason, the scheme they are quite likely to use is known as Code 39. This uses the digits 0-9, the 26 upper case letters which are assigned the values 10 through 35, the characters '-' and '.' assigned the values 36 and 37 respectively and a blank space given the value of 38. Since there are different characters used here, we can no longer refer to a check digit and must instead use the term check character.

The last character is the check character. It is calculated by converting the original code into a string (a1, a2,..., am-1, am) where 0<=ai<=38. If ac is the check character, then ac = (a1, a2,..., am-1, am).(m,m-1,...,2,1)(mod 39). This value will be converted to the corresponding alphabetical character if necessary.

So, the check character of CHECK1234 can be found by first converting it into a string.
CHECK12345 ---> (12,17,14,12,20,1,2,3,4)
(12,17,14,12,20,1,2,3,4).(9,8,7,6,5,4,3,2,1) = 534 = 27(mod 39)
27 ---> R
Therefore, we now have the code CHECK1234R.

Since 39 is not prime, the error detection rates for this method are not going to be particularly high. Despite this, it is used extensively in nonretail business.

Permutations

Permutations explained
These consist of ordered sets of numbers and a rule for changing each one into another. When the permutation rule involved here is applied to any number in the set, the result is the next number in the set with the last number going back to the first number (it is cyclic). So, if the set was {0,1,2) and the permutation was applied 0->1, 1->2and 2->0. The permutation can be applied more than once. So, with {0,1,2}, if the permutation was applied twice to 0, the result would be 2.

s (a lower case sigma) is generally used to denote a permutation. Each closed cycle is contained within a set of parentheses so the permutation could be s=(0)(12)(345)(6789). s(0)=0 would denote that the permutation had been applied to 0 once and the answer is 0. s3(6)=9 would denote that the permutation had been applied to 6 three times and the answer is 9.

Credit Cards

The scheme used for credit cards was developed by IBM. It requires the use of a permutation s=(0)(124875)(36)(9) ie s(0)=0, s(1)=2, s(2)=4, s(3)=6, s(4)=8, s(5)=1, s(6)=3, s(7)=5, s(8)=7, s(9)=9. You might have noticed that s(x)= the sum of the digits of 2x.

For a 16 digit credit card number, the final digit is the check digit. Let the credit card number be (a1, a2,..., a15, a16) with a16 being the check digit.

Then a16 = -[s(a1) + a2 + s(a3) + a4 + ... + a14 + s(a15)] (mod 10)

Note, in the above example the permutation was applied to ai where i is even (a1, a3 etc) because there are odd number of digits excluding the check digit. Had this scheme been used on a number with an even number of digits excluding the check digit, the permutation would have been applied to ai where i was even.

All single digit errors will be detected using this method since it is relying on a sum of the digits and a single digit error will cause an increase or decrease in the sum by a whole number in the set {1,...,9} which will always change the remainder modulus 10 and will therefore be detected.

Unfortunately, it does not always detect the transposition of adjacent digits. Since neither 0 nor 9 is altered by the permutation, if these two are transposed, the sum will remain the same and the error will go undetected.

Therefore, for each pair of adjacent digits, of the 90 possible transposition errors, two will be undetectable. So, the detection efficiency is 88/90 = 97.8%.

I think it is clear that check digits on credit cards are not a serious effort to reduce major fraud because it is not particularly difficult to calculate a valid check digit. They are, however, invaluable in the fight against errors.

Noncommutative Schemes

Noncommutation explained
The fact that we are going to be using a dihedral group (where D5 is a group of symmetries for an n-sided regular polygon) is not particularly important in itself.

If a scheme is commutative, it means that a*b=b*a, where * can be any opreation. So, addition and multiplication we are used to using are commutative. Therefore, when something is noncommutative, we know this will not necessarily be the case (it might so happen that for some values of a and b, it is commutative but it is not a general rule any more - one counter example means the whole system is regarded as noncommutative). When using the dihedral group D5, as we will do below, we find that it is not commutative since 8*9=4 but 9*8=1. This is not the only case that arises in the group either.

We have seen that using a modulus 11 scheme is so far one's best bet, so long as you can come up with away to deal with the remainder 10 problem. Well, there is another scheme that can achieve a 100% detection rate for single digit errors and transposition errors. It requires the use of dihedral group of order 10, called D5, as shown in the table below, and a permutation such as s=(0)(14)(23)(58697).

Multiplication Table for D5
D5 Multiplication

Multiplication table for D5 explained
If multiplying a by b (a*b), one finds a on the left column and follows it across and b on the top row and and follows it down, where they meet is the answer. For example, 7*6 = 1.

To find the product of more than two numbers, say a*b*c*d, first find a*b, then multiply this answer by c etc, so a*b*c*d = ((a*b)*c)*d. For example check that 1*6*5*2 = 7*5*2 = 2*2 = 4.

When finding the inverse, a-1 of a number using such a multiplication table, if the number is a, then a*a-1=0.

Using all this, for a string of m digits, the check digit, ac is calculated such that
ac = [sm(a1)* sm-1(a2)*...* s2(am-1)* s(am)]-1
So, for the string 12345, using the permutation s=(0)(14)(23)(58697) we can find [s5(1)* s4(2)* s3(3)* s2(4)* s(5)]-1 = (4*2*2*4*8)-1
The multiplication table helps us work out that (4*2*2*4*8)-1 = 5-1
Then it shows us that 5-1 = 5 (since 5*5 = 0) which is our check digit.

Since the check digit is the inverse of the product, when the same product is multiplied by the check digit, the answer will be 0, if there has been no error.
ie sm(a1)* sm-1(a2)*...* s(am)* ac = 0

This is a very powerful scheme since si(a) =/= si(b) if a =/= b so all single errors can be detected.

Also, s(a)*b =/= s(b)*a if a =/= b. This implies that si+1(a)*si =/= si+1*si(a) if a =/= b. As a result, all transpositions of adjacent digits will also be detected.

The check digit schemes and their uses covered in this article are by no means them all. They also appear in serial numbers, driving licence numbers, vehicle identification numbers (VINs) and passport numbers. For security reasons, and paranoia, a lot of organisations will not divulge information about their scheme and in some cases do not even say whether or not they use them. There is also the case that there are places where check digits might prove to be useful but do not actually get used, this is often because they predate check digit schemes. A few countries (Germany is one) use check digits on their banknotes but the Bank of England has no such feature on its notes as the numbers are just assigned sequentially. Furthermore, the uses of check digits are increasing. For example, OCR are going to introduce a modulus 17 scheme to provide validation of Unique Candidate Identifiers that are being implemented for the new A levels and GNVQ GNVQs.

I think the world of check digits will continue to expand as people will always be trying to find the optimum check digit scheme which is both easy to implement and reliable as there is always some compromise involved and new uses for them will continue to appear. You could see if you could come up with one yourself.

Error Correction

Two check digits

Being able to detect that an error has occurred is all well and good but it would be helpful to be able to correct it too - more check digits can do just that. Introducing a second check digit means that one can be used, as before, to detect and find the magnitude of an error and the other can then locate and correct it.

As usual, modulus 11 is a popular one to use. An effective two check digit scheme is one that uses modulus 11 twice but two different series of weights. For example, taking an m digit number (a1, a2,�, am-1, am) and give it check digits am+1 and am+2 such that:

/*Sigma stuff*/

Using these two check digits, all double errors can be detected and all single errors corrected.

For example, if we had the number 12345, 1+2+3+4+5=15 so 15+am+1+am+2 = 0(mod 11) and 1 × 1+2 × 2+3 × 3+4 × 4+5 × 5 = 55 so 55+6am+1+7am+2 = 0(mod 11).
This gives am+1=5 and am+2=2. So, the full number is now 1234552.

If the error 1234552 ---> 1239552 occured, then 1+2+3+9+5+5+2 = 27 = 5(mod 11). Assuming that only a single error has occurred, this tells us that one of the digits is 5 too large. 1 × 1+2 × 2+3 × 3+9 × 4+5 × 5+5 × 6+2 × 7 = 119 = 9(mod 11) = 20(mod 11). 20 = 5 × 4 so this tells us that the fourth digit is where this error occured.

The first scalar product could have told us that one of the digits was 6 too small but then no sensible answer would have been obtainable from the second scalar product.

We could go further and introduce even more check digits and thus be able to correct a greater number of errors. This does have a downside though because the more check digits that are used, the longer the number becomes and also, more calculations are involved so the whole process gets more complicated and time consuming.

Bar codes

As discussed previously, bar codes have a check digit that can tell us if an error has occurred. This alone does not have the capability to correct the error. However, due to the way these numbers are represented as light and dark bars, some errors can be corrected. Each digit (excluding the very first one but including the check digit) is turned into a binary code seven digits long which is then represented by seven modules which will appear merged if more than one of a certain kind are adjacent. All 0s are represented by white space modules and 1s are represented by dark modules. The key is that in each representation of a decimal digit, there must be two blocks of light modules and two blocks of dark modules so 0100111 would be valid but 0101010 would not.

Valid and Invalid Patterns


If a bar code is wrongly printed and the result is an invalid seven module pattern, this will be detected and the location of this error will be remembered. Then, the check digit will be reached and by doing a calculation on the digits that have been correctly read and by knowing the weight applied to the digit that was incorrectly read, the 'missing' digit can be calculated.

Error correcting schemes do not end here. They are used extensively with binary data and appear in our CD players, digital televisions and even in the transmission of data from space probes. Some computer viruses even use error correcting codes to check and repair themselves it someone has modified them. A lot of these rely on polynomial arithmetic rather than check digits. Finally, one last type of error correcting code that do involve check digits are Hamming codes. (If you are familiar with matrices, you may wish to follow the link to discover more about how such a code can be constructed). Error correcting also takes place in Nature, it is believed that a lot of DNA which appears to be redundant, actually is involved in an error correcting procedure which avoids errors in DNA replication so it is fair to say that without efficient error detection and correction we would not be here.

References

For All Practical Purposes
Consortium for Mathematics and its Applications, WH Freeman & Company, 5th Edition
http://www.whfreeman.com/comap

Error Detection Methods
ACM Computing Surveys 28(3), 504-517, Sept 1996
J A Gallian

The Mathematics of Identification Numbers
The College Mathematics Journal 22(3), 194-202, May 1991
J A Gallian

Modular Arithmetic in the Marketplace
American Mathematical Monthly 95, 548-551, 1998
J A Gallian, S Winters

A modulus 11 check digit scheme for a given scheme of codes
Computer Bulletin 14, 12-13, 1970
D V A Campbell