An Introduction to Coding Theory

Binary Repetition and the Hamming Distance
a simple error-correcting code
Alexander Wei
May 7, 2020

Outline
Introduction
Error Correction Overview
Linear Codes
Binary Repetition
Hamming Weight
Implementation
Code Generation
Encoding
Decoding
Efficiency
Demonstration

Introduction
What is Coding Theory?

Introduction
"Coding Theory is the study of the properties of codes and their
respective fitness for specific applications" ("Coding Theory").

Introduction
I Error Correction
I Cryptography
I Data Compression

Introduction
I Error Correction
I Cryptography
I Data Compression
Examples
I For the 1972 Mars Mariner mission, a binary (32,64,16)
code was used. Here, 32 = 6 + 26, with 6
information symbols and 26 redundancy symbols (Hill
6).

Introduction
I Error Correction
I Cryptography
I Data Compression
Examples
I For the 1972 Mars Mariner mission, a binary (32,64,16)
code was used. Here, 32 = 6 + 26, with 6
information symbols and 26 redundancy symbols (Hill
6).
I In the 1979 Voyager spaceship to Jupiter, a binary
(24,642,8) code was used. This time, 24 = 12 +
12, with 12 information symbols and 12 redundancy
symbols (Hill 6).

Introduction
What is a code?
A code is a "is a system of rules to convert information—such as a
letter, word, sound, image, or gesture—into another form or
representation" ("Code").
In short
Where M contains all possible messages and V contains all possible
encodings, an encoding scheme will be a map
T : M → V .
Given a message m ∈ M, we send m0 = T(m) to a recipient who
calculates T−1(m0) to produce the original message, often a single
bit-value.
Why do we do this?
Reasons include resiliency of data over a noisy channel

Overview
Figure 1: Basic operation of an error-correcting code (Hill 2)
Steps
Code-words representing messages are pre-determined.
1 Sender takes a message m and computes its code-word T(m)

Overview
Steps
2 Sender sends T(m) over a communication channel, such as a
phone line or radio link

Overview
Steps
3 Recipient receives (T(m))0 although it may have been altered
due to noise in the channel

Overview
Steps
3 Recipient receives (T(m))0 although it may have been altered
due to noise in the channel
4 Recipient calculates the code-word vector most likely to
correspond with (T(m))0

Linear Codes
Definition
In a linear code the set of code-words, representing possible
messages, forms a vector space.
Consider
V = Fn
over a field F which is typically a finite field, or Z2 in the case of
binary codes.
Then
I A code C is a vector subspace, C ⊂ V , consisting of all
possible code-words.
I C = {T(m) | m ∈ M} where M is the set of all possible
messages and T is the encoding scheme.

Linear Codes
Generation
Where |F| = m and V = Fn we have a m × n generator matrix A.
We will suppose that the message space M ⊂ F. Then the
code-words are generated by the transformation
T : M → V
m 7→ m ⊗ A
where ⊗ denotes special matrix multiplication.

Linear Codes
Example
Perhaps the simplest of all coding schemes is Binary Repetition.

Linear Codes
Example
I A binary bit is repeated some n number of times.

Linear Codes
Example
I A binary bit is repeated some n number of times.
I The sequence of bits is sent over a channel

Linear Codes
Example
I The received packet of bits decides the resulting code-word by
majority

Linear Codes
Example
I The received packet of bits decides the resulting code-word by
majority
I The code-word corresponding to the received packet is
converted back into a message

Binary Repetition
Code generation
A binary repetition code has two messages

Binary Repetition
Code generation
A binary repetition code has two messages, {0, 1} and two
code-words

Binary Repetition
Code generation
A binary repetition code has two messages, {0, 1} and two
code-words generated by a transformation
T : Z2 → (Z2)n
= C
0 7→ ~
0
1 7→
X
i∈I
1 · βi = ~
1
where (β)I is a basis for C and
βi = [..., 1, ...] ,
a vector with all zeroes except for the ith row.

The Hamming Metric
Definition
The Hamming weight of a binary code-word v =
P
i∈I (vi · βi ) is a
function
H(v) = d(v,~
0) =
X
i∈I
(vi − 0i ) =
X
i∈I
vi

The Hamming Metric
Definition
P
function
H(v) = d(v,~
0) =
X
i∈I
(vi − 0i ) =
X
i∈I
vi
which evaluates to the number of its non-zero bits. The function
d(x, y) returns the number of bits in which x and y differ and is
called the Hamming distance.

The Hamming Metric
Definition
P
function
H(v) = d(v,~
0) =
X
i∈I
(vi − 0i ) =
X
i∈I
vi
which evaluates to the number of its non-zero bits. The function
d(x, y) returns the number of bits in which x and y differ and is
called the Hamming distance.
Properties
Non-negativity d(x, y) ≥ 0
Identity d(x, y) = 0 ⇔ x = y
Symmetry d(x, y) = d(y, x)
Triangle Inequality d(x, y) ≤ d(x, z) + d(z, y)

Examples
Hamming distance
If x = [1, 1, 1, 1] and y = [0, 0, 0, 0] in (Z2)4 then
d(x, y) =

Examples
Hamming distance
If x = [1, 1, 1, 1] and y = [0, 0, 0, 0] in (Z2)4 then
d(x, y) = 4
If x = [1, 1, 1, 0], y = [0, 1, 1, 1] then
d(x, y) =

Examples
Hamming distance
If x = [1, 1, 1, 1] and y = [0, 0, 0, 0] in (Z2)4 then
d(x, y) = 4
If x = [1, 1, 1, 0], y = [0, 1, 1, 1] then
d(x, y) = 2
If x = [0, 0, 1, 1], y = [0, 0, 0, 1] then
d(x, y) =

Examples
Hamming distance
If x = [1, 1, 1, 1] and y = [0, 0, 0, 0] in (Z2)4 then
d(x, y) = 4
If x = [1, 1, 1, 0], y = [0, 1, 1, 1] then
d(x, y) = 2
If x = [0, 0, 1, 1], y = [0, 0, 0, 1] then
d(x, y) = 1

Binary Repetition
How well does B.R. work?

Binary Repetition
Implementation
I Sage allows for computations over finite fields

Binary Repetition
Implementation
I A set of binary codes is a subspace of F = (Z2)n

Binary Repetition
Implementation
I A set of binary codes is a subspace of F = (Z2)n
F ∼
=
Z2[x]
(f (x))Z2[x]
: f (x) is irreducible, deg f = n

Binary Repetition in Sage
Irreducible polynomials
∀n > 0 , f (x) = xn + xn−1 + 1 ∈ Z2[x] is irreducible.
Proof
f (0) ≡ 1 ≡ f (1) 6≡ 0 (mod 2)

Proof
f (0) ≡ 1 ≡ f (1) 6≡ 0 (mod 2)
Getting started

Proof
f (0) ≡ 1 ≡ f (1) 6≡ 0 (mod 2)
Getting started
Figure 2: Construct vector space and code word subspace

Binary Repetition, n = 3
Code Generation
A = xn−1 + xn−2 + ... + 1
C = [1 * A, 0 * A]
where
A =
X
i<n
xi
C = {T(m) | m ∈ M}

Code Generation
A = xn−1 + xn−2 + ... + 1
C = [1 * A, 0 * A]
where
A =
X
i<n
xi
C = {T(m) | m ∈ M}
Since our two messages are 0, 1 ∈ Z2, the code-space
C = {0 · Z2 ⊗ A , 1 · Z2 ⊗ A} .

Let’s take a message
HERE IS A SAMPLE MESSAGE. THIS WILL BE CONVERTED TO
BINARY FORMAT BEFORE BEING ENCODED.

Let’s take a message
Because there are 26 letters in the alphabet and another two for
period and space, we will use five bits to store each character.
25
= 32 > 28
The characters
ui 7→ Bin(i − 1)
. 7→ Bin(26)
␣ 7→ Bin(27)
where Bin 0 = 0, Bin 1 = 1, Bin 2 = x, ... , Bin n = n’s binary
representation in F.

Sending a message
Characters to binary
H −→ Bin(7) −→ x2
+ x + 1 ←→ [0][0][1][1][1]

Sending a message
H −→ Bin(7) −→ x2
+ x + 1 ←→ [0][0][1][1][1]
E −→ Bin(4) −→ x2
←→ [0][0][1][0][0]

Sending a message
H −→ Bin(7) −→ x2
+ x + 1 ←→ [0][0][1][1][1]
E −→ Bin(4) −→ x2
←→ [0][0][1][0][0]
R −→ Bin(17) −→ x5
+ 1 ←→ [1][0][0][0][1]
· · ·

Sending a message
H −→ Bin(7) −→ x2
+ x + 1 ←→ [0][0][1][1][1]
E −→ Bin(4) −→ x2
←→ [0][0][1][0][0]
R −→ Bin(17) −→ x5
+ 1 ←→ [1][0][0][0][1]
· · ·
Binary encoding
To send the first three letters we encode 3 × 5 = 15 bits.

Sending a message
H −→ Bin(7) −→ x2
+ x + 1 ←→ [0][0][1][1][1]
E −→ Bin(4) −→ x2
←→ [0][0][1][0][0]
R −→ Bin(17) −→ x5
+ 1 ←→ [1][0][0][0][1]
· · ·
Binary encoding
To send the first three letters we encode 3 × 5 = 15 bits.
Recall that
0 · Z2 ⊗ A = 0 ⊗ [1, 1, 1] = ~
0
1 · Z2 ⊗ A = 1 ⊗ [1, 1, 1] = ~
1

Sending bit by bit, n = 3
The 15 bits holding the first three characters become 15 × n = 45
bits after encoding.

Receiving bit by bit, n = 3
A noisy channel will induce an average error rate of some
P(bit error) =
# incorrect bits
100 bits
.

Decoding a message
Error Correction
I Even when repeating bits, the system is not flawless.

Decoding a message
Error Correction
I When errors make up a majority of a bit’s encoding, the
received message is incorrect.

Decoding a message
Error Correction
I When errors make up a majority of a bit’s encoding, the
received message is incorrect.
Definition
Recalling that the set of code-words C is a vector subspace of V ,
we define the syndrome of a code-word x0 ∈ C
S(x0, δ) = {x ∈ V | d(x0, x) ≤ δ} .

Syndrome Computation
Let C = {~
0,~
1} ⊂ V ∼
= F = Z2[x]
(x3+x2+1)Z2[x]
.
Example
S(~
0, 1) = {~
0, [1][0][0], [0][1][0], [0][0][1]}

Let C = {~
0,~
1} ⊂ V ∼
= F = Z2[x]
(x3+x2+1)Z2[x]
.
Example
S(~
0, 1) = {~
0, [1][0][0], [0][1][0], [0][0][1]}
These vectors correspond to elements of the quotient
{[0], [x2
], [x], [1]} ⊂ F .

Let C = {~
0,~
1} ⊂ V ∼
= F = Z2[x]
(x3+x2+1)Z2[x]
.
Example
S(~
0, 1) = {~
0, [1][0][0], [0][1][0], [0][0][1]}
{[0], [x2
], [x], [1]} ⊂ F .
And
S(~
1, 1) = {~
1, [1][1][0], [0][1][1], [1][0][1]}

Let C = {~
0,~
1} ⊂ V ∼
= F = Z2[x]
(x3+x2+1)Z2[x]
.
Example
S(~
0, 1) = {~
0, [1][0][0], [0][1][0], [0][0][1]}
{[0], [x2
], [x], [1]} ⊂ F .
And
S(~
1, 1) = {~
1, [1][1][0], [0][1][1], [1][0][1]}
with
{[x2
+ x + 1], [x2
+ x], [x + 1], [x2
+ 1]} ⊂ F .

Figure 3: Syndrome vectors for Binary Repetition, n = 3

"Nearest neighbor" decoding
If we choose to send each bit some odd number n times, the
syndrome
S

x0,
jn
2
k
contains the codes which will be corrected to x0 upon receipt.

Nearest neighbor decoding
If we choose to send each bit some odd number n times, the
syndrome
S

x0,
jn
2
k
contains the codes which will be corrected to x0 upon receipt.

Code efficiency
Let r = P(bit error) the average measure of channel noise, and let
n = 3.
Then the chance of a received code-word being correctly decoded is
P(CW) = (1 − r)3
+ 3 · r(1 − r)2
,

Code efficiency
n = 3.
P(CW) = (1 − r)3
+ 3 · r(1 − r)2
,
the combined probability of having all three bits received correctly,
plus having a single error in one of the three bits.

Code efficiency
n = 3.
P(CW) = (1 − r)3
+ 3 · r(1 − r)2
,
the combined probability of having all three bits received correctly,
plus having a single error in one of the three bits.
In general
P(CW) =
bn/2c
X
i=0

n
i

(1 − r)n−i
ri

Simulating a text transmission
BINARY FORMAT BEFORE BEING ENCODED. (len. 86)
Because each character takes up five code-words, we need to
receive five correct code-words in sequence.

Suppose r = 10%
I ((P(CW))5|n=3 = ((.9)3 + 3 · (.1)(.9)2)5 ≈ (.97)5 ≈ 87%
11 errors

Suppose r = 10%
I ((P(CW))5|n=3 = ((.9)3 + 3 · (.1)(.9)2)5 ≈ (.97)5 ≈ 87%
11 errors
I ((P(CW))5|n=5 ≈ (.99)5 ≈ 96%
3 errors

Suppose r = 10%
I ((P(CW))5|n=3 = ((.9)3 + 3 · (.1)(.9)2)5 ≈ (.97)5 ≈ 87%
11 errors
I ((P(CW))5|n=5 ≈ (.99)5 ≈ 96%
3 errors
I ((P(CW))5|n=7 ≈ (.997)5 ≈ 98.6%
1 error

10% channel error
No encoding
HARE IS A SAMNJETMECSMGEZ THK.Z KLL.JU.CONVERRFDLSO AJ
ARY FOQMA. DENOZOLBAINGLENCKHUDK
n = 3
HERA IS E SAMPLE MESSAGE. THIS WILL BE CPNVEVTED TO
BANAQY FPRMAX BEFMRG JEING ENCODED.
n = 5
HERE IS A SAMPLE MECSAGE. THIS WILL BE COPVERTED TG
BINARY FORMAT BEFORE BEJNG ENCODED.
n = 7

Works Cited
“Code.” Wikipedia, Wikimedia Foundation, 1 May 2020,
en.wikipedia.org/wiki/code
“Coding Theory.” Wikipedia, Wikimedia Foundation, 15 Apr. 2020,
en.wikipedia.org/wiki/coding_theory.
Hill, Raymond. A First Course in Coding Theory. Clarendon Press,
1986.

An Introduction to Coding Theory

More Related Content

What's hot

Similar to An Introduction to Coding Theory

Recently uploaded

In this document

An Introduction to Coding Theory