INFORMATION THEORY AND CODING
Ch. Ravi Kumar
Assistant Professor, Department
of E.C.E, Sir C. R. Reddy College of Engineering, Eluru, India
If the rate of information
from the source does not exceeds the capacity of a communication channel; then
there exists a coding techniques ,such that the information can be transmitted
over the channel with an arbitrary small frequency of errors , despite the
presence of noise.
1. Measure of information:
The outputs from an information source such as data,
speech can be regarded as a random process, the information theory defines the self
information of an event x i with probability P(xi)
self information :
Ii = - log2 Pi
= log2 |
This measure of
information is large/small for an event of lower/higher probability. A message
that there was or will be an earthquake in the area where earthquake is very
rare has a value of big news and
therefore can be believed to have a
lot of information. But, another
message that there will be an earthquake
in the area where earthquakes are frequent is of
much less value as a news
and accordingly, can be
regarded as having a little
information.
1.1
Properties of Self information:
1)
Self information is a non negative quantity
2) I
i ≥ 0 for 0≤ P
I ≤ 1
3) I i –- 0 for P I --- 1
4) I i ≥ I j for P i ≤ P j
5) Joint probability:
If X i & X
j are two successive and
independent events then the probabilities P(X i , X j) = P i , P j and the self information
Iij = log2
I
i j = I i +I j
1,2 Average information (or) entropy:
To find average information we required probability of events
“ABADCADBCDBADCBADC”
PA , PB , PC,
PD are to be
calculate and total number of symbols are “ N”
Therefore ITotal = N. PA log2
Suppose a source emotes one of N possible symbols with the probabilities P1
, P2, P3, -------PN
Therefore Average
information H =
H
= PA log2
H= |
Problems:
1)
A source emits one of four symbols with probabilities
H=
1,3
Properties of Entropy:
Consider
a discrete memory less source emits the symbols S0 , S1 ,
S2, ---SM with
the probabilities P1 , P2 , P3------, PM then the entropy H(s) of the source is
bounded as
0
≤ H(s) ≤ log2
M |
1) H(s) =0 If probability of one symbol is ‘1’ and the
probability of remaining symbols are ‘0’
that is no randomness
or no uncertainty , i.e every time only one symbol emits all the time.
Therefore no information
2) H(s) = log2 M, If probability of symbols are equitable , that
is randomness or uncertainty ,
it has maximum information
2): A source emits eight symbols with equal probabilities
H=
= 8
Entropy of Binary memory less
source:
Consider
the probability of logic 0 is P and probability of logic 1 is 1- P then the
entropy H(s) is
H(s) = p log2
H(s) is maximum when the derivative
of H(s) with respective P and equate it to 0
i.e
3)
Two symbols are emitted by the source having probabilities
H(s) =
1.4
Information rate:
The
discrete memory less emits symbols with symbol rate and the average information then the rate of
information is given by
R = r
H(s) bits / sec |
Where ‘r’
is symbol rate
4)
Suppose a source emits 2000 symbols / sec from an alphabet size of 4 with symbol probabilities are given bellow.
Find the self information , entropy and information rate
Symbol |
Probability
|
Self
information |
A |
|
|
B |
|
|
C |
|
|
D |
|
|
H(s)
=
R
= r H(s) = 2000 x 1.75 = 3,500 bits / sec
1,5
Extended Entropy for Discrete memory less source: Entropy of extended sources is defined as the ‘n’th order by
H (sn) = n H(s) |
Where ‘n’ is order of extinction of the source
if n=2 ,
second order of extent ion
Let
3 symbols S0 , S1 , S2, are
emitted by the source and second order extension then
the number of blokes = 32 = 9 those are
New
symbol |
ﮯ 0 |
ﮯ 1 |
ﮯ 2 |
ﮯ 3 |
ﮯ 4 |
ﮯ 5 |
ﮯ 6 |
ﮯ 7 |
ﮯ 8 |
S0
S0 |
S0
S1 |
S0
S2 |
S1
S0 |
S1
S1 |
S1
S2 |
S2
S0 |
S2
S1 |
S2
S2 |
|
Probabilities |
P0
P0 |
P0
P1 |
P0
P3 |
P1
P0 |
P1
P1 |
P1
P2 |
P2
P0 |
P2
P1 |
P2
P2 |
5)
Consider the DMS with symbol probabilities
a) Entropy of the source
b) Entropy of 2nd order extended
source
sol: a)
H(s) =
b) Entropy of 2nd order
extended source
New
symbol |
ﮯ 0 |
ﮯ 1 |
ﮯ 2 |
ﮯ 3 |
ﮯ 4 |
ﮯ 5 |
ﮯ 6 |
ﮯ 7 |
ﮯ 8 |
S0
S0 |
S0
S1 |
S0
S2 |
S1
S0 |
S1
S1 |
S1
S2 |
S2
S0 |
S2
S1 |
S2
S2 |
|
Probabilities |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H(s)
=
H(s)
=
6)
Consider the DMS with symbol probabilities,0.7,0.15,0.15 ,Calculate
a) Entropy of the source Answer: 1.18 bits / symbol
b) Entropy of 2nd order extended
source Answer: 2.362 bits / symbol
2, Shannon’s source coding: (Shannon’s First
Law)
Sometimes we
want to find an efficient code using fewest bits to represent the information.
Shannon’s source coding theorem or noiseless coding theorem states that we need
the number of bits not less than the entropy in order to encode an information
message in such a way that perfect decoding is possible..
The average
codeword length of any code assigned to the set S of symbols cannot be less than the
entropy H(s) defined, the
average codeword length
H(s) /
≤ 1 |
2.1. Kraft Inequality: In order to
check uniquely decodable condition at the
receiving end of communication
system follow the Kraft Inequality condition as
K =
A discrete memory less source emits M equal
symbol probabilities then the information rate
R = r H(s) = r log M
When the symbol
probabilities are deferent then the information rate
R = r H(s) ≤ r log M
Efficient
transmission required as encoding process that is:
R = r H(s) ≤ rb
H(s) ≤
Minimum
value of
i.e
H(s) ≤
Efficiency
ŋ =
Nmin (minimum number of
bits for each symbol) Optimum source coding achieve the lower boundary i.e N =H(s) otherwise sub Optimum
code
The average
codeword length of any code assigned to the set S of symbols cannot be less than the
entropy H(s) defined, the average codeword length
≤ 1
Kraft Inequality: In order to check
uniquely decodable condition at the receiving end of communication system
follows the Kraft Inequality condition as
K =
7)
A Discrete memory less source the symbols along with codes given bellow ,
find realizable or optimum condition and also verify uniquely decodable
condition
Symbol (xi) |
Probability (Pi) |
Code 1 |
Code 2 |
Code 3 |
Code 4 |
A |
|
00 |
0 |
0 |
0 |
B |
|
01 |
1 |
01 |
10 |
C |
|
10 |
10 |
011 |
110 |
D |
|
11 |
11 |
0111 |
111 |
Code 1:
H(s)
= H(s) =
H(s)
≤
K
=
Because
of
And K
less than 1 Therefore Code 1 uniquely decodable
Code 2:
H(s)
= H(s) =
H(s)
≥
Because
of
For
uniquely decodable condition Kraft
Inequality, should
be less than 1 (K =
= 2-1 +2-1 +2-2 +2-2 = 1.5 bits
Because
of K is Greater than 1 i.e (
K ≥ 1 )
Therefore Code 2 is not uniquely decodable
BAABB = 10011
CABB =
10011
CAD = 10011
Note:
Students are advised to verify code 3
and code 4 also
2.2 .Huffman Coding:
Huffman coding is a variable length
coding which has a strategy to minimize the average number of bits required for
coding a set of symbols by assigning shorter/longer length code to more/less
probable symbol. It is performed in two steps – merging and splitting as
Procedure:
9) Merging: Repeat arranging the symbols in descending
order of probability as in each column
of Table 9.1 and merging the least
probable two entries into a new entry with a probability
equal to the sum of their probabilities
in the next column, until only two ntries remain.
(2)
Splitting: Assign 0/1 to each of
the two remaining entries to make their initial parts of
codeword. Then repeat splitting back
the merged entries into the two (previous) entries
and appending another 0/1 to each of
their (unfinished) codewords.
8) A Discrete memory less
source emits 5 symbols with the probabilities are given bellow, find the
Huffman code and calculate the efficiency?
{S0 , S1 , S2 , S3 , S4 },
{0.4 , 0.2 , 0.2 , 0.1 , 0.1}
CODE |
Symbols |
Probability |
Step1 |
Step2 |
Step3 |
00 |
S0 |
00 |
|
1 |
0.6 0 |
10 |
S1 |
10 |
01 |
0.4 00 |
0.4 1 |
11 |
S2 |
11 |
10 |
0.2 01 |
|
010 |
S3 |
010 |
0.2 11 |
|
|
011 |
S4 |
0.1 011 |
|
|
|
Efficiency =
H(S) =
= 0.4 log
= 2.2 bits
Efficiency =
Variance =
= 0,4(2-2.2)2 +
0.2(2-2.2) 2 + 0.2(2-2.2) 2+ 0.1(3-2.2) 2+
0.1(3-2.2) 2
= 0.4(0.04) + 0.2(0.04) + 0.2(0.04) +
0.1(0.64) + 0.1(0.64)
= 0.016 + 0.008 + 0.008 + 0.064+ 0.064
Variance =
0.16
8) A Discrete memory less
source emits 5 symbols with equal probabilities, find the Huffman code and
calculate the efficiency?
Equal probabilities meals {
0.2 , 0.2, 0.2, 0.2 , 0.2}
CODE |
Symbols |
Probability |
Step1 |
Step2 |
Step3 |
01 |
S0 |
01 |
00 |
1 |
0.6 0 |
10 |
S1 |
10 |
01 |
00 |
0.4 1 |
11 |
S2 |
11 |
10 |
0.2 01 |
|
000 |
S3 |
000 |
0.2 11 |
|
|
001 |
S4 |
0.2 001 |
|
|
|
H(S) =
= 0.2 log
=
with
the corresponding probability vector
P(x) = [0.2 , 0.15, 0.13, 0.12, 0.1, 0.09,
0.08,0.07,0.06]
Ex1) A DMS emits 5 symbols with the probabilities
0.4, 0.2, 0.1, 0.1, 0.2 respectively,
compute the Huffman code and calculate the efficiency, variance?
Efficiency =
Variance =
Ex 2) A DMS emits 5 symbols with the probabilities 0.55,
0.15, 0.15, 0.10, 0.05 respectively,
compute the Huffman code and calculate the efficiency, variance?
Ex 3) A DMS emits 8 symbols with the probabilities 0.26,
0.25, 0.14, 0.09, 0.08,0.07, 0.07, 0.04 respectively, compute the Huffman code and calculate the
efficiency, variance?
Ex 4) A DMS emits 8 symbols with the probabilities 0.4,
0.3, 0.1, 0.06, 0.05,0.04, 0.04, 0.01 respectively, compute the Huffman code and calculate the
efficiency, variance?
Ex 5) A DMS emits 7 symbols with the probabilities 0.37,
0.33, 0.16, 0.07, 0.04,0.02, 0.01
respectively, compute the Huffman code
and calculate the efficiency, variance?
Ex 6) A DMS emits 7 symbols with the probabilities 0.46,
0.26, 0.12, 0.05, 0.04, 0.03, 0.02
respectively, compute the Huffman code
and calculate the efficiency, variance?
2.3 Shannon Fano Code:
Shannon Fano Code is the another type of source code the Procedure is as follows
: Step 1 Draw a line that divide the symbols
in to two groups, such that the group probabilities
Are as nearly equal as possible.
Step 2.Then
assign logic 0 for each symbol in the group above the line and the logic 1 to
each
Symbol in the group below the line.
Step 3.For
all sub sequent steps , you sub divide each group in to sub groups ,assign
digits by
The previous rule.
Step 4.
When even a group consisting just 1 symbol , no farther sub division is
possible and
The code word for that symbol is completed
1) Apply Shannon Fano technique to find code word
for the symbols emitted by DMS having probabilities given below, find the
coding efficacy.
Symbol |
Probability |
Step1 |
Step2 |
Step3 |
Step4 |
Step5 |
Codeword |
|
S0 S1 S2 S3 S4 S5 S6 S7 |
|
0 |
0 0 |
0 |
0 |
0 |
0 100 101 1100 1101 1110 11110 11111 |
|
|
1 1 |
|||||||
1 0 0 |
||||||||
1 1 1 1 1 |
1 1 |
|||||||
1 0 |
||||||||
1 1 1 |
1 1 1 |
|||||||
1 1 |
||||||||
1 |
2) A Discrete memory less
source emits 5 symbols with equal probabilities, find the Shannon Fano code and
calculate the efficiency?
Equal probabilities meals {
0.2 , 0.2, 0.2, 0.2 , 0.2}
Symbol |
Probability |
Step1 |
Step2 |
Step3 |
Step4 |
S0 |
0.2 0.2 |
0 |
0 |
0 |
00 |
S1 |
0 |
1 |
01 |
||
S2 |
0.2 0.2 0.2 |
1 |
0 |
10 |
|
S3 |
1 1 |
1 |
110 |
||
S4 |
1 |
1 |
111 |
H(S) =
= 0.2 log
=
3. The Discrete Memory less Channel (DMC)
Channel representation: A channel is a statistical
modal with an input ‘X’ and out put ‘Y’.The channel is said to be Discrete , when the alphabets of X and Y are
both finite . Also it is said to be memory less when the current output depends
on only the current input and not any of the previous inputs
The Channel Matrix: A channel is specified by the set of transition
probabilities
Row of the Channel Matrix is unity i.e
Input and Output probabilities are represented by
Row Matrixes
Input
Matrix: [P(X)] = [ P(x1) , P(x2) , P(x3)
, ...............,P(xm)]
Output
Matrix: [P(Y)] = [P(y1) , P(y2) , P(y3) , .................P(yn)]
Output
(Matrix) probabilities are related by
[P(Y)]
= [P(X)] [P(Y/X)]
Joint (Matrix) Probability is represented by
[P(X,Y)]
= [P(Xd)] [P(Y/X)]
[P(Y/X)] is The Channel Matrix
Where [P(Xd)]
is Input diagonal Matrix
3.1.Types of Channels: Channel types are classified
according to the Channel Transition Matrix (Channel Matrix)
1)
Lossless Channel
2)
Deterministic Channel
3)
Noiseless Channel
4)
Binary symmetric Channel
1)
Lossless Channel: Lossless Channel defined as In Channel Transition Matrix ,
only one non zero element in each Colum. In
Lossless Channel no source information is loss in transmission.
Ex:
[P(Y)]
= [P(X)] [P(Y/X)]
P(y1) =(3/4) x1
P(y3) = (2/3) x2
P(y2) = (1/4) x1
P(y4)
= (1/3) x2
p(y5)
= x3
2) Deterministic Channel: Deterministic Channel is
defined as In Channel Transition Matrix , only one non zero element in each Row
and the row sum should be unity. In Deterministic Channel a given source symbol
is determined at the receiving end.
[P(Y)]
= [P(X)] [P(Y/X)]
P(y1) = x1 + x2
P(y2) = x3 + x4
P(y3) = x5
3)
Noiseless Channel: A Noiseless Channel is defined as it obey the properties
both Lossless and deterministic channels, each Colum and row have one element
that should be unity. Therefore Noiseless Channel is square matrix and unit
matrix.
4)
Binary symmetric Channel: Binary symmetric Channel is defined as two inputs and
two outputs ,this channel is symmetric because the probability of receiving ‘1’
if a ‘0’ is sent as the same as the probability of receiving ‘0’ if a ‘1’ is
sent. This common transmission probability is denoted by ‘P’ and other probability ‘1- P’.
[P(Y)]
= [P(X)] [P(Y/X)]
P(y1)
= (1 – P) x1 + P x2
P(y2)
= Px1 + (1 – P) x2
1) Given a Binary channel show in bellow i) find the
channel matrix ii) Find P(y1) , P(y2) when P(x1) = P(x2) = 0.5 iii) Find the
joint probability when P(x1) = P(x2) = 0.5
i) Find the channel matrix , ii) Find P(y1) , P(y2)
when P(x1) = P(x2) = 0.5
iii) Find the joint probability when P(x1) = P(x2) =
0.5
2) Given a Binary channel show in bellow i) find the
channel matrix ii) Find P(z1) , P(z2) when P(x1) = P(x2) = 0.5 iii) Find the
joint probability when P(x1) = P(x2) = 0.5
i) find the channel matrix: [P (Z/X)]
= [P(Y/X)] [P(Z/Y)]
ii) Find P(z1) , P(z2) when P(x1) = P(x2) = 0.5:
[P(Z)]
= [P(X)] [P(Z/Y)]
iii) Find the joint probability when P(x1) = P(x2) =
0.5:
3) Given a
channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2)
when P(x1) = P(x2) = 0.5 and P = 0.5, Given channel is binary erasure channel
i) Find the channel matrix:
ii) Find P(y1) , P(y2) , P(y3) when P(x1) = P(x2) = 0.5
P(y1)
= 0.25 , P(y2) = 0.5
, P(y3) = 0.25
3.2. Information Transmission on Discrete Channels
The condition and joint entropies are calculated by
using condition and joint probabilities for the channel, the following are the
various entropy functions for a channel with input and outputs.
H(X) =
H(Y) =
H(Y/X) =
H(X,Y) =
4) Given a
channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2)
, P(y3) when P(x1) = 0.3 , P(x2) = 0.4
and P(x3) = 0.3 iii) Find the all entropies , H(X) , H(Y) , H(Y/X) , H(X,Y)
i)
find the channel matrix:
[P(Y)]
= [P(X)] [P(Y/X)]
P(y1)
=0.24 P(y2)
=0.55 P(y3)
=0.21
H(X) =
H(Y) =
H(Y/X) =
=
P(x1,y1)log
=
H(X,Y) =
=
P(x1,y1)log
+ P(x2,y1)log
=
3.3 Mutual Information:
It is defined as measure of information transfer,
noise and other transmission impairments alter the source information.
·
P(xi) is the input
probability for’ i’ integer values
·
P(yj) is the output
probability for ‘j’ integer values
·
P(xi , yj) is the joint
probability for ‘j’ & i’ integer values, that is when the probability of
‘yj’ is received when the probability of ‘yj’ is transmitted
·
P(yj/xi) is the
conditional probability for ‘j’ & i’ integer values, that is when the
probability of ‘yj’ is received when the probability of ‘yj’ was
transmitted
Mutual Information: I(X,Y) = log
I(X,Y) = log
If the
channel is Ideal (noiseless) then the P(xi) = P(yj)
P(yj/xi) = 1
Therefore I(X,Y)
= log
Then
the channel is Ideal (noiseless) then the Mutual Information is Self
information
If
the channel is full of noise
Then P(xi)
≠ P(yj)
I(X,Y)
= log
Properties of Mutual information:
1) Mutual
information is non negative quantity i.e I(X,Y) ≥ 0
2) Mutual
information is symmetry i.e I(X,Y) =
I(Y,X)
3) Mutual
information I(X,Y) = H(Y) – H(Y/X)
4) Mutual
information I(X,Y) = H(X) – H(X/Y)
5) Mutual
information I(X,Y) = H(Y) + H(X) – H(X,Y
1) Show that the Mutual information
of the Binary symmetry channel is
I(X,Y) = H(Y) + P log P + (1- P) log (1 – P), when P(x1) = α &
P(x2) = 1 – α
x1 = α
, x2 = (1 – α)
[P(Y)]
= [P(X)] [P(Y/X)]
P(y1)
= (1 – P) x1 + P x2 = (1 – P) α + P (1 –
α)
P(y2)
= Px1 + (1 - P) x2 = P α + (1 - P) (1 –
α)
P(X
,Y) = [P(xd)] [P(Y/X)]
I(X,Y) = H(Y) – H(Y/X) |
H(Y) =
H(Y/X) =
=
P(x1,y1)log
+ P(x2,y2)log
=(α –
αp) log
= - α
log(1-p) + αp log (1-p) – αp log p – p log p + αp log p - log (1-p) + p log
(1-p)
+ α
log (1-p) - αp log (1-p)
=– p
log p- log (1-p) + p log (1-p)
H(Y/X) = - p log p – (1-p) log (1-p)
I(X,Y)
= H(Y) – H(Y/X)
I(X,Y) = H(Y) + P log P + (1- P) log (1 – P) |
If α = 0.5
& p = 0.1 Then
I(X,Y) = H(Y) + P
log P + (1- P) log (1 – P)
= 1 –
0.45 = 0.534
If
α = 0.5 & p = 0.5 Then
I(X,Y) = H(Y) + P
log P + (1- P) log (1 – P)
= 1 – 1
= 0
3.4.
Channel Capacity:
The
Channel capacity of discrete memory less channel is define as based on mutual
information of Channel, that is where the power spectral density function
of
Input/output
of channel is maximizing.
Cs =
C = r
Cs bits / sec |
Channel Capacity of Special channels:
1) Lossless channel: Lossless channel, conditional
entropy H(X/Y) = 0 therefore mutual information I(X,Y) = H(X) - H(X/Y)
I(X,Y) = H(X)
Thus the mutual information (information
transfer) is equal to the Input (source) entropy, and no source information is
lost in transmission.
Cs =
Cs =
Cs = log2 M
2) Deterministic channel: Deterministic channel
conditional entropy H(Y/X) = 0 therefore mutual information
I(X,Y) = H(Y) - H(Y/X)
I(X,Y) = H(Y)
Thus
the mutual information (information transfer) is equal to the output
(destination) entropy, and no source information is determined at destination.
Cs =
Cs
=
Cs = log2 N
3) Noise less channel: Noise less channel obeys the
properties of both Lossless channels and Deterministic channel therefore
conditional entropy H(X/Y) = H(Y/X) = 0
I(X,Y) = H(Y) - H(Y/X)
I(X,Y) = H(Y)
Cs =
Cs =
Cs
= log2 M = log2 N
4)
Binary symmetric channel: Binary symmetric channel, the Mutual information of
the Binary symmetry channel is I(X,Y) =
H(Y) - H(Y/X), when P(x1) = α & P(x2) = 1 – α
Cs =
Cs =
Cs = H(Y) + P log P + (1- P) log (1 –
P) |
Channel capacity depends on P(x1) = α &
P(x2) = 1 – α and probability of channel transition.
1) Show that the Channel capacity of the
Binary symmetry channel
when P(x1) = α & P(x2) = 1 – α and P = 0.1 & P = 0.5
x1 = α
, x2 = (1 – α)
[P(Y)]
= [P(X)] [P(Y/X)]
P(y1)
= (1 – P) x1 + P x2 = (1 – P) α + P (1 –
α)
P(y2)
= Px1 + (1 - P) x2 = P α + (1 - P) (1 –
α)
P(X
,Y) = [P(xd)] [P(Y/X)]
I(X,Y) = H(Y) – H(Y/X) |
H(Y) =
H(Y/X) =
=
P(x1,y1)log
+ P(x2,y2)log
=(α –
αp) log
= - α
log(1-p) + αp log (1-p) – αp log p – p log p + αp log p - log (1-p) + p log
(1-p)
+ α
log (1-p) - αp log (1-p)
=– p
log p- log (1-p) + p log (1-p)
H(Y/X) = - p log p – (1-p) log (1-p)
I(X,Y)
= H(Y) – H(Y/X)
I(X,Y) = H(Y) + P log P + (1- P) log (1 – P) |
Channel
capacity:
Cs =
Cs =
If α = 0.5 & p = 0.1 Then
I(X,Y) = H(Y) + P
log P + (1- P) log (1 – P)
Cs
= 1 – 0.45 = 0.534
If α = 0.5 & p = 0.5 Then
I(X,Y) = H(Y) + P log P +
(1- P) log (1 – P)
Cs
= 1 – 1 = 0
3.5. Alternative method to calculate the channel
capacity:
Q1, Q2 are the variables, solve the Q1 & qQ2
values and find the Channel capacity
Channel capacity
Cs
= log2 ( 2Q1 + 2Q2 ) bits/sec |
1) Find the channel capacity of given channel , when
P(x1) = 0.6 & P(x2) = 0.4
H(Y) =
=P(y1)
log2
= 0.6 log2
H(Y/.X) =
=
P(x1,y1)log
+ P(x2,y2)log
=0.46 log2
Cs =
Cs = 0.97 – 0.78 =
0.19 bits/sec
Alternative method to calculate the channel capacity:
0.8 Q1 + 0.2 Q2 = - 0.72
0.3 Q1 + 0.7 Q2 = 0.88
Therefore
Q1 = -0.656
Q2 = 0.88
Channel capacity ‘C’ = log2 ( 2Q1
+ 2Q2 ) bits/sec
C= log2 ( 2-0.656
+ 20.88 ) = 0.192 bits/sec
1) Find the channel capacity of given channel,
Therefore
Q1 = -0.966
Q2 = -0.966
Channel capacity ‘C’ = log2 ( 2Q1
+ 2Q2 ) bits/sec
C= log2 ( 2-0.966
+ 2-0.966 ) = 0.034 bits/sec
2) Find the channel capacity of given channel
Channel
capacity ‘C’ = log2 ( 2Q1 + 2Q2 + 2Q3)
bits/sec
4. Channel Coding:
The
simplified block diagram of a communication system depicted in Fig. shows the
position of the source encoder/decoder and the channel encoder/decoder.
The purpose of source coding is to reduce the number of data bits that are to
be transmitted over
the channel for sending a message information
to the receiver, while the objective of channel coding is to make the
transmission error correction efficient so that the error probability can be
decreased
Shannon-Hartley channel capacity theorem
The
limits of the channel capacity as S/NoB -----α or 0 are as follows:
In
order to taste an implication of this formula, we suppose an ideal situation in
which the data transmission rate R reaches its upper bound that is the channel
capacity C . The relationship between
bandwidth efficiency R/B and Eb / No dB = 10 log (Eb / No ) for such an ideal system can be obtained by
substituting C=R into the left-hand side of Eq. as In scientific theory, the
Shannon–Hartley theorem tells the most rates at that info may be transmitted
over a communications channel of a specified information measure within the
presence of noise. it's an application of the noisy-channel secret writing
theorem to the first case of a continuous-time analog communications channel
subject to mathematician noise. the theory establishes Shannon's data rate for
such a communication link, a certain on the most quantity of error-free info
per unit of time that may be transmitted with a specified information measure
within the presence of the noise interference, forward that the signal power is
delimited, which the mathematician noise method is characterized by a renowned
power or power spectral density. The law is called when technologist and Ralph
David Hartley.
4.1Statement:
The Shannon–Hartley theorem states
the data rate C, that means the
theoretical tightest boundary on the data rate of knowledge which will be
communicated at an willy-nilly low error rate exploitation a median received
signal power S through AN analog line
subject to additive white mathematician noise of power N:
C
= B log2
Where
·
’C’ is the channel capacity in bits per second,
·
’B
‘ is the bandwidth of the channel
in hertz
·
’S’ is the average received signal
power over the bandwidth
·
‘N’ is
the average power of the noise and interference over the bandwidth, measured in
watts
·
’
Consider
the channel is AWGN i.e
Additive
: Add the noise to the
signal Y = S + N
White noise: Power
spectral density of the noise in uniform in the entire range of
Frequencies,
Gaussian
: Power spectral density of
the noise is Gaussian distribution curve
And
the signal is power limited and band limited
In order to calculate the channel
capacity, first calculate the mutual information of the channel, that give the
channel capacity, where the maximum probability density function of mutual
information
C
= Max {I (X , Y)}
= Max { H(Y) – H (Y/X) }
Channel is random process, and
the signal give to the channel and coming from the channel in random variable.
Signals are continuous .
H (Y/X) =
From bye’s theorem = P(x , y) = P(y/x) P(x)
=
=
We know
P(Y/X) = P (Y – N) = PN(n)
Y – N = n , dy = dn
H
(Y/X) =
C = Max [I (X , Y)] = Max [H(Y) – H (Y/X)]
Signal Power is mean square value
of signal
Noise Power is mean square value
of signal
Output signal
Max [H(Y)] =
=
Max [H(N)] =
=
Therefore Max [I (X , Y)] =
=
Max [I (X , Y)] =
Consider a Gaussian noise channel
is band limited having a bandwidth ‘B’ HZ and the signal is sampled at Nyquest
rate
fs = 2 fm
Where
fm = B
fs = 2 B
Channel
Capacity C = 2B Max [I (X , Y)]
C=2B
C = B log2 |
Shannon–Hartley
theorem tells the maximum rate
at which information can be transmitted over a communications channel of a
specified bandwidth in the presence of noise. It is an application of the noisy-channel
coding theorem to the archetypal case
of a continuous-time analog communications channel subject to Gaussian noise.
C
= B log2
R≤
C
R≤
B log2
1+
|
If
the rate of information ‘R’ is fixed and ‘ ŋ’ is fixed then go for
analyses
Case 1: If
Case II: If
C = B log2
4.2 Why use
Error-Correction coding?
Error-Correction coding can be
regarded as a vehicle for effecting various system trade-off. The below figure
compares, two curves depicting bit-error performance versus Eb/No. One curve
represents a typical modulation scheme without coding. The demonstrated below
are four benefits or trade-off that can be achieved with the use of channel
coding
Trade-Off1:
Error performance versus Bandwidth
A
simple, inexpensive voice communication system has just been developed and
delivered to a customer. The system does not use error-correction coding.
Consider that the operating point of the system can be depicted by point A in
figure ( Eb / No=8 dB, and PB=10-2).
After a few trails, there are complaints about the voice quality; the customer
suggests that the bit-error probability should be lowered to 10-4.
The usual way of obtaining better error performance in such a system would be
by effecting an operating point movement from point A to, say, point B in figure. The figure suggests that one possible
trade-off is to move the operating point from point A to point C. That is, “walking” down the vertical line to
point C on the coded curve can provide the customer with improved error
performance.
Trade-Off2: Power versus
Bandwidth
Consider
that the system without coding, operating at point D in figure ( Eb /
No=14 dB, and PB=10-6), has been delivered to
a customer. The customer has no complaints about the quality of the data, but
the equipment is having some reliability problems as a result of providing an Eb
/ No of 14 dB. In other words, the equipment keeps breaking
down. If the requirement on Eb / No or power could be
reduced, the reliability difficulties might also be reduced. The figure
suggests a trade-off by moving the
operating point from point D to point E. Thus, the trade-off is one which the
same quality of data is achieved, but the coding allows for a reduction in
power or Eb / No .
Trade-Off3: Data
Rate versus Bandwidth
Consider that
a system without coding, operating at point D in figure ( Eb / No=14 dB, and PB=10-6)
has been developed. Assume that there is no problem with the data quality and
no particular need to reduce power. If the customer’s data rate requirement
increases,
Eb
/ No = Pr /
No (1/R)
The above expression shows that the
received Eb / No would decrease, and in above figure ,
the operating point would move upwards from point D to, some point F. Now,
envision “walking” down the vertical line to point E o the curve that
represents coded modulation. Increasing the data rate has degraded the quality
of the data. But, here we observe that
the same quality at the same power level.
Trade-Off4:
Capacity versus Bandwidth
Trade-off
4 is similar to trade-off 3 because both
achieve increased capacity. Spread spectrum multiple access technique, called
code-division multiple access(CDMA) and it is one of the schemes used in
cellular telephony. In CDMA ,where users simultaneously share the same
spectrum, each user is an interferer to each
of the other users in the same
cell or nearby cells. Hence the capacity (maximum number of users) per cell is
inversely proportional to Eb / No . In this application,
a lowered Eb / No results
in a raised capacity; the code achieves a reduction in each user’s power, which
in turn allows for an increase in the no. of
users. Again the cost is more bandwidth. But, in this case, the signal
bandwidth expansion due to the error correcting code is small compared with the
most significant spread spectrum expansion , And there is no impact on the transmission
bandwidth.
In the of the above trade-off
examples, a “traditional” code involving in redundant bits and faster
signaling(for a real time communication systems) has been assumed; hence, in
each case, the cost was expended bandwidth. However, there exists an error
correcting techniques, called trellis-coded modulation, that does not require
faster signaling or expended bandwidth for real time systems.
4.3, Types of Channel Coding
In order to
achieve reliable communication through a channel corrupted by a noise, we might
have to make the codewords different from each other conspicuously enough to
reduce the probability of each symbol to be taken for another symbol. The
conversion of the message data with this aim is called channel coding.
It may accompany some side effects such as a decline of data transmission rate,
an increase of required channel bandwidth, and an increase of complexity in the
encoder/decoder.
In this section, we will discuss waveform coding that converts the data
to distinguishable waveforms and structured coding that adds some
redundant/extra parity bits for detection and/or correction of transmission
errors. The structured coding is divided into two schemes. One is the block
coding, which converts the source data independently of the previous data and
the other is the convolutional coding, which converts the data dependently of
the previous data.
Waveform Coding
There are
various methods of waveform coding such as antipodal signaling, orthogonal
ignaling, bi-orthogonal signaling, etc,Among those waveform codings, the
orthogonal signaling can be regarded as converting one bit of data and two bits
of data according to its/their value(s) in the following way:
This
conversion procedure is generalized for K
bits of data into the form of a Hadamard matrix
Here,
we define the crosscorrelation between codeword i
and codeword j as
This can be computed by changing every 0 (of both
codewords) into -1 and dividing the inner product of the two bipolar codeword
vectors by the codeword length. According to this definition, the
crosscorrelation between any two different codewords turns out to be zero,
implying the orthogonality among the codewords and their corresponding signal
waveformsOn the other hand, the bi-orthogonal signaling can be regarded
as converting two bits of data and
generally, K bits of data in the following way:
Since the number
of columns of the codeword matrix is the number of bits per symbol, the number
of bits spent for transmission with bi-orthogonal signaling is a half of that
for orthogonal signaling and the required channel bandwidth is also a half of
that for orthogonal signaling, still showing comparable BER performance.
However, the codeword lengths of both signaling (coding) methods increase by
geometric progression as the number K of bits per symbol increases and
consequently, the data transmission
rate R or the bandwidth B
will suffer, resulting in the degradation of bandwidth efficiency R/B.
.
5. Linear
Block Coding
Block
coding is a mapping of
k -bit message vectors (symbols) into
n -bit ( n ≥ k ) code vectors by using an (n.k)block code which consists of 2k
codes of length ‘n’.A block code is
said to be linear if the modulo-2 sum/difference of any two codewords in
the block code is another codeword belonging to the block code. A linear
block code is represented by its kxn generator matrix G ,
which is modulo-2 premultiplied by a k -bit message symbol vector K to yield an
n -bit codeword U as
U.
= mG
Accordingly,
the encoder and decoder in a linear block coding scheme use matrix
multiplications with generator/parity-check matrices instead of using table
lookup methods with the whole 2k n codeword matrices. This makes the
encoding/decoding processes simple and efficient. Now, we define the minimum
(Hamming) distance of a linear block code U as
the minimum of Hamming distances between two different codewords:
d min = minimum Hamming distance
Here, Hamming distance
between two different code words Ui and Uj
is the number of bits having different values and
can be found as the weight of the
modulo-2 sum/difference of the two code words: where the Hamming
weight of a codeword is
defined to be the number of non-zero bits.Error-detecting and correcting capabilities
are defined to be the maximum numbers of
detectable/correctable bit errors, respectively, and they can be obtained from
the minimum distance as follows:
Error-detecting
capabilities: d
min -1
error
correcting capabilities
:
Generator
Matrix : and the Corresponding:
A Kxn generator matrix G
representing an (n,k) block code
can be constructed as
With
this generator matrix, the n - bit code vector (codeword) for
a k
-bit message (or source or information) vector m
is generated as
which consists of the first
n - k parity bits and
the last k message bits in a systematic structure. Note that if a
block code is represented by a generator matrix containing an identity matrix,
then the message vector appears (without being altered) in the code vector and
the code is said to be systematic. Correspondingly to this generator
matrix G , the parity-check matrix H to be used for decoding the received
signal vector is defined as
Parity-Check Matrix:
For a received signal vector r
containing an error vector e ,
the decoder at RCVR computes
which
is called a syndrome vector for the reason that it has some (distinct)
information about the possible symbol error like a pathological syndrome or
symptom of disease.
EX 1) Generate A Linear Block Code of (6 , 3) with
given generator matrix
G
is the generator matrix having the length n =6 . k= 3 and parity bits P = n – k
In this example p =3 ,
Ik is the identity matrix, =
Parity
vectors are selected from 2n-k set of vectors, in this example n =6 . k= 3,
then
the set is 26-3 = 23 = 8
Parity
vectors are: neglect all 000 and neglect identity , least priority to 111
vector
Now the generator matrix G is given
by
Codeword
U = mG
U = mG |
Message vectors are 2k = 23 = 8 and codeword are 2n = 26 = 64, valid code words are 8
only because 8 possible message vectors
Syndrome Test: S=rHT , If the
‘r’ is received vector 1 1 0 0 1 1 then the syndrome
S=
[0 0 0] no errors in the received vector
Where , H is parity check matrix
EX 2) Generate A Linear Block Code of (7 , 4) with given generator matrix
Find the code words and the minimum distance
of the (7,4) linear block code represented by the
generator matrix
Minimum_distance
= 3
Because
of Minimum_distance = 3
Error detection
capability = dmin – 1
= 3 – 1 = 2
Error correction
capability =
Therefore (7,4) code is Hamming code
that correct single bit error correction capability.
Standard Array:
6. Cyclic coding
A
cyclic code is a linear block code having the property that a cyclic
shift (rotation) of any codeword yields another codeword. Due to this
additional property, the encoding and decoding processes can be implemented
more efficiently using a feedback shift register. An (n,k) cyclic code can be described by an (n,k)th-degree generator
polynomial
The
procedure of encoding a k -bit message vector m=[mo , m1, m2,----mk-1 ]
represented by a (k-1)
th-degree polynomial
into an n -bit codeword represented by an (n-1) th-degree polynomial is as
follows:
Which has the generator polynomial g(x)
as a (multiplying) factor.
Then the f irst n-k
coefficients constitute the parity
vector and the remaining k coefficients make the message vector.Note
that all the operations involved in the polynomial multiplication, division,
addition, and subtraction are not the ordinary arithmetic ones, but the
modulo-2 operations.
Ex): A Cyclic
Code With
a (7,4) cyclic code represented by the generator matrix
find
the codeword for a message vector m= [1 0 1 1].
Noting
that n=7, k=4, and n-k = 3, we divide by g(x)
as
to
get the remainder polynomial
and add it to to make
the codeword polynomial as
The
codeword made in this way has the n-k =
3 parity bits and k=4 message bits.
remainder
polynomial s(x) This remainder
polynomial s may not be the same as
the error vector e , but at least it is
supposed to have a crucial information about e
and therefore, may well be called the syndrome.
The
RCVR will find the error pattern e corresponding
to the syndrome s , subtract it from
the received vector r to get hopefully the correct codeword
and accept only the last k
bits (in ascending order) of this corrected codeword as a message.
The polynomial operations involved
in encoding/decoding every block of message/coded sequence seem to be an
unbearable computational load. However, we fortunately have divider circuits
which can perform such a modulo-2 polynomial operation. Fig. illustrates the
two divider circuits (consisting of linear feedback shift registers)
each of which carries out the modulo-2 polynomial operations for
encoding/decoding with the cyclic code given in Example Note that the encoder/decoder circuits process
the data sequences in descending order of polynomial.
7. convolutional code
In telecommunication,
a convolutional code is
a type of error-correcting
code that generates parity symbols
via the sliding application of a boolean polynomial function to a data stream. The sliding application
represents the 'convolution' of the encoder over the data, which gives rise to
the term 'convolutional coding'. The sliding nature of the convolutional codes
facilitates trellis decoding using a time-invariant trellis. Time
invariant trellis decoding allows convolutional codes to be maximum-likelihood
soft-decision decoded with reasonable complexity.
The ability to perform economical
maximum likelihood soft decision decoding is one of the major benefits of
convolutional codes. This is in contrast to classic block codes, which are
generally represented by a time-variant trellis and therefore are typically
hard-decision decoded. Convolutional codes are often characterized by the base
code rate and the depth (or memory) of the encoder [n,k,K]. The base code rate is typically given as k/n, where k is the
input data rate and n is the
output symbol rate. The depth is often called the "constraint
length" K, where the output is a function
of the current input as well as the previous K - 1 inputs. The depth may also be given as the number
of memory elements in the
polynomial or the maximum possible number of states of the encoder.
Comments
Post a Comment