INFORMATION THEORY AND CODING

Ch. Ravi Kumar

Assistant Professor, Department of E.C.E, Sir C. R. Reddy College of Engineering, Eluru, India

If the rate of information from the source does not exceeds the capacity of a communication channel; then there exists a coding techniques ,such that the information can be transmitted over the channel with an arbitrary small frequency of errors , despite the presence of noise.

1. Measure of information:

The outputs from an information source such as data, speech can be regarded as a random process, the information theory defines the self information of an event x_i with probability P(x_i)

self information :

I_i = - log₂ P_i = log₂ bits

This measure of information is large/small for an event of lower/higher probability. A message that there was or will be an earthquake in the area where earthquake is very rare has a value of big news and therefore can be believed to have a lot of information. But, another message that there will be an earthquake in the area where earthquakes are frequent is of much less value as a news and accordingly, can be regarded as having a little information.

1.1 Properties of Self information:

1) Self information is a non negative quantity

2) I_i≥ 0 for 0≤ P_I≤ 1

3)I_i–- 0 for P_I--- 1

4) I_i≥ I_j for P_i≤ P_j

5) Joint probability: If X_i& X_j are two successive and independent events then the probabilities P(X_i, X_j) = P_i, P_j and the self information

I_ij = log₂ = log₂ + log₂ bits

I_{i j} = I_i+I_j

1,2 Average information (or) entropy:

To find average information we required probability of events

Ex: in the sentence

“ABADCADBCDBADCBADC”

P_A , P_{B ,} P_C,P_D are to be calculate and total number of symbols are “ N”

Therefore I_Total = N. P_A log₂ + N . P_B log₂ + N. P_C log₂ + N. P_D log₂

Suppose a source emotes one of N possible symbols with the probabilities P₁ , P_2, P_{3, -------}P_N

Therefore Average information H =

H = P_A log₂ + P_B log₂ + P_C log₂ + P_D log₂

H= log₂ bits/symbol

Problems:

1) A source emits one of four symbols with probabilities , , , successive symbols emitted by the source are statically independent , calculate the Entropy.

H= log₂ 3+ log₂ 6+ log₂ 4+ log₂ 4= 1.959 bits/symbol

1,3 Properties of Entropy:

Consider a discrete memory less source emits the symbols S₀ , S_{1 ,} S_{2, ---}S_M with the probabilities P₁ , P_{2 ,} P_3------,P_M then the entropy H(s) of the source is bounded as

0 ≤ H(s) ≤ log₂ M

1) H(s) =0 If probability of one symbol is ‘1’ and the probability of remaining symbols are ‘0’

that is no randomness or no uncertainty , i.e every time only one symbol emits all the time.

Therefore no information

2) H(s) = log₂ M, If probability of symbols are equitable , that is randomness or uncertainty ,

it has maximum information

2): A source emits eight symbols with equal probabilities , calculate the Entropy.

H= log₂ 8+ log₂ 8+ log₂ 8+ log₂ 8+ log₂ 8+ log₂ 8+ log₂ 8+ log₂ 8

= 8 log₂ 8 = log₂ 8 = 3 bits/symbol

Entropy of Binary memory less source:

Consider the probability of logic 0 is P and probability of logic 1 is 1- P then the entropy H(s) is

H(s) = p log₂ + 1-P log₂ bits/symbol

H(s) is maximum when the derivative of H(s) with respective P and equate it to 0

i.e =0

3) Two symbols are emitted by the source having probabilities & find the entropy of the source

H(s) = log₂ 4+ log₂ = 0.811 bits / symbol

1.4 Information rate:

The discrete memory less emits symbols with symbol rate and the average information then the rate of information is given by

R = r H(s) bits / sec

Where ‘r’ is symbol rate

4) Suppose a source emits 2000 symbols / sec from an alphabet size of 4 with symbol probabilities are given bellow. Find the self information , entropy and information rate

Symbol	Probability	Self information
A		log₂ 2 = 1
B		log₂ 4 = 2
C		log₂ 8 = 3
D		log₂ 8 = 3

H(s) = log₂ 2 + log₂ 4 + log₂ 8 + log₂ 8 = 1.75 bits / symbol

R = r H(s) = 2000 x 1.75 = 3,500 bits / sec

1,5 Extended Entropy for Discrete memory less source: Entropy of extended sources is defined as the ‘n’th order by

H (sⁿ) = n H(s)

Where ‘n’ is order of extinction of the source

if n=2 , second order of extent ion

Let 3 symbols S₀ , S_{1 ,} S_2,are emitted by the source and second order extension then

the number of blokes₌3²= 9 those are

New symbol

ﮯ 0

ﮯ 1

ﮯ 2

ﮯ 3

ﮯ 4

ﮯ 5

ﮯ 6

ﮯ 7

ﮯ 8

S₀ S₀

S₀ S₁

S₀ S₂

S₁ S₀

S₁ S₁

S₁ S₂

S₂ S₀

S₂ S₁

S₂ S₂

Probabilities

P₀ P₀

P₀ P₁

P₀ P₃

P₁ P₀

P₁ P₁

P₁ P₂

P₂ P₀

P₂ P₁

P₂ P₂

5) Consider the DMS with symbol probabilities , , Calculate

a) Entropy of the source

b) Entropy of 2^nd order extended source

sol: a) H(s) = log₂ 4+ log₂ 2 + log₂ 2 = 1.5 bits / symbol

b) Entropy of 2^nd order extended source

New symbol	ﮯ 0	ﮯ 1	ﮯ 2	ﮯ 3	ﮯ 4	ﮯ 5	ﮯ 6	ﮯ 7	ﮯ 8
New symbol	S₀ S₀	S₀ S₁	S₀ S₂	S₁ S₀	S₁ S₁	S₁ S₂	S₂ S₀	S₂ S₁	S₂ S₂
Probabilities
Probabilities

H(s) = log₂ 16+ log₂ 8 + log₂ 8 + log₂ 8 + log₂ 4+ log₂ 4+ log₂ 8 + log₂ 4+ log₂ 4

H(s) = x4 + x3+ x3+ x3+ x2+ x2+ x3+ x2+ x2 = 3.75 bits / symbol

6) Consider the DMS with symbol probabilities,0.7,0.15,0.15 ,Calculate

a) Entropy of the source Answer: 1.18 bits / symbol

b) Entropy of 2^nd order extended source Answer: 2.362 bits / symbol

2, Shannon’s source coding: (Shannon’s First Law)

Sometimes we want to find an efficient code using fewest bits to represent the information. Shannon’s source coding theorem or noiseless coding theorem states that we need the number of bits not less than the entropy in order to encode an information message in such a way that perfect decoding is possible..

The average codeword length of any code assigned to the set S of symbols cannot be less than the entropy H(s) defined, the average codeword length of the Huffman code constructed by this procedure is less than

H(s) / ≤ 1

≤ 1

2.1. Kraft Inequality: In order to check uniquely decodable condition at the receiving end of communication system follow the Kraft Inequality condition as

K = ^-Ni≤ 1

A discrete memory less source emits M equal symbol probabilities then the information rate

R = r H(s) = r log M

When the symbol probabilities are deferent then the information rate

R = r H(s) ≤ r log M

Efficient transmission required as encoding process that is:

R = r H(s) ≤ r_b

H(s) ≤ Where

is Average code word length, & Ni is t h code length

Minimum value of is bounded between H(s) and H(s) + €

i.e H(s) ≤ ≤ H(s) + €

Efficiency ŋ =

Nmin (minimum number of bits for each symbol) Optimum source coding achieve the lower boundary i.e N =H(s) otherwise sub Optimum code

The average codeword length of any code assigned to the set S of symbols cannot be less than the entropy H(s) defined, the average codeword length Then the code is realizable and optimum than

≤ 1

Kraft Inequality: In order to check uniquely decodable condition at the receiving end of communication system follows the Kraft Inequality condition as

K = ^-Ni≤ 1

7) A Discrete memory less source the symbols along with codes given bellow , find realizable or optimum condition and also verify uniquely decodable condition

Symbol (xi)	Probability (Pi)	Code 1	Code 2	Code 3	Code 4
A		00	0	0	0
B		01	1	01	10
C		10	10	011	110
D		11	11	0111	111

Code 1:

H(s) = H(s) = log₂ 2 + log₂ 4 + log₂ 8 + log₂ 8 = 1.75 bits / symbol

= = x2 + x 2 + x 2 + x2 = 2 bits

H(s) ≤ i.e 1.75 ≤ 2

K = ^-Ni≤ 1 = 2^-2 +2^-2 +2^-2 +2^-2 = 1

Because of is Greater than the H(s) i.e ( ≥ H(s) ) Therefore Code1 is optimum

And K less than 1 Therefore Code 1 uniquely decodable

Code 2:

H(s) = H(s) = log₂ 2 + log₂ 4 + log₂ 8 + log₂ 8 = 1.75 bits / symbol

= = x1 + x 1 + x 2 + x2 = 1. 25 bits

H(s) ≥ i.e 1.75 ≥ 1.2 5

Because of is less than the H(s) i.e ( ≤ H(s) ) Therefore Code 2 is not optimum

For uniquely decodable condition Kraft Inequality, should be less than 1 (K = ^-Ni≤ 1)

= 2^-1 +2^-1 +2^-2 +2^-2 = 1.5 bits

Because of K is Greater than 1 i.e ( K ≥ 1 ) Therefore Code 2 is not uniquely decodable

BAABB = 10011

CABB = 10011

CAD = 10011

Note: Students are advised to verify code 3 and code 4 also

2.2 .Huffman Coding:

Huffman coding is a variable length coding which has a strategy to minimize the average number of bits required for coding a set of symbols by assigning shorter/longer length code to more/less probable symbol. It is performed in two steps – merging and splitting as

Procedure:

9) Merging: Repeat arranging the symbols in descending order of probability as in each column

of Table 9.1 and merging the least probable two entries into a new entry with a probability

equal to the sum of their probabilities in the next column, until only two ntries remain.

(2) Splitting: Assign 0/1 to each of the two remaining entries to make their initial parts of

codeword. Then repeat splitting back the merged entries into the two (previous) entries

and appending another 0/1 to each of their (unfinished) codewords.

8) A Discrete memory less source emits 5 symbols with the probabilities are given bellow, find the Huffman code and calculate the efficiency?

{S0 , S1 , S2 , S3 , S4 }, {0.4 , 0.2 , 0.2 , 0.1 , 0.1}

CODE

Symbols

Probability

Step1

Step2

Step3

0.4

0.6

0.2

0.4

0.2

010

0.1

010

0.2

011

0.1

011

Efficiency =

H(S) = log = P1 log + P2 log + P3 log + P4 log + P5 log

= 0.4 log + 0.2 log + 0.2 log + 0.1 log + 0.1 log

= 2.2 bits

= Ni = 0.4 x 2 + 0.2 x 2 +0.2 x 2 +0.1 x 3 +0.1 x 3 = 2.2 bits

Efficiency = x 100 =100%

Variance =

= 0,4(2-2.2)² + 0.2(2-2.2)² + 0.2(2-2.2)²+ 0.1(3-2.2)²+ 0.1(3-2.2)²

= 0.4(0.04) + 0.2(0.04) + 0.2(0.04) + 0.1(0.64) + 0.1(0.64)

= 0.016 + 0.008 + 0.008 + 0.064+ 0.064

Variance = 0.16

8) A Discrete memory less source emits 5 symbols with equal probabilities, find the Huffman code and calculate the efficiency?

Equal probabilities meals { 0.2 , 0.2, 0.2, 0.2 , 0.2}

CODE

Symbols

Probability

Step1

Step2

Step3

0.2

0.4

0.6

0.2

0.4

0.2

000

0.2

000

0.2

001

0.2

001

H(S) = log = P1 log + P2 log + P3 log + P4 log + P5 log

= 0.2 log + 0.2 log + 0.2 log + 0.2 log + 0.2 log

= Ni = 0.4 x 2 + 0.2 x 2 +0.2 x 2 +0.1 x 3 +0.1 x 3 = 2.2 bits

9) Suppose there is an information source X which generates the symbols from

with the corresponding probability vector

P(x) = [0.2 , 0.15, 0.13, 0.12, 0.1, 0.09, 0.08,0.07,0.06]

Ex1) A DMS emits 5 symbols with the probabilities 0.4, 0.2, 0.1, 0.1, 0.2 respectively, compute the Huffman code and calculate the efficiency, variance?

Efficiency =

Variance =

Ex 2) A DMS emits 5 symbols with the probabilities 0.55, 0.15, 0.15, 0.10, 0.05 respectively, compute the Huffman code and calculate the efficiency, variance?

Ex 3) A DMS emits 8 symbols with the probabilities 0.26, 0.25, 0.14, 0.09, 0.08,0.07, 0.07, 0.04 respectively, compute the Huffman code and calculate the efficiency, variance?

Ex 4) A DMS emits 8 symbols with the probabilities 0.4, 0.3, 0.1, 0.06, 0.05,0.04, 0.04, 0.01 respectively, compute the Huffman code and calculate the efficiency, variance?

Ex 5) A DMS emits 7 symbols with the probabilities 0.37, 0.33, 0.16, 0.07, 0.04,0.02, 0.01 respectively, compute the Huffman code and calculate the efficiency, variance?

Ex 6) A DMS emits 7 symbols with the probabilities 0.46, 0.26, 0.12, 0.05, 0.04, 0.03, 0.02 respectively, compute the Huffman code and calculate the efficiency, variance?

2.3 Shannon Fano Code:

Shannon Fano Code is the another type of source code the Procedure is as follows

: Step 1 Draw a line that divide the symbols in to two groups, such that the group probabilities

Are as nearly equal as possible.

Step 2.Then assign logic 0 for each symbol in the group above the line and the logic 1 to each

Symbol in the group below the line.

Step 3.For all sub sequent steps , you sub divide each group in to sub groups ,assign digits by

The previous rule.

Step 4. When even a group consisting just 1 symbol , no farther sub division is possible and

The code word for that symbol is completed

1) Apply Shannon Fano technique to find code word for the symbols emitted by DMS having probabilities given below, find the coding efficacy.

Symbol

Probability

Step1

Step2

Step3

Step4

Step5

Codeword

100

101

1100

1101

1110

11110

11111

Efficiency = = = 100%

2) A Discrete memory less source emits 5 symbols with equal probabilities, find the Shannon Fano code and calculate the efficiency?

Equal probabilities meals { 0.2 , 0.2, 0.2, 0.2 , 0.2}

Symbol	Probability	Step1	Step2	Step3	Step4
S0	0.2 0.2	0	0	0	00
S1	0.2 0.2	0	1		01
S2	0.2 0.2 0.2	1	0		10
S3		1 1	1		110
S4		1 1	1	1	111

H(S) = log = P1 log + P2 log + P3 log + P4 log + P5 log

= 0.2 log + 0.2 log + 0.2 log + 0.2 log + 0.2 log

= Ni = 0.4 x 2 + 0.2 x 2 +0.2 x 2 +0.1 x 3 +0.1 x 3 = 2.2 bits

Efficiency =

3. The Discrete Memory less Channel (DMC)

Channel representation: A channel is a statistical modal with an input ‘X’ and out put ‘Y’.The channel is said to be Discrete , when the alphabets of X and Y are both finite . Also it is said to be memory less when the current output depends on only the current input and not any of the previous inputs

The Channel Matrix: A channel is specified by the set of transition probabilities

Row of the Channel Matrix is unity i.e for all i values

Input and Output probabilities are represented by Row Matrixes

Input Matrix: [P(X)] = [ P(x1) , P(x2) , P(x3) , ...............,P(xm)]

Output Matrix: [P(Y)] = [P(y1) , P(y2) , P(y3) , .................P(yn)]

Output (Matrix) probabilities are related by

[P(Y)] = [P(X)] [P(Y/X)]

Joint (Matrix) Probability is represented by

[P(X,Y)] = [P(X_d)] [P(Y/X)]

[P(Y/X)] is The Channel Matrix

Where [P(X_d)] is Input diagonal Matrix

3.1.Types of Channels: Channel types are classified according to the Channel Transition Matrix (Channel Matrix)

1) Lossless Channel

2) Deterministic Channel

3) Noiseless Channel

4) Binary symmetric Channel

1) Lossless Channel: Lossless Channel defined as In Channel Transition Matrix , only one non zero element in each Colum. In Lossless Channel no source information is loss in transmission.

Ex:

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) =(3/4) x1 P(y3) = (2/3) x2

P(y2) = (1/4) x1 P(y4) = (1/3) x2

p(y5) = x3

2) Deterministic Channel: Deterministic Channel is defined as In Channel Transition Matrix , only one non zero element in each Row and the row sum should be unity. In Deterministic Channel a given source symbol is determined at the receiving end.

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) = x1 + x2

P(y2) = x3 + x4

P(y3) = x5

3) Noiseless Channel: A Noiseless Channel is defined as it obey the properties both Lossless and deterministic channels, each Colum and row have one element that should be unity. Therefore Noiseless Channel is square matrix and unit matrix.

4) Binary symmetric Channel: Binary symmetric Channel is defined as two inputs and two outputs ,this channel is symmetric because the probability of receiving ‘1’ if a ‘0’ is sent as the same as the probability of receiving ‘0’ if a ‘1’ is sent. This common transmission probability is denoted by ‘P’ and other probability ‘1- P’.

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) = (1 – P) x1 + P x2

P(y2) = Px1 + (1 – P) x2

1) Given a Binary channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2) when P(x1) = P(x2) = 0.5 iii) Find the joint probability when P(x1) = P(x2) = 0.5

i) Find the channel matrix , ii) Find P(y1) , P(y2) when P(x1) = P(x2) = 0.5

iii) Find the joint probability when P(x1) = P(x2) = 0.5

2) Given a Binary channel show in bellow i) find the channel matrix ii) Find P(z1) , P(z2) when P(x1) = P(x2) = 0.5 iii) Find the joint probability when P(x1) = P(x2) = 0.5

i) find the channel matrix: [P (Z/X)] = [P(Y/X)] [P(Z/Y)]

ii) Find P(z1) , P(z2) when P(x1) = P(x2) = 0.5:

[P(Z)] = [P(X)] [P(Z/Y)]

iii) Find the joint probability when P(x1) = P(x2) = 0.5:

3) Given a channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2) when P(x1) = P(x2) = 0.5 and P = 0.5, Given channel is binary erasure channel

i) Find the channel matrix:

ii) Find P(y1) , P(y2) , P(y3) when P(x1) = P(x2) = 0.5

P(y1) = 0.25 , P(y2) = 0.5 , P(y3) = 0.25

3.2. Information Transmission on Discrete Channels

The condition and joint entropies are calculated by using condition and joint probabilities for the channel, the following are the various entropy functions for a channel with input and outputs.

H(X) = it is Average of the channel input

H(Y) = it is Average of the channel output

H(Y/X) = it is Average of the conditional channel output

H(X,Y) = it is Average of the Joint channel output

4) Given a channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2) , P(y3) when P(x1) = 0.3 , P(x2) = 0.4 and P(x3) = 0.3 iii) Find the all entropies , H(X) , H(Y) , H(Y/X) , H(X,Y)

i) find the channel matrix:

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) =0.24 P(y2) =0.55 P(y3) =0.21

H(X) = log = 0.3 log + 0.3 log + 0.3 log =

H(Y) = log =0.24 log + 0.55 log + 0.21 log =

H(Y/X) = log

= P(x1,y1)log + P(x1,y2)log + P(x1,y3)log + P(x2,y1)log + P(x2,y2)log + P(x2,y3)log + P(x3,y1)log + P(x3,y2)log + P(x3,y3)log

H(X,Y) = log

= P(x1,y1)log + P(x1,y2)log + P(x1,y3)log

+ P(x2,y1)log + P(x2,y2)log + P(x2,y3)log + P(x3,y1)log + P(x3,y2)log + P(x3,y3)log

3.3 Mutual Information:

It is defined as measure of information transfer, noise and other transmission impairments alter the source information.

· P(xi) is the input probability for’ i’ integer values

· P(yj) is the output probability for ‘j’ integer values

· P(xi , yj) is the joint probability for ‘j’ & i’ integer values, that is when the probability of ‘yj’ is received when the probability of ‘yj’ is transmitted

· P(yj/xi) is the conditional probability for ‘j’ & i’ integer values, that is when the probability of ‘yj’ is received when the probability of ‘yj’ was transmitted

Mutual Information: I(X,Y) = log bits

I(X,Y) = log bits

If the channel is Ideal (noiseless) then the P(xi) = P(yj)

P(yj/xi) = 1

Therefore I(X,Y) = log bits------------Self information

Then the channel is Ideal (noiseless) then the Mutual Information is Self information

If the channel is full of noise

Then P(xi) ≠ P(yj)

I(X,Y) = log bits

Properties of Mutual information:

1) Mutual information is non negative quantity i.e I(X,Y) ≥ 0

2) Mutual information is symmetry i.e I(X,Y) = I(Y,X)

3) Mutual information I(X,Y) = H(Y) – H(Y/X)

4) Mutual information I(X,Y) = H(X) – H(X/Y)

5) Mutual information I(X,Y) = H(Y) + H(X) – H(X,Y

1) Show that the Mutual information of the Binary symmetry channel is

I(X,Y) = H(Y) + P log P + (1- P) log (1 – P), when P(x1) = α & P(x2) = 1 – α

x1 = α , x2 = (1 – α)

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) = (1 – P) x1 + P x2 = (1 – P) α + P (1 – α)

P(y2) = Px1 + (1 - P) x2 = P α + (1 - P) (1 – α)

P(X ,Y) = [P(xd)] [P(Y/X)]

I(X,Y) = H(Y) – H(Y/X)

H(Y) = log

H(Y/X) = log

= P(x1,y1)log + P(x1,y2)log + P(x2,y1)log

+ P(x2,y2)log

=(α – αp) log + αp log + (p – αp) log + (1- α – p + αp) log

= - α log(1-p) + αp log (1-p) – αp log p – p log p + αp log p - log (1-p) + p log (1-p)

+ α log (1-p) - αp log (1-p)

=– p log p- log (1-p) + p log (1-p)

H(Y/X) = - p log p – (1-p) log (1-p)

I(X,Y) = H(Y) – H(Y/X)

I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

If α = 0.5 & p = 0.1 Then

I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

= 1 – 0.45 = 0.534

If α = 0.5 & p = 0.5 Then

I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

= 1 – 1 = 0

3.4. Channel Capacity:

The Channel capacity of discrete memory less channel is define as based on mutual information of Channel, that is where the power spectral density function of

Input/output of channel is maximizing.

C_s = I(X,Y) bits / symbol

C = r C_sbits / sec

Channel Capacity of Special channels:

1) Lossless channel: Lossless channel, conditional entropy H(X/Y) = 0 therefore mutual information I(X,Y) = H(X) - H(X/Y)

I(X,Y) = H(X)

Thus the mutual information (information transfer) is equal to the Input (source) entropy, and no source information is lost in transmission.

C_s = I(X,Y) bits / symbol

C_s = H(X)

C_s = log₂ M

2) Deterministic channel: Deterministic channel conditional entropy H(Y/X) = 0 therefore mutual information

I(X,Y) = H(Y) - H(Y/X)

I(X,Y) = H(Y)

Thus the mutual information (information transfer) is equal to the output (destination) entropy, and no source information is determined at destination.

C_s = I(X,Y) bits / symbol

C_s = H(Y)

C_s = log₂ N

3) Noise less channel: Noise less channel obeys the properties of both Lossless channels and Deterministic channel therefore conditional entropy H(X/Y) = H(Y/X) = 0

I(X,Y) = H(Y) - H(Y/X)

I(X,Y) = H(Y)

C_s = I(X,Y) bits / symbol

C_s = H(X) = H(Y)

C_s = log₂ M = log₂ N

4) Binary symmetric channel: Binary symmetric channel, the Mutual information of the Binary symmetry channel is I(X,Y) = H(Y) - H(Y/X), when P(x1) = α & P(x2) = 1 – α

C_s = I(X,Y) bits / symbol

C_s = I(X,Y) = H(Y) - H(Y/X)

C_s= H(Y) + P log P + (1- P) log (1 – P)

Channel capacity depends on P(x1) = α & P(x2) = 1 – α and probability of channel transition.

1) Show that the Channel capacity of the Binary symmetry channel

when P(x1) = α & P(x2) = 1 – α and P = 0.1 & P = 0.5

x1 = α , x2 = (1 – α)

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) = (1 – P) x1 + P x2 = (1 – P) α + P (1 – α)

P(y2) = Px1 + (1 - P) x2 = P α + (1 - P) (1 – α)

P(X ,Y) = [P(xd)] [P(Y/X)]

I(X,Y) = H(Y) – H(Y/X)

H(Y) = log

H(Y/X) = log

= P(x1,y1)log + P(x1,y2)log + P(x2,y1)log

+ P(x2,y2)log

=(α – αp) log + αp log + (p – αp) log + (1- α – p + αp) log

= - α log(1-p) + αp log (1-p) – αp log p – p log p + αp log p - log (1-p) + p log (1-p)

+ α log (1-p) - αp log (1-p)

=– p log p- log (1-p) + p log (1-p)

H(Y/X) = - p log p – (1-p) log (1-p)

I(X,Y) = H(Y) – H(Y/X)

I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

Channel capacity:

C_s = I(X,Y) bits / symbol

C_s = I(X,Y) = H(Y) - H(Y/X)

If α = 0.5 & p = 0.1 Then

I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

C_s = 1 – 0.45 = 0.534

If α = 0.5 & p = 0.5 Then

I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

C_s = 1 – 1 = 0

3.5. Alternative method to calculate the channel capacity:

Q1, Q2 are the variables, solve the Q1 & qQ2 values and find the Channel capacity

Channel capacity

Cs = log₂ ( 2^Q1 + 2^Q2) bits/sec

1) Find the channel capacity of given channel , when P(x1) = 0.6 & P(x2) = 0.4

H(Y) = log₂ ; P(y1) = 0.6 & P(y2) = 0.4

=P(y1) log₂ + P(y2) log₂

= 0.6 log₂ + 0.4 log₂ = 0.97 bits

H(Y/.X) = log

= P(x1,y1)log + P(x1,y2)log + P(x2,y1)log

+ P(x2,y2)log

=0.46 log₂ + 0.12 log₂ +0.12 log₂ + 0.28 log₂ = 0.78 bits

C_s = I(X,Y) = H(Y) - H(Y/X)

C_s = 0.97 – 0.78 = 0.19 bits/sec

Alternative method to calculate the channel capacity:

0.8 Q1 + 0.2 Q2 = - 0.72

0.3 Q1 + 0.7 Q2 = 0.88

Therefore

Q1 = -0.656

Q2 = 0.88

Channel capacity ‘C’ = log₂ ( 2^Q1 + 2^Q2) bits/sec

C= log₂ ( 2^-0.656 + 2^0.88) = 0.192 bits/sec

1) Find the channel capacity of given channel,

Therefore

Q1 = -0.966

Q2 = -0.966

Channel capacity ‘C’ = log₂ ( 2^Q1 + 2^Q2) bits/sec

C= log₂ ( 2^-0.966 + 2^-0.966) = 0.034 bits/sec

2) Find the channel capacity of given channel

Channel capacity ‘C’ = log₂ ( 2^Q1 + 2^Q2+ 2^Q3) bits/sec

4. Channel Coding:

The simplified block diagram of a communication system depicted in Fig. shows the position of the source encoder/decoder and the channel encoder/decoder.

The purpose of source coding is to reduce the number of data bits that are to be transmitted over

the channel for sending a message information to the receiver, while the objective of channel coding is to make the transmission error correction efficient so that the error probability can be decreased

Shannon-Hartley channel capacity theorem

The limits of the channel capacity as S/NoB -----α or 0 are as follows:

In order to taste an implication of this formula, we suppose an ideal situation in which the data transmission rate R reaches its upper bound that is the channel capacity C . The relationship between bandwidth efficiency R/B and Eb / No dB = 10 log (Eb / No ) for such an ideal system can be obtained by substituting C=R into the left-hand side of Eq. as In scientific theory, the Shannon–Hartley theorem tells the most rates at that info may be transmitted over a communications channel of a specified information measure within the presence of noise. it's an application of the noisy-channel secret writing theorem to the first case of a continuous-time analog communications channel subject to mathematician noise. the theory establishes Shannon's data rate for such a communication link, a certain on the most quantity of error-free info per unit of time that may be transmitted with a specified information measure within the presence of the noise interference, forward that the signal power is delimited, which the mathematician noise method is characterized by a renowned power or power spectral density. The law is called when technologist and Ralph David Hartley.

4.1Statement:

The Shannon–Hartley theorem states the data rate C, that means the theoretical tightest boundary on the data rate of knowledge which will be communicated at an willy-nilly low error rate exploitation a median received signal power S through AN analog line subject to additive white mathematician noise of power N:

C = B log₂

Where

· {\displaystyle C}’C’ is the channel capacity in bits per second, {\displaystyle B}

· ’B ‘ is the bandwidth of the channel in hertz

· {\displaystyle S}’S’ is the average received signal power over the bandwidth {\displaystyle N}

· ‘N’ is the average power of the noise and interference over the bandwidth, measured in watts

· {\displaystyle S/N}’ ‘ is the signal-to-noise ratio (SNR)

Consider the channel is AWGN i.e

Additive : Add the noise to the signal Y = S + N

White noise: Power spectral density of the noise in uniform in the entire range of

Frequencies,

Gaussian : Power spectral density of the noise is Gaussian distribution curve

And the signal is power limited and band limited

In order to calculate the channel capacity, first calculate the mutual information of the channel, that give the channel capacity, where the maximum probability density function of mutual information

C = Max {I (X , Y)}

= Max { H(Y) – H (Y/X) }

Channel is random process, and the signal give to the channel and coming from the channel in random variable. Signals are continuous .

H (Y/X) = log = log dx dy

From bye’s theorem = P(x , y) = P(y/x) P(x)

= P(x) log dx dy

= dx log dy

We know dx =1 and Change the integration and notation i.e

P(Y/X) = P (Y – N) = P_N(n)

Y – N = n , dy = dn

H (Y/X) = log dn = H_N(n)

C = Max [I (X , Y)] = Max [H(Y) – H (Y/X)]

Signal Power is mean square value of signal = S

Noise Power is mean square value of signal = N

Output signal = S + N

Max [H(Y)] = log (2πe )

= log ((2πe(S+N))

Max [H(N)] = log (2πe )

= log ((2πeN)

Therefore Max [I (X , Y)] = log ((2πe(S+N)) - log ((2πeN)

= log ( )

Max [I (X , Y)] = log ( 1+ )

Consider a Gaussian noise channel is band limited having a bandwidth ‘B’ HZ and the signal is sampled at Nyquest rate

f_s = 2 f_m

Where f_{m = B}

f_s = 2 B

Channel Capacity C = 2B Max [I (X , Y)]

C=2B log ( 1+ )

C = B log₂

Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy-channel coding theorem to the archetypal case of a continuous-time analog communications channel subject to Gaussian noise.

C = B log₂

R≤ C

R≤ B log₂

1+ ≥ 2^R/B – 1 Multiplied LHS and RHS with and Noise power spectral density N =ŋB

≥ (2^R/B – 1)

If the rate of information ‘R’ is fixed and ‘ ŋ’ is fixed then go for analyses

Case 1: If = 0.1 then = 10

= 0.1(2¹⁰ – 1) = 102.3 ᴝ 100 = 20dB

= 20dB

Case II: If = 1 then = 1

= 1(2¹ – 1) = 1 = 0dB

= 0dB

Case II1: If B α , What is channel capacity ?

C = B log₂

Lt log (1+ )^1/

B α

= ln2= -1.6 dB , = -1.6 dB

4.2 Why use Error-Correction coding?

Error-Correction coding can be regarded as a vehicle for effecting various system trade-off. The below figure compares, two curves depicting bit-error performance versus E_b/N_o.One curve represents a typical modulation scheme without coding. The demonstrated below are four benefits or trade-off that can be achieved with the use of channel coding

Trade-Off1: Error performance versus Bandwidth

A simple, inexpensive voice communication system has just been developed and delivered to a customer. The system does not use error-correction coding. Consider that the operating point of the system can be depicted by point A in figure ( E_b/ N_o=8 dB, and P_B=10^-2). After a few trails, there are complaints about the voice quality; the customer suggests that the bit-error probability should be lowered to 10^-4. The usual way of obtaining better error performance in such a system would be by effecting an operating point movement from point A to, say, point B in figure. The figure suggests that one possible trade-off is to move the operating point from point A to point C. That is, “walking” down the vertical line to point C on the coded curve can provide the customer with improved error performance.

Trade-Off2: Power versus Bandwidth

Consider that the system without coding, operating at point D in figure ( E_b/ N_o=14 dB, and P_B=10^-6), has been delivered to a customer. The customer has no complaints about the quality of the data, but the equipment is having some reliability problems as a result of providing an E_b/ N_o of 14 dB. In other words, the equipment keeps breaking down. If the requirement on E_b/ N_o or power could be reduced, the reliability difficulties might also be reduced. The figure suggests a trade-off by moving the operating point from point D to point E. Thus, the trade-off is one which the same quality of data is achieved, but the coding allows for a reduction in power or E_b/ N_o.

Trade-Off3: Data Rate versus Bandwidth

Consider that a system without coding, operating at point D in figure ( E_b/ N_o=14 dB, and P_B=10^-6) has been developed. Assume that there is no problem with the data quality and no particular need to reduce power. If the customer’s data rate requirement increases,

E_b/ N_o = P_r/ N_o (1/R)

The above expression shows that the received E_b/ N_owould decrease, and in above figure , the operating point would move upwards from point D to, some point F. Now, envision “walking” down the vertical line to point E o the curve that represents coded modulation. Increasing the data rate has degraded the quality of the data. But, here we observe that the same quality at the same power level.

Trade-Off4: Capacity versus Bandwidth

Trade-off 4 is similar to trade-off 3 because both achieve increased capacity. Spread spectrum multiple access technique, called code-division multiple access(CDMA) and it is one of the schemes used in cellular telephony. In CDMA ,where users simultaneously share the same spectrum, each user is an interferer to each of the other users in the same cell or nearby cells. Hence the capacity (maximum number of users) per cell is inversely proportional to E_b/ N_o. In this application, a lowered E_b/ N_oresults in a raised capacity; the code achieves a reduction in each user’s power, which in turn allows for an increase in the no. of users. Again the cost is more bandwidth. But, in this case, the signal bandwidth expansion due to the error correcting code is small compared with the most significant spread spectrum expansion , And there is no impact on the transmission bandwidth.

In the of the above trade-off examples, a “traditional” code involving in redundant bits and faster signaling(for a real time communication systems) has been assumed; hence, in each case, the cost was expended bandwidth. However, there exists an error correcting techniques, called trellis-coded modulation, that does not require faster signaling or expended bandwidth for real time systems.

4.3, Types of Channel Coding

In order to achieve reliable communication through a channel corrupted by a noise, we might have to make the codewords different from each other conspicuously enough to reduce the probability of each symbol to be taken for another symbol. The conversion of the message data with this aim is called channel coding. It may accompany some side effects such as a decline of data transmission rate, an increase of required channel bandwidth, and an increase of complexity in the encoder/decoder.

In this section, we will discuss waveform coding that converts the data to distinguishable waveforms and structured coding that adds some redundant/extra parity bits for detection and/or correction of transmission errors. The structured coding is divided into two schemes. One is the block coding, which converts the source data independently of the previous data and the other is the convolutional coding, which converts the data dependently of the previous data.

Waveform Coding

There are various methods of waveform coding such as antipodal signaling, orthogonal ignaling, bi-orthogonal signaling, etc,Among those waveform codings, the orthogonal signaling can be regarded as converting one bit of data and two bits of data according to its/their value(s) in the following way:

This conversion procedure is generalized for K bits of data into the form of a Hadamard matrix

Here, we define the crosscorrelation between codeword i and codeword j as

This can be computed by changing every 0 (of both codewords) into -1 and dividing the inner product of the two bipolar codeword vectors by the codeword length. According to this definition, the crosscorrelation between any two different codewords turns out to be zero, implying the orthogonality among the codewords and their corresponding signal waveformsOn the other hand, the bi-orthogonal signaling can be regarded as converting two bits of data and generally, K bits of data in the following way:

Since the number of columns of the codeword matrix is the number of bits per symbol, the number of bits spent for transmission with bi-orthogonal signaling is a half of that for orthogonal signaling and the required channel bandwidth is also a half of that for orthogonal signaling, still showing comparable BER performance. However, the codeword lengths of both signaling (coding) methods increase by geometric progression as the number K of bits per symbol increases and consequently, the data transmission rate R or the bandwidth B will suffer, resulting in the degradation of bandwidth efficiency R/B. .

5. Linear Block Coding

Block coding is a mapping of k -bit message vectors (symbols) into n -bit ( n ≥ k ) code vectors by using an (n.k)block code which consists of 2^k codes of length ‘n’.A block code is said to be linear if the modulo-2 sum/difference of any two codewords in the block code is another codeword belonging to the block code. A linear block code is represented by its kxn generator matrix G , which is modulo-2 premultiplied by a k -bit message symbol vector K to yield an n -bit codeword U as

U. = mG

Accordingly, the encoder and decoder in a linear block coding scheme use matrix multiplications with generator/parity-check matrices instead of using table lookup methods with the whole 2^k n codeword matrices. This makes the encoding/decoding processes simple and efficient. Now, we define the minimum (Hamming) distance of a linear block code U as the minimum of Hamming distances between two different codewords:

d _min = minimum Hamming distance

Here, Hamming distance between two different code words Ui and Uj is the number of bits having different values and can be found as the weight of the modulo-2 sum/difference of the two code words: where the Hamming weight of a codeword is defined to be the number of non-zero bits.Error-detecting and correcting capabilities are defined to be the maximum numbers of detectable/correctable bit errors, respectively, and they can be obtained from the minimum distance as follows:

Error-detecting capabilities: d _min-1

error correcting capabilities :

Generator Matrix : and the Corresponding:

A Kxn generator matrix G representing an (n,k) block code can be constructed as

With this generator matrix, the n - bit code vector (codeword) for a k -bit message (or source or information) vector m is generated as

which consists of the first n - k parity bits and the last k message bits in a systematic structure. Note that if a block code is represented by a generator matrix containing an identity matrix, then the message vector appears (without being altered) in the code vector and the code is said to be systematic. Correspondingly to this generator matrix G , the parity-check matrix H to be used for decoding the received signal vector is defined as

Parity-Check Matrix:

For a received signal vector r containing an error vector e , the decoder at RCVR computes

which is called a syndrome vector for the reason that it has some (distinct) information about the possible symbol error like a pathological syndrome or symptom of disease.

EX 1) Generate A Linear Block Code of (6 , 3) with given generator matrix

G is the generator matrix having the length n =6 . k= 3 and parity bits P = n – k

In this example p =3 ,

I_k is the identity matrix, =

Parity vectors are selected from 2^n-k set of vectors, in this example n =6 . k= 3,

then the set is 2^6-3= 2³= 8

Parity vectors are: neglect all 000 and neglect identity , least priority to 111 vector

Now the generator matrix G is given by

Codeword U = mG

U = mG

Message vectors are 2^k = 2³= 8 and codeword are 2ⁿ = 2⁶= 64, valid code words are 8 only because 8 possible message vectors

Syndrome Test: S=rH^T , If the ‘r’ is received vector 1 1 0 0 1 1 then the syndrome

S= [0 0 0] no errors in the received vector

Where , H is parity check matrix

EX 2) Generate A Linear Block Code of (7 , 4) with given generator matrix

Find the code words and the minimum distance of the (7,4) linear block code represented by the

generator matrix

Minimum_distance = 3

Because of Minimum_distance = 3

Error detection capability = dmin – 1

= 3 – 1 = 2

Error correction capability = = = 1

Therefore (7,4) code is Hamming code that correct single bit error correction capability.

Standard Array:

6. Cyclic coding

A cyclic code is a linear block code having the property that a cyclic shift (rotation) of any codeword yields another codeword. Due to this additional property, the encoding and decoding processes can be implemented more efficiently using a feedback shift register. An (n,k) cyclic code can be described by an (n,k)^th-degree generator polynomial

The procedure of encoding a k -bit message vector m=[mo , m1, m2,----m_k-1 ] represented by a (k-1)^th-degree polynomial

into an n -bit codeword represented by an (n-1) ^th-degree polynomial is as follows:

1. Divide by the generator polynomial g(x) to get the remainder polynomial 2. Subtract the remainder polynomial from to obtain a codeword polynomial

Which has the generator polynomial g(x) as a (multiplying) factor. Then the f irst n-k

coefficients constitute the parity vector and the remaining k coefficients make the message vector.Note that all the operations involved in the polynomial multiplication, division, addition, and subtraction are not the ordinary arithmetic ones, but the modulo-2 operations.

Ex): A Cyclic Code With a (7,4) cyclic code represented by the generator matrix

find the codeword for a message vector m= [1 0 1 1].

Noting that n=7, k=4, and n-k = 3, we divide by g(x) as

to get the remainder polynomial and add it to to make the codeword polynomial as

The codeword made in this way has the n-k = 3 parity bits and k=4 message bits.

Now, let us consider the procedure of decoding a cyclic coded vector. Suppose the RCVR has received a possibly corrupted code vector where c is a codeword and e is an error. Just as in the encoder, this received vector, being regarded as a polynomial, is divided by the generator polynomial g(x)

remainder polynomial s(x) This remainder polynomial s may not be the same as the error vector e , but at least it is supposed to have a crucial information about e and therefore, may well be called the syndrome.

The RCVR will find the error pattern e corresponding to the syndrome s , subtract it from the received vector r to get hopefully the correct codeword

and accept only the last k bits (in ascending order) of this corrected codeword as a message.

The polynomial operations involved in encoding/decoding every block of message/coded sequence seem to be an unbearable computational load. However, we fortunately have divider circuits which can perform such a modulo-2 polynomial operation. Fig. illustrates the two divider circuits (consisting of linear feedback shift registers) each of which carries out the modulo-2 polynomial operations for encoding/decoding with the cyclic code given in Example Note that the encoder/decoder circuits process the data sequences in descending order of polynomial.

7. convolutional code

In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a boolean polynomial function to a data stream. The sliding application represents the 'convolution' of the encoder over the data, which gives rise to the term 'convolutional coding'. The sliding nature of the convolutional codes facilitates trellis decoding using a time-invariant trellis. Time invariant trellis decoding allows convolutional codes to be maximum-likelihood soft-decision decoded with reasonable complexity.

The ability to perform economical maximum likelihood soft decision decoding is one of the major benefits of convolutional codes. This is in contrast to classic block codes, which are generally represented by a time-variant trellis and therefore are typically hard-decision decoded. Convolutional codes are often characterized by the base code rate and the depth (or memory) of the encoder {\displaystyle [n,k,K]}[n,k,K]. The base code rate is typically given as {\displaystyle n/k}k/n, where {\displaystyle n}k is the input data rate and {\displaystyle k}n is the output symbol rate. The depth is often called the "constraint length" {\displaystyle K}K, where the output is a function of the current input as well as the previous {\displaystyle K-1}K - 1 inputs. The depth may also be given as the number of memory elements {\displaystyle v} in the polynomial or the maximum possible number of states of the encoder.{\displaystyle 2^{v}}

Search This Blog

RAVIKUMAR.CHEEKATLA

4.2 Why use Error-Correction coding?

Trade-Off1: Error performance versus Bandwidth

Trade-Off2: Power versus Bandwidth

Trade-Off3: Data Rate versus Bandwidth

Trade-Off4: Capacity versus Bandwidth

Comments

Post a Comment