INFORMATION THEORY AND CODING

Ch. Ravi Kumar

Assistant Professor, Department of E.C.E, Sir C. R. Reddy College of Engineering, Eluru, India

 

If the rate of information from the source does not exceeds the capacity of a communication channel; then there exists a coding techniques ,such that the information can be transmitted over the channel with an arbitrary small frequency of errors , despite the presence of noise.

1. Measure of information:

            The outputs from an information source such as data, speech can be regarded as a random process, the information theory defines the self information of an event x i with probability P(xi)

self information :  

    Ii = - log2 Pi = log2    bits

                                This measure of information is large/small for an event of lower/higher probability. A message that there was or will be an earthquake in the area where earthquake is very rare has   a value of big news  and  therefore  can be believed to  have  a lot of information.  But,  another   message that there will be an earthquake  in the area where earthquakes are frequent  is of  much   less value  as a news  and  accordingly,   can be  regarded  as having a little information.

 

1.1 Properties of Self information:

 

                        1) Self information is a non negative quantity

                        2)  I i ≥ 0   for  0≤ P I ≤ 1

                        3)  I i –- 0  for   P I --- 1

                        4) I i I j   for    P i ≤ P j    

                       5) Joint probability: If  X i & X j   are two successive and independent events then the probabilities P(X i , X j) = P i , P j and the self information

 

Iij = log2  =  log2  + log2  bits

                                                                     I i  j = I i +I j             

1,2 Average information (or) entropy:

                                                       To find average information we required probability of events

  Ex: in the sentence

“ABADCADBCDBADCBADC”

 

PA , PB , PC, PD    are to be calculate and total number of symbols are “ N”

Therefore   ITotal  = N. PA  log2 + N . PB  log2 + N. PC  log2 + N. PD  log2

Suppose a source emotes one of  N possible symbols with the probabilities P1 , P2, P3, -------PN

 Therefore Average information    H =

H =  PA  log2 + PB  log2 + PC  log2 + PD  log2

                        H=   log2   bits/symbol

 

Problems:

1) A source emits one of four symbols with probabilities , , ,  successive symbols emitted by the source are statically independent , calculate the Entropy.

                  H=      log2 3+   log2 6+   log2 4+  log2 4=  1.959 bits/symbol

 

1,3 Properties of Entropy:

Consider a discrete memory less source emits the symbols S0 , S1 , S2, ---SM    with the probabilities P1 , P2 , P3------, PM   then the entropy H(s) of the source is bounded as

 

0     H(s)   log2 M

 

1) H(s) =0  If probability of one symbol is ‘1’ and the probability of remaining symbols are ‘0’

    that is no randomness or no uncertainty , i.e every time only one symbol emits all the time.

    Therefore no information

2) H(s) = log2 M,   If probability of symbols are equitable ,   that is  randomness or uncertainty ,

    it has maximum information

  2): A source emits eight  symbols with equal  probabilities , calculate the Entropy.

H=      log2 8+    log2 8+    log2 8+   log2 8+    log2 8+    log2 8+    log2 8+   log2 8

   = 8    log2 8 =  log2 8 = 3 bits/symbol

Entropy of Binary memory less source:

            Consider the probability of logic 0 is P and probability of logic 1 is 1- P then the entropy H(s) is

                                

H(s) = p log2  + 1-P log2  bits/symbol

 

H(s) is maximum when the derivative of H(s) with respective P and equate it to 0

                                                          i.e =0

3) Two symbols are emitted by the source having probabilities  &  find the entropy of the source

H(s) =      log2 4+    log2   =  0.811 bits / symbol

 

1.4 Information rate:

The discrete memory less emits symbols with symbol rate  and the average information then the rate of information is given by

 

R =  r  H(s)  bits / sec

 

                                       Where ‘r’ is symbol rate

 

4) Suppose a source emits 2000 symbols / sec from an alphabet size of  4 with symbol probabilities are given bellow. Find the self information , entropy and information rate

 

Symbol

Probability

Self information

A

 log2 2 = 1

B

 

 log2 4 = 2

C

 log2 8 = 3

D

 log2 8 = 3

 

H(s) =    log2 2 +  log2 4 +  log2 8 +  log2 8 = 1.75 bits / symbol

R = r H(s) = 2000 x 1.75 = 3,500 bits / sec

1,5 Extended Entropy for Discrete memory less source: Entropy of extended sources is defined as the ‘n’th order by

H (sn) =  n H(s)

 

Where ‘n’ is order of extinction of the source

 if n=2 , second  order of extent ion 

Let  3 symbols  S0 , S1 , S2, are emitted by the source and second order extension then

 the number of blokes  =  32  = 9 those are

 

New symbol

0

 

1

2

3

4

5

6

7

8

S0 S0

S0 S1

S0 S2

S1 S0

S1 S1

S1 S2

S2 S0

S2 S1

S2 S2

Probabilities

P0 P0

P0 P1

P0 P3

P1 P0

P1 P1

P1 P2

P2 P0

P2 P1

P2 P2

 

5) Consider the DMS with symbol probabilities , , Calculate

             a) Entropy of the source

             b) Entropy of 2nd order extended source

 sol: a)  H(s) =    log2 4+  log2 2 +  log2 2 = 1.5 bits / symbol

        b) Entropy of 2nd order extended source

 

New symbol

0

 

1

2

3

4

5

6

7

8

S0 S0

S0 S1

S0 S2

S1 S0

S1 S1

S1 S2

S2 S0

S2 S1

S2 S2

Probabilities

 

H(s) =  log2 16+  log2 8 +  log2 8 +  log2 8 +  log2 4+  log2 4+  log2 8 +  log2 4+  log2 4

 

H(s) = x4 +  x3+  x3+  x3+  x2+  x2+  x3+  x2+  x2 = 3.75 bits / symbol

 

6) Consider the DMS with symbol probabilities,0.7,0.15,0.15 ,Calculate

             a) Entropy of the source                                            Answer:  1.18 bits / symbol

             b) Entropy of 2nd order extended source                    Answer:  2.362 bits / symbol

2, Shannon’s source coding: (Shannon’s First Law)

 

Sometimes we want to find an efficient code using fewest bits to represent the information. Shannon’s source coding theorem or noiseless coding theorem states that we need the number of bits not less than the entropy in order to encode an information message in such a way that perfect decoding is possible..

The average codeword length of any code assigned to the set S   of symbols cannot be less than the entropy  H(s)  defined, the average codeword length     of the Huffman code constructed by this procedure is less than    

H(s) /  ≤ 1

 

  1

2.1. Kraft Inequality: In order to check uniquely decodable condition at the  receiving end of  communication system follow the Kraft Inequality condition as

 

K = -Ni  ≤ 1

 A discrete memory less source emits M equal symbol probabilities then the information rate 

R = r H(s) = r log M

When the symbol probabilities are deferent then the information rate 

 

R = r H(s) ≤ r log M

 

Efficient transmission required as encoding process that is:

 

R = r H(s) ≤ rb

 

H(s) ≤                Where

        

=

 is Average code word length, & Ni  is t h code length

Minimum value of  is bounded between  H(s) and H(s) + €

 

i.e  H(s)        H(s) + €

 

Efficiency ŋ =  

Nmin (minimum number of bits for each symbol) Optimum source coding achieve the lower boundary i.e  N =H(s) otherwise  sub Optimum  code     

The average codeword length of any code assigned to the set S   of symbols cannot be less than the entropy  H(s)  defined, the average codeword length   Then the code is realizable  and  optimum than

                      

  1

 

Kraft Inequality: In order to check uniquely decodable condition at the receiving end of communication system follows the Kraft Inequality condition as

 

K = -Ni  ≤ 1

7) A Discrete memory less source the symbols along with codes given bellow , find  realizable  or   optimum  condition and also verify uniquely decodable condition

 

 

Symbol  (xi)

Probability (Pi)

Code 1

Code 2

Code 3

Code 4

A

00

0

0

0

B

01

1

01

10

C

10

10

011

110

D

11

11

0111

111

 

Code 1:

H(s) = H(s) =    log2 2 +  log2 4 +  log2 8 +  log2 8 = 1.75 bits / symbol

 =  =    x2 +  x 2 +  x 2 + x2 =  2 bits

                                           H(s)      i.e   1.75     2   

K = -Ni  ≤ 1 = 2-2 +2-2 +2-2 +2-2  = 1

Because of     is Greater than the H(s) i.e (   H(s)  ) Therefore Code1 is  optimum

And  K less than 1 Therefore Code 1  uniquely decodable

 

Code 2:

H(s) = H(s) =    log2 2 +  log2 4 +  log2 8 +  log2 8 = 1.75 bits / symbol

 =  =    x1 +  x 1 +  x 2 + x2 = 1. 25 bits

                                           H(s)      i.e   1.75  ≥ 1.2 5  

 

Because of     is less than the H(s) i.e (   H(s)  ) Therefore Code 2 is  not optimum

For uniquely decodable condition   Kraft Inequality,   should be less than 1 (K = -Ni  ≤ 1)

                                                      = 2-1 +2-1 +2-2 +2-2  = 1.5 bits

Because of   K   is Greater than  1 i.e (  K  ≥ 1  ) Therefore Code 2 is  not uniquely decodable

BAABB  =  10011

CABB    =  10011

CAD      = 10011

Note: Students are advised to verify code 3 and code 4 also

2.2 .Huffman Coding:

 

Huffman coding is a variable length coding which has a strategy to minimize the average number of bits required for coding a set of symbols by assigning shorter/longer length code to more/less probable symbol. It is performed in two steps – merging and splitting as

 

Procedure:

 

9)      Merging:  Repeat arranging the symbols in descending order of probability as in each column

           of Table 9.1 and merging the least probable two entries into a new entry with a probability    

           equal to the sum of their probabilities in the next column, until only two ntries remain.

 

(2) Splitting:  Assign 0/1 to each of the two remaining entries to make their initial parts of

           codeword. Then repeat splitting back the merged entries into the two (previous) entries

           and appending another 0/1 to each of their (unfinished) codewords.

 

8) A Discrete memory less source emits 5 symbols with the probabilities are given bellow, find the Huffman code and calculate the efficiency?

 

{S0 , S1 , S2 , S3 , S4 }, {0.4 , 0.2 , 0.2 , 0.1 , 0.1}

 

CODE

Symbols

Probability

Step1

Step2

Step3

00

 

S0

 

 

0.4

00

 

 

0.4

00

 

 

0.4

1

 

 

0.6

 

0

10

 

S1

 

 

0.2

 

10

 

0.2

 

01

 

0.4

00

 

 

0.4

 

1

11

 

S2

 

 

0.2

11

 

 

0.2

10

 

 

0.2

 

01

 

010

 

S3

 

 

0.1

010

 

 

0.2

11

 

 

 

011

 

S4

 

 

0.1

011

 

 

 

 

Efficiency =      

H(S) = log  = P1 log  + P2 log  + P3 log  + P4 log + P5 log

                                  = 0.4 log + 0.2 log + 0.2 log  + 0.1 log + 0.1 log

                                 = 2.2 bits

 = Ni = 0.4 x 2 + 0.2 x 2 +0.2 x 2 +0.1 x 3 +0.1 x 3 = 2.2 bits

Efficiency = x 100 =100%

Variance  =                     

                                       2

                  = 0,4(2-2.2)2 + 0.2(2-2.2) 2 + 0.2(2-2.2) 2+ 0.1(3-2.2) 2+ 0.1(3-2.2) 2

                  = 0.4(0.04) + 0.2(0.04) + 0.2(0.04) + 0.1(0.64) + 0.1(0.64)

                  = 0.016 + 0.008 + 0.008 + 0.064+ 0.064

              Variance   =    0.16

8) A Discrete memory less source emits 5 symbols with equal probabilities, find the Huffman code and calculate the efficiency?

Equal probabilities meals { 0.2 , 0.2, 0.2, 0.2 , 0.2}

 

CODE

Symbols

Probability

Step1

Step2

Step3

01

 

S0

 

 

0.2

01

 

 

0.4

00

 

 

0.4

1

 

 

0.6

 

0

10

 

S1

 

 

0.2

 

10

 

0.2

 

01

 

0.4

00

 

 

0.4

 

1

11

 

S2

 

 

0.2

11

 

 

0.2

10

 

 

0.2

 

01

 

000

 

S3

 

 

0.2

000

 

 

0.2

11

 

 

 

001

 

S4

 

 

0.2

001

 

 

 

 

 

H(S) = log  = P1 log  + P2 log  + P3 log  + P4 log + P5 log

                                  = 0.2 log + 0.2 log + 0.2 log  + 0.2 log + 0.2 log

                                 =

 = Ni = 0.4 x 2 + 0.2 x 2 +0.2 x 2 +0.1 x 3 +0.1 x 3 = 2.2 bits

9) Suppose there is an information source X    which generates the symbols from                          

with the corresponding probability vector

                                   

   P(x) = [0.2 , 0.15, 0.13, 0.12, 0.1, 0.09, 0.08,0.07,0.06]

 

Ex1) A DMS emits 5 symbols with the probabilities 0.4, 0.2, 0.1, 0.1, 0.2 respectively,   compute the Huffman code and calculate the efficiency, variance?

                                     Efficiency =      

                                    Variance = 

                                                            2

Ex 2) A DMS emits 5 symbols with the probabilities 0.55, 0.15, 0.15, 0.10, 0.05 respectively,   compute the Huffman code and calculate the efficiency, variance?

Ex 3) A DMS emits 8 symbols with the probabilities 0.26, 0.25, 0.14, 0.09, 0.08,0.07, 0.07, 0.04 respectively,   compute the Huffman code and calculate the efficiency, variance?

Ex 4) A DMS emits 8 symbols with the probabilities 0.4, 0.3, 0.1, 0.06, 0.05,0.04, 0.04, 0.01 respectively,   compute the Huffman code and calculate the efficiency, variance?

Ex 5) A DMS emits 7 symbols with the probabilities 0.37, 0.33, 0.16, 0.07, 0.04,0.02,  0.01 respectively,   compute the Huffman code and calculate the efficiency, variance?

Ex 6) A DMS emits 7 symbols with the probabilities 0.46, 0.26, 0.12, 0.05, 0.04, 0.03,  0.02 respectively,   compute the Huffman code and calculate the efficiency, variance?

2.3 Shannon Fano Code:

 Shannon Fano Code is the another type of source code  the Procedure is as follows

:    Step 1 Draw a line that divide the symbols in to two groups, such that the group probabilities

               Are as nearly equal as possible.

    Step 2.Then assign logic 0 for each symbol in the group above the line and the logic 1 to each

               Symbol in the group below the line.

    Step 3.For all sub sequent steps , you sub divide each group in to sub groups ,assign digits by

               The previous rule.

    Step 4. When even a group consisting just 1 symbol , no farther sub division is possible and

                The code word for that symbol is completed 

1) Apply Shannon Fano technique to find code word for the symbols emitted by DMS having probabilities given below, find the coding efficacy.

Symbol

Probability

Step1

Step2

Step3

Step4

Step5

Codeword

S0

 

 

S1

 

 

S2

 

 

S3

 

 

S4

 

 

S5

 

 

S6

 

 

S7

 

 

0

 

 

 

 

 

0

 

 

0

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

0

 

 

100

 

 

101

 

 

1100

 

 

 

1101

 

 

1110

 

 

11110

 

 

11111

 

 

 

 

 

 

1

 

 

1

 

 

1

 

 

 

0

 

 

0

 

1

 

 

1

 

 

1

 

 

1

 

 

1

 

1

 

 

1

 

 

 

1

 

0

 

1

 

 

1

 

 

1

 

 

1

 

 

1

 

 

1

 

1

 

 

1

 

1

 

 Efficiency =       =                                        = 100%

2) A Discrete memory less source emits 5 symbols with equal probabilities, find the Shannon Fano code and calculate the efficiency?

Equal probabilities meals { 0.2 , 0.2, 0.2, 0.2 , 0.2}

 

Symbol

Probability

Step1

Step2

Step3

Step4

S0

 

0.2

 

0.2

 

0

 

0

 

 

 

 

 

 

0

00

S1

 

0

1

01

S2

 

0.2

 

0.2

 

0.2

1

 

0

10

S3

 

1

 

 

1

1

110

S4

 

1

1

111

 

H(S) = log  = P1 log  + P2 log  + P3 log  + P4 log + P5 log

                                  = 0.2 log + 0.2 log + 0.2 log  + 0.2 log + 0.2 log

                                 =

 

 = Ni = 0.4 x 2 + 0.2 x 2 +0.2 x 2 +0.1 x 3 +0.1 x 3 = 2.2 bits

 

 Efficiency =      

3. The Discrete Memory less Channel (DMC)

Channel representation: A channel is a statistical modal with an input ‘X’ and out put ‘Y’.The channel is said to be  Discrete , when the alphabets of X and Y are both finite . Also it is said to be memory less when the current output depends on only the current input and not any of the previous inputs   

The Channel Matrix: A channel is  specified by the set of transition probabilities

Row of the Channel Matrix is unity i.e    for all i values

Input and Output probabilities are represented by Row Matrixes

Input Matrix:  [P(X)] = [ P(x1) , P(x2) , P(x3) , ...............,P(xm)]

Output Matrix: [P(Y)] = [P(y1) , P(y2) , P(y3) , .................P(yn)]

 Output (Matrix) probabilities are related by

[P(Y)] = [P(X)] [P(Y/X)]

Joint (Matrix) Probability is represented  by

[P(X,Y)] = [P(Xd)] [P(Y/X)]

[P(Y/X)]   is The Channel Matrix

 

Where  [P(Xd)] is Input diagonal Matrix

3.1.Types of Channels: Channel types are classified according to the Channel Transition Matrix (Channel Matrix)

                                                1) Lossless Channel

                                                2) Deterministic Channel

                                                3) Noiseless Channel

                                                4) Binary symmetric Channel

1) Lossless Channel: Lossless Channel defined as In Channel Transition Matrix , only one non zero element in each Colum. In  Lossless Channel no source information is loss in transmission.

Ex:

[P(Y)] = [P(X)] [P(Y/X)]

=

P(y1) =(3/4) x1                                                                                             P(y3) = (2/3) x2

P(y2) = (1/4) x1                                                                                             P(y4) = (1/3) x2  

p(y5) = x3

2) Deterministic Channel: Deterministic Channel is defined as In Channel Transition Matrix , only one non zero element in each Row and the row sum should be unity. In Deterministic Channel a given source symbol is determined at the receiving end.  

[P(Y)] = [P(X)] [P(Y/X)]

       P(y1) = x1 + x2

        P(y2) = x3 + x4

 P(y3) = x5

3) Noiseless Channel: A Noiseless Channel is defined as it obey the properties both Lossless and deterministic channels, each Colum and row have one element that should be unity. Therefore Noiseless Channel is square matrix and unit matrix.

4) Binary symmetric Channel: Binary symmetric Channel is defined as two inputs and two outputs ,this channel is symmetric because the probability of receiving ‘1’ if a ‘0’ is sent as the same as the probability of receiving ‘0’ if a ‘1’ is sent. This common transmission probability is denoted by  ‘P’ and other probability ‘1- P’.

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) = (1 – P) x1 + P x2

P(y2) = Px1 + (1 – P) x2

1) Given a Binary channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2) when P(x1) = P(x2) = 0.5 iii) Find the joint probability when P(x1) = P(x2) = 0.5

i) Find the channel matrix , ii) Find P(y1) , P(y2) when P(x1) = P(x2) = 0.5

iii) Find the joint probability when P(x1) = P(x2) = 0.5

2) Given a Binary channel show in bellow i) find the channel matrix ii) Find P(z1) , P(z2) when P(x1) = P(x2) = 0.5 iii) Find the joint probability when P(x1) = P(x2) = 0.5

i) find the channel matrix:  [P (Z/X)]  = [P(Y/X)] [P(Z/Y)]

ii) Find P(z1) , P(z2) when P(x1) = P(x2) = 0.5:

[P(Z)] = [P(X)] [P(Z/Y)]

iii) Find the joint probability when P(x1) = P(x2) = 0.5:

3) Given a  channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2) when P(x1) = P(x2) = 0.5 and P = 0.5, Given channel is binary erasure channel

i) Find the channel matrix: 

ii) Find P(y1) , P(y2) , P(y3)  when P(x1) = P(x2) = 0.5

P(y1) = 0.25  ,    P(y2) = 0.5  ,    P(y3) = 0.25 

3.2. Information Transmission on Discrete Channels

The condition and joint entropies are calculated by using condition and joint probabilities for the channel, the following are the various entropy functions for a channel with input and outputs.

H(X) =  it is Average of the channel input

H(Y) =  it is Average of the channel output

H(Y/X) =  it is Average of the conditional channel output

H(X,Y) =  it is Average of the Joint channel output

4) Given a  channel show in bellow i) find the channel matrix ii) Find P(y1) , P(y2) , P(y3) when P(x1) = 0.3 ,  P(x2) = 0.4 and P(x3) = 0.3 iii) Find the all entropies , H(X) , H(Y) , H(Y/X) , H(X,Y)

i) find the channel matrix:

[P(Y)] = [P(X)] [P(Y/X)]

 

P(y1) =0.24                                         P(y2) =0.55                                         P(y3) =0.21    

 

H(X) = log     =  0.3 log + 0.3 log + 0.3 log = 

 

H(Y) = log      =0.24 log + 0.55 log + 0.21 log = 

 

H(Y/X) = log  

  = P(x1,y1)log + P(x1,y2)log + P(x1,y3)log                                       + P(x2,y1)log + P(x2,y2)log + P(x2,y3)log                                      + P(x3,y1)log + P(x3,y2)log + P(x3,y3)log

=

H(X,Y) = log  

  = P(x1,y1)log + P(x1,y2)log + P(x1,y3)log

      + P(x2,y1)log  + P(x2,y2)log  + P(x2,y3)log                                                                          + P(x3,y1)log + P(x3,y2)log + P(x3,y3)log

=

3.3 Mutual Information:

It is defined as measure of information transfer, noise and other transmission impairments alter the source information.

·         P(xi) is the input probability for’ i’ integer values

·         P(yj) is the output probability for ‘j’ integer values

·         P(xi , yj) is the joint probability for ‘j’ & i’ integer values, that is when the probability of ‘yj’ is received when the probability of ‘yj’ is transmitted   

·         P(yj/xi) is the conditional probability for ‘j’ & i’ integer values, that is when the probability of ‘yj’ is received when the probability of ‘yj’ was transmitted   

Mutual Information:                  I(X,Y)   = log  bits

 

I(X,Y)   = log  bits

 If the channel is Ideal (noiseless) then the P(xi) = P(yj)

                                                                P(yj/xi) = 1 

Therefore        I(X,Y)   = log  bits------------Self information

Then the channel is Ideal (noiseless) then the Mutual Information is Self information

If the channel is full of noise

Then  P(xi)  ≠ P(yj)

        I(X,Y)   = log   bits

Properties of Mutual information:

1)      Mutual information is non negative quantity i.e I(X,Y) ≥ 0

2)      Mutual information is symmetry i.e  I(X,Y) = I(Y,X)

3)      Mutual information I(X,Y) = H(Y) – H(Y/X)

4)      Mutual information I(X,Y) = H(X) – H(X/Y)

5)      Mutual information I(X,Y) = H(Y) + H(X) – H(X,Y

 

1) Show that the Mutual information of the Binary symmetry channel is

    I(X,Y) = H(Y) + P log P + (1- P) log (1 – P), when P(x1) = α  &  P(x2) = 1 – α

x1 = α     ,     x2 = (1 – α)

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) = (1 – P) x1 + P x2 = (1 – P) α  + P (1 – α) 

P(y2) = Px1 + (1 - P) x2 = P α  + (1 - P) (1 – α) 

P(X ,Y) = [P(xd)] [P(Y/X)]

I(X,Y) = H(Y) – H(Y/X)

H(Y) = log      

H(Y/X) = log  

  = P(x1,y1)log + P(x1,y2)log + P(x2,y1)log

      + P(x2,y2)log

       =(α – αp) log  + αp log  + (p – αp) log + (1- α – p + αp) log

       = - α log(1-p) + αp log (1-p) – αp log p – p log p + αp log p - log (1-p) + p log (1-p)

        + α log (1-p) - αp log (1-p)

       =– p log p- log (1-p) + p log (1-p)

H(Y/X) = - p log p – (1-p) log (1-p)

I(X,Y) = H(Y) – H(Y/X)

 I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

                                                     

 

 If α = 0.5  & p = 0.1 Then

                             I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

                                         = 1 – 0.45 = 0.534

 

If α = 0.5  & p = 0.5 Then

                            I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

                                       = 1 – 1 = 0

3.4. Channel Capacity:

The Channel capacity of discrete memory less channel is define as based on mutual information of Channel, that is where the power spectral density function of 

Input/output of channel is maximizing.

 

Cs  =  I(X,Y) bits / symbol

C = r  Cs  bits / sec

 

Channel Capacity of Special channels:

1) Lossless channel: Lossless channel, conditional entropy H(X/Y) = 0 therefore mutual information              I(X,Y) = H(X) - H(X/Y)

                                 I(X,Y) = H(X)

  Thus the mutual information (information transfer) is equal to the Input (source) entropy, and no source information is lost in transmission.

Cs  =  I(X,Y) bits / symbol

                                                   Cs =   H(X)

                                                    Cs = log2 M 

2) Deterministic channel: Deterministic channel conditional entropy H(Y/X) = 0 therefore mutual information             

                                             I(X,Y) = H(Y) - H(Y/X)

                                             I(X,Y) = H(Y) 

 

Thus the mutual information (information transfer) is equal to the output (destination) entropy, and no source information is determined at destination.

Cs  =  I(X,Y) bits / symbol

                                                   Cs =   H(Y)

                                                    Cs = log2 N 

3) Noise less channel: Noise less channel obeys the properties of both Lossless channels and Deterministic channel therefore conditional entropy H(X/Y) = H(Y/X) = 0

                                             I(X,Y) = H(Y) - H(Y/X)

                                             I(X,Y) = H(Y) 

Cs  =  I(X,Y) bits / symbol

                                                   Cs =    H(X) =  H(Y)

Cs = log2 M  = log2 N

4) Binary symmetric channel: Binary symmetric channel, the Mutual information of the Binary symmetry channel is   I(X,Y) = H(Y) - H(Y/X), when P(x1) = α  &  P(x2) = 1 – α

Cs  =  I(X,Y) bits / symbol

Cs  =  I(X,Y) = H(Y) - H(Y/X)

                                        

Cs    = H(Y) + P log P + (1- P) log (1 – P)

 

 

Channel capacity depends on P(x1) = α  &  P(x2) = 1 – α and probability of channel transition.

1) Show that the Channel capacity of the Binary symmetry channel

    when P(x1) = α  &  P(x2) = 1 – α and P = 0.1 & P = 0.5

x1 = α     ,     x2 = (1 – α)

 

[P(Y)] = [P(X)] [P(Y/X)]

P(y1) = (1 – P) x1 + P x2 = (1 – P) α  + P (1 – α) 

P(y2) = Px1 + (1 - P) x2 = P α  + (1 - P) (1 – α) 

P(X ,Y) = [P(xd)] [P(Y/X)]

I(X,Y) = H(Y) – H(Y/X)

H(Y) = log      

H(Y/X) = log  

  = P(x1,y1)log + P(x1,y2)log + P(x2,y1)log

      + P(x2,y2)log

       =(α – αp) log  + αp log  + (p – αp) log + (1- α – p + αp) log

       = - α log(1-p) + αp log (1-p) – αp log p – p log p + αp log p - log (1-p) + p log (1-p)

        + α log (1-p) - αp log (1-p)

       =– p log p- log (1-p) + p log (1-p)

H(Y/X) = - p log p – (1-p) log (1-p)

I(X,Y) = H(Y) – H(Y/X)

 I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

                                                     

 

Channel capacity:

Cs  =  I(X,Y) bits / symbol

     Cs  =  I(X,Y) = H(Y) - H(Y/X)

                             If α = 0.5  & p = 0.1 Then

                             I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

 

                                        Cs = 1 – 0.45 = 0.534

 

                            If α = 0.5  & p = 0.5 Then

                            I(X,Y) = H(Y) + P log P + (1- P) log (1 – P)

                                     

                                    Cs = 1 – 1 = 0

3.5. Alternative method to calculate the channel capacity:

Q1, Q2 are the variables, solve the Q1 & qQ2 values and find the Channel capacity

Channel capacity

Cs = log2 ( 2Q1 + 2Q2 ) bits/sec

 

1) Find the channel capacity of given channel , when P(x1)  = 0.6  & P(x2) = 0.4

H(Y) = log2  ; P(y1)  = 0.6  & P(y2) = 0.4

        =P(y1) log2 + P(y2) log2

     = 0.6 log2 + 0.4 log2  = 0.97 bits

H(Y/.X) = log  

  = P(x1,y1)log + P(x1,y2)log + P(x2,y1)log

      + P(x2,y2)log

=0.46 log2 + 0.12 log2 +0.12 log2 + 0.28 log2  = 0.78 bits

     Cs  =  I(X,Y) = H(Y) - H(Y/X)

                                               Cs    = 0.97 – 0.78 = 0.19 bits/sec

 

Alternative method to calculate the channel capacity:

0.8 Q1 + 0.2 Q2 = - 0.72

0.3 Q1 + 0.7 Q2 = 0.88

Therefore

   Q1 = -0.656

 Q2 = 0.88

 

Channel capacity ‘C’ = log2 ( 2Q1 + 2Q2 ) bits/sec

                               C= log2 ( 2-0.656 + 20.88 ) = 0.192 bits/sec

1) Find the channel capacity of given channel,

Therefore

   Q1 = -0.966

 Q2 = -0.966

 

Channel capacity ‘C’ = log2 ( 2Q1 + 2Q2 ) bits/sec

                               C= log2 ( 2-0.966 + 2-0.966 ) = 0.034 bits/sec

2) Find the channel capacity of given channel

Channel capacity ‘C’ = log2 ( 2Q1 + 2Q2 + 2Q3) bits/sec

 

 

 

 

 

4. Channel Coding:

 

The simplified block diagram of a communication system depicted in Fig. shows the position of the source encoder/decoder and the channel encoder/decoder.

 


The purpose of source coding is to reduce the number of data bits that are to be transmitted over

 the channel for sending a message information to the receiver, while the objective of channel coding is to make the transmission error correction efficient so that the error probability can be decreased

 

Shannon-Hartley channel capacity theorem

 

The limits of the channel capacity as S/NoB -----α or 0 are as follows:

            In order to taste an implication of this formula, we suppose an ideal situation in which the data transmission rate R reaches its upper bound that is the channel capacity C   . The relationship between bandwidth efficiency  R/B  and Eb / No dB = 10 log (Eb / No )  for such an ideal system can be obtained by substituting  C=R  into the left-hand side of Eq. as In scientific theory, the Shannon–Hartley theorem tells the most rates at that info may be transmitted over a communications channel of a specified information measure within the presence of noise. it's an application of the noisy-channel secret writing theorem to the first case of a continuous-time analog communications channel subject to mathematician noise. the theory establishes Shannon's data rate for such a communication link, a certain on the most quantity of error-free info per unit of time that may be transmitted with a specified information measure within the presence of the noise interference, forward that the signal power is delimited, which the mathematician noise method is characterized by a renowned power or power spectral density. The law is called when technologist and Ralph David Hartley.

4.1Statement:

            The Shannon–Hartley theorem states the data rate  C, that means the theoretical tightest boundary on the data rate of knowledge which will be communicated at an willy-nilly low error rate exploitation a median received signal power  S through AN analog line subject to additive white mathematician noise of power  N:

C = B log2

            Where

·         {\displaystyle C}’C’  is the channel capacity in bits per second, {\displaystyle B}

·         ’B ‘  is the bandwidth of the channel in hertz

·         {\displaystyle S}’S’  is the average received signal power over the bandwidth {\displaystyle N}

·         ‘N’ is the average power of the noise and interference over the bandwidth, measured in watts

·         {\displaystyle S/N}  ‘ is the signal-to-noise ratio (SNR)

Consider the channel is AWGN  i.e 

  Additive    :        Add the noise to the signal Y = S + N

 White noise:         Power spectral density of the noise in uniform in the entire range of

                              Frequencies,

 Gaussian    :         Power spectral density of the noise is Gaussian distribution curve

And the signal is power limited and band limited

In order to calculate the channel capacity, first calculate the mutual information of the channel, that give the channel capacity, where the maximum probability density function of mutual information

C = Max {I (X , Y)}

                  = Max { H(Y) – H (Y/X) }

Channel is random process, and the signal give to the channel and coming from the channel in random variable. Signals are continuous .

 

H (Y/X) = log     =    log  dx dy

                      From bye’s theorem               =       P(x , y) = P(y/x)  P(x)

 

                                                                      = P(x) log dx dy

                                                                      =  dx log  dy

We know  dx =1 and Change the integration and notation i.e

                                                         P(Y/X) = P (Y – N) = PN(n)

               Y – N = n  , dy = dn

H (Y/X) = log dn =  HN(n)

C = Max [I (X , Y)]   = Max [H(Y) – H (Y/X)]

Signal Power is mean square value of signal  = S

Noise Power is mean square value of signal  = N

                                                   Output signal  = S + N

Max [H(Y)] = log (2πe )

                   = log ((2πe(S+N))

Max [H(N)] = log (2πe )

                     = log ((2πeN)

Therefore Max [I (X , Y)] = log ((2πe(S+N)) - log ((2πeN)

                                          = log ( )

                   Max [I (X , Y)] = log ( 1+ )

Consider a Gaussian noise channel is band limited having a bandwidth ‘B’ HZ and the signal is sampled at Nyquest rate 

fs  = 2 fm

Where fm = B

fs  = 2 B

Channel Capacity C = 2B Max [I (X , Y)]

                                                                        C=2B log ( 1+ )

C = B log2

Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy-channel coding theorem to the archetypal case of a continuous-time analog communications channel subject to Gaussian noise.

C = B log2

R≤ C

R≤ B log2

1+  ≥ 2R/B – 1  Multiplied LHS and RHS with  and Noise power spectral density N =ŋB

 

   (2R/B – 1)

If the rate of information ‘R’ is fixed and ‘ ŋ’ is fixed then go for analyses 

Case 1:  If  = 0.1 then  = 10

                   = 0.1(210 – 1) = 102.3 ᴝ 100 = 20dB

                   = 20dB

Case II: If  = 1 then  = 1

                   = 1(21 – 1) = 1  =  0dB

                   =  0dB

Case II1:  If  B          α ,   What is channel capacity ?

                      C = B log2

                     Lt           log (1+  )1/

                       B          α    

                   = ln2= -1.6 dB ,           = -1.6 dB

4.2 Why use Error-Correction coding?

 

                Error-Correction coding can be regarded as a vehicle for effecting various system trade-off. The below figure compares, two curves depicting bit-error performance versus   Eb/No. One curve represents a typical modulation scheme without coding. The demonstrated below are four benefits or trade-off that can be achieved with the use of channel coding

 

Trade-Off1: Error performance versus Bandwidth

                                        A simple, inexpensive voice communication system has just been developed and delivered to a customer. The system does not use error-correction coding. Consider that the operating point of the system can be depicted by point A in figure ( Eb / No=8 dB, and PB=10-2). After a few trails, there are complaints about the voice quality; the customer suggests that the bit-error probability should be lowered to 10-4. The usual way of obtaining better error performance in such a system would be by effecting an operating point movement from point A  to, say, point B in figure.  The figure suggests that one possible trade-off is to move the operating point from point A to point C.  That is, “walking” down the vertical line to point C on the coded curve can provide the customer with improved error performance.

 Trade-Off2: Power versus Bandwidth

Consider that the system without coding, operating at point D in figure ( Eb / No=14 dB, and PB=10-6), has been delivered to a customer. The customer has no complaints about the quality of the data, but the equipment is having some reliability problems as a result of providing an Eb / No of 14 dB. In other words, the equipment keeps breaking down. If the requirement on Eb / No or power could be reduced, the reliability difficulties might also be reduced. The figure suggests  a trade-off by moving the operating point from point D to point E. Thus, the trade-off is one which the same quality of data is achieved, but the coding allows for a reduction in power or Eb / No .

Trade-Off3: Data Rate versus Bandwidth

                                  Consider that a system without coding, operating at point D in figure  ( Eb / No=14 dB, and PB=10-6) has been developed. Assume that there is no problem with the data quality and no particular need to reduce power. If the customer’s data rate requirement increases,

                                                                        Eb / No  = Pr / No (1/R)

     The above expression shows that the received Eb / No would decrease, and in above figure , the operating point would move upwards from point D to, some point F. Now, envision “walking” down the vertical line to point E o the curve that represents coded modulation. Increasing the data rate has degraded the quality of the data.  But, here we observe that the same quality at the same power level.

Trade-Off4: Capacity versus Bandwidth

                        Trade-off 4 is similar to trade-off 3  because both achieve increased capacity. Spread spectrum multiple access technique, called code-division multiple access(CDMA) and it is one of the schemes used in cellular telephony. In CDMA ,where users simultaneously share the same spectrum, each user is an interferer to each  of  the other users in the same cell or nearby cells. Hence the capacity (maximum number of users) per cell is inversely proportional to Eb / No . In this application, a lowered Eb / No   results in a raised capacity; the code achieves a reduction in each user’s power, which in turn allows for an increase in the no. of  users. Again the cost is more bandwidth. But, in this case, the signal bandwidth expansion due to the error correcting code is small compared with the most significant spread spectrum expansion , And  there is no impact on the transmission bandwidth.

                 In the of the above trade-off examples, a “traditional” code involving in redundant bits and faster signaling(for a real time communication systems) has been assumed; hence, in each case, the cost was expended bandwidth. However, there exists an error correcting techniques, called trellis-coded modulation, that does not require faster signaling or expended bandwidth for real time systems.

4.3, Types of Channel Coding

 

In order to achieve reliable communication through a channel corrupted by a noise, we might have to make the codewords different from each other conspicuously enough to reduce the probability of each symbol to be taken for another symbol. The conversion of the message data with this aim is called channel coding. It may accompany some side effects such as a decline of data transmission rate, an increase of required channel bandwidth, and an increase of complexity in the encoder/decoder. 

     In this section, we will discuss waveform coding that converts the data to distinguishable waveforms and structured coding that adds some redundant/extra parity bits for detection and/or correction of transmission errors. The structured coding is divided into two schemes. One is the block coding, which converts the source data independently of the previous data and the other is the convolutional coding, which converts the data dependently of the previous data.

 

Waveform Coding

There are various methods of waveform coding such as antipodal signaling, orthogonal ignaling, bi-orthogonal signaling, etc,Among those waveform codings, the orthogonal signaling can be regarded as converting one bit of data and two bits of data according to its/their value(s) in the following way:

 

 

 

 

 

 


This conversion procedure is generalized for K  bits of data into the form of a Hadamard matrix

Here, we define the crosscorrelation between codeword   i    and codeword  j     as

 

 

This can be computed by changing every 0 (of both codewords) into -1 and dividing the inner product of the two bipolar codeword vectors by the codeword length. According to this definition, the crosscorrelation between any two different codewords turns out to be zero, implying the orthogonality among the codewords and their corresponding signal waveformsOn the other hand, the bi-orthogonal signaling can be regarded as converting two bits of data and  generally,   K  bits of data in the following way:

 

 

 

 

 

 

 


Since the number of columns of the codeword matrix is the number of bits per symbol, the number of bits spent for transmission with bi-orthogonal signaling is a half of that for orthogonal signaling and the required channel bandwidth is also a half of that for orthogonal signaling, still showing comparable BER performance. However, the codeword lengths of both signaling (coding) methods increase by geometric progression as the number  K   of bits per symbol increases and consequently,  the data transmission rate  R    or the bandwidth  B   will suffer, resulting in the degradation of bandwidth efficiency  R/B.      .

 

5.  Linear Block Coding

Block coding is a mapping of    k -bit message vectors (symbols) into  n -bit ( n ≥ k ) code vectors by using an  (n.k)block code which consists of 2k   codes of length ‘n’.A block code is said to be linear if the modulo-2 sum/difference of any two codewords in the block code is another codeword belonging to the block code. A linear block code is represented by its  kxn   generator matrix  G  , which is modulo-2 premultiplied by a  k  -bit message symbol vector  K to yield an  n -bit codeword  U as   

U. = mG

Accordingly, the encoder and decoder in a linear block coding scheme use matrix multiplications with generator/parity-check matrices instead of using table lookup methods with the whole 2k n codeword matrices. This makes the encoding/decoding processes simple and efficient. Now, we define the minimum (Hamming) distance of a linear block code  U   as the minimum of Hamming distances between two different codewords:

 

d min = minimum Hamming distance

Here, Hamming distance between two different code words  Ui    and Uj     is the number of bits having different values  and  can be found as  the weight of the modulo-2 sum/difference of the two code words: where the Hamming weight  of a codeword   is defined to be the number of non-zero bits.Error-detecting and correcting capabilities  are defined to be the maximum numbers of detectable/correctable bit errors, respectively, and they can be obtained from the minimum distance as follows:

Error-detecting capabilities: d min -1

error correcting capabilities  :

Generator Matrix : and the Corresponding:

A      Kxn    generator matrix  G  representing an   (n,k) block code can be constructed as

 

 

With this generator matrix, the  n  - bit code vector (codeword)     for a   k  -bit message (or source or information) vector   m      is generated as

 

which consists of  the first   n - k   parity bits  and  the last k  message bits   in a systematic structure. Note that if a block code is represented by a generator matrix containing an identity matrix, then the message vector appears (without being altered) in the code vector and the code is said to be systematic. Correspondingly to this generator matrix  G   , the parity-check matrix H     to be used for decoding the received signal vector is defined as

 

Parity-Check Matrix:

For  a received signal vector   r   containing  an error vector  e ,  the decoder at RCVR computes

which is called a syndrome vector for the reason that it has some (distinct) information about the possible symbol error like a pathological syndrome or symptom of disease.

 

EX 1) Generate A Linear Block Code of (6 , 3) with given generator matrix

G is the generator matrix having the length n =6 .  k= 3 and parity bits P = n – k

 In this example p =3 ,

Ik    is the identity matrix, =

Parity vectors are selected from 2n-k set of vectors,  in this example n =6 .  k= 3,

then the set is  26-3 = 23   = 8

Parity vectors are: neglect all 000 and neglect identity , least priority to 111 vector

Now the generator matrix G is given by

 

Codeword U = mG

 

U = mG

 

Message vectors are 2k  = 23 = 8 and codeword are 2n  = 26 = 64, valid code words are 8 only because 8 possible message vectors

 

 

Syndrome Test:  S=rHT    , If the  ‘r’ is received vector 1 1 0 0 1 1 then the syndrome

 S=    [0    0     0] no errors in the received vector

Where ,  H is parity check matrix

 EX 2) Generate A Linear Block Code of (7 , 4) with given generator matrix

 

   Find the code words and the minimum distance of the (7,4) linear block code represented by the 

   generator matrix 

 

Minimum_distance = 3

Because of Minimum_distance = 3

 

Error detection capability = dmin – 1

                                          =  3 – 1 = 2

Error correction capability = = = 1                 

            Therefore (7,4) code is Hamming code that correct single bit error correction capability.

Standard Array:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6. Cyclic coding

            A cyclic code is a linear block code having the property that a cyclic shift (rotation) of any codeword yields another codeword. Due to this additional property, the encoding and decoding processes can be implemented more efficiently using a feedback shift register. An (n,k)        cyclic code can be described by an   (n,k)th-degree generator polynomial

 

 

The procedure of  encoding  a   k  -bit message vector    m=[mo , m1, m2,----mk-1 ]                                     represented  by  a   (k-1) th-degree polynomial

 

 

into an    n  -bit codeword represented by an    (n-1) th-degree polynomial  is  as follows:

 


1. Divide                      by the generator polynomial   g(x)  to get the remainder polynomial                      2. Subtract the remainder polynomial             from                      to obtain a codeword polynomial

 

 

Which has  the generator polynomial    g(x)   as a (multiplying) factor.   Then  the f irst n-k    

coefficients constitute the parity vector and the remaining  k   coefficients make the message vector.Note that all the operations involved in the polynomial multiplication, division, addition, and subtraction are not the ordinary arithmetic ones, but the modulo-2 operations.

 

 

Ex):  A Cyclic Code With a (7,4) cyclic code represented by the generator matrix

 

 

find the codeword for a message vector  m=   [1 0 1 1].

 


Noting that      n=7, k=4, and  n-k = 3, we divide                    by   g(x)   as

 

 


to get the remainder polynomial                    and  add it  to                                        to make the codeword polynomial as

 

The codeword made in this way has the   n-k = 3 parity bits and  k=4   message bits.

Now, let us consider the procedure of decoding a cyclic coded vector. Suppose the RCVR has received a possibly corrupted code vector                    where c   is a codeword and  e   is an error. Just as in the encoder, this received vector, being regarded as a polynomial, is divided by the generator polynomial g(x)

 

 

                             remainder polynomial s(x)  This remainder polynomial s    may not be the same as the error vector e  , but at least it is supposed to have a crucial information about e  and therefore, may well be called the syndrome.

The RCVR will find the error pattern e  corresponding to the syndrome s   , subtract it from the received vector  r  to get hopefully the correct codeword

 

and accept only the last    k   bits (in ascending order) of this corrected codeword as a message.

The polynomial operations involved in encoding/decoding every block of message/coded sequence seem to be an unbearable computational load. However, we fortunately have divider circuits which can perform such a modulo-2 polynomial operation. Fig. illustrates the two divider circuits (consisting of linear feedback shift registers) each of which carries out the modulo-2 polynomial operations for encoding/decoding with the cyclic code given in Example  Note that the encoder/decoder circuits process the data sequences in descending order of polynomial.

 

 

 

 

 

 

 

 

 

 

 

7. convolutional code

In telecommunication, a convolutional code is a type of error-correcting code that generates parity symbols via the sliding application of a boolean polynomial function to a data stream. The sliding application represents the 'convolution' of the encoder over the data, which gives rise to the term 'convolutional coding'. The sliding nature of the convolutional codes facilitates trellis decoding using a time-invariant trellis. Time invariant trellis decoding allows convolutional codes to be maximum-likelihood soft-decision decoded with reasonable complexity.

The ability to perform economical maximum likelihood soft decision decoding is one of the major benefits of convolutional codes. This is in contrast to classic block codes, which are generally represented by a time-variant trellis and therefore are typically hard-decision decoded. Convolutional codes are often characterized by the base code rate and the depth (or memory) of the encoder {\displaystyle [n,k,K]}[n,k,K]. The base code rate is typically given as {\displaystyle n/k}k/n, where {\displaystyle n}k is the input data rate and {\displaystyle k}n is the output symbol rate. The depth is often called the "constraint length" {\displaystyle K}K, where the output is a function of the current input as well as the previous {\displaystyle K-1}K - 1 inputs. The depth may also be given as the number of memory elements {\displaystyle v} in the polynomial or the maximum possible number of states of the encoder.{\displaystyle 2^{v}}

 

.

 


 

 

 

 

 

 

 

 

 

 

 

 

 

Comments

Popular posts from this blog