New coding techniques for minimum complexity FFNN

Let us choose an arbitrary bijective mappingyⁱ sⁱ which maps the sent symbols into the training set targets, called code words and let us arrange the code words into a matrix as follows:

S s¹ s² s^N ^{C N} (4.14)

where the code words are interpreted as column vectors. Similarly to (4.6), let us denote the conditional probability vector of the code words as:

p s x p s¹ x p s² x p s^N x ^T 0 1^N ¹ (4.15) Due to the bijective nature of the mapping, the following is true: yⁱ x sⁱ x . Thus, the expectation of (4.9) can be rewritten as

s x ^N

i 1

sⁱ p sⁱ x ^N

i 1

sⁱ p yⁱ x S p y x (4.16) After training s x will appear at the output ofFFNN. This general coding capability of the FFNN via the training set is depicted inFigure 47. The block diagram of the detector using

Net x w

Figure 47: Equivalence of the FFNN with an encoding

an arbitrary code matrix Sis depicted inFigure 48. Our objective is to develop such codes

y H

Figure 48: Flow graph representation of the detector using an arbitrary encoding

sⁱ i 1 Nwhich under some conditions will reproduce theMAPdecision. This objective must be achieved effectively under the following constraints:

• The length of the code words are smallL C N, because then the number of output neurons are small.

• The decision procedure ˆy g s x is of small complexity and can be easily parallelized.

If we reduce the dimension too aggressively, then we cannot reproduce theMAPdecision.

Coding by interval splitting with parameter 2

With this coding the number of outputs in theFFNNcan be reduced toLinstead ofN 2^L, furthermore, the processing rate is increased accordingly. The objective of the method is to develop a codesⁱ i 1 N, which yields a conditioned expected value vector s x in such

a way that the maximum conditioned probability can be obtained by logarithmic search. In order to obtain the uniqueMAPdecision we have to assume the following properties of the conditional probabilities. Let us define a set of indicesE 1 N E N 2 which splits all the indices into to disjoint halves. We assume that for our chosen index setEthe following holds:

j max

i 1 Np sⁱ x if j E

k E

p s^k x

i 1 N E

p sⁱ x

(4.17)

This means that the maximum can be sought by logarithmic search if we investigateLpossible disjoint half index set pairings by summing them up and comparing them against each other pair wise. For example this interval halving can be obtained by first summing the first half of p y x with 1 and the other half with 1 weights. Then we split the previously obtained halves and sum the first and second half with 1 and 1 weights, respectively. Furthermore, we repeat this interval halving mechanism until the splitting zooms down to an individual component, thus locating the index of the symbol with maximum probability.

The code matrix can be expressed in a closed form as follows:

Si j s_i^j sgn sin 2 2ⁱ ¹ j

N 1 i 1 L j 1 N (4.18)

Based on this coding scheme the decision function reduces to the traditional sgn function, since for each stage we only need to decide if the sum of the components encoded with 1 or the sum of the components encoded with 1 is bigger in absolute value. Thus the sign of that particular stage will indicate in which half is the component having the maximum probability. In this way L C log₂ N Nas a result, this coding reduces the number of the output of theFFNNtoL and also increases the processing rate. So instead of a linear search the decision algorithm is

ˆy sgn s x (4.19)

The only condition that must be fulfilled in order for this encoding scheme to perform as well as the optimalMAPdecision is (4.17) and the numerical simulation will demonstrate that this condition is satisfied with a relatively high probability.

Examples for coding by interval splitting with parameter 2 To give an example we present this encoding with parametersL C 3,N 2³ 8:

S s¹ s² s^N

-1 -1 -1 -1 +1 +1 +1 +1 -1 -1 +1 +1 -1 -1 +1 +1 -1 +1 -1 +1 -1 +1 -1 +1

(4.20)

The first row represents the step where we sum up the two halves with 1 weights, the second row represents the step when we sum up interval fourths with appropriate weights and the last

row represents the last step in this example where we sum up individual interval eights with the appropriate 1 weights. For example in (4.21) the case is detailed if we assume thatp y⁴ x is the component with the maximum probability. In this case the first row lets us know that the maximum component has an index from 1 2 3 4 , the second row tells that the maximum component has an index from 3 4 7 8 , the third row tells that it has an index from 2 4 6 8 . Combining these yields to index 4.

ˆy sgn s x

-1 +1 +1

(4.21a)

S p y x

-1 -1 -1-1 +1 +1 +1 +1 -1 -1 +1+1 -1 -1 +1 +1 -1 +1 -1+1 -1 +1 -1 +1

p y¹ x p y⁴ x p y⁸ x

(4.21b)

Handling the error if the assumption does not hold. However there are receivedx points for which assumption (4.17) does not hold. In this case our detection mechanism yields to an erroneous symbol (in a sense that it is not the most likely). To show this phenomenon here is an example where L 2. InFigure 49 one can see the four received symbols if no noise is present:Hyⁱ i 1 4. These are marked with small squares. Around them contour curves of the additive noise are depicted. There are four distinct areas where assumption (4.17) does not hold. These are marked with shaded areas colored to blueish red, blueish black, greenish red and greenish black. All areas are colored to the main color for which the decision is made and to the secondary color for which the decision should have been made. E.g. a received symbol in the blueish red area will be decoded as a “blue" sent symbol instead of a “red". For the sake of the example I have chosen a received symbolx 0 435 0 685^T in the blueish red area. In the box at the right side of the symbol we denoted the corresponding conditional probabilities:

p y x p y¹ x p y² x p y³ x p y⁴ x ^T 0 4406 0 0002 0 3364 0 2229^T

One can see that (4.17) does not hold, since p y¹ x p y² x p y³ x p y⁴ x , but y¹ has the maximum probability. In this case

s x S p y x -1 -1 +1 +1

-1 +1 -1 +1 p y x 0.1185 -0.5539

Thus our detection algorithm, sgn s x will choose ˆy 1 1^T instead ofˆyopt y¹ 1 1^T. Note that by introducing assumption (4.17), we also introduce an inherent error compared to the theoretical optimum even if our network perfectly learns the expectation.

Figure 49: Coding error of the interval halving method

Similarly, to the previous transformation it is possible to generalize the proposed coding mechanism by using higher order logarithmic search. This can be obtained by splitting the index set into smaller partitions (e.g. thirds, fourths, etc.). The objective of this generalized method is to give a scalable balance between the computational complexity and the achievable theoretical performance. This generalized version of the coding, enabling logarithmic search, strikes a good compromise betweendim s and performance. On the one hand if we utilize a higher order logarithmic search then constraint (4.17) can be weakened, thus the performance will fall closer to theMAPdecision, but at the same time it increases the complexity of theFFNN.

THESIS III.1(blind detection by interval halving andFFNN). I have defined anFFNNbased blind detector for theMUDproblem, which lends itself to easy parallelization and can perform optimally under the constraint defined in (4.17). In(4.18), I give the linear encoding based on interval halving which is used to generate a training set for anFFNNand in(4.19)I give the low complexity decision function which is to be employed on the output of the net. I have shown that the detector performs near optimally on the investigatedMUDscenarios described in subsection 4.4.

The thesis is restated in a self consistent way in AppendixAatThesis III.1(page 88).

In document A dissertation submitted for the degree of Doctor of Philosophy (Ph.D.) (Pldal 82-86)