Cellular Neural/Non-linear Networks - Emulált digitális CNN-UM architektúra megvalósítása újrak

1.1.1 CNN theory

Cellular Neural/Non-linear Network [1] contains identical analog processing elements called cells. These cells are arranged on a 2 or k-dimensional square grid. Each cell is connected to its local neighborhood via programmable weights which are called the cloning template. The CNN cell array is programmable by changing the cloning template. The local neighborhood of the cell is defined by the following equation:

Sr(ij) ={C(kl) : max{|k−i|,|l−j|} ≤r} (1.1) In the simplest case the sphere of influence is 1 thus the cell is connected to only its nearest neighbors as show in Figure 1.1. Input, output and state variables of the CNN cell array are continuous functions in time. The CNN cell dynamics can be implemented by the electronic circuit shown in Figure 1.2. The state equation of a cell can be described by the following ordinary differential equation:

Cxv˙xij(t) =− 1 Rx

vxij(t) + X

kl∈Sr(ij)

Aij,kl·vykl(t) + X

kl∈Sr(ij)

Bij,kl·vukl(t) +zij (1.2)

Figure 1.1: A two-dimensional CNN defined on a square grid with nearest neighbor connections

I Eij

v_uij v_xij

Cx Rx I (ij,kl)_xu I (ij,kl)_xy I_yx Ry

v_yij

Figure 1.2: Electronic circuit model of one CNN cell

wherevxij is the state,vyij is the output andvuij is the input voltage of the cell. Aij is the feedback andBij is the feed-forward template. The state of the cells is connected to the output via a nonlinear element which is shown in Figure 1.3 and described by the following function:

yij =f(xij) = |x+ 1|+|x−1|

2 =







1 xij(t)>1 xij(t) −1≤xij(t)≤1

−1 x_ij(t)<−1

(1.3)

V_xij f(V_xij)

-1 -1

Figure 1.3: The output nonlinearity: unity gain sigmoid function

In most cases the Cx and Rx values are assumed to be 1 which makes it possible to simplify the state equation as follows:

xij(t) =−xij(t) + X

kl∈Sr(ij)

Aij,kl·ykl(t) + X

kl∈Sr(ij)

Bij,kl·ukl(t) +zij (1.4) where x, y, u and z are the state, output, input and the cell bias value of the corre-sponding CNN cell respectively. Template matrices Aij and Bij are space invariants if its values do not depend on the (i, j) position of the cell otherwise it is called a space variant.

In order to fully specify the dynamics of a CNN cell array the boundary conditions have to be defined. In the simplest case the edge cells are connected to a constant value: this called Dirichlet or fixed boundary condition. If the cell values are du-plicated at the edges, the system does not lose energy: this is called Neumann or zero-flux boundary condition. In case of circular boundary conditions the edge cells see the values at the opposite sides thus cell array can be placed on a torus.

By stacking several CNN arrays on each other and connecting them a multi-layer CNN structure can be defined. The state equation of one layer can be described by the following equation:

xm,ij(t) =−xm,ij(t) +

n=1



 X

kl∈Sr(ij)

Amn,ij,kl·yn,kl(t) +

+ X

kl∈Sr(ij)

Bmn,ij,kl·un,kl(t)



+zm,ij (1.5) wherepis the number of layers, mis the actual layer andAmnand Bmn are templates which connect the output of the n^th layer to the m^th layer.

1.1.2 The CNN Universal Machine

VLSI implementation of the previously described CNN array has very high computing power but algorithmic programmability is required to improve its usability. The CNN Universal Machine (CNN-UM) [5] is a stored program analogic computer based on the CNN array. To ensure programmability, a global programming unit was added to the array. This new architecture is able to combine analog array operations with local logic efficiently. The base CNN cells are extended by adding local analog and logic memories to ensure an efficient reuse of intermediate results. Additionally the cell elements can be equipped with local sensors for faster input acquisition and additional circuitry makes cell-wise analog and logical operations possible.

According to the Turing-Church thesis the Turing Machine, the grammar and the µ-recursive functions are equivalent. The CNN-UM is universal in Turing sense because every µ-recursive function can be computed on this architecture.

1.1.3 CNN-UM implementations

Since the introduction of the CNN Universal Machine in 1993 several CNN-UM imple-mentations have been developed. These impleimple-mentations are ranged from the simple software simulators to the analog VLSI solutions. Properties and performance of the recent CNN-UM architectures are summarized in Table 1.1.

The simplest and most flexible implementation of the CNN-UM architecture is the software simulation. Every feature of the CNN array can be configured e.g. the template size, the number of layers, space variant and nonlinear templates can be

Table 1.1: Comparison of the recent CNN-UM implementations

Pentium

frequency 2GHz 1.5GHz 3.2GHz 1.2GHz 200MHz 250MHz

Feature

size 0.13µm 0.13µm 0.13µm 0.12µm 0.22µm 0.15µm

Chip area 1.27cm² 3.74cm² N/A 1.1cm² 1.2cm² 3.5cm²

Number of physical

processing element 1 1 1 1 7 (12bits) 48 (18bits)

Cascadability no no no no yes yes

Dissipation 50W 130W 100W 1W 3W 15W

3*3 convolution 140ms 110ms 87ms 16.384ms 35ms 4.09ms

Erosion / Dilation 270ms 220ms 170ms 32.76ms 70ms 8.19ms

Laplace

(15 iterations) 2000ms 1560ms 1250ms 245.7ms 175ms 61.44ms

Accuracy control no no no no yes yes

CASTLE

frequency 200MHz 1/10MHz 32MHz 700MHz 100MHz

Feature

size 0.35µm 0.5µm 0.35µm 0.18µm 0.18µm

Chip area 0.68cm² 1cm² 1.45cm² 6.9468m² 0.25cm²

Number of physical

processing element 3*2 4096 16384 65536 3072

Cascadability yes yes yes no no

Dissipation <0.8W 1.3W <4W 491.52kW <0.5W

3*3 convolution 2.67ms (12 bit)

1.34ms (6 bit) 10.6ms 1.749ms 3.18ms 5-25ms

Erosion / Dilation 5.34ms (12 bit)

2.67ms (6 bit) 10.6ms 1.749ms 3.18ms 0.1ms

Laplace (15 iterations)

39.6ms (12 bit)

19.8ms (6 bit) 11.5ms 1.8975ms 3.45ms 60ms

Accuracy control yes no no no no

used etc. But flexibility is traded for performance because the software simulation is very slow even if it is accelerated by using processor specific instructions or Digital Signal Processors.

The performance can be improved by using emulated digital CNN-UM architec-tures [6] where small specialized processor cores are implemented on reconfigurable devices or by using custom VLSI technology. These architectures are 1-2 orders faster than the software simulation but slower than the analog CNN-UM implementations.

The most powerful CNN-UM implementations are the analog VLSI CNN-UM chips [7]. The recent arrays contain 128×128 elements but its accuracy is limited to 7 or 8 bits. Additionally these devices are very sensitive to temperature changes and other types of noises.

In document Emulált digitális CNN-UM architektúra megvalósítása újrakonfigurálható áramkörökön és alkalmazásai (Pldal 11-16)