1.1.1 CNN theory
Cellular Neural/Non-linear Network [1] contains identical analog processing elements called cells. These cells are arranged on a 2 or k-dimensional square grid. Each cell is connected to its local neighborhood via programmable weights which are called the cloning template. The CNN cell array is programmable by changing the cloning template. The local neighborhood of the cell is defined by the following equation:
Sr(ij) ={C(kl) : max{|k−i|,|l−j|} ≤r} (1.1) In the simplest case the sphere of influence is 1 thus the cell is connected to only its nearest neighbors as show in Figure 1.1. Input, output and state variables of the CNN cell array are continuous functions in time. The CNN cell dynamics can be implemented by the electronic circuit shown in Figure 1.2. The state equation of a cell can be described by the following ordinary differential equation:
Cxv˙xij(t) =− 1 Rx
vxij(t) + X
kl∈Sr(ij)
Aij,kl·vykl(t) + X
kl∈Sr(ij)
Bij,kl·vukl(t) +zij (1.2)
Figure 1.1: A two-dimensional CNN defined on a square grid with nearest neighbor connections
I Eij
vuij vxij
Cx Rx I (ij,kl)xu I (ij,kl)xy Iyx Ry
vyij
Figure 1.2: Electronic circuit model of one CNN cell
wherevxij is the state,vyij is the output andvuij is the input voltage of the cell. Aij is the feedback andBij is the feed-forward template. The state of the cells is connected to the output via a nonlinear element which is shown in Figure 1.3 and described by the following function:
yij =f(xij) = |x+ 1|+|x−1|
2 =
1 xij(t)>1 xij(t) −1≤xij(t)≤1
−1 xij(t)<−1
(1.3)
Vxij f(Vxij)
1
1
-1 -1
Figure 1.3: The output nonlinearity: unity gain sigmoid function
In most cases the Cx and Rx values are assumed to be 1 which makes it possible to simplify the state equation as follows:
˙
xij(t) =−xij(t) + X
kl∈Sr(ij)
Aij,kl·ykl(t) + X
kl∈Sr(ij)
Bij,kl·ukl(t) +zij (1.4) where x, y, u and z are the state, output, input and the cell bias value of the corre-sponding CNN cell respectively. Template matrices Aij and Bij are space invariants if its values do not depend on the (i, j) position of the cell otherwise it is called a space variant.
In order to fully specify the dynamics of a CNN cell array the boundary conditions have to be defined. In the simplest case the edge cells are connected to a constant value: this called Dirichlet or fixed boundary condition. If the cell values are du-plicated at the edges, the system does not lose energy: this is called Neumann or zero-flux boundary condition. In case of circular boundary conditions the edge cells see the values at the opposite sides thus cell array can be placed on a torus.
By stacking several CNN arrays on each other and connecting them a multi-layer CNN structure can be defined. The state equation of one layer can be described by the following equation:
˙
xm,ij(t) =−xm,ij(t) +
p
X
n=1
X
kl∈Sr(ij)
Amn,ij,kl·yn,kl(t) +
+ X
kl∈Sr(ij)
Bmn,ij,kl·un,kl(t)
+zm,ij (1.5) wherepis the number of layers, mis the actual layer andAmnand Bmn are templates which connect the output of the nth layer to the mth layer.
1.1.2 The CNN Universal Machine
VLSI implementation of the previously described CNN array has very high computing power but algorithmic programmability is required to improve its usability. The CNN Universal Machine (CNN-UM) [5] is a stored program analogic computer based on the CNN array. To ensure programmability, a global programming unit was added to the array. This new architecture is able to combine analog array operations with local logic efficiently. The base CNN cells are extended by adding local analog and logic memories to ensure an efficient reuse of intermediate results. Additionally the cell elements can be equipped with local sensors for faster input acquisition and additional circuitry makes cell-wise analog and logical operations possible.
According to the Turing-Church thesis the Turing Machine, the grammar and the µ-recursive functions are equivalent. The CNN-UM is universal in Turing sense because every µ-recursive function can be computed on this architecture.
1.1.3 CNN-UM implementations
Since the introduction of the CNN Universal Machine in 1993 several CNN-UM imple-mentations have been developed. These impleimple-mentations are ranged from the simple software simulators to the analog VLSI solutions. Properties and performance of the recent CNN-UM architectures are summarized in Table 1.1.
The simplest and most flexible implementation of the CNN-UM architecture is the software simulation. Every feature of the CNN array can be configured e.g. the template size, the number of layers, space variant and nonlinear templates can be
Table 1.1: Comparison of the recent CNN-UM implementations
Pentium
frequency 2GHz 1.5GHz 3.2GHz 1.2GHz 200MHz 250MHz
Feature
size 0.13µm 0.13µm 0.13µm 0.12µm 0.22µm 0.15µm
Chip area 1.27cm2 3.74cm2 N/A 1.1cm2 1.2cm2 3.5cm2
Number of physical
processing element 1 1 1 1 7 (12bits) 48 (18bits)
Cascadability no no no no yes yes
Dissipation 50W 130W 100W 1W 3W 15W
3*3 convolution 140ms 110ms 87ms 16.384ms 35ms 4.09ms
Erosion / Dilation 270ms 220ms 170ms 32.76ms 70ms 8.19ms
Laplace
(15 iterations) 2000ms 1560ms 1250ms 245.7ms 175ms 61.44ms
Accuracy control no no no no yes yes
CASTLE
frequency 200MHz 1/10MHz 32MHz 700MHz 100MHz
Feature
size 0.35µm 0.5µm 0.35µm 0.18µm 0.18µm
Chip area 0.68cm2 1cm2 1.45cm2 6.9468m2 0.25cm2
Number of physical
processing element 3*2 4096 16384 65536 3072
Cascadability yes yes yes no no
Dissipation <0.8W 1.3W <4W 491.52kW <0.5W
3*3 convolution 2.67ms (12 bit)
1.34ms (6 bit) 10.6ms 1.749ms 3.18ms 5-25ms
Erosion / Dilation 5.34ms (12 bit)
2.67ms (6 bit) 10.6ms 1.749ms 3.18ms 0.1ms
Laplace (15 iterations)
39.6ms (12 bit)
19.8ms (6 bit) 11.5ms 1.8975ms 3.45ms 60ms
Accuracy control yes no no no no
used etc. But flexibility is traded for performance because the software simulation is very slow even if it is accelerated by using processor specific instructions or Digital Signal Processors.
The performance can be improved by using emulated digital CNN-UM architec-tures [6] where small specialized processor cores are implemented on reconfigurable devices or by using custom VLSI technology. These architectures are 1-2 orders faster than the software simulation but slower than the analog CNN-UM implementations.
The most powerful CNN-UM implementations are the analog VLSI CNN-UM chips [7]. The recent arrays contain 128×128 elements but its accuracy is limited to 7 or 8 bits. Additionally these devices are very sensitive to temperature changes and other types of noises.