Separate CMOS imager
Figure 21. A zoomable fovea is navigating in a high resolution CMOS imager, following the region of interest
Two virtual processor array architectures will be shortly discussed in this dissertation.
The first is the Bi-i architecture , which we introduced jointly with my collage, Dr. Csaba Rekeczky. The Bi-i camera was the first professional camera, which contained a CNN chip as a fovea sensor and co-processor, which enabled it making more than 10,000 visual decisions in a second real-time, which is still unique. Thanks to its high performance, the Bi-i won the product of the year award in the Vision Fair in Stuttgart, Germany in 2003.
The Bi-i architecture is described in Section 3.4.
The second architecture, described in Section 3.5, targets a single chip vision system, which combines a fine-grain sensor-processor array as a front-end processor, and a virtual processor array for performing multi-fovea back-end processing. The architecture of the chip is co-designed by two of my collages, Dr Péter Földesy (MTA-SZTAKI), Dr Csaba Rekeczky (Eutecus Inc) and me. Since we need to implement multi-layer, multi-resolution, multi-scale sensor-processor arrays, this ongoing project uses an experimental 3D silicon integration technology.
3.2 Mixed-signal virtual processor array architecture for analog video signal processing
Analog video signal is constructed of a stream of consecutive image lines as they come out from the video sensor device. The mixed-signal virtual processor array was designed to capture and process these signals on-the-fly without digitalization. In general, the architecture can process n incoming video signals, and delivers m outgoing video signals. This makes possible to process RGB image flows or fuse multi-band images coming from different synchronized visual/IR/UV image sensors.
Since the device cannot store entire video frames inside the processor array, it forms long and narrow segments of the images internally by capturing a few consecutive image lines as we have seen in Figure 19. These image segments are then processed by an elongated
processor array. Naturally, these segments should be properly overlapped to avoid artifacts at the boundaries (Figure 22).
high-pass filter band-pass filter low-pass filter
overlap A overlap A overlap A
overlap B overlap B overlap B
Figure 22. Example for the improper and the proper boundary condition handlings. Multi-scale analysis was calculated on horizontal image stripes with two different overlaps. As it can be seen, the number of the overlapping lines was too low with overlap A for band-pass and low-pass, while it was satisfactory with overlap B for all three cases.
To avoid artifacts caused by the boundary problems, the proposed elongated processor array is divided to three major areas. Two of these areas (the upper and the lower) are dedicated to handle the boundary problem, while the third (middle) area – the main area – calculates the outgoing video lines (Figure 23). The number of the rows in the upper and the lower overlapping areas may be different, because the sphere of influence of the operators might be asymmetric. Moreover, in some cases, the results of the calculations on the preceding segment can be used as upper boundary conditions. Therefore, in these cases it is enough to implement one row of the upper overlapping area to contain the pre-calculated boundary conditions.
height Main processor area (h rows)
(Only these rows are read out from the processor array.) Upper overlapping area (ou rows)
Lower overlapping area (ol rows) row length (N: processor array length)
Figure 23. Topology of the elongated video processor 3.2.1 Timing details
First, we introduce the architecture with one incoming, and one outgoing video signals (n=m=1). To fulfill real-time concurrent I/O and processing requirements, two extra memory banks, and two line buffers are needed (Figure 24). The incoming and the outgoing video line buffers are two-port analog memories. They can handle both serial and parallel access. On the serial port of the incoming (outgoing) line buffer, it can capture (release) analog video signal. The incoming (outgoing) video line buffer can be read out (filled in) through its parallel port, which is connected to a column-wise parallel bus. This column-wise data bus is responsible for the data communication among the blocks of the system (Figure 24). Its width is N, and it can transfer an entire video line in one cycle.
Incoming video line buffers (size: n * N×1 type: analog dual port Input video bank (size: n * N× ( h+ ou +ol ), type: analog memory)
Output video bank (size: m * N×h, type: analog memory)
Outgoing video line buffers (size: m * N×1 type: analog dual port Incoming
video signals (n)
Outgoing video signals
bus Topographic physical processor array (size: N × ( h+ ou +ol ),
type: fine-grain, mixed signal)
Figure 24. Architecture of the physical processor array
The incoming video buffer having collected an entire video line, sends it to the input memory bank. This is done during the row blanking time, when the video signal does not contain pixel data. The input video bank contains the last (h+ ou +ol) video lines of the incoming frame. Its full content is transferred to the physical processor array after a new h line segment arrived. This means that the line transfer period and the processing time is equal to:
tp: processing and transfer time;
tl: line duration (~64 μs in PAL or NTSC, and includes ~14μs row blanking time);
h: number of rows in the main processor area.
The result of the calculation, the h rows from the main processor area, is transferred to the output video bank at the end of the tp period, from where the lines are sent to the outgoing line buffer one after the other.
For multiple incoming and outgoing video signal support, the number of the incoming and outgoing line buffers and the input and output video banks are multiplied by n and m, respectively (Figure 24).
3.2.2 Processor options
The mixed-signal processor cells can be either continuous time CNN like devices , or discrete time fine-grain mixed-signal ,  ones. These processor arrays can execute an operation in the 10-50 microsecond range. Assuming 15 operations to execute with 9 overlap on an asymmetric way with stored boundary conditions in the upper overlapping area, a 20x640 physical processor array device (h=10, tp= 640μs) can perform video (speed)