Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
An Object Detection/Recognition System using a 3-DimensionalIntegration with Local and Global Wireless Interconnections
Hiroshi Ando, Seiji Kameda, Nobuo Sasaki, Daisuke Arizono, Kentaro Kimoto†, NorimitsuFuchigami, Kouta Kaya, Mamoru Sasaki, Takamaro Kikkawa† and Atsushi Iwata
Graduate School of Advanced Sciences of Matter, Hiroshima University†Research Center for Nanodevices and Systems, Hiroshima University
Phone: +81-824-22-7358, E-mail: [email protected]
1. Introduction
In order to realize hyper brain system which can recog-nize various objects in real-time/real-world, numbers ofchips with massively parallel processing and wideband in-terconnection capabilities are needed. To assemble thesemulti chips with low-power and Gbps bandwidth inter-connection, new integration techniques which replace theconventional System in Package techniques are required.
To solve the problem, we have proposed the 3-dimensio-nal custom stack system (3DCSS) using two kinds ofwireless interconnections: inductive coupling local wire-less interconnect (LWI) and antenna coupling globalwireless interconnect (GWI) [1]. In the system LWI isused to transmit/receive 2D image data between neigh-boring chips in parallel, and GWI is used to trans-mit/receive global system clocks and serial data such ascontrol signals or database between all stacked chips. Inthe asynchronous LWI scheme without any clocking, thehigh bit rate of 1Gbps/ch and low power dissipation of0.95mW/ch has been achieved by a 0.18µm CMOS tech-nology [2]. The generation of ultra short Gaussian mono-cycle pulse which is the fundamental element for imple-menting GWI has been also demonstrated in the sametechnology [3].
To implement the multi-object recognition system, theprocessing algorithm and system/chip architecture whichare suitable to the 3-dimensional integration techniquehave to be developed. Although many kinds of algo-rithm have been reported in a field of human face recogni-tion [4], the most of these were developed aiming at soft-ware realization and did not apply to LSI implementa-tion, because of complex large-scale calculation and hugememory capacity.
In this research, we have developed the architectureadopting “Eigenfaces” method based on PCA (PrincipalComponent Analysis) which is one of the well-known facedetection/recognition [5]. By combining the Eigenfacesmethod with 3DCSS, we have proposed architecture ofthe multi-object recognition system [6]. We have alsoimplemented the prototype system developing two typesof chips with a 0.18µm CMOS technology. The chip de-sign utilizing the advantages of LWI and GWI has beenalso described.
2. Object detection/recognition algorithm
The Eigenfaces method should be suitable for objectrecognition hardware architecture because of several ad-vantages in both of recognition performance and hard-ware implementation. This method has the equivalentor higher and robust recognition performance comparingwith other recognition algorithms [7]. The various kindsof object can be detected and recognized by only prepar-ing each individual database of them without changingprocessing. In hardware implementation, we can imple-
ment it with massively parallel circuit architecture andconventional digital circuit techniques without nonlinearprocessing, and design a chip which is commonly appliedto detection and recognition without increasing in circuitarea.
We explain a fundamental of the Eigenfaces algorithmbriefly. An i-th face image consists of M pixels is repre-sented as a row vector Γi. A preprocessed face Φi is de-fined by Φi = Γi−Ψ, where Ψ represents the average faceof N images in DB(database), that is Ψ = 1N
∑Nn=1 Γn.
The “eigenfaces” can be calculated as the eigenvectorsak (in an ascending order, k=1, 2, · · ·, m, m ≪ M) ofthe covariance matrix C of DB, C = 1M
∑Mn=1 ΦnΦ
tn,
where Φt is a transposed matrix. A face image is trans-formed into so-called “eigen-space” ωk by a simple op-eration: ωk = atkΦi. The eigen-space ωk forms a vectorΩ = [ω1 ω2 . . . ωm] that describes the contribution ofeach eigenface for face image.
Face detection is performed by generally used thresh-olding methods. A reconstructed image Φr, defined byΦr =
∑mk=1 ωkak, is used as an input of evaluation func-
tion for thresholding. For example, Euclidean distanceε = ∥Φin − Φr∥ is often used as evaluation function,where Φin is a preprocessed unknown input image. Ifthe value ε is lower than a threshold, an unknown in-put image is classified as a human face. Face recogni-tion is also achieved with the same calculations exceptfor evaluation function. If the face space vector ΩDBiof i-th face image in DB leads to the minimum distanceεmin = ∥Ω−ΩDBi∥, we can know that the input face isthe same as i-th face.
3. Hardware implementation
A schematic of the proposed multi-object recogni-tion system is shown in Fig. 1. This system consistsof three kinds of chips, that is Visual Processing chip(VP3D) [8], Detection/Recognition chip (DR3D) andReference Memory chip (RM3D). Each chip has 21×2chLWIs which can transmit to and receive data from neigh-boring chips simultaneously and 2ch GWIs for clock andbinary digital data receiving. The RM3D has 2ch GWIsand transmitter circuits for clock and data.
Now we explain the proposed methods of detecting andrecognizing by this system. At first, original image data isstored in RM3D1 and transmitted to neighboring VP3Din 21-pixel parallel PWM (pulse width modulation) sig-nals (LWI-1). The transmission rate is about 160Mbpswhen the maximum bit width and time resolution ofPWM signal is 8bit and 4ns (250MHz clock distributionby GWI-1), respectively. Second, massively parallel im-age pre-processing is implemented by several VP3Ds andresulted image data is transmitted to RM3D2 with LWI-2 as same as LWI-1. Finally, the DR3D receives pro-
cessed image data and object database through LWI-3(5.3Gbps=21bit/4ns), or after storing other database toRM3D2 from RM3DN by GWI-2 (250Mbps), and detectsand recognizes objects.
This system has the ability of 40GOPS (Giga Oper-ation Per Second) at 250MHz operation. Therefore, weexpect to derive 160GOPS performance at the maximumLWI operation (1Gbps/1ch at present).
RM3D1(image/database)
LWI-1 :160Mbps(input image data)
VP3D(visual processing)
LWI-2 :160Mbps
1 :2
50M
bp
s(C
LK)
2 :2
50M
bp
s(d
atab
ase)
LWI-2 :160Mbps(preprocessed image data)
LWI-3 :5.3Gbps(preprocessed image data,object data and database)
DR3D(object detection/
recognition)
RM3DN(database)
GW
I-1
:250
Mb
ps
GW
I-2
:250
Mb
ps
RM3D2(preprocessed
image/database)
Figure 1: Multi-object recognition system with 3DCSS.
3.1 Reference memory chip - RM3D
Figure 2 (a) shows a block diagram of the proposedRM3D storing reference data of both of VP3D andDR3D. The capacity of SRAM is 56kbits for image dataΓ, 196kbits and 123kbits for database Ψ and a. In com-municating with VP3D, the binary digital image datais modulated to PWM signal by DPC (Digital-to-PWMConverter) and transmitted to VP3D in 21-pixel par-allel with LWI. The visual processed data is receivedand stored after demodulation by PDC (PWM-to-DigitalConverter). The 21bit digital bus data for one pixel (8bitΓ, 8bit Ψ and 5bit a) is transmitted to DR3D in pixelserial with LWI. The clock signal generated by VCO(Voltage Controlled Oscillator) and binary data storedin memory are transmitted to all of stacked chips. Ifwe need huge memory capacity for database, we shouldonly stack several RM3Ds because of wireless widebandcommunications by GWI.
3.2 Detection/recognition chip - DR3D
A block diagram of the proposed DR3D which enablesto implement the object detection/recognition algorithmmentioned in Sec. 2 is shown in Fig. 2 (b). The DR3Dcan achieve the two operation modes of object detectionand recognition in common circuits and 32-pixel paralleloperation by utilizing the advantages of Eigenfaces algo-rithm.
At first, 21bit bus data are received by LWI and storedto each 32×32 shift register, where pixel size of object
is 32×32, and converted to 32-pixel parallel data byshift-register. Second, in reconstructed image generator,Φi = Γi − Ψ is calculated by subtracter, ωk = atkΦi iscalculated by multiplier and we obtain Φr =
∑mk=1 ωkak
by accumulator in 32-pixel parallel. Finally, Manhattandistance εi = ∥Φi − Φr∥ is calculated with subtracterand compared in Winner-take-all circuits, detection orrecognition process is finished.
Thus, the proposed multi-object recognition systemcould be implemented by making the most of LWI’s andGWI’s advantages that the Gbps multi channel commu-nications enable to execute parallel processing and long-line wireless communications make it possible to stackseveral memory chips.
3.3 Fabrication and integration
Test chips of RM3D and DR3D fabricated in a 0.18µmCMOS technology are shown in Fig. 3. The chip sizewas 5×5mm2, and the supply voltage and operation fre-quency were 1.8V and 250MHz, respectively. The detec-tion time was 580µs and the one-object to one-databaserecognition time was 4.2µs at 84×84 image and 32×32object size. The 20.6ms detection time and 12.7ms recog-nition time (30fps) should be achieved if we estimate thesystem ability at QVGA image which includes about 30objects and 100 database objects.
The custom flexible printed circuit (FPC) shown inFig. 4 was developed for testing each chip. Note thatthe most of area around a chip is needless because thisFPC was used for preliminary measurements. We con-firmed basic operation such as memory read/write, con-trol signal generation. Now prototype 3DCSS is underdevelopment by stacking the measured chip.
4. Conclusion
The multi-object recognition system architecture wasdeveloped by utilizing the recognition algorithm basedon Eigenfaces method and the 3-D integration scheme(3DCSS) with two types of wireless interconnections ofLWI and GWI. The prototype system was designed with3 types of chips for object detection/recognition, refer-ence data storage and image pre-processing. Processingperformance of 40GOPS at 250MHz was obtained by thechips with a 0.18µm CMOS technology. Object detectionand recognition system performance of 580µs detectiontime and 4.2µs one-object to one-database recognitiontime was obtained.
References[1] A. Iwata, et al., “A 3D-integration scheme utilizing wire-
less interconnections for implementing hyper brains”, Di-gest of ISSCC2005, pp. 262-263, Feb 6-10, 2005.
[2] M. Sasaki, et al., “A 0.95mW/1.0Gbps spiral-inductorbased wireless chip-interconnect with asynchronous com-munication scheme”, Digest of Sympo. on VLSI Circuits,June 2005.
[3] N. Sasaki, et al., “A single-chip Gaussian monocyclepulse transmitter using 0.18µm CMOS technology for in-tra/interchip UWB communication”, Digest of Sympo. onVLSI Circuits, 2006.
[4] W. Zhao, et al., “Face Recognition: A Literature Survey”,ACM CSUR archive, vol. 35, pp. 399-458, 2003.
[5] M. A. Turk, et al., “Eigenfaces for Recognition”,CVPR’91, pp. 586–591, 1991.
[6] H. Ando, et al., “A prototype software system for multi-object recognition and its fpga implementation”, Proc.Third Hiroshima International Workshop on NTIP, pages89–90, 2004.
[7] P. J. Phillips, et al., “The FERET evaluation methodol-ogy for face-recognition algorithms”, IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 22, pp. 1090–1104,2000.
[8] S. Kameda, et al., “A Brain-type vision system using a3-Dimensional integration with local and global wirelessinterconnections”,Proc. Fourth Hiroshima InternationalWorkshop on NTIP, pages 38–39, 2005.
56kbits SRAM
CTRL
LW
I
x21
8
192kbits SRAM
8 5
128kbitsSRAM
VC
OR
x
DPCPDC
ΓΓ ΓΓ i
ΨΨ ΨΨ
a k
Selector
aver
age
vect
or
eige
n ve
ctor
imag
e
4x84
SR
AM
4x84
SR
AM
4x84
SR
AM
Tx
Rx
Tx
GW
IMemory
data
base
CLK
x21
(a) RM3D
spiral inductor
dipo
le a
nten
na
Win
ner-
take
-all
ΓΓΓΓi ΨΨΨΨ ak
ΦΦΦΦi εεεεi
25kb
its b
uffe
r m
emor
y
Rx
CLK
x21
8 8 5
x32 x32 x32ΓΓΓΓi ΨΨΨΨ ak
subtracter
multiplier
accumuratorωωωωk ΦΦΦΦr
ΦΦΦΦi
reconstructed image generator
subt
ract
er
Manhattandistance
32x3
2 sh
iftre
gist
er(im
age)
32x3
2 sh
iftre
gist
er (
ave)
32x3
2 sh
iftre
gist
er (
eige
n)
Object detection/recognition
(b) DR3D
Figure 2: Block diagrams of RM3D and DR3D.
(a)
LWI
56kbits SRAMfor image data
123kbits SRAMfor database
196kbits SRAMfor database
Tx/
Rx
An
ten
na
for
dat
abas
e
Data Tx
Data Rx
CLK RxCLK Tx
Tx/Rx Antenna for CLK
LWI
Rx
An
ten
na
for
dat
abas
e
Data Rx
Rx
An
ten
na
for
dat
abas
eRx Antenna for CLK
CLK Rx
Object detection/recognition
(b)
Figure 3: Microphotographs of RM3D (a) and DR3D (b).
3DCSS chip(about 50µµµµm thickness)
pattern forconnecting board
Figure 4: FPC board.