Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
262 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE.
ISSCC 2005 / SESSION 14 / TD: LOW-POWER WIRELESS AND ADVANCED INTEGRATION / 14.4
14.4 A 3D Integration Scheme utilizing WirelessInterconnections for Implementing Hyper Brains
Atsushi Iwata, Mamoru Sasaki, Takamaro Kikkawa, Seiji Kameda,Hiroshi Ando, Kentaro Kimoto, Daisuke Arizono, Hideo Sunami
Hiroshima University, Higashi-Hiroshima, Japan
In order to break Moore’s law by 3D integration, innovative solu-tions for inter- and intra-chip interconnections have to be devel-oped. Inter-chip interconnection technologies using via-holeshave been developed however their associated fabrication costand yield are still unacceptable. To overcome these problems,wireless interconnects using capacitive coupling of small pads [1]or inductive coupling of on-chip spiral inductors [2] are proposed.With the former technique, since a pair of pads formed on differ-ent chips must couple with an insulation layer with the thicknessof 1 to 2µm, the problems are not satisfactorily solved. The lattertechnique consumes large power of more than 10mW for a singleinterconnect. Therefore, expected requirements of over 1000 con-nections between chips cannot be realized in practice.
Another bottleneck for 3D integration lies in the processing algo-rithm and architecture of conventional Neumann computers.Although living systems use a vast number of mutually connect-ed neural cells that are sensitive to noise, these systems achievehighly sophisticated capabilities with sufficiently high reliability.To mimic biological systems, a processing architecture for aggre-gating information and making perspective judgment based onthe advanced interconnection techniques is developed.
From the view point of a complete system, global interconnectsthroughout whole chips and local parallel interconnects betweenadjacent chips are required. The former is used for system clock-ing over 10GHz as well as busses that enables synchronous pro-cessing and accessing to a G-byte database. The latter transfers2D data such as image data, without the requirement of gather-ing and multiplexing.
The proposed local wireless interconnection scheme (LWI)between chips utilizes the principle of magnetic coupling and res-onance of on-chip spiral inductors. A circuit schematic and aninductor structure are shown in Fig. 14.4.1. TX consists of aswitching MOST and a pulse shaper, and RX consists of an LCresonator, a detector, and a reconstruction FF. Current and volt-age characteristics of the circuit are also shown in Fig. 14.4.1. Tooptimize transmission delay and power dissipation, the pulsewidth is set to the time when the inductor current (iL) reaches itsmaximum value. FDTD analysis and circuit simulation showfound that the inductor size (Lind) should be larger than twice ofthe inductor distance (tchip) to obtain a large enough coupling coef-ficient (k). These sizes are scalable under a constant current con-dition.
The test chip is developed with a 0.25µm CMOS technology. Twochips are mounted on manipulators for measuring the transfercharacteristics. Measured results are also shown in Fig. 14.4.1. Adata rate of 800Mb/s is obtained at supply voltage of 2.5V andpower consumption of 9mW. By circuit simulation using 0.18µmCMOS devices, 2Gb/s data rate are obtained with only 1mWpower dissipation. The LWI can be applied to asynchronous inter-connects corresponding to wires as well as synchronous systemsusing a global clock. By optimizing inductor size, chip thicknessand power dissipation, it is possible to transfer data in highlyparallel form between neighboring chips.
A multi-chip vision (MCV) system based on hierarchical biologi-cal processing is investigated. An MCV test chip that consists ofa pixel array and a PWM-based line parallel I/O is implementedin a 0.35µm CMOS technology. Two MCV chips are connectedwith two LWI chips using analog PWM signaling as shown in Fig.14.4.2. The pixel output voltage of MCV1 is modulated to an ana-log PWM signal. It is re-modulated to the RZ signal and drivesthe TX MOST. The received signal is detected by a comparatorand reconstructed to the original PWM signal. The PWM signalsare then demodulated to the analog signals. Measured wave-forms are shown in Fig. 14.4.3. P1o and P2i are the PWM output ofMCV1 and the PWM input to MCV2, respectively. In the experi-ment, since 1760 pixel data are transferred across only one LWIchannel, a long period of 3.17ms is needed. If a column parallelscheme is applied, this transfer time is drastically reduced. Thetime resolution is 2.5ns, which is limited by the comparatorresponse delay due to the 0.35µm CMOS device speed. vo(1) andvo(2) are output voltages, without smoothing and with smoothing,respectively. The 8b accuracy is obtained by PWM analog datatransfer using LWI.
The proposed global wireless interconnection (GWI) utilizes high-frequency electromagnetic (EM) wave transmission using inte-grated antennas [3] and an ultra-wideband (UWB) transceiversystem. EM wave propagation characteristics through stackedsilicon substrates are measured using integrated dipole anten-nas, as shown in Fig. 14.4.4. A 20GHz sinusoidal wave propagateswith a low loss of 0.14dB/chip through Si chips with increasedsubstrate resistivity of 2.29kΩ-cm. Clock distribution at over10GHz can be implemented with a sinusoidal wave transmission.UWB monocycle pulse transmission characteristics are also mea-sured and a low loss of 0.24dB/chip is obtained. ExperimentalUWB TX and RX chips are designed with a 0.18µm CMOS tech-nology. A bit rate of 50Mb/s and 8-channel multiplex are obtainedby simulation. Influence of the EM wave on circuit operations andinterference with external EM waves are now under investiga-tion.
3D integration using GWI and LWI, called 3-dimensional customstack system (3DCSS), is proposed. By the 3DCSS concept, over10 chips can be stacked as shown in Fig. 14.4.5. Required align-ment accuracy of chips is very relaxed and as low as about half ofthe inductor size. Power/Ground pins are bonded with existingtechniques. The feature of 3DCSS is to realize a flexible cus-tomized system integrating various kinds of chips with a wirelessinterface. Yield and known-good-die problems are also resolvedby chip testing using wireless pads.
For implementing a hyper brain, the concept of a multi-objectrecognition system using the 3DCSS approach is studied. Theprincipal component analysis and the eigen face method for real-izing multi-object detection and recognition in a natural scene areadopted. The proposed hyper brain is composed of wireless inter-connected multiple chips including image sensor, early-visionprocessor, object detector, feature detector, object recognizer, anddatabase, as shown in Fig. 14.4.6. The expected performance is10frame/s frame rate, 10 objects and 1000 data/object matching.
Acknowledgement:This research is supported by the 21st century COE program, JapaneseGovernment. Test chips are fabricated in the chip fabrication program ofVDEC, the University of Tokyo.
References:[1] K. Kanda et al., “1.27Gb/s/ch 3mW/pin Wireless Superconnect (WSC)Interface Scheme,” ISSCC Dig. Tech. Papers, pp. 186-187, Feb., 2003.[2] D. Mizoguchi et al. “A 1.2Gb/s/pin Wireless Superconnect Base onInductive Inter-Chip Signaling (IIS),” ISSCC Dig.Tech. Papers, pp. 142-143, Feb., 2004.[3] A. Rashid et al., “High Transmission Gain Integrated Antenna onExtremely High Resistivity Si for ULSI Wireless Interconect,” IEEE EDLetters, vol. 23, no. 12, pp. 731-733, Dec., 2002.
263DIGEST OF TECHNICAL PAPERS •
Continued on Page 597
ISSCC 2005 / February 8, 2005 / Salon 1-6 / 2:45 PM
Figure 14.4.1: Wireless interconnect (LWI) based on resonatedinductor coupling.
Figure 14.4.2: Experimental setup of multi-chipvVision connected withanalog LWIs.
Figure 14.4.3: Measured waveforms of MCV using analog-LWI.
Figure 14.4.5: Structure of 3DCSS. Figure 14.4.6: Schematic view of hyper brain and data transfer.
Figure 14.4.4: Measurement results of EM wave transmission throughstacked chips.
(b) Inductor Structure
Bit Rate = 800MbpDTX (RZ)
DRX (NRZ)
DTX (NRZ)
(d) Measured Waveforms
(a) Simplified Circuit Diagram
C1 L1
R1
iLVdd M kL
L2
R2
C2
RX
CLKVref
Vdd
DTX (NRZ)
DTX(NRZ)
FF
Comp.
TX
DTX(RZ)
(c) Calculated Waveforms
-10
-5
0
5
10
0.2 0.6 0.8 10.4
i L(mA)
Vc(mV)
-100
-50
0
50
100
DTX (RZ)
t [ns]
L1=6nH, R1=50ohmC1=100fF, L2 -6nHR2=100ohm, C2=100fFK=0.11, PW=50ps
0
Vc
iL
150
Dump
Vc
5ns/div
1V/div
0.1V/div
1V/div
Lind
tchipk
Si Substrate
Inductor: Top metal
Pixel NeuronsRsy Rsz
VbsyVbsx Vbsz
Rm SWn
Sh
sw3
Vr
-
vo
Ramp Gen.Pixel
Neurons
MCV1 MCV2
TX-LWI
Rsx
C1 L
R1
Vdd
CLK400MHz
P1o
RX-LWI
Comp
L2
R2C2
Vref
Vdd (2.5V)
dump
FF
Vo(1)
Vo(2)
PWMMod.
M L
PWMMod.
L1
iL
Input
Scan
ner
P2i
Outpu
t Sca
nner
DTX(RZ)
Image data (1line)
w/ Smoothing
P1o
Max 256ns
P2i
Vo(1)
DTX(RZ)
2.5V
4.0V
0V
3.3V
1.0V
1.65V
2.3V
2.5V
2.5V
2.5V
1.0V
1.65V
2.3V
0V
3.3V
2.5V
4.0V
72us=40pixels
360.8us-44lines
Pixel Operation (Column Parallel)
23ns
2.5ns
23ns50ns
PWM
Vo(2)
P1o
P2i
P1o
P2i
Pixel Data Transfer3.17ms=1760pixels
8us=40pixels
Vo(1)
Vo(2)
Lant
d
Pad Length
Stacked chips
Transmitting antenna
Receiving antennaMeasurement Setup
Gaussian monocycle pulse
Number of inserted chips
Sinusoidal wave (f=20GHz)
Number of inserted chips0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10
-50
-40
-30
-20
Lant = 4 mm, d = 3 mmPad length = 1 mm
Anten
na tr
ansm
ission
gain (
Ga)
[dB]
-0.14 dB/Chip
-1.3 dB/Chip
-0.24 dB/Chip
-40
-30
-20
-10
Pulse
Amp
litude
[dB]
Lant= 4 mm, d= 3 mmPad length = 1 mm
-0.92 dB/Chip
= 2.29k -cm= 10 -cm
tchip= 260 m
Chip thickness : t chip
tchip= 260 m
Transmitting signal Received signal
0 1 2 3 4 5
-40
-20
0
20
40
Time [ ns ]
Outpu
t Volt
age [
mV
] Lant = 4mm, d = 3mm, h = 2860 m (10 Si chips)
0 1 2 3 4 5-1.0
-0.5
0.0
0.5
1.0
Time [ns]
Outpu
t volt
age [
mV]
= 2.29k -cm= 10 -cm
Cross Sectional View of BB’
A
B
Circuits
Cross Sectional View of AA’
No. of Stacked Chips >10 (typical)Chip Thickness tchip= 50 - 200 mAntenna Length Lant= 2 - 4 mmInductor Size Lind= 50 - 200 m
A’
B’
Antennas
Vdd / Gnd
Circuits
Vdd / Gnd
Spiral InductorsSpiral Inductors
Spiral Inductors
Antennas
Early Vision(Smoothing, LoG, etc)
Recognition
Reference Data Memory
Feature Detection
Object Detection
LWI3
LWI2
Cloc
k
Image Sensor Array(n1 x n1)
n1=input image size, n2=object size, nv=vector size,N= # of objects, M= # of ref. DataAmount of dataLWI1(PWM Image): 28 x n1LWI2(Feature Coef.): 8bit x n2 x n2 xNGWI1(Eigen Coef.):11bit x nv xNGWI2(Eigen Vector): 11bit x nv xNxM
GWI2
GWI1
LWI1
14
597 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE.
ISSCC 2005 PAPER CONTINUATIONS
Figure 14.4.7: Test chips for (a) LWI and (b) PWM I/O MCV.
(b) 0.35 m CMOS, 4.9 x 4.9mm(a) 0.25 m CMOS, 5M, 3.3 x 3.3mm
Lind=200,300 m
Tx
Rx
Pixel array40x44 pixels