3
262 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14 / TD: LOW-POWER WIRELESS AND ADVANCED INTEGRATION / 14.4 14.4 A 3D Integration Scheme utilizing Wireless Interconnections for Implementing Hyper Brains Atsushi Iwata, Mamoru Sasaki, Takamaro Kikkawa, Seiji Kameda, Hiroshi Ando, Kentaro Kimoto, Daisuke Arizono, Hideo Sunami Hiroshima University, Higashi-Hiroshima, Japan In order to break Moore’s law by 3D integration, innovative solu- tions for inter- and intra-chip interconnections have to be devel- oped. Inter-chip interconnection technologies using via-holes have been developed however their associated fabrication cost and yield are still unacceptable. To overcome these problems, wireless interconnects using capacitive coupling of small pads [1] or inductive coupling of on-chip spiral inductors [2] are proposed. With the former technique, since a pair of pads formed on differ- ent chips must couple with an insulation layer with the thickness of 1 to 2μm, the problems are not satisfactorily solved. The latter technique consumes large power of more than 10mW for a single interconnect. Therefore, expected requirements of over 1000 con- nections between chips cannot be realized in practice. Another bottleneck for 3D integration lies in the processing algo- rithm and architecture of conventional Neumann computers. Although living systems use a vast number of mutually connect- ed neural cells that are sensitive to noise, these systems achieve highly sophisticated capabilities with sufficiently high reliability. To mimic biological systems, a processing architecture for aggre- gating information and making perspective judgment based on the advanced interconnection techniques is developed. From the view point of a complete system, global interconnects throughout whole chips and local parallel interconnects between adjacent chips are required. The former is used for system clock- ing over 10GHz as well as busses that enables synchronous pro- cessing and accessing to a G-byte database. The latter transfers 2D data such as image data, without the requirement of gather- ing and multiplexing. The proposed local wireless interconnection scheme (LWI) between chips utilizes the principle of magnetic coupling and res- onance of on-chip spiral inductors. A circuit schematic and an inductor structure are shown in Fig. 14.4.1. TX consists of a switching MOST and a pulse shaper, and RX consists of an LC resonator, a detector, and a reconstruction FF. Current and volt- age characteristics of the circuit are also shown in Fig. 14.4.1. To optimize transmission delay and power dissipation, the pulse width is set to the time when the inductor current (i L ) reaches its maximum value. FDTD analysis and circuit simulation show found that the inductor size (L ind ) should be larger than twice of the inductor distance (t chip ) to obtain a large enough coupling coef- ficient (k). These sizes are scalable under a constant current con- dition. The test chip is developed with a 0.25μm CMOS technology. Two chips are mounted on manipulators for measuring the transfer characteristics. Measured results are also shown in Fig. 14.4.1. A data rate of 800Mb/s is obtained at supply voltage of 2.5V and power consumption of 9mW. By circuit simulation using 0.18μm CMOS devices, 2Gb/s data rate are obtained with only 1mW power dissipation. The LWI can be applied to asynchronous inter- connects corresponding to wires as well as synchronous systems using a global clock. By optimizing inductor size, chip thickness and power dissipation, it is possible to transfer data in highly parallel form between neighboring chips. A multi-chip vision (MCV) system based on hierarchical biologi- cal processing is investigated. An MCV test chip that consists of a pixel array and a PWM-based line parallel I/O is implemented in a 0.35μm CMOS technology. Two MCV chips are connected with two LWI chips using analog PWM signaling as shown in Fig. 14.4.2. The pixel output voltage of MCV1 is modulated to an ana- log PWM signal. It is re-modulated to the RZ signal and drives the TX MOST. The received signal is detected by a comparator and reconstructed to the original PWM signal. The PWM signals are then demodulated to the analog signals. Measured wave- forms are shown in Fig. 14.4.3. P 1o and P 2i are the PWM output of MCV1 and the PWM input to MCV2, respectively. In the experi- ment, since 1760 pixel data are transferred across only one LWI channel, a long period of 3.17ms is needed. If a column parallel scheme is applied, this transfer time is drastically reduced. The time resolution is 2.5ns, which is limited by the comparator response delay due to the 0.35μm CMOS device speed. v o (1) and v o (2) are output voltages, without smoothing and with smoothing, respectively. The 8b accuracy is obtained by PWM analog data transfer using LWI. The proposed global wireless interconnection (GWI) utilizes high- frequency electromagnetic (EM) wave transmission using inte- grated antennas [3] and an ultra-wideband (UWB) transceiver system. EM wave propagation characteristics through stacked silicon substrates are measured using integrated dipole anten- nas, as shown in Fig. 14.4.4. A 20GHz sinusoidal wave propagates with a low loss of 0.14dB/chip through Si chips with increased substrate resistivity of 2.29k-cm. Clock distribution at over 10GHz can be implemented with a sinusoidal wave transmission. UWB monocycle pulse transmission characteristics are also mea- sured and a low loss of 0.24dB/chip is obtained. Experimental UWB TX and RX chips are designed with a 0.18μm CMOS tech- nology. A bit rate of 50Mb/s and 8-channel multiplex are obtained by simulation. Influence of the EM wave on circuit operations and interference with external EM waves are now under investiga- tion. 3D integration using GWI and LWI, called 3-dimensional custom stack system (3DCSS), is proposed. By the 3DCSS concept, over 10 chips can be stacked as shown in Fig. 14.4.5. Required align- ment accuracy of chips is very relaxed and as low as about half of the inductor size. Power/Ground pins are bonded with existing techniques. The feature of 3DCSS is to realize a flexible cus- tomized system integrating various kinds of chips with a wireless interface. Yield and known-good-die problems are also resolved by chip testing using wireless pads. For implementing a hyper brain, the concept of a multi-object recognition system using the 3DCSS approach is studied. The principal component analysis and the eigen face method for real- izing multi-object detection and recognition in a natural scene are adopted. The proposed hyper brain is composed of wireless inter- connected multiple chips including image sensor, early-vision processor, object detector, feature detector, object recognizer, and database, as shown in Fig. 14.4.6. The expected performance is 10frame/s frame rate, 10 objects and 1000 data/object matching. Acknowledgement: This research is supported by the 21st century COE program, Japanese Government. Test chips are fabricated in the chip fabrication program of VDEC, the University of Tokyo. References: [1] K. Kanda et al., “1.27Gb/s/ch 3mW/pin Wireless Superconnect (WSC) Interface Scheme,” ISSCC Dig. Tech. Papers, pp. 186-187, Feb., 2003. [2] D. Mizoguchi et al. “A 1.2Gb/s/pin Wireless Superconnect Base on Inductive Inter-Chip Signaling (IIS),” ISSCC Dig.Tech. Papers, pp. 142- 143, Feb., 2004. [3] A. Rashid et al., “High Transmission Gain Integrated Antenna on Extremely High Resistivity Si for ULSI Wireless Interconect,” IEEE ED Letters, vol. 23, no. 12, pp. 731-733, Dec., 2002.

ISSCC 2005 / SESSION 14 / TD: LOW-POWER ......262 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14 / TD: LOW-POWER

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ISSCC 2005 / SESSION 14 / TD: LOW-POWER ......262 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14 / TD: LOW-POWER

262 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE.

ISSCC 2005 / SESSION 14 / TD: LOW-POWER WIRELESS AND ADVANCED INTEGRATION / 14.4

14.4 A 3D Integration Scheme utilizing WirelessInterconnections for Implementing Hyper Brains

Atsushi Iwata, Mamoru Sasaki, Takamaro Kikkawa, Seiji Kameda,Hiroshi Ando, Kentaro Kimoto, Daisuke Arizono, Hideo Sunami

Hiroshima University, Higashi-Hiroshima, Japan

In order to break Moore’s law by 3D integration, innovative solu-tions for inter- and intra-chip interconnections have to be devel-oped. Inter-chip interconnection technologies using via-holeshave been developed however their associated fabrication costand yield are still unacceptable. To overcome these problems,wireless interconnects using capacitive coupling of small pads [1]or inductive coupling of on-chip spiral inductors [2] are proposed.With the former technique, since a pair of pads formed on differ-ent chips must couple with an insulation layer with the thicknessof 1 to 2µm, the problems are not satisfactorily solved. The lattertechnique consumes large power of more than 10mW for a singleinterconnect. Therefore, expected requirements of over 1000 con-nections between chips cannot be realized in practice.

Another bottleneck for 3D integration lies in the processing algo-rithm and architecture of conventional Neumann computers.Although living systems use a vast number of mutually connect-ed neural cells that are sensitive to noise, these systems achievehighly sophisticated capabilities with sufficiently high reliability.To mimic biological systems, a processing architecture for aggre-gating information and making perspective judgment based onthe advanced interconnection techniques is developed.

From the view point of a complete system, global interconnectsthroughout whole chips and local parallel interconnects betweenadjacent chips are required. The former is used for system clock-ing over 10GHz as well as busses that enables synchronous pro-cessing and accessing to a G-byte database. The latter transfers2D data such as image data, without the requirement of gather-ing and multiplexing.

The proposed local wireless interconnection scheme (LWI)between chips utilizes the principle of magnetic coupling and res-onance of on-chip spiral inductors. A circuit schematic and aninductor structure are shown in Fig. 14.4.1. TX consists of aswitching MOST and a pulse shaper, and RX consists of an LCresonator, a detector, and a reconstruction FF. Current and volt-age characteristics of the circuit are also shown in Fig. 14.4.1. Tooptimize transmission delay and power dissipation, the pulsewidth is set to the time when the inductor current (iL) reaches itsmaximum value. FDTD analysis and circuit simulation showfound that the inductor size (Lind) should be larger than twice ofthe inductor distance (tchip) to obtain a large enough coupling coef-ficient (k). These sizes are scalable under a constant current con-dition.

The test chip is developed with a 0.25µm CMOS technology. Twochips are mounted on manipulators for measuring the transfercharacteristics. Measured results are also shown in Fig. 14.4.1. Adata rate of 800Mb/s is obtained at supply voltage of 2.5V andpower consumption of 9mW. By circuit simulation using 0.18µmCMOS devices, 2Gb/s data rate are obtained with only 1mWpower dissipation. The LWI can be applied to asynchronous inter-connects corresponding to wires as well as synchronous systemsusing a global clock. By optimizing inductor size, chip thicknessand power dissipation, it is possible to transfer data in highlyparallel form between neighboring chips.

A multi-chip vision (MCV) system based on hierarchical biologi-cal processing is investigated. An MCV test chip that consists ofa pixel array and a PWM-based line parallel I/O is implementedin a 0.35µm CMOS technology. Two MCV chips are connectedwith two LWI chips using analog PWM signaling as shown in Fig.14.4.2. The pixel output voltage of MCV1 is modulated to an ana-log PWM signal. It is re-modulated to the RZ signal and drivesthe TX MOST. The received signal is detected by a comparatorand reconstructed to the original PWM signal. The PWM signalsare then demodulated to the analog signals. Measured wave-forms are shown in Fig. 14.4.3. P1o and P2i are the PWM output ofMCV1 and the PWM input to MCV2, respectively. In the experi-ment, since 1760 pixel data are transferred across only one LWIchannel, a long period of 3.17ms is needed. If a column parallelscheme is applied, this transfer time is drastically reduced. Thetime resolution is 2.5ns, which is limited by the comparatorresponse delay due to the 0.35µm CMOS device speed. vo(1) andvo(2) are output voltages, without smoothing and with smoothing,respectively. The 8b accuracy is obtained by PWM analog datatransfer using LWI.

The proposed global wireless interconnection (GWI) utilizes high-frequency electromagnetic (EM) wave transmission using inte-grated antennas [3] and an ultra-wideband (UWB) transceiversystem. EM wave propagation characteristics through stackedsilicon substrates are measured using integrated dipole anten-nas, as shown in Fig. 14.4.4. A 20GHz sinusoidal wave propagateswith a low loss of 0.14dB/chip through Si chips with increasedsubstrate resistivity of 2.29kΩ-cm. Clock distribution at over10GHz can be implemented with a sinusoidal wave transmission.UWB monocycle pulse transmission characteristics are also mea-sured and a low loss of 0.24dB/chip is obtained. ExperimentalUWB TX and RX chips are designed with a 0.18µm CMOS tech-nology. A bit rate of 50Mb/s and 8-channel multiplex are obtainedby simulation. Influence of the EM wave on circuit operations andinterference with external EM waves are now under investiga-tion.

3D integration using GWI and LWI, called 3-dimensional customstack system (3DCSS), is proposed. By the 3DCSS concept, over10 chips can be stacked as shown in Fig. 14.4.5. Required align-ment accuracy of chips is very relaxed and as low as about half ofthe inductor size. Power/Ground pins are bonded with existingtechniques. The feature of 3DCSS is to realize a flexible cus-tomized system integrating various kinds of chips with a wirelessinterface. Yield and known-good-die problems are also resolvedby chip testing using wireless pads.

For implementing a hyper brain, the concept of a multi-objectrecognition system using the 3DCSS approach is studied. Theprincipal component analysis and the eigen face method for real-izing multi-object detection and recognition in a natural scene areadopted. The proposed hyper brain is composed of wireless inter-connected multiple chips including image sensor, early-visionprocessor, object detector, feature detector, object recognizer, anddatabase, as shown in Fig. 14.4.6. The expected performance is10frame/s frame rate, 10 objects and 1000 data/object matching.

Acknowledgement:This research is supported by the 21st century COE program, JapaneseGovernment. Test chips are fabricated in the chip fabrication program ofVDEC, the University of Tokyo.

References:[1] K. Kanda et al., “1.27Gb/s/ch 3mW/pin Wireless Superconnect (WSC)Interface Scheme,” ISSCC Dig. Tech. Papers, pp. 186-187, Feb., 2003.[2] D. Mizoguchi et al. “A 1.2Gb/s/pin Wireless Superconnect Base onInductive Inter-Chip Signaling (IIS),” ISSCC Dig.Tech. Papers, pp. 142-143, Feb., 2004.[3] A. Rashid et al., “High Transmission Gain Integrated Antenna onExtremely High Resistivity Si for ULSI Wireless Interconect,” IEEE EDLetters, vol. 23, no. 12, pp. 731-733, Dec., 2002.

Page 2: ISSCC 2005 / SESSION 14 / TD: LOW-POWER ......262 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14 / TD: LOW-POWER

263DIGEST OF TECHNICAL PAPERS •

Continued on Page 597

ISSCC 2005 / February 8, 2005 / Salon 1-6 / 2:45 PM

Figure 14.4.1: Wireless interconnect (LWI) based on resonatedinductor coupling.

Figure 14.4.2: Experimental setup of multi-chipvVision connected withanalog LWIs.

Figure 14.4.3: Measured waveforms of MCV using analog-LWI.

Figure 14.4.5: Structure of 3DCSS. Figure 14.4.6: Schematic view of hyper brain and data transfer.

Figure 14.4.4: Measurement results of EM wave transmission throughstacked chips.

(b) Inductor Structure

Bit Rate = 800MbpDTX (RZ)

DRX (NRZ)

DTX (NRZ)

(d) Measured Waveforms

(a) Simplified Circuit Diagram

C1 L1

R1

iLVdd M kL

L2

R2

C2

RX

CLKVref

Vdd

DTX (NRZ)

DTX(NRZ)

FF

Comp.

TX

DTX(RZ)

(c) Calculated Waveforms

-10

-5

0

5

10

0.2 0.6 0.8 10.4

i L(mA)

Vc(mV)

-100

-50

0

50

100

DTX (RZ)

t [ns]

L1=6nH, R1=50ohmC1=100fF, L2 -6nHR2=100ohm, C2=100fFK=0.11, PW=50ps

0

Vc

iL

150

Dump

Vc

5ns/div

1V/div

0.1V/div

1V/div

Lind

tchipk

Si Substrate

Inductor: Top metal

Pixel NeuronsRsy Rsz

VbsyVbsx Vbsz

Rm SWn

Sh

sw3

Vr

-

vo

Ramp Gen.Pixel

Neurons

MCV1 MCV2

TX-LWI

Rsx

C1 L

R1

Vdd

CLK400MHz

P1o

RX-LWI

Comp

L2

R2C2

Vref

Vdd (2.5V)

dump

FF

Vo(1)

Vo(2)

PWMMod.

M L

PWMMod.

L1

iL

Input

Scan

ner

P2i

Outpu

t Sca

nner

DTX(RZ)

Image data (1line)

w/ Smoothing

P1o

Max 256ns

P2i

Vo(1)

DTX(RZ)

2.5V

4.0V

0V

3.3V

1.0V

1.65V

2.3V

2.5V

2.5V

2.5V

1.0V

1.65V

2.3V

0V

3.3V

2.5V

4.0V

72us=40pixels

360.8us-44lines

Pixel Operation (Column Parallel)

23ns

2.5ns

23ns50ns

PWM

Vo(2)

P1o

P2i

P1o

P2i

Pixel Data Transfer3.17ms=1760pixels

8us=40pixels

Vo(1)

Vo(2)

Lant

d

Pad Length

Stacked chips

Transmitting antenna

Receiving antennaMeasurement Setup

Gaussian monocycle pulse

Number of inserted chips

Sinusoidal wave (f=20GHz)

Number of inserted chips0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10

-50

-40

-30

-20

Lant = 4 mm, d = 3 mmPad length = 1 mm

Anten

na tr

ansm

ission

gain (

Ga)

[dB]

-0.14 dB/Chip

-1.3 dB/Chip

-0.24 dB/Chip

-40

-30

-20

-10

Pulse

Amp

litude

[dB]

Lant= 4 mm, d= 3 mmPad length = 1 mm

-0.92 dB/Chip

= 2.29k -cm= 10 -cm

tchip= 260 m

Chip thickness : t chip

tchip= 260 m

Transmitting signal Received signal

0 1 2 3 4 5

-40

-20

0

20

40

Time [ ns ]

Outpu

t Volt

age [

mV

] Lant = 4mm, d = 3mm, h = 2860 m (10 Si chips)

0 1 2 3 4 5-1.0

-0.5

0.0

0.5

1.0

Time [ns]

Outpu

t volt

age [

mV]

= 2.29k -cm= 10 -cm

Cross Sectional View of BB’

A

B

Circuits

Cross Sectional View of AA’

No. of Stacked Chips >10 (typical)Chip Thickness tchip= 50 - 200 mAntenna Length Lant= 2 - 4 mmInductor Size Lind= 50 - 200 m

A’

B’

Antennas

Vdd / Gnd

Circuits

Vdd / Gnd

Spiral InductorsSpiral Inductors

Spiral Inductors

Antennas

Early Vision(Smoothing, LoG, etc)

Recognition

Reference Data Memory

Feature Detection

Object Detection

LWI3

LWI2

Cloc

k

Image Sensor Array(n1 x n1)

n1=input image size, n2=object size, nv=vector size,N= # of objects, M= # of ref. DataAmount of dataLWI1(PWM Image): 28 x n1LWI2(Feature Coef.): 8bit x n2 x n2 xNGWI1(Eigen Coef.):11bit x nv xNGWI2(Eigen Vector): 11bit x nv xNxM

GWI2

GWI1

LWI1

14

Page 3: ISSCC 2005 / SESSION 14 / TD: LOW-POWER ......262 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14 / TD: LOW-POWER

597 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE.

ISSCC 2005 PAPER CONTINUATIONS

Figure 14.4.7: Test chips for (a) LWI and (b) PWM I/O MCV.

(b) 0.35 m CMOS, 4.9 x 4.9mm(a) 0.25 m CMOS, 5M, 3.3 x 3.3mm

Lind=200,300 m

Tx

Rx

Pixel array40x44 pixels