2
A 32-bit CPU with Zero Standby Power and 1.5-clock Sleep/2.5-clock Wake-up Achieved by Utilizing a 180-nm C-axis Aligned Crystalline In-Ga-Zn Oxide Transistor Atsuo Isobe, Hikaru Tamura, Kiyoshi Kato, Takuro Ohmaru, Wataru Uesugi, Takahiko Ishizu, Tatsuya Onuki, Kazuaki Ohshima, Takanori Matsuzaki, Atsushi Hirose, Yasutaka Suzuki, Naoaki Tsutsui, Tomoaki Atsumi, Yutaka Shionoiri, Gensuke Goto, Jun Koyama, Masahiro Fujita and Shunpei Yamazaki Semiconductor Energy Laboratory Co., Ltd. 398 Hase, Atsugi-shi, Kanagawa, 243-0036, Japan VLSI Design and Education Center (VDEC), University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan E-mail: [email protected] Abstract A flip-flop achieving high-speed backup utilizing a Si transistor and long-term retention with zero standby power by means of a transistor of c-axis aligned crystalline (CAAC) In-Ga-Zn oxide, a kind of CAAC oxide semiconductor, featuring extremely low off-state current is proposed. Using the flip-flop, a 32-bit processor has been fabricated with 350-nm Si/180-nm CAAC oxide semiconductor technology, and demonstrated data backup and power shutdown in 1.5 clock cycles at a low power of 1.77 nJ, data recovery in 2.5 clock cycles, and data retention with zero standby power for at least a day. According to simulation results, fast backup and long-term retention can also be achieved with 45-nm Si/180-nm CAAC oxide semiconductor technology. Introduction Various systems including network, sensing, and wearable equipment require devices that consume as little power and energy as possible. Power gating (PG) for power shut-off is a promising and effective method to facilitate lower power consumption and energy saving. A low-power microcontroller unit (MCU) [1,2] using ferroelectric RAM (FeRAM) for data retention during power-off has been reported as a low-power technology. A nonvolatile element using magnetoresistive RAM (MRAM) has also been studied [3]. We have reported on an 8-bit MCU [4] including flip-flops with a backup circuit that is realized using Si/c-axis aligned crystalline oxide semiconductor (CAAC-OS) hybrid technology, which combines high-speed Si transistors with the low off-state current [5] of CAAC-OS transistors using In-Ga-Zn oxide. Power gating usually requires data to be written to a backup circuit in response to a control signal from a power management unit (PMU); thus, it is difficult to shut down power immediately. This paper describes a flip-flop that includes a primary backup circuit for short-term retention of flip-flop data and a secondary backup circuit for long-term retention. Power gating with such a flip-flop enables power shutdown immediately after a PMU decides to power off a CPU. In addition, we have implemented a 32-bit CPU, which is applicable as a general purpose microcontroller, using thinner CAAC-OS technology. Flip-flop with Two-step Backup using Si and CAAC-OS Fig. 1 shows a circuit diagram of the proposed flip-flop consisting of Si and CAAC-OS (hereafter, OS-FF). The OS-FF consists of a standard flip-flop, a primary retention circuit (SRC1), a secondary retention circuit (SRC2), and a read circuit. SRC1 is composed of a Si transistor and a capacitor (Cs1) and is used for short-term data retention. SRC2 is composed of a CAAC-OS transistor and a capacitor (Cs2) and is used for long-term data retention. Fig. 2 is a timing diagram of the OS-FF. In a normal operation, the OS-FF behaves as a standard flip-flop: the signal RST is high, and Cs1 retains the flip-flop data. In a backup operation, RST is simply set to low, and VVDD can be turned off at the same time when RST becomes low. Thus, in principle, it can take 0 clock cycles from the start of backup until power shutdown. VVDD is made low by natural or artificial discharge. Then, to back up data from SRC1 to SRC2, OSG is set to high as in Fig. 2(b). All power supplies to the CPU are shut down during data retention; i.e., zero-power standby is realized. In a recovery operation, first, power is applied to stabilize VVDD. Signals clock and RST are sequentially set to high such that data are correctly restored in the flip-flop. A recovery node N in Fig. 1(a) is charged or discharged in correspondence to the charge amount of Cs1 and Cs2 and the state of read Si FETs A and B (Fig. 1(b)). Fabrication and Measurement Results A test chip has been fabricated with a 350-nm Si/180-nm CAAC-OS process [6], and it mainly consists of a three-stage pipelined MIPS processer, cache, and a PMU. Fig. 3 shows a micrograph, features, a cross-sectional TEM image, and a block diagram of the chip. Fig. 4 shows operation waveforms of the test chip that performs power-down in 1.5 clock cycles and wake-up in 2.5 clock cycles and that data retention time of SRC1 is 10 ms. The 1.5 clock cycles for power-down are necessary just for correct operation of the PMU, not for the processor. Fig. 5 shows operation waveforms of data backup to SRC2 and that data retention in SRC2 is at least 1 day. As in Fig. 6 and Table I, energy overhead of power-down/wake-up is 1.77 nJ and 11.64 nJ when using SRC1, and power supply affects the energy overhead for short PG when using SRC2. In comparison with [2], the proposed technology achieves faster backup and recovery and comparable PG overhead energy in spite of the thicker technology, considering the difference in power supply voltage (Table I). Scalability of Flip-flop with Two-step Backup Fig. 7 and Table II show the layout and simulation results of a flip-flop using 45-nm Si/180-nm CAAC-OS technology. A 180-nm CAAC-OS transistor and capacitors Cs1 and Cs2 can be stacked in the 15.2-μm 2 layout area of a standard flip-flop and a read circuit. Retention time of SRC1 with 45-nm Si technology is long enough for two-step backup. The performance and energy overheads in normal operation are only 8% and 3%, respectively. Conclusion A test flip-flop achieving high-speed backup by means of Si/CAAC-OS hybrid technology has been fabricated. The proposed flip-flop demonstrates power-down and data recovery by ultrashort clock cycles (1.5 and 2.5, respectively). Simulation results show that 45-nm Si technology enables two-step backup of ultrashort operation as well as low cost. 978-1-4799-3328-0/14/$31.00 ©2014 IEEE 2014 Symposium on VLSI Circuits Digest of Technical Papers

[IEEE 2014 IEEE Symposium on VLSI Circuits - Honolulu, HI, USA (2014.6.10-2014.6.13)] 2014 Symposium on VLSI Circuits Digest of Technical Papers - A 32-bit CPU with zero standby power

  • Upload
    shunpei

  • View
    217

  • Download
    4

Embed Size (px)

Citation preview

Page 1: [IEEE 2014 IEEE Symposium on VLSI Circuits - Honolulu, HI, USA (2014.6.10-2014.6.13)] 2014 Symposium on VLSI Circuits Digest of Technical Papers - A 32-bit CPU with zero standby power

A 32-bit CPU with Zero Standby Power and 1.5-clock Sleep/2.5-clock Wake-up Achieved by Utilizing a 180-nm C-axis Aligned Crystalline In-Ga-Zn Oxide Transistor

Atsuo Isobe, Hikaru Tamura, Kiyoshi Kato, Takuro Ohmaru, Wataru Uesugi, Takahiko Ishizu, Tatsuya Onuki, Kazuaki Ohshima, Takanori Matsuzaki, Atsushi Hirose, Yasutaka Suzuki, Naoaki Tsutsui, Tomoaki Atsumi, Yutaka Shionoiri, Gensuke Goto, Jun Koyama, Masahiro Fujita† and Shunpei Yamazaki

Semiconductor Energy Laboratory Co., Ltd. 398 Hase, Atsugi-shi, Kanagawa, 243-0036, Japan †VLSI Design and Education Center (VDEC), University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan

E-mail: [email protected]

Abstract A flip-flop achieving high-speed backup utilizing a Si

transistor and long-term retention with zero standby power by means of a transistor of c-axis aligned crystalline (CAAC) In-Ga-Zn oxide, a kind of CAAC oxide semiconductor, featuring extremely low off-state current is proposed. Using the flip-flop, a 32-bit processor has been fabricated with 350-nm Si/180-nm CAAC oxide semiconductor technology, and demonstrated data backup and power shutdown in 1.5 clock cycles at a low power of 1.77 nJ, data recovery in 2.5 clock cycles, and data retention with zero standby power for at least a day. According to simulation results, fast backup and long-term retention can also be achieved with 45-nm Si/180-nm CAAC oxide semiconductor technology.

Introduction Various systems including network, sensing, and wearable

equipment require devices that consume as little power and energy as possible. Power gating (PG) for power shut-off is a promising and effective method to facilitate lower power consumption and energy saving. A low-power microcontroller unit (MCU) [1,2] using ferroelectric RAM (FeRAM) for data retention during power-off has been reported as a low-power technology. A nonvolatile element using magnetoresistive RAM (MRAM) has also been studied [3]. We have reported on an 8-bit MCU [4] including flip-flops with a backup circuit that is realized using Si/c-axis aligned crystalline oxide semiconductor (CAAC-OS) hybrid technology, which combines high-speed Si transistors with the low off-state current [5] of CAAC-OS transistors using In-Ga-Zn oxide. Power gating usually requires data to be written to a backup circuit in response to a control signal from a power management unit (PMU); thus, it is difficult to shut down power immediately.

This paper describes a flip-flop that includes a primary backup circuit for short-term retention of flip-flop data and a secondary backup circuit for long-term retention. Power gating with such a flip-flop enables power shutdown immediately after a PMU decides to power off a CPU. In addition, we have implemented a 32-bit CPU, which is applicable as a general purpose microcontroller, using thinner CAAC-OS technology.

Flip-flop with Two-step Backup using Si and CAAC-OS Fig. 1 shows a circuit diagram of the proposed flip-flop

consisting of Si and CAAC-OS (hereafter, OS-FF). The OS-FF consists of a standard flip-flop, a primary retention circuit (SRC1), a secondary retention circuit (SRC2), and a read circuit. SRC1 is composed of a Si transistor and a capacitor (Cs1) and is used for short-term data retention. SRC2 is composed of a CAAC-OS transistor and a capacitor (Cs2) and is used for long-term data retention.

Fig. 2 is a timing diagram of the OS-FF. In a normal

operation, the OS-FF behaves as a standard flip-flop: the signal RST is high, and Cs1 retains the flip-flop data. In a backup operation, RST is simply set to low, and VVDD can be turned off at the same time when RST becomes low. Thus, in principle, it can take 0 clock cycles from the start of backup until power shutdown. VVDD is made low by natural or artificial discharge. Then, to back up data from SRC1 to SRC2, OSG is set to high as in Fig. 2(b). All power supplies to the CPU are shut down during data retention; i.e., zero-power standby is realized. In a recovery operation, first, power is applied to stabilize VVDD. Signals clock and RST are sequentially set to high such that data are correctly restored in the flip-flop. A recovery node N in Fig. 1(a) is charged or discharged in correspondence to the charge amount of Cs1 and Cs2 and the state of read Si FETs A and B (Fig. 1(b)).

Fabrication and Measurement Results A test chip has been fabricated with a 350-nm Si/180-nm

CAAC-OS process [6], and it mainly consists of a three-stage pipelined MIPS processer, cache, and a PMU. Fig. 3 shows a micrograph, features, a cross-sectional TEM image, and a block diagram of the chip. Fig. 4 shows operation waveforms of the test chip that performs power-down in 1.5 clock cycles and wake-up in 2.5 clock cycles and that data retention time of SRC1 is 10 ms. The 1.5 clock cycles for power-down are necessary just for correct operation of the PMU, not for the processor. Fig. 5 shows operation waveforms of data backup to SRC2 and that data retention in SRC2 is at least 1 day. As in Fig. 6 and Table I, energy overhead of power-down/wake-up is 1.77 nJ and 11.64 nJ when using SRC1, and power supply affects the energy overhead for short PG when using SRC2. In comparison with [2], the proposed technology achieves faster backup and recovery and comparable PG overhead energy in spite of the thicker technology, considering the difference in power supply voltage (Table I).

Scalability of Flip-flop with Two-step Backup Fig. 7 and Table II show the layout and simulation results of

a flip-flop using 45-nm Si/180-nm CAAC-OS technology. A 180-nm CAAC-OS transistor and capacitors Cs1 and Cs2 can be stacked in the 15.2-μm2

layout area of a standard flip-flop and a read circuit. Retention time of SRC1 with 45-nm Si technology is long enough for two-step backup. The performance and energy overheads in normal operation are only 8% and 3%, respectively.

Conclusion A test flip-flop achieving high-speed backup by means of

Si/CAAC-OS hybrid technology has been fabricated. The proposed flip-flop demonstrates power-down and data recovery by ultrashort clock cycles (1.5 and 2.5, respectively). Simulation results show that 45-nm Si technology enables two-step backup of ultrashort operation as well as low cost.

978-1-4799-3328-0/14/$31.00 ©2014 IEEE 2014 Symposium on VLSI Circuits Digest of Technical Papers

Page 2: [IEEE 2014 IEEE Symposium on VLSI Circuits - Honolulu, HI, USA (2014.6.10-2014.6.13)] 2014 Symposium on VLSI Circuits Digest of Technical Papers - A 32-bit CPU with zero standby power

References [1] A. Baumann et al., VLSI Circuits Symposium 2013, pp. 202–203. [2] S. Bartling et al., ISSCC 2013, pp. 432–434. [3] E. Kitagawa et al., IEDM Dig. Tech. Papers, pp. 677–680, 2012. [4] H. Kobayashi et al., COOL Chips XVI 2013. [5] K. Kato et al., Jpn. J. Appl. Phys., vol. 51, 021201, 2012. [6] Y. Kobayashi et al., Ext. Abst. Solid State Devices and Materials, pp. 930–931, 2013.

5 104 404 28 97 427

100

50

0

100

50

0

PG o

verh

ead

Ene

rgy

(nJ)

PG period (clock cycles) PG period (clock cycles)

PG o

verh

ead

Ene

rgy

(nJ)

OSG

VVDD(Power supply)

VVDD(Cs2)

VVDD(Backup, recovery)

(a) Micrograph of test chip

Fig. 3 Chip micrograph, features, cross-sectional TEM image, and block diagram.

(b) Features

(d) Block diagram

Fig. 1 Circuit diagram and state list of flip-flop.

(a) Circuit diagram (b) State list

Table I Comparison of chip performance.

Table II Simulation and layout results of OS-FF.

Fig. 2 Timing diagram of flip-flop.

(c) Cross-sectional TEM image

Fig. 7 Layout of flip-flop in 45-nm Si technology.

Fig. 6 Measured PG overhead energy of CPU using SRC1 and SRC2.

Fig. 4 Measured waveforms of PG operation and retention time using SRC1. (a) Waveform of PG operation (b) Retention time

Fig. 5 Measured waveforms of PG operation and retention time using SRC2. (a) Waveform of PG operation (b) Retention time

(a) SRC1 (b) SRC2

Bus_IF Debug_IF

Cache controller

PMU

Cache

PMU

CacheCPU

CPU power domain

PMU

2kB cache

Cache controllerBus IF

CAAC-OSflip-flops

Global VDD

Virtual VDD( VVDD )

GND

32-bit Core

Power switch CPU circuits

CAAC-OS control signal

OSG

RST

clock

VVDD

FN1

FN2

NData

Data Data

Data

Normal NormalPower off

VVDD stabilization Recovery

Data "0" Data "1"

PMU operationfor power-down

Data

Data Data

Data

Normal Normal

Backup(SRC2)

Power off

VVDD stabilization Recovery

Data "0" Data "1"

PMU operationfor power-down

(a) Backup to SRC1 using natural discharge of VVDD

(b) Backup to SRC2 using artificial discharge of VVDD

Power off for 10,000 clks

Backup to SRC2Addressbus

External clockOSG

VVDD

CAAC-OS FET

180nm

Si Transistor

350nm

Addressbus

External clockOSG

VVDD

Power shutdown (1.5 clks)

Recovery(2.5 clks)

VVDD stabilization(1 clk)

DRST

RSTclock

clock clock

clock

Q

RST

RST

clock

OSG

VVDD

CAAC-OSCs1

SRC1SRC2

Cs2

flip-flop

Data retention circuitVVDD

FN1

FN2

N

RST

Si-FET(B)

Si-FET(A)

ISA MIPS I (32-bit, RISC)Pipeline 3 stageCache 2way, 2kB

Clock frequency 15 MHztype Si CAAC-OS

Technology 350-nm 180-nm

Number of transisters

CPU 116,200 1,410Cache 200,000 50,000Others 55,100 -

Power supply voltage 2.5V 3.2V / -1V

Retention time (s)1E-3 1E-2 1E-1 1E+0

100

95

90

Evaluation OS-FFs: 384

Pass

ratio

(%)

10ms

Retention time (s)1E+3 1E+4 1E+5 1E+6

100

95

90

1day

Evaluation OS-FFs: 384

Pass

ratio

(%)

Metric This work H. Kobayashi [4] Bartling [2]

Si implementation Yes Yes Yes

ISA 32-bit RISC (MIPS I) 8-bit CISC (Z80 like) 32-bit RISC (ARM CM0)

Technology 350-nm Si 180-nm CAAC-OS

500-nm Si 800-nm CAAC-OS

130-nm Si 130-nm FeRAM

Supply Voltage Si: 2.5 VCAAC-OS: 3.2 V/-1 V

Si: 2.5 VCAAC-OS: 3.2 V Si: 1.5 V

Area 289 mm2: Chip Core 92 mm2: CPU 14.9 mm2: CPU 4.4 mm2: Chip Core

1.1 mm2: CPU

Clock Frequency 15 MHz 25 MHz 8 MHz/125 MHz

Retention Circuits Implementation OS-FFs with Two-step Backup OS-FFs Mini Arrays,

FeCapsCore Area Overhead 4.8% * 5.7% 12%

Power Gating Energy Overhead

13.42 nJ Power shutdown 1.11 nJ: CPU w/o OS-FFs ** 0.66 nJ: OS-FFs ** Wake-up 5.82 nJ: CPU w/o OS-FFs ** 5.82 nJ: OS -FFs **

(4.9 μs)

7.25 nJ Power shutdown 0.86 nJ: w/o NVL 4.72 nJ: NVL Wake-up 0.33 nJ: w/o NVL 1.34 nJ: NVL

Power Shutdownof CPU

1.5 clock cycles( 0-clock backup ) 45 clock cycles

40 clock cycles at 125 MHz*2.6 clock cycles

at 8 MHz conversion*

Wake-up 2.5 clock cycles 55 clock cycles48 clock cycles at 125 MHz*

3.1 clock cyclesat 8 MHz conversion*

Comments * Including routing overhead ** Ratio by simulation

Needs two kinds of clock * NVL operation

1530 nm4420 nm

2560

nm

Cs2 Cs1

CAAC-OS FET

Read circuit

Metric 45-nm Si 180-nm CAAC-OS

350-nm Si 180-nm CAAC-OS

Recovery time 1.1 ns 2.3 ns

SRC1

Backup time 0.4 ns 1.4 ns

Retention time 489 ns 2.6 ms

Backup & Restore energy 31 fJ2 fJ: OS-FET and Cs2

1390 fJ7 fJ: OS-FET and Cs2

SRC2

Backup time 61 ns 135 ns

Retention time > 8.1 h > 17.6 d

Backup & Restore energy 86 fJ40 fJ: OS-FET and Cs2

3649 fJ973 fJ: OS-FET and Cs2

Simulationcondition Cs1, Cs2, VDD

Cs1: 2 fF, Cs2: 27.5 fF Si: 1.1 V

CAAC-OS: 1.8 V/-1 V

Cs1: 13 fF, Cs2: 133 fFSi: 2.5 V

CAAC-OS: 3.2 V/-1 V

Impact on conventional-FF 8% Performance 3% Power 35% Area

16% Performance 11% Power 35% Area

Recoveryroot

Data retentionregion

Cs1 Cs2Si-

FET(A)

Si-FET(B)

Data (D): 0

SRC1SRC1

↑ ↑ off on

SRC2↑ ↓ off off

SRC2 ↓ ↓ on offData (D): 1

SRC1SRC1

↓ ↑ on on

SRC2↓ ↑ on on

SRC2 ↓ ↑ on on↑:charged, ↓:discharged

2014 Symposium on VLSI Circuits Digest of Technical Papers