Side-Channel Analysis Aspects of Lightweight Block Ciphers · Erklärung Ich versichere hiermit, dass ich meine Diplomarbeit mit dem Thema Side-Channel Analysis Aspects of Lightweight

Side-Channel Analysis Aspects ofLightweight Block Ciphers

Carsten Rolfes

February 28, 2009

Diploma ThesisRuhr-Universität Bochum

Chair for Embedded SecurityProf. Dr.-Ing. Christof Paar

Dipl.-Ing. Dipl-Kfm. Axel Poschmann

Erklärung

Ich versichere hiermit, dass ich meine Diplomarbeit mit demThema

Side-Channel Analysis Aspects of Lightweight Block Ciphers

selbständig verfasst und keine anderen als die angegebenenQuellen und Hilfsmittel benutzthabe. Zitate habe ich als solche kenntlich gemacht. Die Arbeit wurde bisher keiner anderenPrüfungsbehörde vorgelegt und auch nicht veröffentlicht.

Bochum, den 28. Februar 2009 Carsten Rolfes

Contents

Nomenclature xiii

1. Introduction 11.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2. Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3. Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4. Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Fundamentals 32.1. Introduction to thePRESENTAlgorithm . . . . . . . . . . . . . . . . . . . . 3

2.1.1. S-Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.2. P-Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3. Key Schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2. VLSI Design Flow for Standard Cells . . . . . . . . . . . . . . . . . .. . . 62.2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2. EDA Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.3. Standard Cells Library . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3. Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3.1. XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.2. Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3.3. Flip-Flop and Register . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4. Introduction to Low Power Design . . . . . . . . . . . . . . . . . . .. . . . 122.4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4.2. Dynamic Power Consumption . . . . . . . . . . . . . . . . . . . . . 132.4.3. Static Power Consumption . . . . . . . . . . . . . . . . . . . . . . . 14

2.5. Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5.1. Loop Unrolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5.2. Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5.3. Clock Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.5.4. Data Gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.6. Side-Channel Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 192.6.1. Power Analysis Method . . . . . . . . . . . . . . . . . . . . . . . . 212.6.2. Side-Channel Distinguisher . . . . . . . . . . . . . . . . . . . . .. 22

2.6.2.1. Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 23

vi Contents

2.6.2.2. Mutual Information . . . . . . . . . . . . . . . . . . . . . 232.6.2.3. Zero-Offset . . . . . . . . . . . . . . . . . . . . . . . . . 242.6.2.4. Distance of HW/HD . . . . . . . . . . . . . . . . . . . . . 24

2.6.3. Theoretical Countermeasures . . . . . . . . . . . . . . . . . . . .. . 242.6.4. Practical Countermeasures . . . . . . . . . . . . . . . . . . . . . .. 25

2.7. Signal to Noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

3. Implementation of PRESENT 293.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2. Parallel Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 30

3.2.1. Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.2. VHDL Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.3. Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3. Round-based Architecture . . . . . . . . . . . . . . . . . . . . . . . . .. . 313.3.1. Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3.2. VHDL Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.3. Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4. Serial Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 343.4.1. Design Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4.2. VHDL Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.4.3. Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5. Crypto Coprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.6. Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.6.1. Metrics and used design flow . . . . . . . . . . . . . . . . . . . . . .373.6.2. Low cost passive smart devices . . . . . . . . . . . . . . . . . . .. 383.6.3. Low cost active smart devices . . . . . . . . . . . . . . . . . . . .. 393.6.4. High end active smart devices . . . . . . . . . . . . . . . . . . . .. 39

4. Adiabatic Logic 414.1. Introduction to adiabatic Logic . . . . . . . . . . . . . . . . . . .. . . . . . 414.2. Previous Work of Adiabatic Logic . . . . . . . . . . . . . . . . . . .. . . . 444.3. CMOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4. CAL - Clocked CMOS Adiabatic Logic . . . . . . . . . . . . . . . . . . . . 454.5. PAL - Pass-Transistor Adiabatic Logic . . . . . . . . . . . . . .. . . . . . . 474.6. CRSABL - Charge Recycling Sense Amplifier Based Logic . . . . . . . .. . 474.7. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5. Power Analysis 535.1. Analysis Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53

5.1.1. Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.1.2. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2. Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56

Contents vii

5.2.1. CMOSPRESENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.2.2. CMOS AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565.2.3. iMDPLPRESENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.2.4. PALPRESENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.3.1. PRESENT4-Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.3.2. PRESENT8-Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.3.3. AES 8-Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.3.4. iMDPLPRESENT4-Bit . . . . . . . . . . . . . . . . . . . . . . . . . 625.3.5. PALPRESENT4-Bit . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4. Appraisement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6. Conclusion and Further Work 716.1. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.2. Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A. Bibliography 73

B. Detailed Synthesis Results 79

C. Detailed Adiabatic Logic Results 81C.1. Power traces NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81C.2. Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

D. Detailed Side-Channel Analysis Results 93

List of Figures

2.1. A top-level algorithmic description ofPRESENT. . . . . . . . . . . . . . . . . 42.2. S-Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3. P-Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4. Top down VLSI design flow . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5. XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6. Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.7. Storage elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122.8. Power savings vs. design levels . . . . . . . . . . . . . . . . . . . .. . . . . 132.9. Gajski/Kuhn Y-chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 142.10. Circuit before pipelining . . . . . . . . . . . . . . . . . . . . . . . .. . . . 162.11. Circuit with pipelining . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 162.12. Circuit before clock gating . . . . . . . . . . . . . . . . . . . . . . .. . . . 172.13. Circuit with clock gating . . . . . . . . . . . . . . . . . . . . . . . . .. . . 182.14. Side-Channel Analysis Method . . . . . . . . . . . . . . . . . . . . .. . . . 21

3.1. Datapath of the parallelPRESENTarchitecture . . . . . . . . . . . . . . . . . 303.2. Datapath of the pipelined parallelPRESENTversion . . . . . . . . . . . . . . 313.3. Datapath of the round based version . . . . . . . . . . . . . . . . .. . . . . 323.4. Finite state machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 333.5. Datapath of the serial version . . . . . . . . . . . . . . . . . . . . .. . . . . 353.6. Block diagramm ofPRESENT-128 coprocessor with 32-bit interface . . . . . 36

4.1. CMOS charging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2. Constant charging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .424.3. Adiabatic charging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 444.4. CMOS inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.5. CMOS 2-input NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.6. CAL 2-input NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.7. CAL control signal converter . . . . . . . . . . . . . . . . . . . . . . .. . . 464.8. PAL 2-input NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.9. CRSABL 2-input NAND gate . . . . . . . . . . . . . . . . . . . . . . . . . 484.10. CRSABL feedback network . . . . . . . . . . . . . . . . . . . . . . . . . . 484.11. Adiabatic NAND gates at 4ns clock period . . . . . . . . . . . .. . . . . . . 494.12. Adiabatic NAND gates at 40ns clock period . . . . . . . . . . .. . . . . . . 50

x List of Figures

4.13. Adiabatic NAND gates at 400ns clock period . . . . . . . . . .. . . . . . . 51

5.1. Side-channel analysis framework . . . . . . . . . . . . . . . . . .. . . . . . 545.2. Side-channel analysis target: CMOS . . . . . . . . . . . . . . . . .. . . . . 575.3. Side-channel analysis target: iMDPL . . . . . . . . . . . . . . .. . . . . . . 575.4. Side-channel analysis target: PAL . . . . . . . . . . . . . . . . .. . . . . . 585.5. Power tracesPRESENT-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.6. Noise SNR 0PRESENT-4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.7. Success ratePRESENT-4 with HD model . . . . . . . . . . . . . . . . . . . . 615.8. Success ratePRESENT-8 with HD model . . . . . . . . . . . . . . . . . . . . 625.9. Success rate over SNR AES with HD/HW model . . . . . . . . . . . .. . . 635.10. Success rate iMDPLPRESENT-4 with HD and HW model . . . . . . . . . . . 645.11. Success rate iMDPLPRESENT-4 HD/HW model . . . . . . . . . . . . . . . 655.12. Success rate iMDPLPRESENT-4 ZOHD/ZOHW model . . . . . . . . . . . . 665.13. Success rate over SNR iMDPLPRESENT-4 with DHD model . . . . . . . . . 665.14. Power traces PALPRESENT-4 HW model and different PC . . . . . . . . . . 675.15. Success rate over SNR PALPRESENT-4 with HW model . . . . . . . . . . . 685.16. Success rate over SNR PALPRESENT-4 with HD and ZOHW model . . . . . 695.17. Success rate over plaintexts PALPRESENT-4 with HW model . . . . . . . . . 69

C.1. Power traces CMOS NAND . . . . . . . . . . . . . . . . . . . . . . . . . . 81C.2. Power traces CAL NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . 82C.3. Power traces iCAL NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . 82C.4. Power traces PAL NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83C.5. Power traces CRSABL NAND . . . . . . . . . . . . . . . . . . . . . . . . . 83C.6. CMOS 2-input XNOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . 85C.7. CAL inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86C.8. CAL 2-input XNOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86C.9. PAL inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87C.10.PAL 2-input XNOR gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87C.11.CRSABL inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88C.12.CRSABL 2-input XNOR gate . . . . . . . . . . . . . . . . . . . . . . . . . 89C.13.Adiabatic XNOR gates at 4ns clock period . . . . . . . . . . . . .. . . . . . 90C.14.Adiabatic XNOR gates at 40ns clock period . . . . . . . . . . . .. . . . . . 91C.15.Adiabatic XNOR gates at 400ns clock period . . . . . . . . . . .. . . . . . 92

List of Tables

2.1. Input and output relations of S-Box . . . . . . . . . . . . . . . . . .. . . . . 52.2. Input and output relations of P-Layer . . . . . . . . . . . . . . .. . . . . . . 62.3. Truth table of an XOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4. Truth table of a MUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5. Truth table of a FF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.6. Hamming weight of an 8-bit data masked by a single mask bit . . . . . . . . 25

3.1. Results ofPRESENT@ 100 kHz . . . . . . . . . . . . . . . . . . . . . . . . 383.2. Results ofPRESENTround based . . . . . . . . . . . . . . . . . . . . . . . . 393.3. Results ofPRESENTpipelined . . . . . . . . . . . . . . . . . . . . . . . . . 403.4. Results ofPRESENTco-processor . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1. Success rates of iMDPLPRESENT-4 at 0.01 ns using the DHW power model 64

B.1. Results ofPRESENTround @ 100 kHz . . . . . . . . . . . . . . . . . . . . . 79B.2. Results ofPRESENTround @ 10 MHz . . . . . . . . . . . . . . . . . . . . . 79B.3. Results ofPRESENTpipeline @ 100 kHz . . . . . . . . . . . . . . . . . . . . 80B.4. Results ofPRESENTpipeline @ 10 MHz . . . . . . . . . . . . . . . . . . . . 80B.5. Results ofPRESENTserial @ 100 kHz . . . . . . . . . . . . . . . . . . . . . 80B.6. Results ofPRESENTserial @ 10 MHz . . . . . . . . . . . . . . . . . . . . . 80

Nomenclature

ASIC Application Specific Integrated Circuit

CAL Clocked CMOS Adiabatic Logic

CG Clock Gating

CMOS Complementary Metal Oxide Semiconductor

CRSABL Charge Recycling Sense Amplifier Based Logic

DG Data Gating

DPA Differential Power Analysis

DSP Digital Signal Processing

EDA Electronic Design Automation

FF Flip-Flop

FSM Finite State Machine

GDS2 Graphic Data System

GE Gate Equivalence

HD Hamming Distance

HW Hamming Weight

IP Intellectual Property

MI Mutual Information

NMOS N-type Metal-Oxide Semiconductor

PAL Pass-Transistor Adiabatic Logic

PMOS P-type Metal-Oxide Semiconductor

xiv Nomenclature

RFID Radio Frequency Identification

RTL Register Transfer Level

SCA Side-Channel Analysis

SDF Standard Delay Format

SNR Signal to Noise Ratio

SPN Substitution-Permutation Network

TCL Tool Command Language

VHDL Very high speed integrated circuit Hardware Description Language

VITAL VHDL Initiative Toward ASIC Libraries

VLSI Very Large Scale Integration

1. Introduction

This chapter illustrates the intention behind this work. The motivation and the goal are out-lined, as well as the necessary steps to reach it. Also, the structure of the thesis is decribed.

1.1. Motivation

The block cipherPRESENTis a new lightweight cipher designed for low hardware require-ments. This block cipher is designed to fulfill the strict needs of ubiquitous computing. RFIDtags and sensor nodes are such low-cost applications that require a small footprint in hardwareand a low average and peak power consumption. Till now, only one hardware realization ofPRESENTexists with the goal of minimum area [2]. So this work tries tofind architecturesthat are optimized for all aspects and to improve the existing design. Until now the most blockciphers were designed for high throughput rates. Afterwards they were reshaped and opti-mized for minimum area. Unfortunately for most of them thereexists no power consumptionstatistics. On the other hand, there exists several attacksagainst hardware implementations,that do not care about the theoretical security, because they exploit physical weaknesses ofthe device. This so-called side-channel attacks have become a dangerous threat in the pastyears. To analyze side-channel aspects of lightweight block ciphers and to find new solutionsto reduce the susceptibility to side-channel attacks is another motivation of this thesis.

1.2. Goal

The goal of this thesis is to determine and optimize different key figures of thePRESENTblockcipher like area, throughput, and power consumption. So onecan choose the architecture thatmeets the requirements most suitable. Furthermore, the derived metrics can be compared toother block ciphers like AES, HIGHT, and CAST. Afterwards we will focus on an architecturethat has a very small footprint in hardware and we will explore which side-channel attackachieves the best results. At the end, adiabatic logic styles are utilized to implement a blockcipher with the goal to find better countermeasures then existing CMOS logic styles offer.

2 Introduction

1.3. Approach

Three architectures with different design constraints will be created:

• Parallel: for a high throughput rate

• Mix: for a good time-area ratio

• Serial: for low power and low area requirements

All three architectures feature encryption only.

First a VHDL model of each architecture will be generated andtested withMentor Model-Sim. Than the working model will be compiled into an ASIC design by usingSynopsys DesignCompiler. The next step is the optimization where several techniquesdescribed in Section 2.5will be implemented. Finally,Design Compilerwill be used to generate several reports ofarea, timing, and power consumption. In the next step, several well know side-channel attacksare carried out on a serialized implementation using standard CMOS logic and side-channelresistant logic. Therefore, we use SPICE simulations and process the results with power con-sumption prediction models implemented in MATLAB. Then a newlogic style will be chosento implement the architecture. The same side-channel attacks will be performed again. The re-sult will show that adiabatic logic can reduce the side-channel leakage significantly comparedto side-channel resistant logic using normal CMOS.

1.4. Structure

The remainder of this work is organized as follows. The second chapter deals with the funda-mentals needed to understand this work, like thePRESENTalgorithm, the basic VLSI standardcell design flow, the used tools, the characteristics of the components, and some improvementmethods in the design flow. Also, an introduction to side-channel attacks, the different pre-diction models, and countermeasures in theory and practicewill be given. The third chapterdescribes the three different architectures of thePRESENTalgorithm, their implementation re-sults, and a comparison to other ciphers. New logic styles using the very interesting adiabaticprinciple are described in the fourth chapter. They possible side-channel leakage of differentgates will be simulated, too. In the fifth chapter side-channel attacks will be performed tocompare the different logic styles using thePRESENTS-Box. A conclusion and further workcan be found in the sixth chapter.

2. Fundamentals

This chapter describes the used encryption algorithm in Section 2.1. It gives an impression ofthe ASIC design flow and the used software tools in Section 2.2. To implement the algorithmseveral standard components or building blocks are used, which are described in Section 2.3.Section 2.4 illustrates the components of low power design.Section 2.5 introduces possibleimprovement techniques to reduce power consumption and area usage. Section 2.6 introducesthe powerful side-channel attacks. Finally, the signal to noise ratio is explained.

2.1. Introduction to the PRESENTAlgorithm

PRESENT is a new hardware-optimized block cipher. It was presented at CHES 2007 [2].The authors attention are mainly area and power constraints. The cipher is a substitution-permutation network (SPN) with 64-bits block size and 80 or 128 bits of key (from here onreferred to asPRESENT80 for the 80 bit version andPRESENT-128 for the 128 bit version).The authors focus onPRESENT, because 80-bits provide a security level which is sufficient formany RFID driven applications.PRESENThas 31 regular rounds and a final round that onlyconsists of the key mixing step. One regular round consists of a key mixing step, a substitu-tion layer, and a permutation layer. Figure 2.1 shows a top-level algorithmic description ofPRESENT.

Bogdanov et al. state "that a carefully designed block cipher could be a less risky undertak-ing than a newly designed stream cipher". Because the art of block cipher design seems to be alittle better understood than that of stream ciphers. The eSTREAM [7] project delves into thedesign and understanding of compact stream ciphers. On the other hand they are, potentially,more compact than block ciphers. So if a block cipher requires similar hardware resources assuch a compact stream cipher it could be very interesting.

According to Shannon, there are two primitive operations for encryption: confusion anddiffusion. Their realizations are described in the next twosections.

2.1.1. S-Box

Confusion is an encryption operation where the relationshipbetween plaintext and ciphertextis obscured. InPRESENTconfusion and diffusion are achieved within the substitution-layerand the permutation-layer, see next section.

4 Fundamentals

generateRoundKeys()for i = 1 to 31do

addRoundKey(STATE,Ki)

sBoxLayer(STATE)pLayer(STATE)

end foraddRoundKey(STATE,K32)

plaintext

?e

?sBoxLayer

pLayer

?...?

sBoxLayer

pLayer

?e

?ciphertext

key register

?

qaddRoundKey�

...

update

?

?

update

addRoundKey�

Figure 2.1.: A top-level algorithmic description ofPRESENT.

Sx S(x)

Figure 2.2.: S-Box

The substitution layer consists of 16 S-Boxes in parallel that each have 4 bit input and 4bit output (4x4): S : F

42 → F

42. This implementation is much more compact than that of an

eight-bit S-Box. The so-calledavalanche of changeis improved, because the S-Box fulfillsthe following conditions, where the Fourier coefficient of Sis

SWb (a) = ∑

x∈F42

(−1)〈b,S(x)〉+〈a,x〉

1. For any fixed non-zero input difference∆I ∈F42 and any fixed non-zero output difference

∆O ∈ F42 we require

{x∈ F4

2 |S(x)+S(x+∆I ) = ∆O}≤ 4

2. For any fixed non-zero input difference∆I ∈ F42 and any fixed output difference∆O ∈ F4

2such that wt(∆I )=wt(∆O)=1 we have

{x∈ F4

2 |S(x)+S(x+∆I ) = ∆O}

= 0

3. For all non-zero a∈ F42 and all non-zero b∈ F4

2 it holds that∣∣SW

b (a)∣∣ ≤ 8.

2.1 Introduction to thePRESENTAlgorithm 5

4. For all a∈ F42 and all non-zero b∈ F4

2 such that wt(a)=wt(b)=1 it holds thatSWb (a) =±4

These conditions will ensure thatPRESENTis resistant to differential and linear attacks. Seedetailed argumentation at [2].

The S-Box is given in hexadecimal notation according to Table2.1 and the symbol for thedatapath figure is depicted in Figure 2.2. It was selected outof a set of 16 different optimalS-Boxes under the aspect of a small footprint in hardware. More details can be found in [25].

x 0 1 2 3 4 5 6 7 8 9 A B C D E FS(x) C 5 6 B 9 0 A D 3 E F 8 4 7 1 2

Table 2.1.: Input and output relations of S-Box

The current STATEb63...b0 if the sBoxLayer is considered as sixteen 4-bit wordsw15...w0

wherewi = b4∗i+3 ||b4∗i+2 ||b4∗i+1 ||b4∗i for 0≤ i ≤ 15 and the output nibble S(wi) providesthe updated state values in the obvious way.

2.1.2. P-Layer

Pi P(i)

Figure 2.3.: P-Layer

Diffusion is defined by spreading out the influence of one plaintext letter over many cipher-text letters.

The bit permutation used inPRESENTis given by Table 2.2 and the symbol for the datapathfigure in Figure 2.3. Biti of the STATE is moved to bit positionP(i). It can be implementedin hardware very efficient by rewiring. This comes along withinfinitesimal area costs.

2.1.3. Key Schedule

PRESENTuses a key length of 80 bit and a round key length of 64 bit. The key is suppliedby the user and stored in a key registerK. Its bitwise representation isk79k78...k0. The keyschedule ofPRESENTconsists of a 61-bit left rotation, an S-Box, and an XOR with a roundcounter. Note thatPRESENTuses the same S-Box for the datapath and the key schedule, whichallows to share resources. The round keyKi = κ63κ62...κ0 consists of the 64 most significant(i.e. leftmost) bits of the actual content of registerK wherei is the round.

6 Fundamentals

i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15P(i) 0 16 32 48 1 17 33 49 2 18 34 50 3 19 35 51

i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31P(i) 4 20 36 52 5 21 37 53 6 22 38 54 7 23 39 55

i 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47P(i) 8 24 40 56 9 25 41 57 10 26 42 58 11 27 43 59

i 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63P(i) 12 28 44 60 13 29 45 61 14 30 46 62 15 31 47 63

Table 2.2.: Input and output relations of P-Layer

Ki = κ63κ62...κ0 = k79k78...k16

The key register is rotated by 61 bit positions to the left, the left-most four bits are passedthrough thePRESENTS-Box, and theround_counter value i is exclusive-ored with bitsk19k18k17k16k15 of K with the least significant bit ofround_counter on the right. For furtherdetails, the interested reader is referred to [2].

2.2. VLSI Design Flow for Standard Cells

A logic circuit is a very complex structure. To handle the numerous steps from a model toa fabricated chip a design flow has been developed. The programming languages VHDL orVerilog guarantee flexible porting to different target platforms.

2.2.1. Overview

Figure 2.4 illustrates a design flow for an ASIC approach. Thetop-down methodology consistsof the following parts:

• VHDL model creationThe synthesizable Model is described in VHDL(Very High Speed Integrated CircuitHardwareDescriptionLanguage) at RTL (RegisterTransferLevel) . So there is a clearseparation between control parts (e.g. FSM -finite statemachines) and operative parts(e.g. arithmetic and logic units). Tools used at this step can range from simple texteditors to dedicated graphical environments that generateVHDL code automatically.

• RTL simulationTo validate the VHDL RTL model it is integrated into a test bench and simulated byusing a logic simulator.

2.2 VLSI Design Flow for Standard Cells 7

Figure 2.4.: Top down VLSI design flowa

asource: http://lsmwww.epfl.ch/Education/former/2002-2003/VLSIDesign/FLOW/fr.html

8 Fundamentals

• RTL synthesisThe synthesis process compiles the input RTL description into a possible gate-levelrealization. The user defines several constraints such as area, timings, or power con-sumption. The synthesis step generates several outputs: a gate level VHDL netlist forpost synthesis, a Verilog gate-level netlist as input for the place & route step, and a SDFdescription that includes delay information for simulation.

• Post-synthesis gate-level simulationThis process step uses the VHDL models for the logic gates matched to the standard celllibrary. To ensure a proper back-annotation of delay through the SDF files generatedby the synthesis or place & route process these models followthe modeling VITALstandard.

• Standard cell place & routeThe place & route step infers a geometric realization of the gate-level netlist. All logicgates in the library have the same height, but may have different width. The logiccells are putted together in rows of equal height, where eachcell has a power rail at itstop and a ground rail at its bottom. The current processes allow several metal layersfor the interconnections, so the rows may be flipped to share power and ground rails.Several outputs are generated: a geometric description(layout) in GDS2 format, a SDFdescription that now includes interconnect delay, and a Verilog gate-level netlist whichincludes further timing optimization, clock tree generation, and routing (e.g. bufferinsertion).

• Post-layout gate-level simulationThe more exact SDF data extracted from the layout and the Verilog gate-level descrip-tion are used to simulate the model after the place & route step.

• System-level integrationThe final layout description can be integrated into a whole system as a block.

In this work we focused on the first four steps, i.e. no place & route was carried out.

2.2.2. EDA Tools

The following tools, running on SunOS 5.9, were used during the design flow:

• EmacsEmacs [8] is a simple text editor with VHDL syntax highlighting to create the gate-levelmodel.

• ModelSimSE PLUS 5.8cModelSimfrom Mentor Graphics[34] is a logic simulator. It is used to validate theVHDL and Verilog gate-level models. SDF timing informationcan be included later onin the design flow.

2.3 Building Blocks 9

• Design CompilerY-2006.06Synopsys[59] Design Compiler(DC) is a command line based synthesis tool. It matchesthe VHDL model to the target library. All reports (area, timing, power) are generatedwith DC. There is also a graphical interface, calledDesign Vision. While compilingmany different models and generate reports and post-synthesis files it is easier to useDC and TCL-scripts .

A very good tutorial dealing with the whole design flow and thehandling of the tools is top-down digital design flow [66] from the university of Lausanne. The TCL scripts were used asa starting point for modifications.

2.2.3. Standard Cells Library

In a standard cell library the elements of an electronic circuit of a specific technology aredefined. It is provided by a foundry or an IP company as part of aso called design-kit. Thesecells are for example, Multiplexer, NAND, NOR, other logic gates, latches, and flip-flops. Themain attribute of the technology is given by the gate length of a transistor. The library containsinformation about the used area, timing, and power consumption as well as the logic functionof a cell. Each logic function is implemented in several gates to accommodate several fanoutcapabilities or drive strength. To synthesize the design weused three different standard celllibraries available through the European Commission initiative EUROPRACTICE [9].

• AMI 0.35 µm library MTC45000Voltage 3.3 V

• IHP 0.26µm library SGB25Voltage 1.8 V

• Virtual Silicon 0.18µm library UMCL18G212T3Voltage 1.8 V

To compare the different technologies the common unit Gate Equivalence (GE) is used.Therefore the needed area of a circuit is normalized, by dividing the whole area by the area ofa NAND Gate with the highest driving strength (4 transistors).

2.3. Building Blocks

A standard cell library offers various elements. For example the MTC45000 features 393 andthe UMC18G212T3 features 600 different standard cells. All layers of thePRESENTalgorithmconsist of more or less complex components. This subsectiondescribes their structure andtheir behavior.

10 Fundamentals

2.3.1. XOR

B

A Z

Figure 2.5.: XOR

One of the most popular building block within an encryption algorithm is the modulo 2addition. This operation is equivalent to a 2-input XOR operation as you can see in Table 2.3.Figure 2.5 shows the XOR symbol that is used in the datapath figures.

A B Z A+B mod 2

0 0 0 0 + 0 = 0 mod 20 1 1 0 + 1 = 1 mod 21 0 1 1 + 0 = 1 mod 21 1 0 1 + 1 = 0 mod 2

Table 2.3.: Truth table of an XOR

2.3.2. Multiplexer

1 0

A B

Sel

Z

Figure 2.6.: Multiplexer

2.3 Building Blocks 11

A multiplexer is a simple switching element. It has two or more n-bit wide inputs and onen-bit output. By using a select signalSel one of the inputs is mapped to the output. Figure2.5 shows the MUX symbol that is used in the datapath figures.A andB are the inputs.Z isthe output. Table 2.4 describes the logic behavior.

A B Sel Z

0,1 - 0 A- 0,1 1 B

Table 2.4.: Truth table of a MUX

2.3.3. Flip-Flop and Register

A flip-flop (FF) consist of two cross-coupled inverting elements (transistors, NAND, or NOR-gates) and an enable/disable mechanism. This devices are designed for synchronous systems.The input is stored only at the transition of a dedicated clock signal otherwise the inputs willbe ignored. Some flip-flops change at the rising edge of the clock, others at the falling edge.This causes the flip-flop to either change or retain its outputsignal based upon the values of theinput signals at the transition. Furthermore there exist different types of flip-flops dependingon their inputs and outputs ports. In this study work we only use the D ("data")- FF which hasan inputD and an outputQ as shown in Figure 2.7(a). To control the device aCLK ("clock")signal is needed. The outputQ takes the value of the inputD and delays it by one clockcycle. Table 2.5 presents this behavior in a compact way. So the D-FF can be interpreted as aprimitive memory cell.

CLK D Q

0 - Q1 0,1 Qt−1

Table 2.5.: Truth table of a FF

If a FF should store the data at the input for more than one clock cycle an additional multi-plexer at the input is needed. Figure 2.7(b) shows such a register. The output of the FF is leadback to the multiplexer. By switching between the old value and a new input the FF can holdthe data. To store more than one bit registers are connected in parallel. It is called a registerbank.

12 Fundamentals

D

Q

Input

OutputCLK

(a) D-flip-flop

D

Q

Input

Output

CLK

10

Load

(b) Register

Figure 2.7.: Storage elements

2.4. Introduction to Low Power Design

Because power consumption is becoming more and more interesting in many fields of cryp-tography it is necessary to understand what factors result in low power.

2.4.1. Overview

In 1983 Daniel Gajski and Robert Kuhn [13] invented a detailedabstraction model for hard-ware design. The Y-chart in Figure 2.9 visualize modeling abstraction levels as well as designhierarchies and design views. This views are shown as radialaxes. The concentric circlescharacterize the different hierarchical levels of the design process, with increasing abstractionfrom the inner to the outer circle. The behavioral domain describes the functional behavior ofa system. The subsystems and their connections are shown at the structural axis. The geom-etry domain presents information about geometric properties of the subsystems like size, theshape, and the physical placement.

In mobile devices the battery is the limiting power factor. So a major goal should be todesign low power components. Figure 2.8 shows the differentdesign levels and their powersaving potential. At system level partitioning, SoC (System on Chip) and dynamic voltagescaling are commonly used. The algorithm should be of low complexity, regular, and imple-mented with a minimizing number of operations. The architecture can use techniques likeclock gating, parallelism, pipelining, and memory partitioning. These three levels togetherhave a very high power saving potential. Also the design timefor the implementation is not sohigh, because the VHDL description language can be used. At register transfer and transistorlevel it is not so easy and powerful to achieve power savings.Because we use standard celllibraries the manufacturer has to implement techniques like multi Uth and transistor sizing.The synthesis tool has to support logic optimization and reduction of UDD.

2.4 Introduction to Low Power Design 13

Pot

entia

l for

pow

er d

issi

patio

n sa

ving

s

Des

ign

Tim

e

System

Algorithm

Architecture

RTL

Transistor

Power Dissipation Savings

Up to 400 %

Up to 20 %

Figure 2.8.: Power savings vs. design levels

2.4.2. Dynamic Power Consumption

The power dissipation of CMOS circuits can be decomposed intostaticanddynamiccompo-nents, see Equation 2.1.

Psum= Pleak+Psc+Pdyn (2.1)

The latter occurs only during transients, when the gate of the transistor is switching. It isattributed to the charging of capacitors and temporary current paths between the supply rails;therefore it is proportional to switching frequency:the higher the number of switching, thegreater the dynamic power consumption. Equation 2.2 shows this relation.

Pdyn = α ×CL ×U2DD × f (2.2)

Whereα denotes the probability for 0→ 1 transition, f denotes the switching frequency, andUDD denotes the supply voltage. During each switching activitythe load capacitanceCL of thetransistor is charged or discharged. The energy taken from the supply is equal toCL ×U2

DD.The total capacitance increases as more gates are used in a circuit. A very effective methodfor power saving is to lower the supply voltage. For instance, halvingUDD drops the powerassumption to a fourth. The last factor is the switching activity α. In reality, not all gates ina circuit switch at the same time. While the activity is easilycomputed for an XOR it turnsout to be far more complex in the case of more complex gates like the S-Box. The switchingactivity α is a function of the nature and the statistics of the input signals. All this sub-microneffects and equations are explained in detail by Rabaey [44].

In actual designs, the rise and fall time of the input wave form is not zero. The finite slopeof the signal causes a direct current path betweenUDD and GND for a short period of time

14 Fundamentals

StructuralBehavioral

Geometry

System

Algorithms

Register-Transfer

Logic

Transfer functions

CPU, Memory

Subsystem, Buses

ALUs, Registers

Gates, Flipflops

Transistors

Polygons

Cells, Module Plans

Macros, Floor Plans

Clusters

Chips, Physical Partitions

Figure 2.9.: Gajski/Kuhn Y-chart

τ during switching, while NMOS and PMOS transistors are conducting simultaneously. Theaverage power consumption is calledshort circuit power. Equation 2.3 defines theshort circuitpower,

Psc =k12

τ(UDD −2Uth)3 f (2.3)

wherek denotes the gain factor of a transistor,τ denotes the slew rate,UDD denotes the supplyvoltage,Uth denotes the subthreshold voltage, andf denotes the switching frequency.

2.4.3. Static Power Consumption

The largest percentage of static power results from source-to-drain subthreshold leakage cur-rent, which is caused by reduced threshold voltagesUth that prevent the gate from completely

2.5 Improvements 15

turning off. Therefore it is calledleakage power. Equation 2.7 defines theleakage power,

Pleak = Ileak×UDD (2.4)

whereIleak denotes the cumulative leakage current andUDD denotes the supply voltage.

The MTC4500 standard cell library defines no values forleakage power. At such a pro-cess technology and such a supply voltageleakage poweris very small and can be ignored.Whereas the UMCL18G212T3 library uses a smaller process technology and a lower supplyvoltage, which draws nearer to the subthreshold voltage of about 0.7 V. So the leakage powerbecomes more important.

More information, especially for power consumption at transistor level can be found in [48],[45], and [44].

2.5. Improvements

There exist some very well known techniques that can be included into existing VHDL sourcecode or that are supported byDesign Compiler. They can increase the speed of a circuit ordecrease the used area and consumed power.

2.5.1. Loop Unrolling

As depicted in Figure 2.1 thePRESENTalgorithm can be implemented as a loop. After eachround the internal state has to be stored, so you get a time overhead caused by storing andfetching the data. So one possible method is to unroll the loop, that means to increase thenumber of operations in each round. (e.g. two states are computed in one round; the numberof round halves from 32 to 16). This can result in large increases in the size of generatedhardware. On the other hand it can dramatically increase theperformance.

2.5.2. Pipelining

The pipelining technique is used to increase the throughputof a design. It partitions blocksof combinatorial logic inton stages. These stages are separated by banks of registers, socalled pipeline registers. To get a minimum cycle time the delays between the register banksshould be equal. The pipelined design throughput is equal ton times throughput of a nonpipelined design and has a latency ofn clock cycles. Latency is the number of clock cyclesneeded to propagate the results from the input to the output which is a disadvantage. Anotherdisadvantage of pipelining is the increased gate count, because of the additional registers tohold the values of the previous blocks. An advantage is that avalid result at the output isproduced after every clock cycle. This condition applies only after the first set of data has

16 Fundamentals

propagated through the design. So pipelining is most usefulfor systems receiving data everyclock cycle in which the clock period is small.

TheDesign Compilercommandbalance_registers ungroups the design and moves theregisters through the design to achieve a minimum cycle time. For more information howDesign Compilerhandles pipelining see the user manual [57].

Sample script:read_file -f vhdl design_to_be_piped.vhd

create_clock clk -period no_of_stages

compile -map_effort low

create_clock clk -period desired_clk_period

balance_registers

compile -map_effort high

To insert pipeline registers append the number of needed registers to the end of the VHDLcode.Design Compilerwill use this additional registers and place them in the design.

Logic 1

D

Q D

Q

Logic 2 Logic 3

Data In

Data out

CLK

Figure 2.10.: Circuit before pipelining

Logic 1

D

Q D

Q

Logic 2 Logic 3

Data In

Data out

CLK

D

Q

D

Q

Figure 2.11.: Circuit with pipelining

2.5.3. Clock Gating

The clock gating technique (CG) is well known. It can be applied to synchronous load en-able registers, which are groups of flip-flops that are connected to the same clock and controlsignals. Normally a register is implemented by use of a flip-flop, feedback loop, and a multi-plexer (see Section 2.3.3). When this register maintains thesame logic value through multipleclock cycles they unnecessarily use power.

2.5 Improvements 17

D

Q

10

Control LogicD

Q

CLK Enable

Data In

Data Out

Flip Flop

Register Bank

Multiplexer

Figure 2.12.: Circuit before clock gating

Figure 2.12 shows such a realization. When the synchronous enable signal (EN) is ’0’, thecircuit uses the multiplexer to feed theQ output of each storage element in the register bankback to theD input. This means the register is disabled and holds the logic value. When theEN

signal is at logic state ’1’ the register is enabled and new values can be load. Such feedbackloops can unnecessarily use power, if the same value is reloaded in the register throughoutmultiple cycles. Furthermore the register bank, its clock network, and the multiplexer consumepower.

CG eliminates the feedback net and multiplexer inserting a 2-input gate in the clock net ofthe register. Depending on the type of register and gating style, OR, NOR, AND, or NANDgates can be used. A latch-based clock gating with AND gate and waveform of the signals isshown in Figure 2.13.

The clock input to the register bank,ENCLK, is gated on or off by the AND gate.ENL isthe enabling signal that controls the gating. The register bank is triggered by the rising edgeof the ENCLK signal. The latch prevents glitches on theEN signal from propagating to theregister’s clock pin. When theCLK input of the 2-input AND gate is at logic ’1’, any glitchesof theEN signal could without the latch propagate and corrupt the register clock signal. Thelatch eliminates this possibility because it blocks signalchanges when the clock is at logic ’1’.Eliminating the latch can slightly reduce power dissipation and area. However, the latch freemethod has a significant drawback: theEN signal must be stable at its new value before thefalling clock edge. Otherwise any glitches on the EN after the trialing edge of the clock leadto glitching and corruption of the gated clock signal.

By controlling the clock signal for the register bank, the need for reloading the same valuein the register through multiple clock cycles is eliminated. Clock gating reduces the clocknetwork power dissipation, relaxes the data path timing, and reduces routing congestion by

18 Fundamentals

D

Q

Control LogicD

Q

CLK

Enable

Data In

Data Out

Flip Flop

Register Bank

Latch

AND

ENCLK

CLK

EnLatch

CLK

Enable

ENCLK

D Q

G

CLK

EnLatch

Figure 2.13.: Circuit with clock gating

removing feedback multiplexer loops. For designs that havelarge multi-bit registers, clockgating can save power and further reduce the number of gates in the design. However, forsmaller register banks, the overhead of adding logic to the clock tree might not compare fa-vorably to the power saved by eliminating a few feedback netsand multiplexers. For moreinformation howDesign Compilerhandles clock gating see [58].

2.5.4. Data Gating

Data gating is a technique to reduce the power consumption. In most designs combinationalcircuits may contribute to the majority of power consumption. If the output of a datapath cir-cuit is not observed the data gating (DG) approach can reducethe dynamic power by addingisolation logic along with the control signal to hold the input of the datapath operator con-stant. For this purpose AND or OR gates can be used. Thereforeno switching activity at theinputs propagates through the circuit and causes dynamic power consumption. Sometimes thisapproach is calledsleep logic, because the circuit seems to be inactive.

To perform DGDesign Compileruses a submodule calledPower Compilerif the followingconditions are met:

• Object is an arithmetic operator or combinational hierarchical cell

2.6 Side-Channel Attacks 19

• Fanout of the object has observability don’t care conditions

• By inserting data gating the dynamic power consumption of the circuit will be reduced

There are two approaches to incorporate DG into the design flow:

1. Two-pass approach:This entails an initial compile followed by an incremental compile. In the first stageisolation logic is inserted, followed by timing and power analysis.

2. One-pass approach:This entails only one compile step. DG is performed during the mapping stage whilerollback take place during timing optimization in the same compile.

To use data gating and clock gating specify the following in the synthesis script ofDesignCompiler

set_clock_gating_style

insert_clock_gating

set do_operand_isolation true

set_operand_isolation_style -logic adaptive -verbose

compile_ultra

2.6. Side-Channel Attacks

In modern security systems cryptographic algorithms are used to provide confidentiality, in-tegrity, and authenticity of data. The details of these algorithms, e.g.PRESENT[2] or AES [38],are publicly available. Typically, two parameters are taken as input of the mathematical func-tion: the known message (plaintext) and the cryptographic key. The encrypted output is calledciphertext. Hence, all data are known except the key, which is kept secret. This importantprinciple is calledKerckhoffs’ Principle. Breaking a cryptographic algorithm means findingthe secret key based on public information, which are typically pairs of plaintexts and cipher-texts. Embedded computers, smart cards, and RFID tags are electronic devices that implementcryptographic algorithms and that store cryptographic keys. The electrical and magnetic emis-sion of a physical device depends on the data it processed, which was first discovered in 1943by accident [39]. At that time a researcher at Bell laboratorynoticed that each time a machine,which was used to encrypt the teletypewriter communication, stepped a spike appeared on anoscilloscope in a distant part of the lab. After some examination he found that he could readthe plaintext of the message being enciphered by the machine. This leads to a new issue forpractical security of the cryptographic algorithms. Not only the mathematical resistance of thealgorithm against attacks is of interest, in fact the security of the device that implements thealgorithm needs to be considered. In recent years, manifoldmethods to attack a cryptographicdevice and reveal the secret key have been published. However, they differ significantly in

20 Fundamentals

terms of cost, time, equipment, and expertise needed. The most commonly approach to cat-egorize these attacks is based on two criteria. The first criterion is if the attack is active orpassive

• Passive: The cryptographic device is operated within its specification. By observing thephysical properties of the device (e.g. execution time, power consumption, electromag-netic radiation) the key is revealed.

• Active: The cryptographic device, its inputs and/or its environment are manipulated tomake the device behave abnormally. The key is revealed by exploiting this behavior.

The second criterion is the interface that is used to mount the attack. To access them sometimesspecial equipment is needed. It can be distinguished between invasive, semi-invasive, and non-invasive attacks. All of these attacks can be passive or active.

• Invasive: This is the strongest type of attack. It typically starts with the depackaging ofthe device. Then components of the device are accessed directly using a probing station.If only data signals are observed the attack is passive. But ifsignals are changed to alterthe functionality of the device by using laser cutters, probing stations, or ion beams.Invasive attacks are very powerful, but require expensive equipment.

• Semi-Invasive: The package of the chip is also removed. But in contrast to invasiveattacks no direct electrical contact to the chip surface is made. A passive attack readsout the content of a memory cell without using the standard output ports of the chip.When using X-rays, electromangetic fields, or light to inducefaults in the device theattack becomes active. This type of attacks do not require such an expensive equipmentlike invasive ones, but the process of locating the right attack position on the surface ofthe chip requires some time and expertise.

• Non-Invasive: The device is attacked using only directly accessible interfaces. Thistype of attacks leaves no marks, because the device is not altered. The passive attackshave recieved a lot of attention. There are often referred toas side-channel attacks.One of the most important side-channel attacks is the differential power analysis attack(DPA) [23]. Other attacks determine the key by measuring theexecution time or theelectromagnetic field. By inducing power glitches, clock glitches, or by changing thetemperature the attacks become active. Most non-invasive attacks can be performedusing only relatively inexpensive equipment.

The bookPower Analysis Attacks - Revealing the Secrets of Smart Cards[31] by Mangard,Oswald, and Poppp gives a very detailed introduction to the field of power analysis attacksand countermeasures. Another good source for information about actual side-channel attacksis theSide-Channel Attack Database[47] website.


Algorithm

Recording

Pre-Processing

Analysis

Secret Key

Plaintext Ciphertext

Device

Side Channel

Secret Key

Figure 2.14.: Side-Channel Analysis Method

2.6.1. Power Analysis Method

This work only focuses on power analysis attacks for the lackof real cryptographic devices.We used power simulation results of the cryptographic algorithm instead of taking measure-ments. The further procedure is the same as attacking a real device and the measured dataare close to reality. In consideration of the high cost and effort to produce a chip fabricationthis approach is commonly used to verify the side-channel resistance of an implementation atfirst. The basic concept of side-channelattacks is illustrated in Figure 2.14. The idea behindthis attack is that the power consumption or the electromagnetic radiation of the device de-pends on the data it processes. Mostly, a very small resistoris placed in the ground line of thedevice. The voltage drop along this resistor, which is proportional to the power consumptionof the device, is measured using an oscilloscope. This powertrace is recorded together withthe corresponding input (plaintext) and output (ciphertext) during the cryptographic operation.The secret key, which is stored in the device and unknown to the attacker, is used to encryptthe plaintext. If the attacker could discover the secret keythe security is broken. In somecases the measured power traces need to be processed with digital signal processing (DSP)algorithms. This pre-processing step reduces the amount ofdata to be analyzed and improvesthe signal quality to get a clearer data dependency. The nextstep is the actual attack. Foreach possible key candidate the power traces for the known plaintext and ciphertext pairs andintermediate values of the cryptographic algorithm are predicted. Then different statistical testare used to compare the predicted traces with the measured ones. The traces that matches the

22 Fundamentals

best should reveal the secret key. The success rate of the attack depends on the statistical test,that describes the data dependency of the cryptographic device most suitable.

2.6.2. Side-Channel Distinguisher

To reveal the secret key the attacker has to predict the data dependency of the cryptographicdevice. For CMOS implementations two models have been established. The Hamming weight(HW) model assumes that the power consumption is proportional to the number of "1" bits inthe binary sequence, see [31].

HW(d) =̂ ∑(di = 1) (2.5)

The attacker does not need to know the value that is processedbefore or after the intermediatevalue. Because the power consumption of CMOS devices depends on the fact whether atransition of a bit occurs or not the HW-model is not very wellsuited to describe the realpower consumption. The Hamming distance (HD) model is basedon counting the 0→ 1 and1→ 0 transitions of a digital circuit during a certain time interval. The power trace is cut intosmall intervals. For each interval the overall number of transitions is calculated. Followingassumptions are made when using the HD-model to approximatethe power consumption. All0→ 1 and 1→ 0 transitions consume the same amount of power. Furthermoreall 0→ 0 and1 → 1 transitions lead to equal power consumption. This model does not consider parasiticcapacitances of wires and cells. Also, it ignores the staticpower consumption of a cell. Butthe HD-model can be used to estimate the power consumption very quickly. The HD of twovaluesd0 andd1 is defined as the number of bits that differ betweend0 andd1. Hence the HDcan be computed as the HW ofd0 XOR d1

HD(d0,d1) = HW(d0⊕d1) (2.6)

The HD-model is well suited to describe the power model of sequential cells like flip-flopsand registers or data buses that connect such cells. Because this elements are triggered by aclock signal they change their value only once each cycle. Touse the HD-model the attackerhas to know either the preceding or the succeeding value. In case of combinational logic theintermediate values are not stable and change due to glitches. In practice the HW-model isused only if the attacker has no information about the netlist of a device or is not able tocalculate consecutive data values. Thus the HD-model is thefirst choice of an attacker andusually leads to better results. To discover dependencies between the predicted power tracesand the recorded side-channel traces several statistical test can be used. Standaertet. al [53]classify the possible side-channel distinguisher under information theoretic aspects into twogroups: partition-based and comparison-based.

• In a partion-based attack the adversary defines a partitionof the leakages according toa function of the input plaintexts and key candidates. Then astatistical test is used tocheck which partition is the most meaningful with respect tothe real physical leakage.


• In a comparison-based attack the adversary models a part ofthe actual leakage emittedby the target device for each key candidate. Then a statistical test is used to compareeach trace of the model with the actual leakage.

2.6.2.1. Correlation

The linear relationship between two variables can be expressed by thecorrelation. Thereforethe Pearson coefficientρ can be computed like described in [31]. In case of correlation attacksthe relationship between measured power tracem and the predicted power tracep that consistsof n elements should be discovered.

ρ =∑n

i=1(mi − m̄) · (pi − p̄)√∑n

i=1(mi − m̄)2 · (pi − p̄)2(2.7)

To predict the power traces the HD, HW, or any other model thatapproximates the powermodel of the device can be used. The correlation coefficient can take values between plus andminus 1. The better two values are correlated the higher is the coefficient.

2.6.2.2. Mutual Information

Another proposal is the mutual information (MI) analysis introduced and utilized in [15],[27], [52]. It is a measure of general dependance between between two variables. Unlikethe formerly introduced correlation both linear and non linear relations are considered. Theconcept of MI is originally from the field of communication theory. Considering two randomvariablesX,Y the MI is defined as

I(X,Y) = H(Y)−H(Y|X) = H(X)+H(Y)−H(X,Y) (2.8)

whereH(X) or H(Y) is the marginal information entropy which measures the informationcontent in a signal andH(X,Y) is the joint information entropy which measures the informa-tion content in a joint systemX andY. The MI can also be defined as

I(X,Y) =∫

Y

∫

XρXY(x,y)log

ρXY(x,y)ρX(x)ρY(y)

dxdy (2.9)

whereρXY(x,y) is the joint probability density function betweenX andY, andρX(x) andρY(y) are the marginal probability density functions. A comparison of MI based dependancewith Pearson correlation coefficient and other traditionaltest is done in [4]. Because this testdoes not require any assumption on the leakage model of a device it is the most generic test.Though the computation of the density functions is very timeconsuming.

24 Fundamentals

2.6.2.3. Zero-Offset

In order to defeat masking, a new class of attacks has been introduced. The 2nd order attackscorrelate the power consumption at multiple times during a single computation step, e.g. oneround. The Zero-Offset DPA was developed by Waddle and Wagner [67] and introduced apreprocessing routine that attempt to correlate power traces with themselves and then applystandard power analysis to the results. This model is based on the assumption that both therandom bitr and the masked intermediate bitr + m correlate with the power consumptionat the same time. This effect occurs if the random and the masked inputs of a circuit arecomputed in parallel.

2.6.2.4. Distance of HW/HD

Another leakage model is calledDifference of Hamming Weightsor Difference of HammingDistances. It was introduced by Moradi et.al [35] to attack DPA resistant logic styles likeiMDPL, which uses masking and dual-rail, and especially focus on the hold and samplingphase of flip-flops. Similar to the previous zero-offset method a preprocessing step is requiredbefore running a DPA with the new power prediction model. Thepower values are foldedaround the estimated empirical mean value per sampled time instant. Then a classical CPA isperformed an the preprocessed power traces. The algorithm description gives an pseudocode

Algorithm 1 The attack algorithm

1: µ (p) =∑t

i=1 pit ; pi : ith sampled power value,t: # of samples

2: for all power valuespi ,1≤ i ≤ t do3: p̂i = |pi −µ (p)|4: end for5: Perform a CPA on p̂ using leakage model DHW(·) or DHD(·)

overview of the attack. The Hamming weight leakage model is shown in Table 2.6 assumingan 8-bit flip-flop as target.

The advantage of DHD/DHW is that knowledge about layout details of the physical device,i.e. the loading imbalances of wires like in [50], is not required.

DHW(

Q(t),Q(t+1))

=∣∣∣#o f Bits−2·HW

(Q(t),Q(t+1)

)∣∣∣ (2.10)

The DHW model fits the sampling phase leakage of a masked ff, while the DHD model canbe applied to the hold phase. Therefore the notation of Equation 2.10 is changed to the HD.

2.6.3. Theoretical Countermeasures

Power analysis attacks work because the power consumption of cryptographic devices dependson the intermediate values during the execution of an cryptographic algorithm. The goal of


Table 2.6.: Hamming weight of an 8-bit data masked by a singlemask bit

HW(D)HW(Dmn) µ(HW(Dmn))

DHW(D) =| HW(D0)−HW(D1) | =

mn = 0 mn = 1 | 8−2·HW(D) |

0 0 8 4 81 1 7 4 62 2 6 4 43 3 5 4 24 4 4 4 05 5 3 4 26 6 2 4 47 7 1 4 68 8 0 4 8

the countermeasures is to remove or to obfuscate this dependency. There have been proposedseveral countermeasures so far. They can be divided into twoclasses:hidingandmasking.

• Hiding: The basic idea is to remove the data dependency of the power consumption.This means the execution of the algorithm is randomized or the characteristics of thepower consumption of the device are changed, so that the attacker cannot find themeasily. Therefore each operation of the device must consumenearly the same amount ofenergy. Another approach is to randomize the power consumption by carrying out otheroperations at the same time. But devices protected by hiding countermeasures processthe same intermediate results like unprotected devices.

• Masking: The basic idea of masking is to randomize the intermediate results that areprocessed during the execution of the algorithm. The power of the randomized interme-diate values is independent of the actual intermediate results. Thus the power consump-tion of the device needs not to be modified.

This two principles can be combined as well. There exists several proposals how to use hid-ing and masking as a software countermeasure and hardware countermeasure. Some of theso called DPA-resistant logic styles to counteract side-channel attacks at hardware level aredescribed in the next Section 2.6.4.

2.6.4. Practical Countermeasures

Since SCA-resistant logic styles were introduced to counteract side-channel attacks, severalarticles have been published in order to evaluate their security. To our knowledge most of themhave taken the logic gates into account, and straightforwardly the security of the combinationalcells have been improved by the new proposals.

26 Fundamentals

The differential power analysis attack as method to physically attack cryptographic devicesintroduced by Kocher [23] was the starting point. The first approach to counteract side channelleakage at gate level was presented by Tiriet al. [62]. They proposed to use Sense AmplifierBased Logic (SABL) to flatten the power consumption. A dual-rail method, where signals arerepresented by two complementary wires,was used to make thepower consumption indepen-dent from the input data and a pre-charge and evaluation phase make sure that a switchingevent occurs every clock cycle. One disadvantage of SABL is that new logic cells and storageelements have to be constructed which is very time consuming. Furthermore the place androute process is not automated like in a standard cell approach. So SABL belongs to the classof full-custom designs. To avoid this Tiriet al. used standard cells to mimic the behavior ofSABL cells. Wave Dynamic Differential Logic (WDDL) [64],[63]is a semi-custom design.The digital designer does not need specialized understanding of the implementation process.He can write code like for every other design. The automated script based design flow gen-erates the WDDL design from HDL. To discharge every node a pre-charge wave propagatesthrough the logic block. This halves the data rate. WDDL is also dual-rail logic, so it must betaken care of balanced routing. The reason was a new a new leakage model stated by Suzukietal. [55] based on timing differences between the input signal ofa gate, caused by loading im-balances during place and route. Tiri [65] introduced one possible way by routing ’fat wires’and split them up subsequently. Therefore the pins of each cell must be redefined, so the lay-out of cells has to be changed. Furthermore the design rules for the routing process have beto restricted. Another method is calledbackend duplicationby Guilly et al. [17] which usesnot so strict design rules was tested with a WDDL Chip. Suzukiet al. [56] used masking atgate level to equalize the transition probability. In RandomSwitching Logic (RSL) one maskbit makes the power consumption data independent and randomizes it. It belongs to the groupof full-custom designs, too. Because of the single-rail character and a pre-charge signal inRSL the timing problem is solved, that occurs if the input datado not arrive simultaneously ateach complementary gate and automated place and route can beused. But Mangardet al. [32]showed that masking does not prevent side channel leakage atall. They found that switchingoperations of logic gates caused by timing properties of gates, called glitches, can be usedfor DPA attacks and exploit this leakage also in practice on an AES chip [33] using a togglecount model. The Masked Dual-Rail Pre-charge Logic (MDPL) proposed by Poppet al. [42]combines advantages of WDDL like semi-custom design, dual-rail, and pre-charge togetherwith masking technique of RSL to avoid glitches. As an improvement of RSL Chen and Zhoupresented Dual-Rail Random switching Logic (DRSL) [5], because it was not mentioned howto generate the pre-charge signal for all cells. Therefore they used ideas of RSL and MDPL.Due to the dual-rail method local pre-charge signals can be generated for every gate. On theone hand many logic styles to counteract side-channel leakage have been proposed. On theother hand then again new leakage sources have been found andthe methods for analysis havebecome more sophisticated.

During the SCARD1 project a prototype chip was built, that contains amongst others three

1Side-Channel Analysis Resistant Design,www.scard-project.eu

www.scard-project.eu

2.7 Signal to Noise Ratio 27

AES co-processors built in CMOS, DRP logic, and MDPL. Suzukiet al. showed that MDPLis susceptible to the early propagation effect [54]. This result was confirmed by a practicalevaluation of the SCARD prototype chip [41]. In [24] Kulikowski et al. showed that SABLis vulnerable to the early propagation effect, too. In orderto cope with the early propaga-tion issues, the designers of MDPL introduced a so calledevaluation pre-charge detectionunit (EPDU), which consists of three (CMOS) AND gates and two (CMOS) OR gates. TheEPDU is applied toall improved MDPL (iMDPL) gates, hence it is not surprising thatthe arearequirements for iMDPL gates increased significantly compared to MDPL gates.

In CHES 2007 Gierlichs [14] presented an attack on MDPL that exploits the slight biasof a pseudo random number generator (PRNG) in combination with unbalanced wires of themask bit signalm. Because in MDPL and iMDPL circuits the mask bit signalm has to beconnected to every gate, and not only the synchronous gates,one might expect that the maskbit tree is even larger than the clock tree. Since parts of themask bit tree have to be single-rail,i.e. before the single-rail output of the PRNG is transformedto a dual-rail signal, it mightbe possible to distinguish between a 0 and a 1 from the power traces. In order to mount thisattack an adversary requires detailed knowledge on the layout-level of the device under attack.However, in practice these information are often not publicly available or require expensiveequipment and time-consuming efforts, such as reverse-engineering, to gain them.

Schaumont and Tiri showed that already slightly unbalancedwires can be exploited tomount a standard DPA attack after applying a single filter step on the power distribution func-tion [50]. Contrary to Gierlichs they did not exploit the unbalanced wires of the mask bitsignalm, but rather use only the unbalanced dual-rail wires of the logical signals.

Note that both attacks, Gierlichs’ and Schaumont/Tiri’s can also be mounted on circuitsbuilt in iMDPL, but again require unbalanced wires and detailed knowledge of the deviceunder attack. Therefore both attacks assume a rather strongattacker model. Furthermore,both attacks and also the attacks by Suzukiet al. [54] and Poppet al. [41] exploit leakageof the combinatorial part of a circuit. Contrary to this, Moradi et al. present an attack oncircuits built in MDPL and DRSL that exploits the leakage of the underlying D-flip-flops [37].They gain the Hamming distance of the mask bit with a simple power analysis (SPA) andsubsequently attack the circuit with a correlation power analysis (CPA). However, this attackis focused on a special type of D-flip-flops and a special type of the circuit. A more generalmethod and a new statistical test are described in Moradiet al. [35]. The distance of hammingweight (DHW) and distance of hamming distance (DHD) offer a higher success rate comparedto former statistics. A short introduction was given in Section 2.6.2.4.

2.7. Signal to Noise Ratio

To express the ratio between the signal and the additional noise of a measurement the signalto noise ratio (SNR) is used in electrical engineering and especially signal processing. The

28 Fundamentals

general definition of the SNR is ratio between thevariance of the signaland thevariance ofthe noise. In case of power analysis attacks the signal corresponds tothe component of thepower consumption that is exploitablePexp. It contains the information that are relevant for theattacker to reveal the secret information. The noise consists of a switching componentPsw.noise

and a electronic componentPel.noise.

SNR=Var(Pexp)

Var(Psw.noise+Pel.noise)(2.11)

The switching noise is caused by data bits that are not part ofthe attack scenario, e.g singlebit attack on a multi bit register. The main component of the electronic noise is generated bythe measurement equipment. Another component is the natural noise by the physical device.The variance of the signal indicates how much a point of a power trace varies because of theexploitable signal. The SNR quantifies how much informationis leaking from a single pointof the power trace. Thus the higher the SNR is, the higher the leakage of the device is and theeasier the signal is to detect in the noise.

3. Implementation of PRESENT

For different application scenarios there exists also different demands on the implementationand the optimization goals. This chapter describes three different architectural approaches ofPRESENT. The first Section 3.1 defines several use cases and their recommendations. Section3.2 describes a speed and throughput optimized design, calledPRESENTparallel. Section 3.3describes an area and speed optimized design, called round-basedPRESENTmix. The thirddesign, calledPRESENTserial, is area and power optimized and is described in Section 3.4.After all, in Section 3.5 we present a cryptographic co-processor with encryption and decryp-tion capabilities, using the round based architecture ofPRESENT128. Note that the choice ofan appropriate I/O interface is highly application specific, while at the same time can have asignificant influence on the area, power, and timing figures. Each approach is designed with aspecial design goal in mind. After the first VHDL implementation approach several optimiza-tion techniques are used, which are described in subsections optimization. At the end of everysection the results of the synthesis process are presented.Finally, the results are compared toother cipher implementations in Section 3.6.

3.1. Use Cases

An implementation for a low cost passive smart device, such as RFID tags or contactless smartcards requires small area and power consumption, while the throughput is of secondary inter-est. On the other hand, an RFID reader device that reads out many devices at the same time,requires a higher throughput, but area and power consumption are less important. Active smartdevices, such as contact smart cards do not face strict powerconstraints but timing and some-times energy constraints. Main key figures of thePRESENTblock cipher are area, throughput,and power consumption. We propose three implementations ofPRESENT, so one can choosethe architecture that meets the given requirements most suitable. In order to decrease the arearequirements even further, all architectures can perform encryption only. This is sufficient forencryption and decryption of data when the block cipher is operated for example in countermode. Besides this it allows a fairer comparison with other lightweight implementations. Forexample, the landmark implementation of Feldhofer et al. [11]. In order to have a clearerestimation of the cryptographic core’s efficiency we did therefore not implement any specialinput or output interfaces, but rather chose a natural widthof 64-bit input, 64-bit output and80 or 128- bit key input, respectively.

30 Implementation ofPRESENT

3.2. Parallel Architecture

The first architecture is a one round version, i.e. the whole encryption can be performed in asingle clock cycle.

3.2.1. Design Goal

The main goal of the design is to achieve a high throughput rate. Therefore the 31 time loop isenrolled, so all XORs, S-Boxes, and P-Layeres are cascaded. This will lead to high area effortand power consumption, but also to high data throughput. Therequired round key is generatedby taking the right bits from the 80-bit key and if necessary pass them through a S-Box or adda roundcounter value. All subkeys are available in paralleland no register is needed to holdthe key.

3.2.2. VHDL Design

Figure 3.1 shows the signal diagram of the parallel architecture. It consists of 32 XORs, 496S-Boxes, and 31 P-Layeres for the datapath. The final stage consists of a 64-bit FF to store theciphertext. In additon, 31 S-Boxes and 31 XORs are needed for the keypath. The roundcounterinput of the keypath XOR is hard wired, the same applies to the61-bit leftshift operation. Firstthe given 64-bit plaintext and the first roundkey are xored. The result is split up into 16 4-bitblocks. Each block is processed by a 4-bit S-Box in parallel. The 64-bit P-Layer transposes thebits at the end of the 31 rounds. However, the 32th round consists only of the XOR operation,because another S-Box and P-Layer do not add security. The output of the last XOR is storedinto a 64-bit ciphertext register. One reason is to hold the value for several clock cycles. Theother reason is a more precise power simulation result. Thisregister approximates the outputload capacitance of a real environment.

S

P

Key

PlaintextCiphertext

64

80

64

S

16 x

4

4

64

S

P64

S

16 x

4

4

64

31 x

64

64

S<< 61

Roundcounter

[79:76]

[19:15]

71

4

5

8080

S<< 61

Roundcounter

[79:76]

[19:15]

71

4

5

8080

D

Q

Figure 3.1.: Datapath of the parallelPRESENTarchitecture

3.3 Round-based Architecture 31

3.2.3. Optimization

One disadvantage of this architecture was recognized afterthe first simulation steps. Theexpected high maximum operating frequency could not be reached. This is a result of the longcritical path. The input signal has to propagate through allXOR and S-Box gates. The moregates belong to the path the higher is the resulting capacitance to be switched. So the timeperiod for a switching event is stretched.To shorten the critical path, FFs as pipeline stages were installed after each P-Layer (see Figure3.2). On the one hand this increases the chip area and power consumption depending on thecell library used. But on the other hand the maximum frequencycan be raised significantly.We assume the key to be stable for many encryption operations. Thus round keys do notpropagate through the pipeline and need not to be stored in additional FFs.

S

P

Key

Plaintext Ciphertext

64

80

64

S

16 x

4

4

64

S

P64

S

16 x

4

4

64

31 x

D

Q

64

D

Q

64

S<< 61

Roundcounter

[79:76]

[19:15]

71

4

5

8080

S<< 61

Roundcounter

[79:76]

[19:15]

71

4

5

8080

Figure 3.2.: Datapath of the pipelined parallelPRESENTversion

3.3. Round-based Architecture

This architecture represents the direct implementation ofthe PRESENT top-level algorithmdescription in Figure 2.1, i.e. one round ofPRESENTis performed in one clock cycle.

3.3.1. Design Goal

Different from the prior architecture the focus lies on a more compact solution, but at thesame time with the time-area product in mind. To save power and area a loop based approachis chosen. The balance between the 64-bit datapath and the used operations per clock cycleleads to a good time-area product. Due to the reuse of severalbuilding blocks and the roundstructure the design has a higher energy efficiency than the parallel design.


3.3.2. VHDL Design

The architecture uses only one substitution and permutation layer. So the datapath consistsof one 64-bit XOR, 16 S-Boxes in parallel, and one P-Layer. To store the internal state andthe key, a 64-bit state register and a 80-bit key register areintroduced. Furthermore an 80-bit2-to-1 multiplexer and a 64-bit 2-to-1 multiplexer to switch between the load phase and theround computation phase are required. Key register, key multiplexer, a 5-bit XOR, one S-Box,and a 61-bit shifter form the component responsible for the key scheduling. It computes theround key on the fly. Figure 3.3 presents the signal structureof the round based approach forPRESENT. At first the key and the plaintext are stored into the accordant register. After eachround the internal state is stored into the state register. After 31 rounds the final ciphertext isprocessed by connecting the output of the state register andthe last round key via XOR. Thecontrol logic is implemented as a Finite State Machine and a 5-bit counter to count the rounds.The FSM also controls the multiplexers to switch between load and progress phase. Figure 3.4shows the transition scheme of the VHDL source code listed below.

S

P

D

Q D

Q

1 010

Key

Plaintext

Ciphertext

80

64

64

64

64

S

16 x

44

5

S

<< 61

Roundcounter

4

[79:76]

[19:15]

71

5

94

5

80 8064

64

[79:16]

80

80

Figure 3.3.: Datapath of the round based version

1 e n t i t y fsm_mix i sport ( n r e s e t : in s t d _ l o g i c ;

c l ock : in s t d _ l o g i c ;

3.3 Round-based Architecture 33

S0 S2

S1

other

Counter >= 30

State S0: load key and plaintextreset roundcounter

State S1: switch multiplexerstart roundcounter

State S2: stop roundcountersend ready signal

reset

Figure 3.4.: Finite state machine

c o u n t e r : in s t d _ l o g i c _ v e c t o r (4 downto 0) ;5 fsmready : out s t d _ l o g i c ;

c o u n t e r r e s e t : out s t d _ l o g i c ;c o u n t e r e n a b l e : out s t d _ l o g i c ;keymuxsel : out s t d _ l o g i c ;t e x t m u x s e l : out s t d _ l o g i c ;

10 t e x t r e g e n : out s t d _ l o g i c ;key regen : out s t d _ l o g i c) ;

end fsm_mix ;

15a r c h i t e c t u r e f sm_arc of fsm_mix i s

−− d e f i n e t h e s t a t e s o f FSM modeltype s t a t e _ t y p e i s ( S0 , S1 , S2 ) ;s i g n a l n e x t _ s t a t e , c u r r e n t _ s t a t e : s t a t e _ t y p e ;

20begin

−− c o c u r r e n t p r o c e s s #1 : s t a t e r e g i s t e r ss t a t e _ r e g : p r o c e s s ( c lock , n r e s e t )begin

25 i f ( n r e s e t = ’0 ’ ) thenc u r r e n t _ s t a t e <= S0 ;

e l s i f ( c lock ’ even t and c l ock = ’1 ’ ) thenc u r r e n t _ s t a t e <= n e x t _ s t a t e ;

end i f ;30 end p r o c e s s ;

−− c o c u r r e n t p r o c e s s #2 : c o m b i n a t i o n a l l o g i ccomb_log ic : p r o c e s s ( c u r r e n t _ s t a t e , c o u n t e r )begin

35 −− use case s t a t e m e n t t o show t h e−− s t a t e t r a n s i s t i o n

c ase c u r r e n t _ s t a t e i s

−− S t a t e 0 : l oad key− and t e x t r e g i s t e r s ; r e s e t c o u n t e r40 when S0 => c o u n t e r r e s e t <= ’0 ’ ;

c o u n t e r e n a b l e <= ’0 ’ ;keymuxsel <= ’0 ’ ;t e x t m u x s e l <= ’0 ’ ;t e x t r e g e n <= ’1 ’ ;

45 keyregen <= ’1 ’ ;f smready <= ’0 ’ ;

n e x t _ s t a t e <= S1 ;


50 −− S t a t e 1 : s w i t c h m u l t i p l e x e r ; e n a b l e c o u n t e rwhen S1 => c o u n t e r r e s e t <= ’1 ’ ;

c o u n t e r e n a b l e <= ’1 ’ ;keymuxsel <= ’1 ’ ;t e x t m u x s e l <= ’1 ’ ;

55 t e x t r e g e n <= ’1 ’ ;key regen <= ’1 ’ ;f smready <= ’0 ’ ;

−− i f c o u n t e r >= 30 nex t s t a t e60 i f c o u n t e r >="11110" then

n e x t _ s t a t e <= S2 ;end i f ;

−− S t a t e 2 : end s t a t e ; unenab le c o u n t e r ; ho ld t e x t r e g i s t e r65 when S2 => c o u n t e r r e s e t <= ’1 ’ ;

c o u n t e r e n a b l e <= ’0 ’ ;keymuxsel <= ’0 ’ ;t e x t m u x s e l <= ’1 ’ ;t e x t r e g e n <= ’0 ’ ;

70 keyregen <= ’0 ’ ;f smready <= ’1 ’ ;

when o t h e r s =>c o u n t e r r e s e t <= ’0 ’ ;

75 c o u n t e r e n a b l e <= ’0 ’ ;keymuxsel <= ’0 ’ ;t e x t m u x s e l <= ’0 ’ ;t e x t r e g e n <= ’0 ’ ;key regen <= ’0 ’ ;

80 fsmready <= ’0 ’ ;

n e x t _ s t a t e <= S0 ;

end c ase ;85 end p r o c e s s ;

end f sm_arc ;

3.3.3. Optimization

Because the ciphertext must be stable for several clock cycles to be processed by the nextstage, that is connected to the cipher, the storage elementsconsists of registers. To reducethe used area and power we make use of clock gating. It can be applied to synchronous loadenable registers, which are groups of flip-flops that are connected to the same clock and controlsignals, like the state and key register. More details are described in Section 2.5.3.

3.4. Serial Architecture

This architecture is based on the mix architecture described in Section 3.3. To save morechip area, only one S-Box is implemented. This reduction leads to a serialized version of thePRESENTalgorithm.

3.4.1. Design Goal

One of the most area consuming parts ofPRESENTare the 16 S-Boxes in parallel. So ifonly one of them could be used to represent the substitution layer a lot of area can be saved.A disadvantage is the longer computation time. Only 4-bit are processed during one clock

3.4 Serial Architecture 35

cycle. Therefore it takes 15 cycles more to compute the substitution layer. This leads to 465additional cycles all together. So the overall cycle count is 496.

3.4.2. VHDL Design

As one can see in Figure 3.5 the datapath of this architectureis still 64-bit wide. The mainproblem is to serialize the permutation layer. Therefore a pseudo 4-bit datapath is chosen. OneS-Box processes 4-bit of the 64-bit state. The other 60 bit areconcatenated with the result ofthe S-Box. A 4-to-1 64-bit wide multiplexer chooses the actual value that is stored into thestateregister. An additional 4-bit counter upgrades the FSM to controll the processing of theinternal state.

S

P

Key

Plaintext

Ciphertext64

1

0

1

0 1

0

Statememory

4-In

64-In

4-Out

64-Out

Shift Register ModeBlock Mode

Memory

Keymemory

4-In

80-In

4-Out

80-Out

5

<< 61

Roundcounter

4

[79:76]

[19:15]

4

71

8080

4

4

80-bit

64-bit

4

4

4

4

4

5

5

9

4

4

80

64

4

Figure 3.5.: Datapath of the serial version

3.4.3. Optimization

Additionally to the already implemented clock gating technique another optimization is made.Because one S-Box is used, only 4-bits can be computed concurrently. So the processing of


the 64-bit internal state of each round takes 16 clock cycles. During that time the P-Layer andXOR do not calculate expedient values. So they can be seperated from the datapath and turnedinto a ’sleep’ mode. The input is set to constant, therefore no switching activity occurs anddynamic power is saved.

3.5. Crypto Coprocessor

To equip a smart device with cryptographic functions there are different ways to implementthem. The first is to write software code. This solution requires RAM to store the program andinhibits the microcontroller while performing cryptographic algorithms. Another possibilityis to implement the crypto part straight into the the microcontroller core. A more flexibleway is to construct a cryptographic co-processor that is controlled by the main core. It uses amemory-like interface for communication. To get a compact and also fast solution we use theround based architecture with a modified finite state machineand added further multiplexers.Now the plaintext is loaded in 32-bit blocks. As far as we knowthis is the maximum bit widthof microcontrollers for smart devices. The co-processor iscontrolled by write and read enablesignals. The address signal selects the different bit blocks and encryption or decryption mode.Figure 3.6 illustrates the interfaces and the units. Resultscan be found in Table 3.4.

Ciphertext [31 ..0]

Ready

WENB

RENB

Key [127 ..0]

Data [31 ..0]

Addr [3 ..0]

Key Scheduling

Datapath Encryption

Datapath Decryption

FSM

I/o Interface

CLK

RESET

Figure 3.6.: Block diagramm ofPRESENT-128 coprocessor with 32-bit interface

3.6. Comparison

In this section we first describe the used design flow and the metrics. Subsequently we com-pare our implementation results for the three scenarios lowcost passive smart devices, low

3.6 Comparison 37

cost active smart devices, and high end smart devices. We considered the following optimiza-tion goals for the three scenarios: low cost and passive smart devices should be optimized forarea and power constraints and low cost and active smart devices for area, energy, and timeconstraints. Note that in our methodology high end devices are always contact smart cards andhence should be optimized for time and energy constraints. Therefore we do not distinguishbetween passive and active high end smart devices.

3.6.1. Metrics and used design flow

All architectures were developed and synthesized by using ascript based design flow. Weused MentorGraphics FPGA Advantage 8.1 for HDL source code construction and functionalverification. Then the RTL description was synthesized withSynopsysDesign CompilerZ-2007.03-SP5, which was also used to generate the area, timing, and power estimation reports.The main effort of synthesis process was area optimization.The S-Box is described as booleanequation which leads to a combinatorial logic implementation. The P-Layer is only simplewiring, which is not very costly in hardware. We used three different standard cell librarieswith different technology parameters: a 350nm technology MTC45000 from AMIS, a 250nm technology SESAME-LP2 from IHP, and a 180nm technology UMCL18G212D3 fromUMC. Each of them consists of a different amount of cells and not all logical functions areimplemented. This fact will lead to different area result expressed in GE. Following definitionsof metrics were used:

Area: This metric represents the amount of area normalized to thearea of one NAND gate.This ratio is expressed in GE.

Cycles: Number of clock cycles to compute and read out the ciphertext.

Throughput: The rate at which new output is produced with respect to time. The numberof ciphertext bits is divided by the needed cycles and multiplied by the operating frequency. Itis expressed in bits-per-second. With increasing frequency the throughput will increase, too.

Power: The power consumption is estimated on the gate level by PowerCompiler1. Itconsists of two major components: the static power which is proportional to the area andthe fabrication process. The dynamic power is proportionalto the switching activity (switch-ing event probability and operating frequency). Both components also depend on the supplyvoltage.

Current: The power consumption divided by the typical core voltage of the process. Theseare for AMI 3.3V, for IHP 2.5V, and for UMC 1.8V.

Throughput to area ratio: This representation is used as a measure of design efficiency.

Maximum frequency: There are many connections between the input and output pins.The delay of each gate forms a timing path for the signals. Theslowest path will set the upper

1Note that power estimations on the transistor level are moreaccurate. However, this also requires furtherdesign steps in the design flow, e.g. place&route.


bound of clock frequency. Note that it might be possible to increase the max. frequency, butthis will also increase area and power.

The interested reader can find more detailed tables with syntheses results in the appendix.

3.6.2. Low cost passive smart devices

Table 3.1 shows the synthesis results for 100 kHz clock frequency, which is a typical operat-ing frequency of RFID tags. Smart devices with integrated contactless functionality have strictarea and power constraints. For this purpose we propose a serialized implementation whichwill consume low area and power resources. Our serial implementation uses about 1000 GE ofarea. To the best of our knowledge this is the smallest implementation of a cryptographic algo-rithm with a moderate security level. Even implementationsof the stream ciphers Grain80 andTrivium require more area (1294 GE and 1857 GE, respectively[16]). For comparison withblock ciphers we choose two AES implementations with a reduced datapath from Feldhofer etal. [10] and Hämäläinen et al. [18]. Furthermore there exists only a reduced datapath imple-mentation of the lightweight block cipher SEA without key scheduling component and controllogic. Note that a similar implementation withPRESENTwould only require around 40 GEin 0.18µm UMC technology. The power consumption of our implementations show a largevariation depending on the core voltage of the library, but the 0.18µm technology consump-tion is still the lowest compared to the other architectures. Note that power figures are highlytechnology dependent, therefore a fair comparison is only possible if the same technology wasused.

Cipher Tech. Datapath Freq. Area Throughp. Cycles Power[µm] [Bit] [MHz] [GE] [Kbps] [µW]

PRESENT-80 0.35 4 0.1 1,000 11.4 563 11.20PRESENT-80 0.25 4 0.1 1,169 11.4 563 4.24PRESENT-80 0.18 4 0.1 1,075 11.4 563 2.52

Feldhofer AES [10] 0.35 8 0.1 3400 12.4 1032 4.50Hämäläinen AES [18] 0.13 8 80 3100 121 160 -

SEA [28] 0.13 8 0.1 449 50 3.22better is lower higher lower lower

Table 3.1.: Implementation results of minimal datapath architectures

3.6 Comparison 39

3.6.3. Low cost active smart devices

The second scenario targets standard smart cards. To reducefabrication costs these cards arealso area constrainted. But in comparison to the prior scenario the crypto core draws his energyfrom a battery of a pervasive device or via the physical contact of the reading device. So theexecution time is of major interest. The round based implementation shows a good trade offbetween area, time, throughput, and energy consumption. Itdoes not consume significantmore area and energy than the serial one, but needs much less clock cycles for computation.The results are compared to other known round based implementations that means a newinternal state is computed every clock cycle. There are results for the ICEBERG [51] and theHIGHT [19] block cipher. Both of them use a 64-bit datapath architecture. In Mace et al.[28] different ASIC implementations of SEA had been characterized. We choose the 96-bitarchitecture for better comparison to the other datapaths.The results in Table 3.2 illustratethe very compact design of thePRESENTblock cipher. Even the -normalized to 10 MHz.-throughput is only outperformed by the ICEBERG implementation. But again, we do notconsider high throughput as highly relevant for this deviceclass.

Cipher Tech. Datapath Freq. Area Tput Energy/Bit Power[µm] [Bit] [MHz] [GE] [Mbps] [pJ/bit] [µW]

PRESENT-80 0.35 64 10 1561 20.6 170.5 3520.0PRESENT-80 0.25 64 10 1594 20.6 21.1 436.0PRESENT-80 0.18 64 10 1705 20.6 3.7 77.1

SEA [28] 0.13 96 250 3758 258.0 19.8 5102.0ICEBERG [28] 0.13 64 250 7732 1000.0 9.6 9577.0

HIGHT [19] 0.25 64 80 3048 150.6 - -better is lower higher lower lower

Table 3.2.: Implementation results of the round based datapath architectures

3.6.4. High end active smart devices

In the third scenario there are no limitations for energy consumption. The task of the co-processor is to relieve the micro controller of the cryptographic computations. The designof this assistant should deliver results fast and consume asless area as possible to be cost-effective. One approach is to use a pipelined architecture.But Table 3.3 discloses that thepipelined implementation generates a very high throughputat the expense of area and power.The basic message is that scaling of operation frequency hasa great impact on power con-sumption. The area is barely affected by this circumstance,because we chose an area optimizesynthesis approach. If we get to higher frequencies the capacitances will become increasingly


important. So cells with a higher driving strength must be used to drive the load and the areawill increase conspicuously. In addition one has to be awareof the input/output interface. Upto now there exist only smart cards with 32-bit micro controllers. The best choice is to imple-ment a round based architecture with an 32-bit I/O interface. In literature can be found severalAES implementations that are up to the mark. We compare thePRESENTimplementations toPramstaller et al. [43] and Satoh et al. [49]. Also a commercial solution by Cast Inc. [3] islisted. Table 3.4 shows the results for the different implementations. As there are many smartcards equipped with 8-bit microcontrollers we list the results for an 8-bit interface, too. ThePRESENTco-processor is much more compact than the other implementations and also needsless clock cycles to compute the ciphertext.

Library Area Power Tput/Area crit. Path max Freq. max .Tput[GE] [µW] [kbps/µm2] [ns] [GHz] [Mbps]

AMI 0.35 µm 24,345.87 81295.00 0.486811614 12.80 0.1 5,000.0IHP 0.25µm 25,193.00 11659.00 0.900080911 4.78 0.2 13,389.1

UMC 0.18µm 27,027.69 6888.00 2.446979668 6.26 0.2 10,223.6better is lower lower higher lower higher higher

Table 3.3.: Implementation results of pipelined architecture @ 10 MHz

Cipher Tech. Datapath max Freq. Area Throughp. Cycles[µm] [Bit] [MHz] [GE] [Mbps]

PRESENT-128 0.35 32 143 2,681 234 39PRESENT-128 0.25 32 141 2,917 231 39PRESENT-128 0.18 32 323 2,989 529 39PRESENT-128 0.35 8 131 2,587 133 63PRESENT-128 0.25 8 121 2,851 123 63PRESENT-128 0.18 8 353 2,900 359 63

CAST AES [3] 0.18 32 300 124,000 872 44Satoh AES [49] 0.11 32 131 54,000 311 54

Pramstaller AES [43] 0.6 32 50 85,000 70 92better is higher lower higher lower

Table 3.4.: Implementation results of co-processor architectures

4. Adiabatic Logic

This chapter gives an overview about several adiabatic logic styles. Section 4.1 explains theadiabatic switching principle and how it can be used to reduce the power consumption. Theprevious work in the field of adiabatic logic is summarized inSection 4.2. Afterwards, theimplementation details of several logic styles, namely CMOS(Section 4.3), CAL (Section4.4), PAL (Section 4.5), CRSABL (Section 4.6) are shown. Finally, the result of the mutualinformation for each logic style is computed in Section 4.7,which is important to determinethe side-channel resistance.

4.1. Introduction to adiabatic Logic

In modern computer system design power consumption became ahigh-priority objective.Over the past years several effective power management design techniques have been de-veloped. The most utilized method was supply voltage scaling. But as the fabrication processscaled below 90 nm it became more difficult to scale down the supply voltage, because oftransistor threshold voltage. They have to be scaled along with the supply voltage to achieveperformance improvements. But the threshold voltage scaling resulted in an high increaseof subthreshold leakage current. Another drawback were thevariations in process, voltage,and temperature that reduced the range over which the supplyvoltage can be varied. So thescientist searched for power saving mechanisms that do not heavily depend on further supplyvoltage scaling. In 1985 Bennett and Landauer investigate intheir paper “The FundamentalPhysical Limits of Computation” [1] that the minimum energy required for a computation isproportional to the number of information bits destroyed during the operation.“Thus , if a computation could be somehow implemented without the loss of information , itsenergy requirements could potentially be reduced to zero.”The showed in theory that performing computations in a reversible manner no information isdestroyed and no energy would be needed. But reversibility isnot sufficient because of thecharge transfer across a voltage difference during the switching event, some circuit embodi-ment is needed in addition to actually compute with zero dissipation. In contrast to conven-tional dissipative switchingadiabatic switchingtries to consume minimal power during chargetransfer. The word adiabatic is Greek for “impassible” and implies the thermodynamic prin-ciple of state change with no loss of gain or heat. Figure 4.1 shows how energy is dissipatedduring a static CMOS switching event. The transition of a nodecan be modeled as charg-ing of a RC tree through a switch.C is the capacitance of the node andR is the resistance

42 Adiabatic Logic

t

27/10 exp(−50 t)

V R

0 T

2 Va

C

Vdd

Vdd

R

C

VR

VC

VIN

Figure 4.1.: CMOS charging

t

81/500−81/500 exp(−100/3 t)

∆ V

0 T

2 Va

C

Vdd

R

C

VR

I

VC

const.

VIN

Figure 4.2.: Constant charging

of the switch. When the switch is closed a high voltage drop (Vdd) occurs atR and currentstarts flowing through the resistance. After a short period of time the capacitorC is chargedto a constant supply voltageVdd. The energy stored inC is 1/2CV2

dd. But the energy takenfrom the power supply isCV2

dd, so half of the energy is dissipated inR. The lines in the upperdiagram represent the voltage overR during the switching event from LOW to HIGH. Thesupply voltage is constant. The bigger the capacitanceC the longer takes the charging. Thearea under the line is proportional to the dissipated energy. The shape of the resistor voltagecurve is based on Equation 4.2. Since the current flowing through a serial connection has tobe constant and the current of a capacitance is the derivative of UC with respect tot, we getan ordinary differential equation. The right side consistsof two terms. The second operandconsiders that the input voltage can be time depend , too.

IR = IC (4.1)

VR

R= C

dVC

dt

VR

R= C

d(VIN −VR)

dtdVR

dt= −

VR

RC+

dVIN

dt(4.2)

4.1 Introduction to adiabatic Logic 43

In case of static CMOS the supply is constant. When switching the node from LOW to HIGHthe input voltage isVdd instantly. With this two initial conditions the solution ofthe ODE isan exponential function 4.3.

VIN = Vdd

dVIN = 0

VR = Vdde−tτ (4.3)

Now, consider the circuit shown in Figure 4.2. The voltage source is replaced by a constantcurrent source. Remembering the dependence of current and voltage ofC the capacitanceshould be charged with an linear rising voltage. WhenVdd reaches 2Va the charge transferis finished and the capacitance loaded. So, the rapid transition has been slowed down. Byspreading the out the charge transfer evenly over a longer time, the energy dissipation ofRis greatly reduced. The area under the voltage curveVR shown in the upper diagram is muchsmaller then in the case of the constant voltage source. The equation to calculateVR valuesis given in Equation 4.4. Now, the initial conditions have changed.VIN is a straight line withgradientVdd

T , whereT is the time period to attainVdd. So, the solution of the ODE is anexponential function with an upper limit.

VIN =Vddt

T

dVIN =Vdd

T

VR = VddτT

[1−e−tτ ] (4.4)

The time variant behavior of the power source can be implemented quite simply asLC tanks,with L andC partly provided by the intrinsic characteristics of the circuit. These so-calledpower-clock generatorssupply a sinusoidal waveform with a constant frequency likeshownin Figure 4.3. A key requirement for power-clock generatorsis the ability to transfer energybidirectionally to and from the energy tank and the power node without wasting much energy.The gradient of voltageVR and the highest peak depend on the capacitance and the switchingtime Ts. The initial conditions and especially the solution of the ODE become more complex.As input voltage we assume a quasi sinusoidal behavior. The result as shown in Equation 4.5.

VIN =12

Vdd

(−cos

(π tTs

)+1

)

dVIN =12

Vdd sin

(π tTs

)πTs

−1

VR =e−tτ Vddπ2τ2(

2Ts2 +2π2τ2)−1

+12

Vddπ τ(−π τ cos

(π tTs

)+Ts sin

(π tTs

))(Ts

2 +π2τ2)−1 (4.5)

44 Adiabatic Logic

R

CVPC

VR

VCV

IN

t

−27/20 cos(10 π t)+27/20

∆ V

0 Ts

2 Va

C

Vdd

Figure 4.3.: Adiabatic charging

Ideally, by using reversible logic and increasing the switching time Ts over which compu-tation is performed, it should be possible to create a circuit that computes with low energydissipation. Because these circuits reuse some charge stored in the capacitances on subse-quent operations they are also known ascharge-recoverylogic. The overall energy dissipationis proportional to Equation 4.6

RCTs

CV2dd (4.6)

whereR is the source resistance of the driver,C the capacitance to be switched,Ts the timeperiod over which the switching occurs, andVdd the voltage to be switched across. A mainchallenge is to implementadiabaticor charge-recoverylogic by low overhead circuit struc-tures that use standard MOSFET devices. The reversible construction and the low energydissipation could lead to very low side-channel leakage. Thus makes adiabatic logic an idealcandidate for side-channel resistant implementations in low power devices like RFID tags andembedded computers.

4.2. Previous Work of Adiabatic Logic

Mahmoodi-Meimandet al. [29] showed that adiabatic design exhibits a significant reductionof switching noise and energy consumption compared to static CMOS. Khatir and Moradiinvestigated the DPA resistance of the 2N-2N2P adiabatic logic style [36] and created a newlow-energy DPA-resistant logic style called Secure Adiabatic Logic (SAL) [20]. How to de-sign adiabatic logic and to avoid common mistakes is described by Frank [12]. Furthermore heexplained that most so-called adiabatic circuit families are not truly adiabatic, because they donot satisfy the general definition of adiabatic physical processes, where no energy is wasted.Therefore they are called semi-adiabatic.

A good overview about recent developments and problems in the field of adiabatic logic canbe found in [22],[68], and [21].

4.3 CMOS 45

4.3. CMOS

Figure 4.4 show the structure of a simple inverter implemented in CMOS logic. It consist ofa p-channel and a n-channel transistor. If the input is zero the upper p-transistor interconnectsthe output to the supply voltage. The n-transistor blocks.

vdd

gnd

INb

IN

OUT

Figure 4.4.: CMOS inverter

vdd

gnd

IN1b

IN2b

IN2

IN1

OUT

Figure 4.5.: CMOS 2-input NAND gate

4.4. CAL - Clocked CMOS Adiabatic Logic

CAL was introduced by Maksimovic et al. in 2000 [30].

46 Adiabatic Logic

CX

PC

gnd

IN1

IN2

OUTOUTb

CX

IN1b

IN2b

Figure 4.6.: CAL 2-input NAND gate

To reduce the energy dissipation caused by the auxiliary clock signal Luo and Hu [26]improved the CAL circuits (iCAL). The square waves lead to non adibatic switching events ofsome gate transistors. Thus they introduced an auxiliary clock generator to form a sinusoidalwave out of the original control signal. Figure 4.7 shows thestructur of this control signalconverter.

IN

PC

INb

OUT CXb

INb

IN

gnd

Figure 4.7.: CAL control signal converter

4.5 PAL - Pass-Transistor Adiabatic Logic 47

4.5. PAL - Pass-Transistor Adiabatic Logic

PAL was introduced by Oklobdzija et al. in 1997 [40].

OUTb OUT

IN1bIN1

IN2 IN2b

PC

Figure 4.8.: PAL 2-input NAND gate

4.6. CRSABL - Charge Recycling Sense Amplifier Based

Logic

Another proposal to save power consumption using charge recycling was made by Tiri andVerbauwhede [61]. They modified the DPA resistant logic style SABL [62] to save 20%

4.7. Results

The mutual information can be seen as measure of the theoretical capability to distinguishbetween different bits. The output of the NAND gate can reachtwo states, zero and one. Thearea above the dotted line in the following diagrams marks the MI, where a difference betweenthis states can be recognized and used to reveal the computedresult by applying side-channelattacks. Figures 4.11,4.12,4.13 show the results of the MI hidden in the power consumptionof the NAND gate implementations. This figures show that the MI of adiabatic logic stylesis frequency depended in contrast to CMOS logic. The lower thefrequency the less noise isneeded to protect the output result of the gate against side-channel attacks. So adiabatic logiccan be used as a hardware countermeasure to reduce the side-channel leakage, especially fordevices operating at low frequencies.

48 Adiabatic Logic

IN1

IN2

V

IN2b

clk

VDD

IN1b

gnd

clk

OUTbOUT

nand1

and1

and1

nand1

gndgnd

Figure 4.9.: CRSABL 2-input NAND gate

gnd

OUT

OUTb

OUTbOUT

VDD

V

Figure 4.10.: CRSABL feedback network

The results of NOR gates can be found in the Appendix.

4.7 Results 49

10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2DUT: NAND2 Clockperiod: 4ns

Noise Standard Deviation [A]

Mut

ual I

nfor

mat

ion

[bit]

CMOSCALiCALbCALPALCRSABL

Figure 4.11.: Adiabatic NAND gates at 4ns clock period

50 Adiabatic Logic

10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8



Mut

ual I

nfor

mat

ion

[bit]



4.7 Results 51

10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8



Mut

ual I

nfor

mat

ion

[bit]



5. Power Analysis

This chapter deals with the practical side-channel evaluation of different logic styles. First, theanalysis framework is described in Section 5.1. Section 5.2presents the differentPRESENT

implementations, that will be tested. Then, the results areshown in Section 5.3. Finally, theresults are evaluated in Section 5.4.

5.1. Analysis Framework

A silicon implementation of the architectures described inChapter 3 is very expensive andneeds many further steps to get ready for chip fabrication. Moreover, a chip is fixed and cannot be changed afterwards. Hence, the first step of a side-channel evaluation is a simulationof the critical parts of the architecture. Because there is nospecialized step integrated intothe standard cell design flow yet, a framework for simulationbased side-channel analysis hasbeen suggested by Regazzoniet al.[46]. The original approach was taken as starting point andslightly modified in terms of better performance, especially the part of statistical tests. Figure5.1 gives a brief overview. The framework is divided into three parts. First the transformationof the behavioral algorithm description into a netlist, which maps the hardware descriptionto logic cells of a particular CMOS technology. This method and the results are explained inSection 2.2. The second step is the preferably accurate simulation of the circuit. The result is apower trace that includes many effects occurring during a switching event of a transistor. Thistraces are processed to extract the time interval that is relevant for the attack. The third stepis the proper attack. A matrix solving software is used to perform the statistical test describedin Section 2.6.2. At the end the several diagrams are plottedfor better clearness of the results.The whole framework is controlled by Perl scripts.

5.1.1. Simulation

The simulation of complex circuits is a very time and memory consuming step. The morethe simulation results should match a real chip the more details have to be computed. Thereexist several simulators for different tasks. If only the right computed result is of interest adigital simulator likeModelSimis chosen. To approximate the power consumption of a circuitwe have chosenPower Compilerin Chapter 3, that performs a prediction at algorithmic andregister-transfer level of the Gajski Charts shown in Figure2.9. The output is mean value

54 Power Analysis

Nanosim

power traces

Diagramm.eps

Perl control script

Design Compiler Virtuoso

mapped netlist

Octave

GnuPlot

VHDL

SPICEnetlist

transitor model

standard cell library

simulate

attack

plot

Figure 5.1.: Side-channel analysis framework

5.1 Analysis Framework 55

depending on a toggle count model. But as described in the side-channel attack introduc-tion in Section 2.6 a power trace is needed. To achieve a high accuracy many samples persecond and a good transistor model of the underlying fabrication process is necessary. Tosimulate the analog behavior of a circuit two circuit description languages have been estab-lished. HSPICE fromSYNOPSISand the counterpart SPECTRE fromCadence. Both of themuse plain n-transistors and p-transistors to construct a circuit. One the one hand this methodis very difficult, because you have to mention the type and size of the transistor and the in-terconnections to get the expected behavior, on the other hand it is the method closed to thesilicon chip fabrication process. When simulating at transistor level there are a lot of equa-tions to be solved. Mostly, it is impractical for large circuits, consisting of several thousandsof transistors, in terms of time and computation resources.To fill the gap between pure digitaland analog simulation, so called fast SPICE simulators likeNanoSimwere introduced. Theyoffer reduction of execution time up to 1000 times, while still achieving an accuracy of 80percent compared to analog simulators. Another advantage is that a VHDL model can be usedto describe the digital function of the circuit. Then the output of the synthesizer is connectedto a SPICE model of the transistor used in the standard cell library. So, a seamless switch-ing between digital and analog simulation is possible. For our simulation we usedNanoSimA-2007.12. The attacker chooses an intermediate result of the executed algorithm. This re-sult needs to be a functionf (d,k) of a known non constant valued and a part of the keyk.In case of block ciphers the S-Box output of the first round has been established as attackpoint. While the cryptographic device encryptD different random data blocks the attackerrecords get a vectord = (d1, ...,dD)′, wheredi denotes the data value in theith encryption run.During each of this runs the attacker records a power traceti

′ = (ti,1, ..., ti,T) of lengthT thatcorrespond to the data blockdi . Hence the traces can be written as matrixT of sizeD×T.

5.1.2. Analysis

Because many computation steps are needed to calculate the power consumption for each keycandidate and the power trace can be written as matrix we choose a matrix solving program.Octave[6] is an open source Matlab clone. It provides a convenient command line interfacefor solving linear and nonlinear problems numerically, andfor performing other numericalexperiments using a language that is mostly compatible withMatlab. After the computationof the success rate several diagrams are created usinggnuplot [60] to illustrate the differenttraces, metrics, and findings. The first step of the analysis is the calculation of the hypotheticalintermediate value for every possible keyk. The total number of choices isK and leads to thevectork = (k1, ...,kK). Now the attacker can calculate the hypothetical intermediate valuesf (d,k) with the given data vectord and the key vectork. The result is the matrixV of sizeD×K. The column j of V consists of the intermediate results that have been calculatedusing the key hypothesisk j . The goal of the attack is to find out the column which has beenprocessed during theD encryption runs. The indexckof this columns leads to the key used bythe device. To find the right column the hypothetical intermediate valuesV have to be mapped

56 Power Analysis

to a matrix of hypothetical power consumption valuesH. Therefore, the attacker uses one ofthe prediction models described in Section 2.6.2. For each hypothetical intermediate value ofthe devicevi, j a hypothetical power consumption valuehi, j is simulated. Now the final stepcompares each column of matrixH with each column of matrixT at every position, whichconsists of the recorded power traces. The result is a matrixR of sizeK ×T. the comparisonis done using the Pearson coefficient, see Section 2.6.2.1 for details. The higher the elementr i, j is, the better the columnshi and t j match. The indexi points to the corresponding keyhypothesis and the indexj to the moment of time. The simulation results are close to reality,but in a real measurement setup there will be many noise sources. Ranging from A/D converterof the oscilloscope to the electronic noise produced by other parts of the chip. Therefore, weadd Gaussian noise to power traces and repeat the attack for many SNRs.

5.2. Implementations

The following section illustrates the devices under test (DUT) that are used for the side-channel analysis. The focus lies on thePRESENTblock cipher. First basic CMOS implementa-tions with a 4-bit and 8-bit datapath are introduced. Also, we synthezized a 8-bit version of theAES datapath for comparison purpose. Furthermore, a side-channel resistant implementationwith a 4-bit datapath using iMDPL logic style is another DUT.The last implementation usesthe adiabatic logic style PAL, which has shown the slightestdata dependency in the MI test ofChapter 4.7. The S-Box source codes can be found in the appendix.

5.2.1. CMOS PRESENT

The first circuit we analyzed is a CMOS implementation of thePRESENTS-Box. It consistsof a 4-bit datapath as used in the the serializedPRESENTimplementation. Figure 5.2 showsthe structure of the DUT. The first component is an XOR with 4-bit plaintext and the 4-bit keyas input. The output is passed through the onePRESENTS-Box and the result is stored in a4-bit flip-flop. The output of the S-Box is used as intermediatevalue to attack the circuit. TheVHDL description was synthesized and mapped to theIHP 0.25µmstandard cell library. Thesupply voltage is 2.5 V. We do not consider the different capacitances of the wires, that wouldoccur after the place&route step. This DUT is referred further on asPRESENT-4_CMOS.Another circuit with a 8-bit datapath consist of an 8-bit XOR,two parallel S-Boxs, and an8-bit flip-flopto store the result. This DUT is referred asPRESENT-8_CMOS

5.2.2. CMOS AES

To compare the side-channel characteristics of thePRESENTS-Box we have synthesized aAES circuit, too. The AES is a standardized block cipher witha similar structure. The S-Box

5.2 Implementations 57

Key

random Plaintext

4-bit

4-bit

4

4

S D

Q

CLK

4-bit

Figure 5.2.: Side-channel analysis target: CMOS

m

Key

random Plaintext

4-bit

4-bit

4

4

S

CLK

D

Q

PRNG

Signal converter

Signal converter

m

Figure 5.3.: Side-channel analysis target: iMDPL

has an 8-bit input and output. The DUT consist of an XOR gate, an AES S-Box, and an 8-bitflip-flop. The underlying technology is the IHP library as well.

5.2.3. iMDPL PRESENT

The iMDPL [41] logic style has been introduced as a side-channel countermeasure at gatelevel. MDPL is a shortcut for masked dual-rail pre-charge logic. Having found some weak-nesses in the original proposal, Poppet al. released a improved version of their logic style.The circuit can be designed with the standard cell design flow, because all cells can be buildfrom standard CMOS cells that are commonly available in standard cell libraries. Figure 5.3illustrates the structure of the DUT. We synthesized a 4-bitdatapath ofPRESENT. The iMDPLstyle avoids glitches by using a dual-rail pre-charge approach. Each gate needs a clock signalto switch between a charge and a computation phase. In addition, all data input and outputsignals are complementary signals. To obscure the data dependency each gate is connected toa random number generator, that provides a single mask bit each clock cycle. So, the actualdata valued = dm⊕mof a node in the circuit results from the singal valuedm that is physicallypresent at the node and the mask bitm. Due to the fact that theIHP 0.25 µm standard cell

58 Power Analysis

Key

random Plaintext

4-bit

4-bit

4

4

S

PC1

PC2

Signal converter

Signal converter

4-bit Buffer

Figure 5.4.: Side-channel analysis target: PAL

library does not contain a majority gate, which is an essential gate for building iMDPL gates,we used a SPICE description of iMDPL cells provided by Amir Moradi. The cells are basedon a 0.18µm technology with 1.8V supply voltage.

5.2.4. PAL PRESENT

In Chapter 4 the PAL logic gates showed the best data dependency behavior considering side-channel leakage. Thus we build aPRESENTS-Box out of this gates to verify the result. Thecircuit consist of an XOR gate, a 4-bit S-Box, and a storage element. Up to now there isno proposal how to build a PAL D-flip-flop, so we used a simple buffer instead. In a chainof PAL logic gates a second sinusoidal power clock PC2, which is phase shifted by 180◦ toPC1, supplies all odd logic stages. Both PC1 and PC2 can be obtained efficient by a singleLC-oscillator. A signal converter transforms the digital input values into sinusoidal PAL inputvalues. The netlist of the PAL device under test is given in Spectre format, because of thefull-costum design process using the Cadence tools. Only circuit design was performed andplace& route is missing. Thus the parasitic wire capacitances are not included in the netlist.

5.3. Results

This Section describes the results of the side-channel attacks on different implementationsof the PRESENTblock cipher. Various diagrams for each implementation have been plotted.As simulation time resolution we choose 0.01ns as well as 0.001ns. This means that the0.001ns power trace consists of 10 times more sample points than the 0.01ns power trace.On the other hand the computation effort increases by the factor 100, because of the matrixcomputations. All key hypotheses were tested for differentsignal to noise ratios. To get asurvey of the success rate, where all key candidates can be discovered, we choose a spectrum

5.3 Results 59

ranging from 0dB to 100dB. If the variance of noise has the samequantity as the variance ofthe data dependent signal, the SNR of the signal is 0dB. If no noise or very less noise overlapsthe signal we can assume a SNR of 100dB.

5.3.1. PRESENT4-Bit

First, a reduced round of thePRESENTalgorithm was simulated. The permutation layer wasomitted, because it consists of simple rewiring of the data lines. Figure 5.5 shows the clocksignal that triggers the flip-flop. The first peak of the seconddiagram is the result from theXOR and S-Box computation step. The simulation encrypts 32 random plaintexts, so 32superimposed power traces are plotted in the second graph. When the clock signal rises thestable output of the logical part of the circuit is stored in the sequential elements. The first peakof the power trace is spread over a bigger time period, because of the cascading of many gatesand the different signal runtime of each gate. The correlation between the estimated powertraces for each key hypothesis and the simulated power traces using key 4 for encryption isdisplayed in the third graph. The left diagram was plotted using the Hamming model to predictthe power consumption. The right one uses the Hamming distance prediction model. The tracewith the highest correlation is plotted in green, because the corresponding matches the secretkey. So, the attack could successfully reveal the secret information of the cryptographic device.The highest data dependency can be found at the storing eventof the intermediate value in theflip-flops. Both models HD and HW can be used to reveal the secretinformation. Now, weadded Gaussian noise to the simulated power traces and repeated the attack. The upper leftgraph shows the simulated power trace during the rising clock, when the data value is storedin the flip-flop. The noise is plotted in the graph beneath. A histogram shows that the noise isnormal distributed. In the upper right graph the final noisy power trace is plotted. The successrate for a SNR window from -10dB to 16dB and the HD model is shown in Figure 5.7 on theleft side. A negative SNR means that the variance of the noiseis higher than th variance ofthe signal. On the right side of the figure the success rate forthe higher resolution of 0.001nsis shown. The barrier that marks the 100% key recovery rate atabout 2dB does not changeclearly. The small difference is caused by the random noise added to the signal. Accordingly,the 0.01ns time resolution is sufficient for attacks of simulated CMOS circuits. With a realmeasurment setup composed of a chip or microcontroller, andan oscilliscope the attacker willsee only a single peak in the power trace. The additional capacitances of the chip flatten theclear spikes.

5.3.2. PRESENT8-Bit

The attack of thePRESENT-8 CMOS circuit were performed with 512 random plaintexts andaresolution of 0.01ns. By reason of 256 possible key for the 8-bit datapath and the large numberof encryption runs the time to simulate and attack the deviceincreased by leaps and bounds.

60 Power Analysis

1.10

0.70

0.30

-0.1012 13 14 15 16 17 18

Current

Time

1.00 0.75 0.50 0.25 0.00-0.25-0.50-0.75-1.00

12 13 14 15 16 17 18

Correlation

Time

3.00

2.00

1.00

0.00

12 13 14 15 16 17 18

V(CLK)

Time

PRESENT CPA for key 4 with 32 different plaintexts and hw-Model and SNR 100

(a) HW-model

1.10

0.70

0.30

-0.1012 13 14 15 16 17 18

Current

Time

1.00 0.75 0.50 0.25 0.00-0.25-0.50

12 13 14 15 16 17 18

Correlation

Time

3.00

2.00

1.00

0.00

12 13 14 15 16 17 18V(CLK)

Time

PRESENT CPA for key 4 with 32 different plaintexts and hd-Model and SNR 100

(b) HD-model

Figure 5.5.: simulated power traces of CMOSPRESENT-4

5.3 Results 61

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

0.0014

14.6 14.8 15 15.2 15.4 15.6 15.8 16

-0.0005

0

0.0005

0.001

0.0015

14.6 14.8 15 15.2 15.4 15.6 15.8 16

-0.0008

-0.0006

-0.0004

-0.0002

0

0.0002

0.0004

0.0006

0.0008

0.001

14.6 14.8 15 15.2 15.4 15.6 15.8 16-0.0008

-0.0006

-0.0004

-0.0002

0

0.0002

0.0004

0.0006

0.0008

0.001

0 2 4 6 8 10 12 14 16

Figure 5.6.: SNR 0 noise additionPRESENT-4

0

20

40

60

80

100

-10 -5 0 5 10 15

Success Rate [%]

SNR [dB]

PRESENT CPA with 32 different plaintexts and hd-Model

(a) 0.01ns

0

20

40

60

80

100

-10 -5 0 5 10

Success Rate [%]

SNR [dB]


(b) 0.001ns

Figure 5.7.: Success rate over SNR CMOSPRESENT-4 with HD model

62 Power Analysis

0

20

40

60

80

100

-12 -10 -8 -6 -4 -2

Success Rate [%]

SNR [dB]


Figure 5.8.: Success rate over SNR CMOSPRESENT-8 with HD model

We only used the promising HD model to predict the power traces. The result for differentSNRs is shown in Figure 5.8. The SNR rate at that all possible keys can be revealed lowerscompared toPRESENT-4. This is caused by the higher number of random inputs.

5.3.3. AES 8-Bit

At least we attack a single AES S-Box implementation. The plain CMOS circuit without anyside-channel attack countermeasures is not very resistantagainst correlation power attackswith the HD or HW model. At a SNR of which can be perceived in Figure 5.9.

5.3.4. iMDPL PRESENT4-Bit

The next DUT is the side-channel resistant iMDPL implementation. The datapath is 4-bit andwe used 512 random plaintexts for simulation. The high number of plaintexts is necessary toget a sound mean estimator required for the difference of HD/HW attack, see Section 2.6.2.4.The better the mean value can be estimated, the better the influence of the mask bit can beremoved. As shown in Figure 5.10 the standard CPA using eitherthe HW or HD model is notsuccessful. The power consumption during the storing eventseems very constant for differ-ent key. The correlation coefficient between the simulated traces and the predicted ones doesnot show significant peaks for a special key hypothesis. Thisis due to the hiding and masking

5.3 Results 63

0

20

40

60

80

100

-2 -1 0 1 2 3

Success Rate [%]

SNR [dB]


(a) HD

0

20

40

60

80

100

-2 -1 0 1 2 3 4 5

Success Rate [%]

SNR [dB]

PRESENT CPA with 256 different plaintexts and hw-Model

(b) HW

Figure 5.9.: Success rate over SNR for AES-8 with HD and HW model.

mechanisms embedded into the iMDPL logic style. The successrates for both prediction mod-els and different simulation resolutions are shown in Figure 5.11. The HW method generatedthe best results, but the best success rate that can be achieved is 81.25%. The HD model in-stead is not able to reveal steadily at a given SNR. To attack logic styles that implement singlemasking bit countermeasure a new class of attacks came up. One of these second order attacksis the zero offset method described in Section 2.6.2.3. The additional step of squaring eachpoint of the power trace leads to better results than using the basic correlation coefficient withHD/HW model. The success rate of the zero-offset method combined with the HD is shownin the left graph of Picture 5.12. The result of zero-offset and HW is shown in the right graph.Both success rates seem to show the same results as if no preprocessing step was performed.The hightest success rate that can be achieved with the ZOHD1 attack is still about 20% andnot stable. However, the ZOHW2 attack has a stable success rate of 56% for a SNR above50dB.

Because the iMDPL logic style seem to offer a resistance against side-channel attacks, wegave the newly prosed difference of HD/HW (see Section 2.6.2.4) a try. It was introducedto attack the side-channel resistant flip-flops, thus we applied it to the time interval aroundthe rising edge of the clock. The results in Figure 5.13 show that all key hypotheses can bediscovered with a SNR of 30dB and above. If we change the simulation resolution to 0.001nsit has no effect on this barrier.

The difference of Hamming weight model is not the right choice to attack the iMDPL flip-flop. Like listed in Table 5.1, only a few keys can be recovered.

1Zero-Offset Hamming Distance2Zero-Offset Hamming Weight

64 Power Analysis

320.00

254.00

188.00

122.00

56.00

-10.0078 79 80 81 82 83

Current [mA]

Time [ns]

0.00

78 79 80 81 82 83

Correlation

Time [ns]

3.00

2.00

1.00

0.00

78 79 80 81 82 83

V(CLK)

Time [ns]

PRESENT CPA for key 7 with 512 different plaintexts and hd-Model and SNR 100

(a) HD

320.00

254.00

188.00

122.00

56.00

-10.0078 79 80 81 82 83

Current [mA]

Time [ns]

0.25

0.00

-0.2578 79 80 81 82 83

Correlation

Time [ns]

3.00

2.00

1.00

0.00

78 79 80 81 82 83

V(CLK)

Time [ns]


(b) HW

Figure 5.10.: Success rate for SNR 100 of iMDPLPRESENT-4 with HD and HW model

Table 5.1.: Success rates of iMDPLPRESENT-4 at 0.01 ns using the DHW power modelSNR Keys computation[dB] ok wrong % found time20 1 15 6,25 13 8 min 19 sec30 5 11 31,25 2, 5, 11, 12, 15 8 min 9 sec40 0 16 0 - 8 min 29 sec50 1 15 6,25 11 8 min 31 sec

5.3 Results 65

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]


(a) HW, 0.01ns

0

20

40

60

80

100

0 10 20 30 40 50

Success Rate [%]

SNR [dB]


(b) HW, 0.001ns

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]


(c) HD, 0.01ns

0

20

40

60

80

100

0 10 20 30 40 50

Success Rate [%]

SNR [dB]


(d) HD, 0.001ns

Figure 5.11.: Success rate over SNR iMDPLPRESENT-4 with HD and HW model

66 Power Analysis

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]

PRESENT CPA with 512 different plaintexts and zohd-Model

(a) ZOHD, 0.01ns

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]

PRESENT CPA with 512 different plaintexts and zohw-Model

(b) ZOHW, 0.01ns

Figure 5.12.: Success rate over SNR iMDPLPRESENT-4 with ZOHD and ZOHW model

0

20

40

60

80

100

20 25 30 35 40 45 50

Success Rate [%]

SNR [dB]

PRESENT CPA with 512 different plaintexts and dhd-Model

(a) DHD, 0.01ns

0

20

40

60

80

100

0 10 20 30 40 50

Success Rate [%]

SNR [dB]

PRESENT CPA with 512 different plaintexts and dhd-Model

(b) DHD, 0.001ns

Figure 5.13.: Success rate over SNR iMDPLPRESENT-4 with DHD model. Left simulationresolution 0.01 ns, right 0.001 ns

5.3 Results 67

0.07

0.03

-0.01

-0.05125 130 135 140 145

Current [mA]

Time [ns]

0.50

0.25

0.00

-0.25

-0.50125 130 135 140 145

Correlation

Time [ns]

3.00

2.00

1.00

0.00

125 130 135 140 145

V(PC2)

Time [ns]


(a) HW, 10ns

0.01 0.01 0.00-0.00-0.01-0.02-0.02

1250 1300 1350 1400 1450Current [mA]

Time [ns]

0.50

0.25

0.00

-0.25

-0.501250 1300 1350 1400 1450

Correlation

Time [ns]

3.00

2.00

1.00

0.00

1250 1300 1350 1400 1450

V(PC2)

Time [ns]


(b) HW, 100ns

Figure 5.14.: Power traces for SNR 100 of PALPRESENT-4 HW model and different clockperiod

5.3.5. PAL PRESENT4-Bit

The last DUT is a 4-bitPRESENTimplementation using PAL gates. This adiabatic logic styleseems to offer side-channel resistance without special countermeasures implemented. Due tothe charge-recovery behavior the signals are routed in a dual-rail manner, which belongs tothe class of hiding countermeasures. Simply because no masking is used first order attacksare applied to reveal the keys. The power clock signal, powertraces, and HD/HD correlationare shown in Figure 5.14 for key 7. Evaluation time of the attack are two periods of thesinusoidal power clock. During the first peak the value is read in the buffer gate and duringthe second peak the output becomes stable. In the middle of the rising and the falling periodhigh peaks are visible in the power traces plot. At this time the sense amplifier stage ofthe PAL gate is turned off or on. This event leaks data dependent information. Especiallythe capturing event of values computed by the previous stageof gates correlates with theHW or HD model. Furthermore, the power traces shown in the diagram highlight anothercharacteristic of adiabatic gates. The left traces are simulated with a power clock period of10ns and the current peaks have an amplitude of 120 nA. When lowering the frequency and

68 Power Analysis

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]


(a) HW, 10ns

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]


(b) HW, 100ns

Figure 5.15.: Success rate over SNR PALPRESENT-4 with HD model. Left PC period 10 ns,right 100 ns

using a power clock period of 100ns the amplitude shrinks to 40 nA. The highest correlationdoes not match the right key anymore. The effect of side-channel resistance that depends onthe power clock period is illustrated in Figure 5.15. The left success rate over SNR diagramshows that more than 90% of the keys can be revealed up to a SNR of 30dB for a PC periodof 10 ns. If the power clock period is raised and thus the operating frequency is loweredthe success rate gets worse, too. The other prediction models like HD (left) or second orderattacks like ZOHW (right) did not lead to better result as shown in Figure 5.16

5.4. Appraisement

After the evaluation of the side-channel resistance of several PRESENTdatapath implemen-tations considering many aspects like power prediction model, simulation time accuracy andSNR we have shown the well known fact that unprotected CMOS circuits of block cipherimplementations are very susceptible to first order side-channel attacks. A singlePRESENTS-Box promises slightly better resistance compared to an AES S-Box or two PRESENTS-Boxsin parallel. Both 8-bit architectures show approximately the same SNR characteristics. If thecircuit should be protected to prevent the radiation of side-channel leakage other logic stylesthen basic CMOS must be used. The practically approved iMDPL logic can be broken withan attack on the flip-flops using the difference of Hamming distance model. Therefore, weinvestigate aPRESENTS-Box circuit implemented in adiabatic logic. The PAL deviceundertest shows a good resistance without any build in masking countermeasure. Most exciting isthe behavior if the operating frequency is lowered, which isequivalent to a longer power clockperiod. The success rate over SNR is decreasing. CMOS circuits do not change their successrate according to the frequency. Figure 5.17 plots the HW correlation of the PAL circuit ac-

5.4 Appraisement 69

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]


(a) HD, 10ns

0

20

40

60

80

100

0 20 40 60 80 100

Success Rate [%]

SNR [dB]

PRESENT CPA with 80 different plaintexts and zohw-Model

(b) ZOHW,10ns

Figure 5.16.: Success rate over SNR PALPRESENT-4 with HD and ZOHW model.

-1

-0.5

0

0.5

1

10 20 30 40 50 60 70

Correlation

Number of Plaintexts


(a) HW, 10ns

-1

-0.5

0

0.5

1

10 20 30 40 50 60 70

Correlation

Number of Plaintexts


(b) HW, 100ns

Figure 5.17.: Success rate over number of plaintexts for PALPRESENT-4 with HD model. LeftPC period 10 ns, right period 100 ns. Key 7

70 Power Analysis

cording to the number of power traces that were encrypted. The left graph shows the changeof the correlation coefficient for a power clock period of 10 ns. The green line represents thechosen correlation after 80 encryptions. A stable result isexistent after 10 random encryptionoperations. The right key was guessed. To the right the graphshows that the wrong key can-didate was chosen with a power clock period of 100 ns. Now, a stable prediction can be madewith more than 30 random inputs.

6. Conclusion and Further Work

This chapter summarizes the thesis. The possible application scenarios of differentPRESENT

architectures are described. Due to the reduced size of the serial implementation and thus lessnoise that is overlapping the data signal there is a higher risk of side-channel attacks. Also,the advantages of adiabatic logic styles in the context of side channel analysis are pointed outand an outlook of the possible further work is given.

6.1. Conclusion

This work explored the new lightweight block cipherPRESENT. The potential applicationfields range from high performance encryption to RFID tags with area and power constraints.But this thesis focus on a serialized architecture that consumes only an area of 1000 GE.

The main topic are the side-channel characteristics of suchan small footprint implementa-tion. By simulating the power consumption of thePRESENTS-Box we showed, that CMOSlogic does not provide any protection against side-channelattacks. But even so-called side-channel resistant logic styles can be overcome with a passable amount of measurements andthe right leakage model. So we introduced a new class of logicstyles, that has not been in thespotlight of side-channel investigations, so far. After comparing gates in different adiabaticlogic implementations, we chose PAL as the best candidate for further simulations. A smalltest circuit of anPRESENTS-Box implemented in CMOS, iMDPL, and PAL was simulated. Asa result PAL offers the best protection against side-channel attacks. We also showed a strongfrequency dependency of the side-channel leakage of adiabatic logic. The lower the operatingfrequency is the less noise is needed to hide the data dependency of the power consumption.

6.2. Further Work

Adiabatic logic seems to be good countermeasure against side-channel attacks. The previ-ous shown results are found only by simulating the circuit. So, the next step to prove theside-channel resistance would be to manufacture real hardware that implements a cipher,e.g. a lightweight block cipher likePRESENT, using adiabatic logic. Several other disturb-ing sources, like parasitic capacitances of wires or the power clock generator, could influencethe side-channel leakage.

A. Bibliography

[1] C.H. Bennett and R. Landauer. The Fundamental Physical Limits of Computation. vol-ume 253, pages 48–56, 1985.

[2] Andrey Bogdanov, Gregor Leander, Lars R. Knudsen, ChristofPaar, Axel Poschmann,Matthew J. Robshaw, Yannick Seurin, and Christine Vikkelsoe.PRESENT - An Ultra-Lightweight Block Cipher. InCryptographic Hardware and Embedded Systems - CHES,LNCS. Springer, 2007. to appear.

[3] Cast Inc. Cast aes32-c. www.cast-inc.com.

[4] C.J. Cellucci, A.M. Albano, and P.E. Rapp. Statistical validation of mutual informa-tion calculations: Comparison of alternative numerical algorithms. Physical Review E,71:66208, 2005.

[5] Zhimin Chen and Yujie Zhou. Dual-Rail Random Switching Logic: A Countermeasureto Reduce Side Channel Leakage. InCryptographic Hardware and Embedded Systems -CHES, volume 4249 ofLNCS, pages 242–254. Springer, 2006.

[6] John W. Eaton. GNU Octave 3.0, 2007.http://www.octave.org/.

[7] ECRYPT Network of Excellence. The Stream Cipher Project: eSTREAM; Available viawww.ecrypt.eu.org/stream.

[8] GNU Emacs. http://www.gnu.org/software/emacs.

[9] EUROPRACTICE - IC Service.http://www.europractice-ic.com/.

[10] M. Feldhofer, J. Wolkerstorfer, and V. Rijmen. AES implementation on a grain of sand.In Information Security, IEE Proceedings, volume 152, pages 13–20, Oct. 2005.

[11] Martin Feldhofer, Sandra Dominikus, and Johannes Wolkerstorfer. Strong Authenti-cation for RFID Systems Using the AES Algorithm. InCryptographic Hardware andEmbedded Systems - CHES, pages 357–370, 2004.

[12] M.P. Frank. Common Mistakes in Adiabatic Logic Design and How to Avoid Them.Proceedings of the Workshop on Methodologies in Low-Power Design, Las Vegas, pages216–222, 2003.

[13] D.D. Gajski and R. Kuhn. Guest Editors’ Introduction: New VLSI Tools. Computer,16(12):11–14, December 1983.

http://www.octave.org/

http://www.europractice-ic.com/

74 A. Bibliography

[14] Benedikt Gierlichs. DPA-Resistance Without Routing Constraints? InCryptographicHardware and Embedded Systems - CHES, volume 4727 ofLNCS, pages 107–120.Springer, 2007.

[15] Benedikt Gierlichs, Lejla Batina, Pim Tuyls, and Bart Preneel. Mutual information anal-ysis - a generic side-channel distinguisher. In Elisabeth Oswald and Pankaj Rohatgi, ed-itors,Cryptographic Hardware and Embedded Systems - CHES, Lecture Notes in Com-puter Science, page 17, Washington DC,US, 2008. Springer-Verlag.

[16] T. Good and M. Benaissa. Hardware Results for selected Stream Cipher Candidates.State of the Art of Stream Ciphers 2007 (SASC 2007), Workshop Record, February2007.

[17] Sylvain Guilley, Philippe Hoogvorst, Yves Mathieu, and Renaud Pacalet. The "BackendDuplication" Method. InCryptographic Hardware and Embedded Systems - CHES,volume 3659 ofLNCS, pages 383–397. Springer, 2005.

[18] Panu Hämäläinen, Timo Alho, Marko Hännikäinen, and Timo D. Hämäläinen. Designand implementation of low-area and low-power aes encryption hardware core. InDSD,pages 577–583, 2006.

[19] Deukjo Hong, Jaechul Sung, Seokhie Hong, Jongin Lim, Sangjin Lee, Bon-Seok Koo,Changhoon Lee, Donghoon Chang, Jesang Lee, Kitae Jeong, Hyun Kim, Jongsung Kim,and Seongtaek Chee. HIGHT: A New Block Cipher Suitable for Low-Resource Device,2006.

[20] Mehrdad Khatir and Amir Moradi. Secure adiabatic logic: a low-energy dpa-resistantlogic style. 2008.http://eprint.iacr.org/.

[21] Suhwan Kim and M.C. Papaefthymiou. True single-phase adiabatic circuitry. IEEETransactions on Very Large Scale Integration (VLSI) Systems, 9(1):52–63, Feb. 2001.

[22] Suhwan Kim, C.H. Ziesler, and M.C. Papaefthymiou. Charge-recovery computing onsilicon. IEEE Transactions on Computers, 54(6):651–659, Jun 2005.

[23] Paul C. Kocher, Joshua Jaffe, and Benjamin Jun. Differential Power Analysis. InCRYPTO 1999, volume 1666 ofLNCS, pages 388–397. Springer, 1999.

[24] Konrad J. Kulikowski, Mark G. Karpovsky, and AlexanderTaubin. Power Attacks onSecure Hardware Based on Early Propagation of Data. InIEEE International Symposiumon On-Line Testing - IOLTS 2006, pages 131–138. IEEE Computer Society, 2006.

[25] G. Leander and A. Poschmann. On the classification of 4 bit s-boxes. InWAIFI ’07:Proceedings of the 1st international workshop on Arithmeticof Finite Fields, pages 159–176, Berlin, Heidelberg, 2007. Springer-Verlag.

[26] Changning Luo and Jianping Hu. Single-phase adiabatic flip-flops and sequential circuitsusing improved cal circuits. InProc. 7th International Conference on ASIC ASICON ’07,pages 126–129, 22–25 Oct. 2007.

http://eprint.iacr.org/

A. Bibliography 75

[27] François Macé, François-Xavier Standaert, and Jean-Jacques Quisquater. Informationtheoretic evaluation of side-channel resistant logic styles. InCryptographic Hardwareand Embedded Systems - CHES, pages 427–442, Berlin, Heidelberg, 2007. Springer-Verlag.

[28] François Mace, François-Xavier Standaert, and Jean-Jacques Quisquater. ASIC Imple-mentations of the Block Cipher SEA for Constrained Applications. In Proceedings ofthe Third International Conference on RFID Security - RFIDSec 2007, pages 103 – 114,Malaga, Spain, 2007.

[29] H. Mahmoodi-Meimand, A. Afzali-Kusha, and M. Nourani.Efficiency of adiabatic logicfor low-power, low-noise VLSI.Circuits and Systems, 2000. Proceedings of the 43rdIEEE Midwest Symposium on, 1, 2000.

[30] D. Maksimovic, V.G. Oklobdzija, B. Nikolic, and K.W. Current. Clocked cmos adiabaticlogic with integrated single-phase power-clock supply.IEEE Transactions on Very LargeScale Integration (VLSI) Systems, 8(4):460–463, Aug. 2000.

[31] S. Mangard, E. Oswald, and T. Popp.Power Analysis Attacks: Revealing the Secrets ofSmart Cards. Springer Verlag, 2007.

[32] Stefan Mangard, Thomas Popp, and Berndt M. Gammel. Side-Channel Leakage ofMasked CMOS Gates. InCT-RSA 2005, volume 3376 ofLNCS, pages 351–365.Springer, 2005.

[33] Stefan Mangard, Norbert Pramstaller, and Elisabeth Oswald. Successfully AttackingMasked AES Hardware Implementations. InCryptographic Hardware and EmbeddedSystems - CHES, volume 3659 ofLNCS, pages 157–171. Springer, 2005.

[34] Mentor Graphics Corporation. http://www.mentor.com.

[35] Amir Moradi, Thomas Eisenbarth, Axel Poschmann, Carsten Rolfes, Christof Paar, Mo-hammad T. Manzuri Shalmani, and Mahmoud Salmasizadeh. Information Leakage ofFlip-Flops in DPA-Resistant Logic Styles. Cryptology ePrintArchive, Report 2008/188,2008.http://eprint.iacr.org/2008/188.

[36] Amir Moradi, Mehrdad Khatir, Mahmoud Salmasizadeh, and Mohammad T. ManzuriShalmani. Investigating the dpa-resistance property of charge recovery logics.Cryptol-ogy ePrint Archive, Report 2008/192, 2008.http://eprint.iacr.org/2008/192.

[37] Amir Moradi, Mahmoud Salmasizadeh, and Mohammad T. Manzuri Shalmani. PowerAnalysis Attacks on MDPL and DRSL Implementations. InInformation Security andCryptology - ICISC 2007, volume 4817 ofLNCS, pages 259–272. Springer, 2007.

[38] National Institute of Standards and Technology. Advanced Encryption Standard.FIPS,197, November 2001.http://www.itl.nist.gov/fipspubs/.

[39] National Security Agency (NSA). TEMPEST: A Signal Problem.Cryptologic Spectrum,Vol. 2(No. 3), 1972 (declassified 2007).

http://eprint.iacr.org/2008/188


http://www.itl.nist.gov/fipspubs/

76 A. Bibliography

[40] V.G. Oklobdzija, D. Maksimovic, and Fengcheng Lin. Pass-transistor adiabatic logicusing single power-clock supply.IEEE Transactions on Circuits and Systems II: Analogand Digital Signal Processing, 44(10):842–846, Oct. 1997.

[41] Thomas Popp, Mario Kirschbaum, Thomas Zefferer, and Stefan Mangard. Evaluation ofthe Masked Logic Style MDPL on a Prototype Chip. InCryptographic Hardware andEmbedded Systems - CHES, pages 81–94, 2007.

[42] Thomas Popp and Stefan Mangard. Masked Dual-Rail Pre-charge Logic: DPA-Resistance Without Routing Constraints. InCryptographic Hardware and EmbeddedSystems - CHES, pages 172–186, 2005.

[43] Norbert Pramstaller, Stefan Mangard, Sandra Dominikus, and Johannes Wolkerstorfer.Efficient aes implementations on asics and fpgas. InAES Conference, pages 98–112,2004.

[44] Jan M. Rabaey.Low Power Design Methodologies. Kluwer, 1996.

[45] Jan M. Rabaey.Digital Integrated Circuits. Prentice Hall, second edition, 2003.

[46] Francesco Regazzoni, Stephane Badel, Thomas Eisenbarth, Johann Großschädl, AxelPoschmann, Zeynep Toprak, Marco Macchetti, Laura Pozzi, Christof Paar, YusufLeblebici, and Paolo Ienne. Simulation-based Methodologyfor Evaluating DPA-Resistance of Cryptographic Functional Units with Application to CMOS and MCMLTechnologies. InProceedings of International Conference on Embedded Computer Sys-tems: Architectures, Modeling, and Simulation (SAMOS IC 07), July 2007.

[47] Reliable Computing Laboratory at Boston University. Sidechannel attacks database.http://www.sidechannelattacks.com.

[48] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand. Leakage current mechanismsand leakage reduction techniques in deep-submicrometer CMOS circuits. Proceedingsof the IEEE, 91(2):305–327, Feb. 2003.

[49] Akashi Satoh, Sumio Morioka, Kohji Takano, and Seiji Munetoh. A compact rijndaelhardware architecture with s-box optimization. InASIACRYPT, pages 239–254, 2001.

[50] Patrick Schaumont and Kris Tiri. Masking and Dual-Rail Logic Don’t Add Up. InCryptographic Hardware and Embedded Systems - CHES, volume 4727 ofLNCS, pages95–106. Springer, 2007.

[51] F.-X. Standaert, G. Piret, N. Gershenfeld, and J.-J. Quisquater. Sea: A scalable encryp-tion algorithm for small embedded applications. In J. Domingo-Ferrer, J. Posegga, andD. Schreckling, editors,Smart Card Research and Applications, Proceedings of CARDIS2006, volume 3928 ofLNCS, pages 222–236. Springer-Verlag, 2006.

[52] Francois-Xavier Standaert, Tal G. Malkin, and Moti Yung. A unified framework forthe analysis of side-channel key recovery attacks. Cryptology ePrint Archive, Report2006/139, 2006.http://eprint.iacr.org/.

http://www.sidechannelattacks.com

http://eprint.iacr.org/

A. Bibliography 77

[53] François-Xavier Standaert, Benedikt Gierlichs, and Ingrid Verbauwhede. Partition vs.comparison side-channel distinguishers. InInformation Security and Cryptology - ICISC2008: 11th International Conference, Lecture Notes in Computer Science, page 16,Seoul,KR, 2008. Springer-Verlag.

[54] Daisuke Suzuki and Minoru Saeki. Security Evaluation of DPA Countermeasures UsingDual-Rail Pre-charge Logic Style. InCryptographic Hardware and Embedded Systems- CHES, volume 4249 ofLNCS, pages 255–269. Springer, 2006.

[55] Daisuke Suzuki, Minoru Saeki, and Tetsuya Ichikawa. DPA Leakage Models for CMOSLogic Circuits. InCryptographic Hardware and Embedded Systems - CHES, volume3659 ofLNCS, pages 366–382. Springer, 2005.

[56] Daisuke Suzuki, Minoru Saeki, and Tetsuya Ichikawa. Random Switching Logic: A NewCountermeasure against DPA and Second-Order DPA at the LogicLevel. IEICE Trans.Fundam. Electron. Commun. Comput. Sci., E90-A(1):160–168, 2007. Also available athttp://eprint.iacr.org/2004/346.

[57] Synopsys.Design Compiler Reference Manual: Optimization and Timing Analysis Ver-sion Y-2006.06, June 2006.

[58] Synopsys.Power Compiler User Guide Version Y-2006.06, June 2006.

[59] Synopsys Inc. http://www.synopsys.com/.

[60] Russell Lang Dave Kotz John Campbell Gershon Elber Alexander WooThomas Williams, Colin Kelley. gnuplot 4.0.http://www.gnuplot.info/.

[61] K. Tiri and I. Verbauwhede. Charge recycling sense amplifier based logic: securing lowpower security ics against dpa [differential power analysis]. In Proc. Proceeding of the30th European Solid-State Circuits Conference ESSCIRC 2004, pages 179–182, 2004.

[62] Kris Tiri, Moonmoon Akmal, and Ingrid Verbauwhede. A Dynamic and DifferentialCMOS Logic with Signal Independent Power Consumption to Withstand DifferentialPower Analysis on Smart Cards. InEuropean Solid-State Circuits Conference - ESS-CIRC 2002, pages 403–406, 2002.

[63] Kris Tiri, David Hwang, Alireza Hodjat, Bo-Cheng Lai, Shenglin Yang, Patrick Schau-mont, and Ingrid Verbauwhede. Prototype IC with WDDL and Differential Routing- DPA Resistance Assessment. InCryptographic Hardware and Embedded Systems -CHES, volume 3659 ofLNCS, pages 354–365. Springer, 2005.

[64] Kris Tiri and Ingrid Verbauwhede. A Logic Level Design Methodology for a Secure DPAResistant ASIC or FPGA Implementation. InDesign, Automation and Test in EuropeConeference - DATE 2004, pages 246–251, 2004.

[65] Kris Tiri and Ingrid Verbauwhede. Place and Route for Secure Standard Cell Design. InConference on Smart Card Research and Advanced Applications -CARDIS 2004, pages143–158. Kluwer, 2004.


http://www.gnuplot.info/

78 A. Bibliography

[66] Alain Vachoux.Top-down digital design flow Version 3.1. Microelectronic Systems LabEcole Polytechnique Federale de Lausanne, 2006.

[67] Jason Waddle and David Wagner. Towards efficient second-order power analysis. InCryptographic Hardware and Embedded Systems - CHES, volume 3156 ofLecture Notesin Computer Science, pages 1–15. Springer, 2004.

[68] C.H. Ziesler, Joohee Kim, M.C. Papaefthymiou, and SuhwanKim. Energy recoverydesign for low-power asics. InProc. IEEE International [Systems-on-Chip] SOC Con-ference, pages 424–427, 17–20 Sept. 2003.

B. Detailed Synthesis Results

Following abbreviations are used in the subsequent tablesCur - CurrentTput/Area - Throughput/AreamFreq - maximum FrequencymTput - maximum Throughput

Library Area Area Power Cur Tput/Area Path mFreq mTput[GE] [µm2] [µW] [µA] [kbps/µm2] [ns] [GHz] [Mbps]

AMI 0.35 µm 1,524.77 82,338 33.40 10.12 0.0024 1.53 0.65 1,307.2IHP 0.25µm 1,594.25 44,996 4.84 1.94 0.0044 0.72 1.39 2,777.8

UMC 0.18µm 1,650.30 15,970 3.86 2.14 0.0125 4.57 0.22 437.6better is lower lower lower lower higher lower higher higher

Table B.1.: Implementation results of round @ 100 kHz


AMI 0.35 µm 1,560.5 84,268 3520.0 1066.7 0.2450 1.23 0.81 1,678.5IHP 0.25µm 1,594.2 44,996 436.0 174.4 0.4588 0.61 1.64 3,384.5

UMC 0.18µm 1,706.0 16,509 77.1 42.8 1.2506 0.51 1.96 4,048.1better is lower lower lower lower higher lower higher higher

Table B.2.: Implementation results of round @ 10 MHz

80 Detailed Synthesis Results


AMI 0.35 µm 24,247 1,309,354 772.0 233.9 0.0049 13.84 0.07 4,624.3IHP 0.25µm 25,193 711,047 121.0 48.4 0.0090 4.98 0.20 12,851.4

UMC 0.18µm 27,009 261,366 72.2 40.1 0.0245 6.78 0.15 9,439.5better is lower lower lower lower higher lower higher higher

Table B.3.: Implementation results of pipeline @ 100 kHz


AMI 0.35 µm 24,346 1,314,677 81295.0 24634.8 0.4868 12.8 0.08 5,000IHP 0.25µm 25,193 711,047 11659.0 4663.6 0.9001 4.78 0.21 13,389

UMC 0.18µm 27,028 261,547 6888.0 3826.7 2.4470 6.26 0.16 10,224better is lower lower lower lower higher lower higher higher

Table B.4.: Implementation results of pipeline @ 10 MHz


AMI 0.35 µm 999.5 53,974 11.20 3.39 0.0002 1.89 0.5 60.1IHP 0.25µm 1,168.8 32,987 4.24 1.70 0.0003 0.66 1.5 172.2


Table B.5.: Implementation results of serial @ 100 kHz

Library Area Area Power Cur. Tput/Area cPath mFreq. mTput[GE] [µm2] [µW] [µA] [kbps/µm2] [ns] [GHz] [Mbps]

AMI 0.35 µm 1,001.19 54,064 1123.00 340.30 0.0210 1.44 0.69 78.9IHP 0.25µm 1,168.75 32,987 421.00 168.40 0.0345 0.62 1.61 183.3


Table B.6.: Implementation results of serial @ 10 MHz

C. Detailed Adiabatic Logic Results

C.1. Power traces NAND

Simulated power traces of NAND gates implemented in different adiabatic logic styles. Theblue shadow marks the values that are used to calculate the mutual information.

Cur

rent

[A]

DUT: CMOS−NAND2 Clockperiod: 4ns

0 1000 2000 3000 4000 5000 6000 7000

0

5

10

15

x 10−5

Out

put [

V]

0 1000 2000 3000 4000 5000 6000 70000

1

2

3

Inpu

ts [V

]

0 1000 2000 3000 4000 5000 6000 70000

1

2

3

Simulation Samples

PC

[V]

0 1000 2000 3000 4000 5000 6000 70000

1

2

3

(a)

Cur

rent

[A]

DUT: CMOS−NAND2 Clockperiod: 400ns

0 1 2 3 4 5 6 7

x 105

02468

x 10−5

Out

put [

V]

0 1 2 3 4 5 6 7

x 105

0

1

2

Inpu

ts [V

]

0 1 2 3 4 5 6 7

x 105

0

1

2

Simulation Samples

PC

[V]

0 1 2 3 4 5 6 7

x 105

0

1

2

(b)

Figure C.1.: Power traces of CMOS NAND gates with different PC period

82 Detailed Adiabatic Logic Results

Cur

rent

[A]

DUT: CAL−NAND2 Clockperiod: 4ns

0 2000 4000 6000 8000 10000 12000 14000

0

2

4

x 10−5

Out

put [

V]

0 2000 4000 6000 8000 10000 12000 140000

1

2

Inpu

ts [V

]

0 2000 4000 6000 8000 10000 12000 140000

1

2

Simulation Samples

PC

[V]

0 2000 4000 6000 8000 10000 12000 140000

1

2

(a)C

urre

nt [A

]

DUT: CAL−NAND2 Clockperiod: 400ns

0 2 4 6 8 10 12 14

x 105

−2

0

2x 10

−5

Out

put [

V]

0 2 4 6 8 10 12 14

x 105

0

1

2

Inpu

ts [V

]

0 2 4 6 8 10 12 14

x 105

0

1

2

Simulation Samples

PC

[V]

0 2 4 6 8 10 12 14

x 105

0

1

2

(b)

Figure C.2.: Power traces of CAL NAND gates with different PC period

Cur

rent

[A]

DUT: iCAL−NAND2 Clockperiod: 4ns

0 2000 4000 6000 8000 10000 12000 14000

0

2

4

6

x 10−5

Out

put [

V]

0 2000 4000 6000 8000 10000 12000 140000

1

2

Inpu

ts [V

]

0 2000 4000 6000 8000 10000 12000 140000

1

2

Simulation Samples

PC

[V]

0 2000 4000 6000 8000 10000 12000 140000

1

2

(a)

Cur

rent

[A]

DUT: iCAL−NAND2 Clockperiod: 400ns

0 2 4 6 8 10 12 14

x 105

0

5

10

15

x 10−6

Out

put [

V]

0 2 4 6 8 10 12 14

x 105

0

1

2

Inpu

ts [V

]

0 2 4 6 8 10 12 14

x 105

0

1

2

Simulation Samples

PC

[V]

0 2 4 6 8 10 12 14

x 105

0

1

2

(b)

Figure C.3.: Power traces of iCAL NAND gates with different PC period

C.1 Power traces NAND 83

Cur

rent

[A]

DUT: PAL−NAND2 Clockperiod: 4ns

0 1000 2000 3000 4000 5000 6000 7000

−2

−1

0

1

x 10−5

Out

put [

V]

0 1000 2000 3000 4000 5000 6000 70000

1

2

Inpu

ts [V

]

0 1000 2000 3000 4000 5000 6000 7000

0.51

1.52

2.5

Simulation Samples

PC

[V]

0 1000 2000 3000 4000 5000 6000 70000

1

2

(a)

Cur

rent

[A]

DUT: PAL−NAND2 Clockperiod: 400ns

0 1 2 3 4 5 6 7

x 105

−6

−4

−2

0

x 10−7

Out

put [

V]

0 1 2 3 4 5 6 7

x 105

0

1

2

Inpu

ts [V

]0 1 2 3 4 5 6 7

x 105

0.51

1.52

2.5

Simulation Samples

PC

[V]

0 1 2 3 4 5 6 7

x 105

0

1

2

(b)

Figure C.4.: Power traces of PAL NAND gates with different PC period

Cur

rent

[A]

DUT: CRSABL−NAND2 Clockperiod: 4ns

0 1000 2000 3000 4000 5000 6000 7000 8000−2

0

2

4

6

x 10−5

Out

put [

V]

0 1000 2000 3000 4000 5000 6000 7000 80000

1

2

Inpu

ts [V

]

0 1000 2000 3000 4000 5000 6000 7000 80000

1

2

Simulation Samples

PC

[V]

0 1000 2000 3000 4000 5000 6000 7000 80000

1

2

(a)

Cur

rent

[A]

DUT: CRSABL−NAND2 Clockperiod: 400ns

0 1 2 3 4 5 6 7 8

x 105

−20246

x 10−5

Out

put [

V]

0 1 2 3 4 5 6 7 8

x 105

0

1

2

Inpu

ts [V

]

0 1 2 3 4 5 6 7 8

x 105

0

1

2

Simulation Samples

PC

[V]

0 1 2 3 4 5 6 7 8

x 105

0

1

2

(b)

Figure C.5.: Power traces of CRSABL NAND gates with different PC period

C.2 Logic Gates 85

C.2. Logic Gates

Schematics of logic gates implemented in adiabatic logic style

IN1

IN1

IN1b

IN2

IN2b

IN1b

IN2bIN2

vdd

gnd

OUT

Figure C.6.: CMOS 2-input XNOR gate


CX

PC

gnd

IN

OUTOUTb

CX

INb

Figure C.7.: CAL inverter

CX

PC

gnd

IN2

IN1

OUTOUTb

CX

IN1b

IN2bIN2IN2b

Figure C.8.: CAL 2-input XNOR gate

C.2 Logic Gates 87

OUT OUTb

INb IN

PC

Figure C.9.: PAL inverter

OUTb OUT

IN1b

IN2b

PC

IN1bIN1b

IN2b

IN1b

Figure C.10.: PAL 2-input XNOR gate


IN1

V

clk

VDD

IN1b

gnd

clk

OUTbOUT

inv1

inv1b

inv1b

inv1

gndgnd

Figure C.11.: CRSABL inverter

C.2 Logic Gates 89

IN2b

IN1b

V

IN1b

clk

VDD

IN2b

gnd

clk

OUTOUTb

xor1

xnor1

xnor1

xor1

gndgnd

IN2

Figure C.12.: CRSABL 2-input XNOR gate


10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2DUT: XNOR2 Clockperiod: 4ns


Mut

ual I

nfor

mat

ion

[bit]


Figure C.13.: Adiabatic XNOR gates at 4ns clock period

C.2 Logic Gates 91

10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8



Mut

ual I

nfor

mat

ion

[bit]




10−10

10−9

10−8

10−7

10−6

10−5

10−4

10−3

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8



Mut

ual I

nfor

mat

ion

[bit]



D. Detailed Side-Channel AnalysisResults

Listing D.1: CMOS VerilogPRESENTS-Box description1 module s b o x _ 4 b i t ( sbox_ in , sbox_ou t ) ;

input [ 3 : 0 ] sbox_ in ;output [ 3 : 0 ] sbox_ou t ;wire n1 , n2 , n3 , n4 , n5 , n6 , n7 , n8 , n9 , n10 , n11 , n12 , n13 , n14 , n15 ,n16 ,

5 n17 , n18 ;

ond3d1 U1 ( . A1( n1 ) , . A2( n2 ) , . B( n3 ) , . C( n4 ) , .ZN( sbox_ou t[ 3 ] ) ) ;mx22d1 U2 ( . I0 ( n5 ) , . I1 ( n6 ) , . S ( n7 ) , .ZN( n4 ) ) ;nr21d1 U3 ( . A1( sbox_ in [ 0 ] ) , . A2( sbox_ in [ 1 ] ) , .ZN( n7 ) ) ;

10 nr21d1 U4 ( . A1( sbox_ in [ 2 ] ) , . A2( n6 ) , .ZN( n5 ) ) ;ond1d1 U5 ( . A1( n2 ) , . A2( n3 ) , . B( n8 ) , .ZN( sbox_ou t [ 2 ] ) ) ;anr1d1 U6 ( . A1( n9 ) , . A2( n10 ) , . B( n11 ) , .ZN( n8 ) ) ;anr1d1 U7 ( . A1( sbox_ in [ 3 ] ) , . A2( n10 ) , . B( n12 ) , .ZN( n11 ) );in01d1 U8 ( . I ( n13 ) , .ZN( n12 ) ) ;

15 mx22d1 U9 ( . I0 ( n2 ) , . I1 ( n6 ) , . S ( sbox_ in [ 2 ] ) , .ZN( n9 ) ) ;nd31d1 U10 ( . A1( sbox_ in [ 1 ] ) , . A2( n6 ) , . A3( sbox_ in [ 2 ] ) , .ZN( n3 ) ) ;mx22d1 U11 ( . I0 ( n14 ) , . I1 ( n15 ) , . S ( sbox_ in [ 3 ] ) , .ZN( sbox_ou t [ 1 ] ) ) ;anr1d1 U12 ( . A1( sbox_ in [ 0 ] ) , . A2( n1 ) , . B( n13 ) , .ZN( n15 ) );nr21d1 U13 ( . A1( sbox_ in [ 0 ] ) , . A2( sbox_ in [ 2 ] ) , .ZN( n13 ) );

20 nd21d1 U14 ( . A1( sbox_ in [ 1 ] ) , . A2( n16 ) , .ZN( n1 ) ) ;ond1d1 U15 ( . A1( n16 ) , . A2( n2 ) , . B( sbox_ in [ 1 ] ) , .ZN( n14 ) );in01d1 U16 ( . I ( sbox_ in [ 0 ] ) , .ZN( n2 ) ) ;in01d1 U17 ( . I ( sbox_ in [ 2 ] ) , .ZN( n16 ) ) ;xn21d1 U18 ( . A1( n17 ) , . A2( n18 ) , .ZN( sbox_ou t [ 0 ] ) ) ;

25 xn21d1 U19 ( . A1( n6 ) , . A2( sbox_ in [ 0 ] ) , .ZN( n18 ) ) ;in01d1 U20 ( . I ( sbox_ in [ 3 ] ) , .ZN( n6 ) ) ;nd21d1 U21 ( . A1( sbox_ in [ 2 ] ) , . A2( n10 ) , .ZN( n17 ) ) ;in01d1 U22 ( . I ( sbox_ in [ 1 ] ) , .ZN( n10 ) ) ;

endmodule

Listing D.2: iMDPL SPICEPRESENTS-Box description1 *********************** Manual ly Genera ted PRESENT SBOX************************

.SUBCKT PRESENT_sbox_4bit i 0 i 1 i 2 i 3 m1 m2 m3 m4 m5 m6+ i 0 n o t i 1 n o t i 2 n o t i 3 n o t m1not m2not m3not m4not m5not m6not

5 +o0 o1 o2 o3 o0not o1not o2not o3not gnd vdd

X_s0_0 i 0 n o t i 2 m1 i 0 i 2 n o t m1not s0_0 s0_0no t gnd vdd _ANDNANDiMDPL_X_s0 s0_0 i 3 m1 s0_0no t i 3 n o t m1not s0 s0no t gnd vdd _ANDNANDiMDPL_X_s14 s0_0 i 1 n o t m1 s0_0no t i 1 m1not s14 s14no t gnd vdd _ANDNANDiMDPL_

10 X_s1_0 i 2 i 3 n o t m1 i 2 n o t i 3 m1not s1_0 s1_0no t gnd vdd _ANDNANDiMDPL_X_s1 s1_0 i 1 n o t m1 s1_0no t i 1 m1not s1 s1no t gnd vdd _ANDNANDiMDPL_X_s13 s1_0 i 0 n o t m1 s1_0no t i 0 m1not s13 s13no t gnd vdd _ANDNANDiMDPL_X_s2_0 i 0 n o t i 2 n o t m1 i 0 i 2 m1not s2_0 s2_0no t gnd vdd _ANDNANDiMDPL_X_s2 s2_0 i 3 n o t m1 s2_0no t i 3 m1not s2 s2no t gnd vdd _ANDNANDiMDPL_

15 X_s3 s2_0 i 1 n o t m2 s2_0no t i 1 m2not s3 s3no t gnd vdd _ANDNANDiMDPL_X_s4_0 i 0 i 2 m2 i 0 n o t i 2 n o t m2not s4_0 s4_0no t gnd vdd _ANDNANDiMDPL_X_s4 s4_0 i 3 n o t m2 s4_0no t i 3 m2not s4 s4no t gnd vdd _ANDNANDiMDPL_X_s10 s4_0 i 1 n o t m2 s4_0no t i 1 m2not s10 s10no t gnd vdd _ANDNANDiMDPL_X_s5_0 i 0 i 1 m2 i 0 n o t i 1 n o t m2not s5_0 s5_0no t gnd vdd _ANDNANDiMDPL_

20 X_s5 s5_0 i 2 n o t m2 s5_0no t i 2 m2not s5 s5no t gnd vdd _ANDNANDiMDPL_X_s7 s5_0 i 3 m2 s5_0no t i 3 n o t m2not s7 s7no t gnd vdd _ANDNANDiMDPL_X_s6_0 i 0 i 2 n o t m2 i 0 n o t i 2 m2not s6_0 s6_0no t gnd vdd _ANDNANDiMDPL_X_s6 s6_0 i 1 n o t m3 s6_0no t i 1 m3not s6 s6no t gnd vdd _ANDNANDiMDPL_X_s12 s6_0 i 3 m3 s6_0no t i 3 n o t m3not s12 s12no t gnd vdd _ANDNANDiMDPL_

25 X_s11 s0 i 1 m3 s0no t i 1 n o t m3not s11 s11no t gnd vdd _ANDNANDiMDPL_X_s9 s0_0 i 1 m3 s0_0no t i 1 n o t m3not s9 s9no t gnd vdd _ANDNANDiMDPL_X_s8_0 i 0 i 1 n o t m3 i 0 n o t i 1 m3not s8_0 s8_0no t gnd vdd _ANDNANDiMDPL_X_s8 s8_0 i 3 m3 s8_0no t i 3 n o t m3not s8 s8no t gnd vdd _ANDNANDiMDPL_X_s15_0 i 0 n o t i 1 m3 i 0 i 1 n o t m3not s15_0 s15_0no t gnd vdd _ANDNANDiMDPL_

94 Detailed Side-Channel Analysis Results

30 X_s15_1 s8_0 s15_0 m3 s8_0no t s15_0no t m3not s15_1 s15_1no t gnd vdd _ORNORiMDPL_X_s15_2 s15_1 i 3 n o t m4 s15_1no t i 3 m4not s15_2 s15_2no t gnd vdd _ANDNANDiMDPL_X_s15_3 s15_1no t i 3 m4 s15_1 i 3 n o t m4not s15_3 s15_3no t gnd vdd _ANDNANDiMDPL_X_s15_4 s15_2 s15_3 m4 s15_2no t s15_3no t m4not s15_4 s15_4no t gnd vdd _ORNORiMDPL_X_s15 s15_4 i 2 n o t m4 s15_4no t i 2 m4not s15 s15no t gnd vdd _ANDNANDiMDPL_

35X_t1 s0 s2 m4 s0no t s2no t m4not t 1 t 1 n o t gnd vdd _ORNORiMDPL_X_t2 s8 s9 m4 s8no t s9no t m4not t 2 t 2 n o t gnd vdd _ORNORiMDPL_X_t3 t 1 t 2 m4 t 1 n o t t 2 n o t m4not t 3 t 3 n o t gnd vdd _ORNORiMDPL_X_o0 t 3 s10 m4 t 3 n o t s10no t m4not o0 o0not gnd vdd _ORNORiMDPL_



50X_t10 s0 s4 m6 s0no t s4no t m6not t10 t 1 0 n o t gnd vdd _ORNORiMDPL_X_o3 t10 s15 m6 t 1 0 n o t s15no t m6not o3 o3not gnd vdd _ORNORiMDPL_.ENDS PRESENT_sbox_4bit

Listing D.3: PAL SpectrePRESENTS-Box description1 / / C e l l name : PAL_SBox

/ / View name : s c h e m a t i cs u b c k t _PAL_SBox PC1 PC2 gnd i 0 i 0 n o t i 1 i 1 n o t i 2 i 2 n o t i 3 i 3 n ot o0 o0not \

o1 o1not o2 o2not o3 o3not5 I96 ( o2_buf o2no t_bu f o1 o1not PC1 gnd ) _INVERTER_schematic

I98 ( x_o2 x_o2not o2_buf o2no t_bu f PC2 gnd ) _INVERTER_schematicI82 ( i 3 i 3 n o t i 0 _ b u f i 0 n o t _ b u f PC1 gnd ) _BUFFERI84 ( i 0 i 0 n o t i 3 _ b u f i 3 n o t _ b u f PC1 gnd ) _BUFFERI92 ( x_o1 x_o1not o2 o2not PC1 gnd ) _BUFFER

10 I86 ( i 2 _ b u f i 2 n o t _ b u f i 2_bu f2 i 2 n o t _ b u f 2 PC2 gnd ) _BUFFERI95 ( x_o3 x_o3not o0 o0not PC1 gnd ) _BUFFERI85 ( i 1 _ b u f i 1 n o t _ b u f i 1_bu f2 i 1 n o t _ b u f 2 PC2 gnd ) _BUFFERI105 ( s14_buf s14no t_bu f s14_buf2 s14no t_bu f2 PC2 gnd ) _BUFFERI71 ( t 4 t 4 n o t t 4 _ b u f t 4 n o t _ b u f PC2 gnd ) _BUFFER

15 I72 ( s12_buf s12no t_bu f s12_buf2 s12no t_bu f2 PC2 gnd ) _BUFFERI73 ( s10_buf s10no t_bu f s10_buf2 s10no t_bu f2 PC2 gnd ) _BUFFERI79 ( s5 s5no t s5_bu f s 5 n o t _ b u f PC1 gnd ) _BUFFERI78 ( s10 s10no t s10_buf s10no t_bu f PC1 gnd ) _BUFFERI81 ( i 2 i 2 n o t i 1 _ b u f i 1 n o t _ b u f PC1 gnd ) _BUFFER

20 I103 ( o0_buf o0no t_bu f o3 o3not PC1 gnd ) _BUFFERI104 ( x_o0 x_o0not o0_buf o0no t_bu f PC2 gnd ) _BUFFERI74 ( i 2_bu f3 i 2 n o t _ b u f 3 i2_bu f4 i 2 n o t _ b u f 4 PC2 gnd ) _BUFFERI77 ( s12 s12no t s12_buf s12no t_bu f PC1 gnd ) _BUFFERI75 ( i 2_bu f2 i 2 n o t _ b u f 2 i2_bu f3 i 2 n o t _ b u f 3 PC1 gnd ) _BUFFER

25 I106 ( s14 s14_no t s14_buf s14no t_bu f PC1 gnd ) _BUFFERI70 ( t10 t 1 0 n o t t 10_bu f t 1 0 n o t _ b u f PC2 gnd ) _BUFFERI69 ( s12_buf2 s12no t_bu f2 s12_buf3 s12no t_bu f3 PC1 gnd ) _BUFFERI83 ( i 1 i 1 n o t i 2 _ b u f i 2 n o t _ b u f PC1 gnd ) _BUFFERI68 ( t10_bu f t 1 0 n o t _ b u f t10_bu f2 t 1 0 n o t _ b u f 2 PC1 gnd ) _BUFFER

30 I87 ( i 3 _ b u f i 3 n o t _ b u f i 3_bu f2 i 3 n o t _ b u f 2 PC2 gnd ) _BUFFERX_s8 ( s8_0 s8_0no t i 3 _ b u f i 3 n o t _ b u f s8 s8no t PC2 gnd ) _ANDX_s15_2 ( s15_1 s15_1no t i 3 n o t _ b u f 2 i3_bu f2 s15_2 s15_2no tPC1 gnd ) \

_ANDX_s12 ( s6_0 s6_0no t i 3 _ b u f i 3 n o t _ b u f s12 s12no t PC2 gnd ) _AND

35 X_s7 ( s5_0 s5_0no t i 3 _ b u f i 3 n o t _ b u f s7 s7no t PC2 gnd ) _ANDX_s5 ( s5_0 s5_0no t i 2 n o t _ b u f i 2 _ b u f s5 s5no t PC2 gnd ) _ANDX_s10 ( s4_0 s4_0no t i 1 n o t _ b u f i 1 _ b u f s10 s10no t PC2 gnd ) _ANDX_s13 ( s1_0 s1_0no t i 0 n o t _ b u f i 0 _ b u f s13 s13no t PC2 gnd ) _ANDX_s3 ( s2_0 s2_0no t i 1 n o t _ b u f i 1 _ b u f s3 s3no t PC2 gnd ) _AND

40 X_s15_0 ( i 3 n o t i 3 i 2 i 2 n o t s15_0 s15_0no t PC1 gnd ) _ANDX_s6 ( s6_0 s6_0no t i 1 n o t _ b u f i 1 _ b u f s6 s6no t PC2 gnd ) _ANDX_s1 ( s1_0 s1_0no t i 1 n o t _ b u f i 1 _ b u f s1 s1no t PC2 gnd ) _ANDX_s9 ( s0_0 s0_0no t i 1 _ b u f i 1 n o t _ b u f s9 s9no t PC2 gnd ) _ANDX_s11 ( s0 s0no t i 1_bu f2 i 1 n o t _ b u f 2 s11 s11no t PC1 gnd ) _AND

45 X_s4 ( s4_0 s4_0no t i 3 n o t _ b u f i 3 _ b u f s4 s4no t PC2 gnd ) _ANDX_s0 ( s0_0 s0_0no t i 3 _ b u f i 3 n o t _ b u f s0 s0no t PC2 gnd ) _ANDX_s2_0 ( i 3 n o t i 3 i 1 n o t i 1 s2_0 s2_0no t PC1 gnd ) _ANDX_s4_0 ( i 3 i 3 n o t i 1 i 1 n o t s4_0 s4_0no t PC1 gnd ) _ANDX_s6_0 ( i 3 i 3 n o t i 1 n o t i 1 s6_0 s6_0no t PC1 gnd ) _AND

50 X_s15_3 ( s15_1no t s15_1 i3_bu f2 i 3 n o t _ b u f 2 s15_3 s15_3no t PC1 gnd ) \_AND

X_s1_0 ( i 1 i 1 n o t i 0 n o t i 0 s1_0 s1_0no t PC1 gnd ) _ANDX_s14 ( s0_0 s0_0no t i 1 n o t _ b u f i 1 _ b u f s14 s14_no t PC2 gnd ) _ANDX_s15 ( s15_4 s15_4no t i 2 n o t _ b u f 4 i2_bu f4 s15 s15no t PC1 gnd) _AND

55 X_s2 ( s2_0 s2_0no t i 3 n o t _ b u f i 3 _ b u f s2 s2no t PC2 gnd ) _ANDX_s5_0 ( i 3 i 3 n o t i 2 i 2 n o t s5_0 s5_0no t PC1 gnd ) _ANDX_s8_0 ( i 3 i 3 n o t i 2 n o t i 2 s8_0 s8_0no t PC1 gnd ) _AND

Detailed Side-Channel Analysis Results 95

X_s0_0 ( i 3 n o t i 3 i 1 i 1 n o t s0_0 s0_0no t PC1 gnd ) _ANDX_o3 ( t10_bu f2 t 1 0 n o t _ b u f 2 s15 s15no t x_o3 x_o3not PC2 gnd )_OR

60 X_t10 ( s0 s0no t s4 s4no t t10 t 1 0 n o t PC1 gnd ) _ORX_o0 ( t 3 t 3 n o t s10_buf2 s10no t_bu f2 x_o0 x_o0not PC1 gnd ) _ORX_t1 ( s0 s0no t s2 s2no t t 1 t 1 n o t PC1 gnd ) _ORX_t2 ( s8 s8no t s9 s9no t t 2 t 2 n o t PC1 gnd ) _ORX_t3 ( t 1 t 1 n o t t 2 t 2 n o t t 3 t 3 n o t PC2 gnd ) _OR

65 X_t4 ( s1 s1no t s3 s3no t t 4 t 4 n o t PC1 gnd ) _ORX_t5 ( s5_bu f s 5 n o t _ b u f s11 s11no t t 5 t 5 n o t PC2 gnd ) _ORX_t6 ( t 4 _ b u f t 4 n o t _ b u f t 5 t 5 n o t t 6 t 6 n o t PC1 gnd ) _ORX_t7 ( s1 s1no t s6 s6no t t 7 t 7 n o t PC1 gnd ) _ORX_t8 ( s7 s7no t s13 s13no t t 8 t 8 n o t PC1 gnd ) _OR

70 X_t9 ( t 7 t 7 n o t t 8 t 8 n o t t 9 t 9 n o t PC2 gnd ) _ORX_s15_4 ( s15_2 s15_2no t s15_3 s15_3no t s15_4 s15_4no t PC2 gnd ) _ORX_o1 ( t 6 t 6 n o t s12_buf3 s12no t_bu f3 x_o1 x_o1not PC2 gnd ) _ORX_s15_1 ( s8_0 s8_0no t s15_0 s15_0no t s15_1 s15_1no t PC2 gnd) _ORX_o2 ( t 9 t 9 n o t s14_buf2 s14no t_bu f2 x_o2 x_o2not PC1 gnd ) _OR

75 ends _PAL_SBox

Documents

Side-Channel Analysis Aspects of Lightweight Block Ciphers · Erklärung Ich versichere hiermit, dass ich meine Diplomarbeit mit dem Thema Side-Channel Analysis Aspects of Lightweight