14
eQASM: An Executable antum Instruction Set Architecture X. Fu 1, 2, * , L. Riesebos 1, 2 , M. A. Rol 1,3 , J. van Straten 4 , J. van Someren 1,2 , N. Khammassi 1,2 , I. Ashraf 1,2 , R. F. L. Vermeulen 1,3 , V. Newsum 5,1 , K. K. L. Loh 5,1 , J. C. de Sterke 6,1 , W. J. Vlothuizen 5,1 , R. N. Schouten 1,3 , C. G. Almudever 1,2 , L. DiCarlo 1,3 , K. Bertels 1,21 Tech, Del University of Technology, P.O. Box 5046, 2600 GA Del, e Netherlands 2 antum Computer Architecture Lab, Del University of Technology, Mekelweg 4, 2628 CD Del, e Netherlands 3 Kavli Institute of Nanoscience, Del University of Technology, P.O. Box 5046, 2600 GA Del, e Netherlands 4 Computer Engineering Lab, Del University of Technology, Mekelweg 4, 2628 CD Del, e Netherlands 5 Netherlands Organisation for Applied Scientic Research (TNO), P.O. Box 155, 2600 AD Del, e Netherlands 6 Topic Embedded Systems B.V., P.O. Box 440, 5680 AK Best, e Netherlands {* x.fu-1, k.l.m.bertels}@tudel.nl ABSTRACT A widely-used quantum programming paradigm comprises of both the data ow and control ow. Existing quantum hardware cannot well support the control ow, signicantly limiting the range of quantum soware executable on the hardware. By analyzing the constraints in the control microarchitecture, we found that exist- ing quantum assembly languages are either too high-level or too restricted to support comprehensive ow control on the hardware. Also, as observed with the quantum microinstruction set MIS [1], the quantum instruction set architecture (QISA) design may suer from limited scalability and exibility because of microarchitectural constraints. It is an open challenge to design a scalable and exible QISA which provides a comprehensive abstraction of the quantum hardware. In this paper, we propose an executable QISA, called eQASM, that can be translated from quantum assembly language (QASM), supports comprehensive quantum program ow control, and is ex- ecuted on a quantum control microarchitecture. With ecient tim- ing specication, single-operation-multiple-qubit execution, and a very-long-instruction-word architecture, eQASM presents bet- ter scalability than MIS. e denition of eQASM focuses on the assembly level to be expressive. antum operations are con- gured at compile time instead of being dened at QISA design time. We instantiate eQASM into a 32-bit instruction set targeting a seven-qubit superconducting quantum processor. We validate our design by performing several experiments on a two-qubit quantum processor. 1 INTRODUCTION antum computing is promising with the potential to accelerate solving some problems which are ineciently solved by classical computers, such as quantum chemistry simulation [2, 3]. A near- term goal is to develop a fully programmable quantum computer based on the circuit model with Noisy Intermediate-Scale antum (NISQ) technology [4] without quantum error correction [5]. To this end, a viable way for both quantum soware and hardware is to support the widely-used “quantum data, classical control” programming paradigm [6]. In this paradigm, the data ow is the state evolution subject to a sequence of classical or quantum operations, and the control ow is the order of operations that are executed, which could be directed by qubit measurement results. It is embodied in a wide range of quantum applications, including active qubit reset [7], teleportation [8], non-deterministic quantum gate decomposition [9], iterative quantum phase estimation [10], etc. ough high-level quantum soware can support this paradigm well, current quantum hardware cannot because of limited exe- cutability of existing low-level quantum assembly languages. As- sembly languages are introduced as human-readable representa- tion of the interface to hardware (or machine code). But exist- ing quantum assembly languages either incorporate too high-level constructs to be directly implemented by a microarchitecture (in- cluding QASM-HL [11], il [12], f-QASM [13], etc.), or are too restricted to provide a comprehensive abstraction of the quantum hardware which can support the required ow control (including OpenQASM [14], MIS [1], etc.). is fact signicantly limits the range of quantum soware which can be executed by the hardware. Required is a scalable and exible interface which can be ex- ecuted by the hardware to support the “quantum data, classical control” programming paradigm. Considering the constraints in microarchitecture implementation, designing such an interface is challenged by the diculty of (1) providing a comprehensive ab- straction of the quantum hardware [15] and (2) making the quantum instruction set architecture (QISA) scalable and exible. 1.1 Comprehensive Abstraction Challenge Existing quantum soware, including quantum programming lan- guages [16], compilers [11, 17, 18], and quantum assembly lan- guages [1114], can well describe both the data ow and the con- trol ow. e basic constructs of control ow include procedure, selection, loop, and recursion [6, 16]. Among them, selection and loop may use feedback based on qubit measurement results to select which instructions to execute in the following. However, existing programmable quantum hardware mostly focuses on supporting the data ow. ey cannot well support the control ow because they lack programmable feedback with sucient exibility [1921] (though feedback has been demonstrated with customized hard- ware in multiple experiments [7, 22–24]). e diculty of supporting programmable feedback in the hard- ware roots in the strict requirements on the electrical signals (e.g., precise parameters and timing) used to control qubits. To satisfy these requirements, a three-step procedure is usually used: (i) den- ing waveforms in digital format that are long enough to include In Proceedings of the 25th International Symposium on High-Performance Computer Architecture (HPCA’19) arXiv:1808.02449v3 [cs.AR] 9 Mar 2019

eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

eQASM: An Executable�antum Instruction Set ArchitectureX. Fu1, 2, ∗, L. Riesebos1, 2, M. A. Rol1,3, J. van Straten4, J. van Someren1,2, N. Khammassi1,2,

I. Ashraf1,2, R. F. L. Vermeulen1,3, V. Newsum5,1, K. K. L. Loh5,1, J. C. de Sterke6,1,W. J. Vlothuizen5,1, R. N. Schouten1,3, C. G. Almudever1,2, L. DiCarlo1,3, K. Bertels1,2†

1 �Tech, Del� University of Technology, P.O. Box 5046, 2600 GA Del�, �e Netherlands2 �antum Computer Architecture Lab, Del� University of Technology, Mekelweg 4, 2628 CD Del�, �e Netherlands

3 Kavli Institute of Nanoscience, Del� University of Technology, P.O. Box 5046, 2600 GA Del�, �e Netherlands4 Computer Engineering Lab, Del� University of Technology, Mekelweg 4, 2628 CD Del�, �e Netherlands

5 Netherlands Organisation for Applied Scienti�c Research (TNO), P.O. Box 155, 2600 AD Del�, �e Netherlands6 Topic Embedded Systems B.V., P.O. Box 440, 5680 AK Best, �e Netherlands

{∗ x.fu-1, † k.l.m.bertels}@tudel�.nl

ABSTRACTA widely-used quantum programming paradigm comprises of boththe data �ow and control �ow. Existing quantum hardware cannotwell support the control �ow, signi�cantly limiting the range ofquantum so�ware executable on the hardware. By analyzing theconstraints in the control microarchitecture, we found that exist-ing quantum assembly languages are either too high-level or toorestricted to support comprehensive �ow control on the hardware.Also, as observed with the quantum microinstruction set �MIS [1],the quantum instruction set architecture (QISA) design may su�erfrom limited scalability and �exibility because of microarchitecturalconstraints. It is an open challenge to design a scalable and �exibleQISA which provides a comprehensive abstraction of the quantumhardware.

In this paper, we propose an executable QISA, called eQASM,that can be translated from quantum assembly language (QASM),supports comprehensive quantum program �ow control, and is ex-ecuted on a quantum control microarchitecture. With e�cient tim-ing speci�cation, single-operation-multiple-qubit execution, anda very-long-instruction-word architecture, eQASM presents bet-ter scalability than �MIS. �e de�nition of eQASM focuses onthe assembly level to be expressive. �antum operations are con-�gured at compile time instead of being de�ned at QISA designtime. We instantiate eQASM into a 32-bit instruction set targeting aseven-qubit superconducting quantum processor. We validate ourdesign by performing several experiments on a two-qubit quantumprocessor.

1 INTRODUCTION�antum computing is promising with the potential to acceleratesolving some problems which are ine�ciently solved by classicalcomputers, such as quantum chemistry simulation [2, 3]. A near-term goal is to develop a fully programmable quantum computerbased on the circuit model with Noisy Intermediate-Scale �antum(NISQ) technology [4] without quantum error correction [5]. Tothis end, a viable way for both quantum so�ware and hardwareis to support the widely-used “quantum data, classical control”programming paradigm [6]. In this paradigm, the data �ow isthe state evolution subject to a sequence of classical or quantumoperations, and the control �ow is the order of operations that areexecuted, which could be directed by qubit measurement results.

It is embodied in a wide range of quantum applications, includingactive qubit reset [7], teleportation [8], non-deterministic quantumgate decomposition [9], iterative quantum phase estimation [10],etc.

�ough high-level quantum so�ware can support this paradigmwell, current quantum hardware cannot because of limited exe-cutability of existing low-level quantum assembly languages. As-sembly languages are introduced as human-readable representa-tion of the interface to hardware (or machine code). But exist-ing quantum assembly languages either incorporate too high-levelconstructs to be directly implemented by a microarchitecture (in-cluding QASM-HL [11], �il [12], f-QASM [13], etc.), or are toorestricted to provide a comprehensive abstraction of the quantumhardware which can support the required �ow control (includingOpenQASM [14], �MIS [1], etc.). �is fact signi�cantly limits therange of quantum so�ware which can be executed by the hardware.

Required is a scalable and �exible interface which can be ex-ecuted by the hardware to support the “quantum data, classicalcontrol” programming paradigm. Considering the constraints inmicroarchitecture implementation, designing such an interface ischallenged by the di�culty of (1) providing a comprehensive ab-straction of the quantum hardware [15] and (2) making the quantuminstruction set architecture (QISA) scalable and �exible.

1.1 Comprehensive Abstraction ChallengeExisting quantum so�ware, including quantum programming lan-guages [16], compilers [11, 17, 18], and quantum assembly lan-guages [11–14], can well describe both the data �ow and the con-trol �ow. �e basic constructs of control �ow include procedure,selection, loop, and recursion [6, 16]. Among them, selection andloop may use feedback based on qubit measurement results to selectwhich instructions to execute in the following. However, existingprogrammable quantum hardware mostly focuses on supportingthe data �ow. �ey cannot well support the control �ow becausethey lack programmable feedback with su�cient �exibility [19–21](though feedback has been demonstrated with customized hard-ware in multiple experiments [7, 22–24]).

�e di�culty of supporting programmable feedback in the hard-ware roots in the strict requirements on the electrical signals (e.g.,precise parameters and timing) used to control qubits. To satisfythese requirements, a three-step procedure is usually used: (i) de�n-ing waveforms in digital format that are long enough to include

In Proceedings of the 25th International Symposium on High-Performance Computer Architecture (HPCA’19)

arX

iv:1

808.

0244

9v3

[cs

.AR

] 9

Mar

201

9

Page 2: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii)converting these digital waveforms into analog ones at runtime bythe AWGs. Because of the computational complexity and commu-nication latency, step (i) and (ii) are performed at static time. �efact that waveforms are determined at static time makes runtimefeedback almost impossible.

Using digital waveforms only as the interface to quantum hard-ware is cumbersome and presents poor scalability and �exibility. Toaddress this issue, our previous work [1] proposed the quantum con-trol microarchitecture �MA, which implements the instruction-based waveform generation as an alternative. In this method, a setof short pulses in place of long waveforms are uploaded to AWGs,with each pulse representing a quantum operation. By executinginstructions in the quantum microinstruction set �MIS, desiredpulses are selected and triggered at runtime, which in principleprovides the foundation for runtime feedback. However, the timingof executing instructions is decoupled from that of pulse genera-tion by a set of FIFOs [1]. As a consequence, a classical instructionthat uses a qubit measurement result may start execution beforethe expected result is ready, or read another result when there aremultiple instructions measuring the same qubit, which can lead toa wrong execution result. Hence, it is a challenge to design a mech-anism that can correctly implement runtime feedback to supportcomprehensive quantum program �ow control.

1.2 Scalability & Flexibility ChallengesA scalability issue in QISA design was �rst observed with �MIS,which was implemented by the control microarchitecture �MA.Because quantum operations of a quantum application are appliedon qubits with particular timing, quantum instructions should befetched from the instruction memory and processed in time toensure the described operations can be applied with the correcttiming. Since every instruction can only encode a limited number ofquantum operations, it would require a minimum number (Rreq) ofinstructions to be issued and processed per cycle for this application.However, the microarchitecture can only issue a limited number(Rallowed) of instructions per cycle given a limited instruction issuerate. As the number of qubits grows, Rreq in general increases.When Rreq > Rallowed, the microarchitecture cannot execute thequantum program correctly. We call this problem the quantumoperation issue rate problem [1, 25, 26]. �e low instruction infor-mation density of �MIS contributes to a large Rreq, exaggeratingthis problem. A �MIS program has a relatively low instructioninformation density because (1) an explicit waiting instruction isrequired to separate any two consecutive timing points; (2) eachtarget qubit of a quantum operation occupies a �eld in the instruc-tion, making the instruction width a limitation for the number oftarget qubits in a single instruction; (3) two parallel and di�erentoperations cannot be combined into a single instruction. �e lowinstruction information density of �MIS contributes to a largeRreq. In our previous experiments, we found that the boundarycondition Rreq ≤ Rallowed cannot be satis�ed for some applicationswith only two qubits. Hence, how to design a QISA with a highinstruction information density forms a scalability challenge.

�ough being able to provide �exibility in directing the program�ow control at runtime, some QISA design su�ers from no quantumsemantics. For example, instructions of �MIS and the RaytheonBBN APS2 instruction set [27] are low level and tightly bound tothe electronic hardware implementation to ensure the executability.Compared to existing quantum assembly languages, these instruc-tions are microinstructions without explicit quantum semantics.�us, they do not qualify as a QISA, and it is another challenge todesign an executable QISA with �exible quantum semantics.

1.3 ContributionsIn this paper, we propose an executable QISA based on QASM,named executable QASM (eQASM). eQASM can be generated by thecompiler backend from a higher-level representation, like cQASM [28].eQASM contains both quantum instructions and auxiliary classicalinstructions to support quantum program �ow control. eQASMsupports a set of discrete quantum operations to enable a microar-chitecture implementation. �e contributions of the paper are thefollowing:

• Comprehensive quantum control �ow: eQASM proposestwo kinds of feedback with required microarchitectural mecha-nisms to implement them: fast conditional execution for simplebut fast feedback, and comprehensive feedback control (CFC) forarbitrary user-de�nable feedback. Based on this, eQASM cansupport comprehensive program �ow control required by the“quantum data, classical control” paradigm, and signi�cantlybroadens the range of quantum applications executable on hard-ware;• Operational implementation: eQASM is a QISA framework

with the de�nition focusing on the assembly level and the basicrules of mapping assembly to binary. It requires customizedinstantiation for the binary format targeting a particular plat-form, which allows the pursuit of �exibility and practicability inmicroarchitecture implementation;

• Increased instruction information density: eQASM adoptsSingle-Operation-Multiple-�bit (SOMQ) execution, a Very-Long-Instruction-Word (VLIW) architecture and a more e�cient methodfor explicit timing speci�cation, which can considerably alleviatethe quantum operation issue rate problem when compared to�MIS;

• Con�gurable QISA at compile time: As opposed to the clas-sical instruction set architecture (ISA) whose operations are de-�ned at ISA design time, eQASM enables the programmer tocon�gure allowed quantum operations at compile time, leavingample space for compiler-based optimization.

We instantiate eQASM into a 32-bit instruction set targeting aseven-qubit superconducting quantum processor and implement itusing a control microarchitecture derived from �MA as proposedin [1]. We validated eQASM by performing several experimentsover a two-qubit superconducting quantum processor using theimplemented microarchitecture.

�is paper is organized as follows. Section 2 introduces the het-erogeneous quantum programming model adopted by eQASM andan overview of eQASM. �e quantum instructions of eQASM withrelated mechanisms are explained in Section 3. Section 4 describes

2

Page 3: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

Fig. 1. Heterogeneous quantum programming and compila-tion model.

the instantiation of eQASM targeting a seven-qubit quantum proces-sor as well as its microarchitecture and implementation. Section 5shows the experiments, and Section 6 concludes.

2 EQASM OVERVIEWTo our understanding, it is viable to integrate quantum computingin a similar way as a GPU or an FPGA in a heterogeneous architec-ture. �e quantum part can be seen as an accelerator for particularclassically-hard tasks. �is section introduces the eQASM program-ming and compilation model, the design guidelines for eQASM, thearchitectural state, and an overview of instructions of eQASM.

2.1 Programming and Compilation ModelTo take advantage of both classical computing and quantum com-puting, eQASM is de�ned based on OpenCL [29], which is an openindustry standard for classical heterogeneous parallel computing.�e programming and compilation model of eQASM is shown inFig. 1.

A quantum-classical hybrid program contains a host programand one or more quantum kernels with the quantum kernel(s)accelerating particular parts of the computation. �e host programis described using a classical programming language, such as Pythonor C++, and the quantum kernels are described using a quantumprogramming language, such as Sca�old [30] or Q# [31]. A hybridcompilation infrastructure compiles the host program into classicalcode using a conventional compiler such as GCC, which is laterexecuted by the classical host CPU.

�e quantum compiler, such as OpenQL [1], compiles the quan-tum kernels in two steps. First, quantum kernels are compiled intoQASM, or a similar format mathematically equivalent to the cir-cuit model. �is format is hardware independent, which enableshigh-level optimizations and can be ported across di�erent plat-forms. Most of the hardware constraints are taken into account in

the second step, where the compiler performs qubit mapping andscheduling, and low-level optimization. �e output is the analogpulse con�guration for physical operations [32], the microcodede�ning QISA-level operations using the pulse-de�ned physicaloperations, and the quantum code consisting of eQASM instruc-tions. �e quantum code contains quantum instructions as well asauxiliary classical instructions to support comprehensive quantumprogram �ow control including runtime feedback [6, 16]. A�er thehost CPU has loaded the quantum code, microcode, and pulses intothe quantum processor, the quantum code can be directly executed.In the rest of this paper, we focus on the quantum processor, i.e., themicroarchitecture in charge of controlling qubits. �e interactionbetween the classical processor and the quantum processor is aresearch topic outside the scope of this paper.

2.2 Design Guidelines�e core requirement of eQASM is being executable on real hard-ware but not bound to a particular electronic control setup. �edesign of eQASM focuses on providing an comprehensive abstrac-tion at the architecture level which can support the “quantumdata, classical control” programming paradigm as well as somequantum experiments such as measuring the relaxation time ofqubits (T1 experiment). Also, because some experiments and radicalcompiler-based optimization techniques such as quantum optimalcontrol [33, 34] may use uncalibrated or uncommon quantum oper-ations, eQASM should support the usage of user-de�ned quantumoperations. In stark contrast to classical computation where timeis irrelevant to correctness, timing plays a key role in the controlover qubits, i.e., in the execution of algorithms and experiments.To ensure repeatability of quantum algorithms and experimentsand reduce the risk of bug �xes or updates in so�ware or hardwarewhere timing is critical, timing of operations can be exposed at thearchitectural level as suggested by [35]. �e design of eQASM isguided by �ve main principles:

(1) eQASM should include classical instructions to support com-prehensive quantum program �ow control including runtimefeedback;

(2) eQASM should contain well-de�ned methods to specify thetiming of quantum operations;

(3) eQASM should be simple to allow a straightforward microar-chitecture implementation;

(4) Low-level hardware information should be abstracted awayfrom the eQASM assembly as much as possible to avoid eQASMbeing stuck to a particular hardware implementation;

(5) �e quantum operation issue rate is a potential bo�leneck ofthe quantum microarchitecture, and should be addressed, e.g.,by densely encoding the instructions such as done with SIMDand VLIW for classical architectures;

(6) eQASM should be �exible to allow di�erent quantum operationsvia con�guration.

2.3 Architectural StateAs shown in Fig. 2, the architectural state of the quantum processorincludes:

3

Page 4: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

Qubit Measurement Result Registers

General Purpose Registers

Quantum Register

Data Memory Instruction Memory

Execution Flag Registers

Comparison Flags

Timing & Event Queues

Quantum Operation Target Registers

PC

Det

erm

inis

tic

Tim

ing

Dom

ain

Non

-det

erm

inis

tic

Tim

ing

Dom

ain

Timing Label Generator

Fig. 2. Architectural state of eQASM. Arrows indicates thepossible information �ow. �e thick arrows represent quan-tum operations, which read information from the modulespassed through.

2.3.1 Data Memory. �e data memory can bu�er intermedi-ate computation results and serve as the communication channelbetween the host CPU and the quantum processor.

2.3.2 Instruction Memory & Program Counter. �e eQASM in-structions are stored in the instruction memory, and the ProgramCounter (PC) contains the address of the next eQASM instructionto fetch. eQASM does not de�ne an instruction memory size or amemory hierarchy.

2.3.3 General Purpose Registers. �e general purpose register(GPR) �le is a set of 32-bit registers, labeled as Ri, where i is theregister address.

2.3.4 Comparison Flags. �e comparison �ags store the com-parison result of two general purpose registers which are used bycomparison and branch related instructions (see Table 1).

2.3.5 �antum Operation Target Registers. Each quantum op-eration target register can be used as an operand of a quantumoperation. Since most quantum technologies support physical oper-ations applied on up to two qubits, there are two types of quantumoperation target registers: single-qubit target registers for single-qubit operations [including measurement (MEASZ)], and two-qubittarget registers for two-qubit operations.

Each single- (two-)qubit target register can store the physicaladdresses of a set of qubits (allowed qubit pairs). An allowed qubitpair is a pair of qubits on which we can directly apply a physicaltwo-qubit gate. A single- (two-)qubit target register is labelled asSi (Ti), with i being the register address. eQASM does not de�nethe format of target registers (see Section 3.3 for a discussion).

2.3.6 Timing and Event�eues. To support explicit timing spec-i�cation of quantum operations, eQASM adopts a queue-basedtiming control scheme [1]. �e timing and event queues are used tobu�er timing points and operations generated from the execution

of quantum instructions (see Section 3.1). Together with the qubitmeasurement result registers, it separates the processor into twotiming domains, the deterministic one and the non-deterministicone.

2.3.7 �bit Measurement Result Registers. Each qubit measure-ment result register is 1-bit wide, and stores the result of the last�nished measurement instruction on the corresponding qubit whenit is valid (see Section 3.6). It is labeled as Qi, where i is the physicaladdress of the qubit.

2.3.8 Execution Flag Registers. Sometimes, the execution of aquantum operation depends on a simple combination of previousmeasurement results of this qubit [7, 22]. To this end, each qubit isassociated with an execution �ag register, which contains multiple�ags derived automatically by the microarchitecture from the lastmeasurement results of this qubit. �e execution �ag register �leis used for fast conditional execution (see Section 3.5).

2.3.9 �antum Register. �e quantum register is the collectionof all physical qubits inside the quantum processor. Each qubit isassigned a unique index, known as the physical address. Since datain qubits can be superposed, eQASM does not allow direct accessto the quantum data at the instruction level. Instead, users canmeasure qubits using measurement instructions and later accessthe results in the qubit measurement result registers.

2.4 Instruction Overview�antum technology is evolving rapidly and is still far away from astable state. To avoid the format of eQASM being stuck to a speci�cquantum technology implementation with particular properties, thede�nition of eQASM focuses on the assembly level and introducesbasic rules of mapping the assembly code to binary instructions.�e binary format is de�ned during the instantiation of eQASMtargeting a concrete control electronic setup and quantum chip. �isfact enables the eQASM assembly to be expressive while leavingconsiderable freedom to the (micro)architecture designer to pursuemicroarchitectural practicability and performance.

An eQASM program can consist of interleaved quantum in-structions and auxiliary classical instructions. An overview ofthe eQASM instructions is shown in Table 1. Since the host CPUcan provide classical computation power, auxiliary classical instruc-tions are simple instructions to support the execution of quantuminstructions. Complex instructions (e.g., �oating-point instructions)are not included. �e top part of Table 1 contains the auxiliary clas-sical instructions. �ere are four types: control, data transfer, logical,and arithmetic instructions. �ese are all scalar instructions. �efunction sign ext(Imm, 32) sign extends the immediate value Imm

to 32 bits. �e operator :: concatenates the two bit strings. �e CMP

instruction sets all comparison �ags based on the comparison resultof GPR Rs and Rt. �e BR instruction changes the PC to PC + Offset

if the speci�ed comparison �ag is ‘1’. To enable arithmetic or logicaloperations on the comparison result, the FBR instruction fetches thespeci�ed comparison �ag into GPR Rd. �e FMR instruction supportscomprehensive feedback control and is explained in Section 3.6.

�e bo�om part of Table 1 contains the quantum instructions.�ere are three types of instructions:• Waiting instructions used to specify timing points (QWAIT, QWAITR),

4

Page 5: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

Table 1. Overview of eQASM instructions.

Type Syntax Description

Control CMP Rs, Rt CoMPare GPR Rs and Rt and store the result into the comparison �ags.BR <Comp. Flag>, Offset (BRanch) Jump to PC + Offset if the speci�ed comparison �ag is ‘1’.

Data Transfer

FBR <Comp. Flag>, Rd (Fetch Branch Register) Fetch the speci�ed comparison �ag into GPR Rd.LDI Rd, Imm (LoaD Immediate) Rd = sign ext(Imm[19..0], 32).LDUI Rd, Imm, Rs (LoaD Unsigned Immediate) Rd = Imm[14..0]::Rs[16..0].LD Rd, Rt(Imm) (LoaD from memory) Load data from memory address Rt + Imm into GPR Rd.ST Rs, Rt(Imm) (STore to memory) Store the value of GPR Rs in memory address Rt + Imm.

FMR Rd, Qi(Fetch Measurement Result) Fetch the result of the last measurement instruction on qubiti into GPR Rd.

Logical AND/OR/XOR Rd, Rs, Rt

NOT Rd, RtLogical and, or, exclusive or, not.

Arithmetic ADD/SUB Rd, Rs, Rt Addition and subtraction.

Waiting QWAIT Imm

QWAITR Rs

(Quantum WAIT Immediate/Register) Specify a timing point by waiting for thenumber of cycles indicated by the immediate value Imm or the value of GPR Rs.

Target Specify SMIS Sd, <Qubit List>

SMIT Td, <Qubit Pair List>

(Set Mask Immediate for Single-/Two-qubit operations) Update the single- (two-)qubitoperation target register Sd (Td).

Q. Bundle [PI,] Q_Op [| Q_Op]* Applying operations on qubits a�er waiting for a small number of cycles indicated by PI.

• �antum operation target register se�ing instructions (SMIS,SMIT), and

• �antum bundle instructions, which consist of the speci�cationof a small waiting time and multiple quantum operations.

�ese quantum instructions have several features based on thefollowing four observations:

� Many quantum experiments, such as the T1 experiment, requirechanging the timing of operations explicitly. Also, the timingof operations can signi�cantly impact the �delity of the �nalresult as quantum errors accumulate during computation (seeSection 5). In the NISQ era, quantum errors accumulate as thecomputation is going on. As a consequence, the timing of opera-tions has a signi�cant impact on the �delity of the �nal compu-tation result (see Section 5). eQASM can explicitly specify thetiming of quantum operations to support quantum experimentsand compiler-based timing optimization. �is enables the pro-grammer to schedule and time the quantum operations to achievehigher �delity. �e timing model is explained in Section 3.1.

� Di�erent quantum experiments or algorithms may require adi�erent set of physical quantum operations. To allow usingdi�erent sets of quantum operations, quantum operations arespeci�ed by programmers at compile time via con�guration (seeSection 3.2) instead of being de�ned at QISA design time. �is�exibility reserves ample space for compiler-based optimization.Only single- and two-qubit operations are allowed, and more-qubit operations should be decomposed into single- and two-qubit operations by the compiler [17, 36–39].

� To alleviate the quantum operation issue rate problem, we canreduce the required instruction issue rate Rreq by increasing theinstruction information density at the architecture level, and/orincrease the available instruction issue rate Rallowed at the mi-croarchitecture level. At the architecture level, eQASM reducesRreq by adopting SOMQ execution, which supports applying a

single quantum operation on multiple qubits (Section 3.3), and aVLIW architecture, which can combine multiple di�erent quan-tum operations into a quantum bundle (Section 3.4). eQASMadopts the VLIW design because the microarchitecture imple-mentation can be much simpler compared to a superscalar de-sign. �e tradeo� is that the compiler needs to perform morescheduling over the instructions as suggested by [40]. �e mi-croarchitecture can also introduce multiple-issue mechanisms asclassical superscalar processors to increase Rallowed, which is outof the scope of this paper focusing on the architecture design.

� Two kinds of feedback are supported. Fast conditional executionperforms a Go/No-go decision for every single-qubit operationbased on a execution �ag of the target qubit (see Section 3.5).To be more �exible, CFC allows programmers to de�ne arbi-trary feedback by redirecting the program �ow based on themeasurement results (see Section 3.6).

3 ARCHITECTUREIn this section, we construct the assembly syntax of quantum oper-ations by introducing the aforementioned mechanisms.

3.1 Timing Model3.1.1 �eue-based Timing Control. eQASM adopts the queue-

based timing control scheme proposed in [1] since it can supportexplicit timing speci�cation. We brie�y introduce this scheme andrefer readers to the original paper for a detailed discussion.

In the queue-based timing control scheme, the execution of quan-tum instructions can be divided into a reserve phase in the non-deterministic timing domain and a trigger phase in the deterministictiming domain. A timeline is constructed by the reserve phase andconsumed by the trigger phase: the result of executing quantum in-structions in the reserve phase is consecutively creating new timing

5

Page 6: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

points on the timeline and associating events to them; the deter-ministic timing domain maintains a timer, and triggers all quantumoperations associated with the timing point on the timeline that itreaches. Auxiliary classical instructions and mask se�ing instruc-tions are not directly associated with timing points. �e triggerphase is handled by the microarchitecture; we introduce the reservephase in the following.

3.1.2 Timeline Construction. �antum instructions fetched fromthe instruction memory form a quantum instruction stream. In-structions in the stream are executed in order; this constructs atimeline by generating consecutive timing points and assigningoperations to them.

If the fetched instruction is a waiting instruction, QWAIT Imm orQWAITR Rs, a new timing point in the timeline is generated. �eposition of the new timing point is determined by the speci�cationof the interval since the last generated timing point. �e intervallength comes from the immediate value Imm or GPR Rs. �e �rsttiming point of the timeline can be set by a dedicated instruction,or by an external trigger to the microarchitecture. Both waitinginstructions use the unit cycle for the interval length.

If the fetched instruction is a quantum bundle instruction, thequantum operation(s) speci�ed in the bundle instruction is asso-ciated with the last generated timing point. If multiple quantumoperations are associated to the same timing point, these quantumoperations will all start execution at that same timing point.

Based on our observation over some testbenches (see Section 4.4),short intervals between timing points are a common case. To im-prove the quantum operation issue rate, eQASM allows merging aQWAIT PI instruction followed by a quantum operation <Quantum

Operation>

QWAIT PI

<Quantum Operation>

into a single instruction

[PI,] <Quantum Operation>.

Square brackets [. . .] indicate that the content inside is optional. PIis short for pre interval, which speci�es a short interval betweenlast generated timing point and the one when the operations inthis instruction are to be triggered. It defaults to 1 if not speci�ed.Value 0 is acceptable to both the PI and the waiting instructions,which means that the following timing point is identical to the lasttiming point.

3.1.3 Example. Assuming the durations of quantum operationsQ_OP0, Q_OP1, Q_OP2, and Q_OP3 all equal one-cycle time, the followingcode triggers these four operations back-to-back.� �1 LDI r0 , 1 # r0 <- 1

2 Q_OP0

3 Q_OP1 # Default PI = 1

4 QWAITR r0 # Register -valued waiting

5 0, Q_OP2

6 QWAIT 0 # Equivalent to NOP

7 1, Q_OP3 # Explicity PI = 1� �

3.2 �antum Operation De�nition & DecodingDepending on the qubit technology and the algorithm to run, dif-ferent quantum operations can be used. eQASM does not de�ne a�xed set of quantum operations at QISA design time, such as {H ,T , CNOT, · · · }. Instead, the available quantum operations can becon�gured by the programmer at compile time.

Flexible quantum operation con�guration is achieved throughthe con�guration of the assembler, the microcode unit and thepulse generator of the microarchitecture: on the one hand, theassembler is con�gured to translate a quantum operation, e.g., theX gate, to the expected opcode, e.g., 0x01; on the other hand, themicrocode unit translates the quantum opcodes into the expectedmicroinstruction(s) using a microcode-based instruction decodingscheme [41]. Each microinstruction represents one or more micro-operations, which are �nally converted into pulses by the pulsegenerator with precise timing applying operations on qubits. �eassembler, the microcode unit, and the pulse generator should becon�gured consistently at compile time.

3.3 Address MechanismA quantum operation applied on multiple qubits is a common case.For example, quantum computation usually starts by preparing thesuperposition state from initialized qubits, which requires applyingHadamard gates on multiple qubits. eQASM uses SOMQ execution,which can apply a single quantum operation on multiple qubitsat the same time. SOMQ is similar to classical single-instruction-multiple-data (SIMD) execution [42], with the operation targetreplaced by qubits. An instantiated eQASM can also be treated asan implementation of the previously proposed Multi-SIMD(k,d)architecture [40] but removing the assumption of SIMD regionsthat in each region only a single quantum operation can be applied.

SOMQ is based on an indirect qubit addressing mechanism. �eSMIS or SMIT instruction �rst de�nes a set of quantum operationtarget(s) in a quantum operation target register. �en a quantumoperation can use the target register as the operand:

<Operation Name> <Target Register>.

3.3.1 Address of Allowed�bit Pairs. Since a two-qubit opera-tion, such as a CNOT gate, can operate on its qubits di�erently, twoqubits with di�erent orders, i.e., (Qubit A,Qubit B) and (Qubit B,

Qubit A), are treated as di�erent allowed qubit pairs. �e termquantum chip topology indicates the available qubits and allowedqubit pairs of a quantum chip (see Fig. 6 for an example). �equantum chip topology can be represented as a graph where eachavailable qubit can be denoted as a vertex, and an allowed qubitpair as a directed edge. In the directed edge (Qubit A, Qubit B),Qubit A is called the source qubit and Qubit B the target qubit ofthe pair.

3.3.2 Translation from Assembly to Binary. Since the e�ciencyof encoding the qubit list (qubit pair list) may depend on the targetquantum chip topology, the designer can choose di�erent binaryencoding schemes for di�erent target quantum processors duringeQASM instantiation. In general, it is more e�cient to put theaddress pairs in the instruction for a highly-connected quantumprocessor, while a mask format could be more e�cient when the

6

Page 7: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

qubit connectivity is limited. For example, since at most two two-qubit gates can be applied and each qubit can be addressed with3 bits in a fully connected 5-qubit trapped ion processor [43], only2 × 2 × 3 bits = 12 bits are required to specify the target of a two-qubit gate. �is is more e�cient than a mask of 20 bits with eachbit in the mask indicating one of all 20 di�erent allowed qubit pairsselected or not. In contrast, a mask of 6 bits is more e�cient forthe IBM QX2 [44], which also contains �ve qubits but has only sixallowed qubit pairs.

3.3.3 Example. �e following code sets the single-qubit targetregister S7 to contain two qubits (0 and 1), and then applies an Xgate on both qubits simultaneously.� �1 SMIS S7 , {0, 1}

2 Y S7� ��e following code sets the two-qubit target register T3 to contain

two pairs of qubits (1, 3) and (2, 4), and then applies a CNOT gate onthem.� �1 SMIT T3 , {(1, 3), (2, 4)}

2 CNOT T3� �3.4 Very Long Instruction Word

3.4.1 �antum Bundle Format. Apart from SOMQ, di�erent op-erations are also allowed to be applied on di�erent qubits in parallel.eQASM can combine parallel quantum operations into a quantumbundle in a VLIW format. We de�ne parallel quantum operationsas operations starting at the same timing point, regardless of theduration of each operation. �e format of a quantum bundle is:

[PI,] <Quantum Operation> [| <Quantum Operation>]*

�e vertical bar | is used to separate di�erent quantum operationsin the same bundle. �e asterisk * means the item in square bracketscan repeat for n ≥ 0 times.

3.4.2 Translation from Assembly to Binary. In the assembly code,an arbitrary number of quantum operations can be combined intoa single quantum bundle. However, a single instruction can ac-commodate only a few quantum operations because of the limitedinstruction width. �e VLIW width of eQASM characterizes thenumber of quantum operations that can be put in a single instruc-tion word, which is de�ned during eQASM instantiation. Matchingthis, a single quantum bundle can be broken into multiple quantumbundle instructions with PI being 0. If the number of operations isnot a multiple of the VLIW width, quantum no-operations (QNOP)�ll up the last instruction. For example, given a VLIW width of 2,the bundle

PI, X S5 | H S7 | CNOT T3

can be decomposed by the assembler to two consecutive quantumbundle instructions

PI, X S5 | H S7

0, CNOT T3 | QNOP.

3.4.3 Example. In the code as shown in Fig. 3, the instructionQWAIT 10000 initializes both qubits by idling them for 200 µs (as-suming a cycle time of 20 ns). Line 6 applies a Y gate on both qubitsusing SOMQ. Line 7 is a VLIW instruction, which applies an X90

and X gate on each qubit. In this paper, X90 (Y90) denotes the gaterotating the quantum state along the x- (y-)axis by a π/2 angle.Xm90 (Ym90) denotes similar gates but with the rotation angle of−π/2. Line 8 measures both qubits using SOMQ. According to thePI value, the Y gate happens immediately a�er the initialization,followed by the X90 and X gates 20 ns later and the measurement40 ns later. �e 1 µs waiting time (line 9) ensures no operationshappening during the measurement.� �1 SMIS S0, {0}

2 SMIS S2, {2}

3 SMIS S7, {0, 2}

4 ...

5 QWAIT 10000

6 0, Y S7

7 1, X90 S0 | X S2

8 1, MEASZ S7

9 QWAIT 50

10 ...� �Fig. 3. Part of the code for a two-qubit AllXY experiment,which is used in validating eQASM in Section 5.

3.5 Fast Conditional ExecutionFast conditional execution allows executing or canceling a single-qubit operation when the micro-operation is triggered. �e decisionis made based on the value of a selected �ag in the execution �agregister corresponding to the target qubit. �e value of the exe-cution �ag is derived by the microarchitecture using prede�nedcombinatorial logic from the last measurement results of the samequbit. Once there returns a measurement result for a qubit, thecorresponding execution �ags are updated automatically. If theexecution �ag is ‘1’, then the operation executes; otherwise, it iscanceled. A selection signal is required for each micro-operation toselect which execution �ag to use, which can be generated by themicrocode unit, or speci�ed by an instruction �eld [45]. Except forthe default execution �ag that should always be ‘1’, which and howmany execution �ags there are, should be de�ned during eQASMinstantiation (see Section 4.2 for an example).

Example. In one instantiation of eQASM, the quantum operationC_X uses the execution �ag which is ‘1’ if and only if (i�) the lastmeasurement result of the qubit is |1〉. Figure 4 shows the code forthe active qubit reset experiment, where qubit 2 is put in an equalsuperposition using an X90 gate a�er initializing it in the |0〉 stateby idling it for 200 µs. A�er a measurement, a conditional C_X gateis applied to reset the qubit. �bit 2 is measured again to read outthe �nal state for veri�cation.

3.6 Comprehensive Feedback ControlCFC allows adjusting the program �ow based on measurementresults of any qubits to enable arbitrary user-de�ned feedback. �is�exibility comes at the cost of longer feedback latency. We proposea three-step mechanism to implement CFC:

7

Page 8: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

� �1 SMIS S2 , {2}

2 QWAIT 10000

3 X90 S2

4 MEASZ S2

5 QWAIT 50

6 C_X S2

7 MEASZ S2� �Fig. 4. eQASM program for active qubit reset. �is experi-mental result is shown in Section 5.

(1) A measurement instruction is applied on the condition qubiti. At the moment that this measurement instruction is issued,Qi is invalidated. At the moment the measurement result isavailable, it is wri�en in Qi. Qi turns back to valid if there areno more pending measurement instructions on qubit i.

(2) �e FMR Rd, Qi instruction fetches the value of the quantummeasurement result register Qi into GPR Rd. If Qi is invalid,FMR should wait until Qi gets valid again. �erea�er, the valueof Qi can be fetched into Rd. Qi remains valid until qubit i ismeasured again.

(3) GPR Rd is then used in a BR instruction to select the program�ow to follow. Note, multiple FMR and BR instructions can becombined to support more complex feedback logic.

Example. �e eQASM program shown in Fig. 5 �rst measuresqubit 1. If the measurement result is 1, a Y gate is applied on qubit0, otherwise, an X gate is applied.� �1 SMIS S0 , {0}

2 SMIS S1 , {1}

3 LDI R0 , 1

4 MEASZ S1

5 QWAIT 30

6 FMR R1 , Q1 # fetch msmt result

7 CMP R1 , R0 # compare

8 BR EQ, eq_path # jump if R0 == R1

9 ne_path:

10 X S0 # happen if msmt result is 0

11 BR ALWAYS , next # this flag is always `1'

12 eq_path:

13 Y S0 # happen if msmt result is 1

14 next:

15 ...� �Fig. 5. eQASM program using CFC.

4 INSTANTIATION & IMPLEMENTATION�is section introduces an instantiation, microarchitecture, andimplementation of eQASM.

4.1 Target Superconducting�antum Chip�e quantum chip topology of the target seven-qubit superconduct-ing quantum chip is shown in Fig. 6. It is part of a two-dimensional

2 3 4

0

5

1

6

Feedline0

Feedline1

08

19

2

10

311

412 5

13

614

7

15

Fig. 6. �antum chip topology of the target seven-qubit su-perconducting quantum chip. Numbers in red are the phys-ical addresses of qubits. �e numbers along the direct edgesare addresses of the allowed qubit pairs.

square la�ice as proposed in [46]. It can implement a distance-2 sur-face code [47], which can detect one physical error. In this �gure, avertex represents a qubit, and a directed edge represents an allowedqubit pair. Numbers besides the vertex (edge) are the addressesof qubits (allowed qubit pairs). For example, allowed qubit pair 0has qubit 2 as the source qubit and qubit 0 as the target qubit. �efeedlines are used to measure the nearby coupled qubits. �bit 0, 2,3, 5, and 6 (1 and 4) are coupled to feedline 0 (1). Each feedline hasan input port and an output port. Besides, each qubit is connectedto a microwave port and a �ux port, which are not shown in Fig. 6.

Operations supported by this quantum processor include mea-surements, single-qubit x- or y-axis rotations, and a two-qubitcontrolled-phase (CZ) gate. A typical gate time is 20 ns for single-qubit gates and ∼ 40 ns for two-qubit gates. �e duration of ameasurement is typically 300 ns - 1 µs. A cycle time of 20 ns is usedin this instantiation.

4.2 Instantiation Design Space ExplorationTo determine a suitable eQASM instantiation con�guration for thetarget quantum processor [a single- (two-)qubit gate time of 1 (2)cycle(s), and a measurement time of 15 cycles], we perform analy-sis over three benchmarks using a quantum control architecturesimulator derived from the previously proposed QPDO [48]. Be-cause substantial time is spent on calibrating qubits before runningapplications with NISQ technology, the �rst benchmark we selectis the widely-used calibration experiment randomized benchmark-ing (RB) [49, 50], which might be limited by the high memoryconsumption when the required waveform for control is plainlystored in memory. Each qubit is subject to 4096 single-qubit Clif-ford gates which have been decomposed into x and y rotations.Because every gate happens immediately following the previousone, randomized benchmarking cannot reveal timing pa�erns ofquantum operations in real quantum algorithms, where the par-allelism is limited by two-qubit gates. Addressing this, we alsoselect two benchmarks from Sca�CC [11] as the representativesof small-scale quantum algorithms that might be executed withNISQ technology: a parallel algorithm (Ising model using 7 qubits,IM) which has < 1% two-qubit gates, and a relatively sequential

8

Page 9: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

Fig. 7. Number of instructions for various architecture con-�gurations for randomized benchmarking (RB), Isingmodel(IM), and square root (SR).

algorithm (Grover’s algorithm to calculate the square root using8 qubits, which is the minimum number of qubits required, SR),which has ∼ 39% two-qubit gates. �e evaluation metric is the totalnumber of instructions.

We investigate the impact of the VLIW width (w), three timing-speci�cation methods, and SOMQ on the number of instructions.�e three timing-speci�cation methods include: the �MIS fashion(specifying every timing point using separate QWAIT instructions,ts1); including QWAIT in the quantum bundle instruction at the placeof a quantum operation (ts2); and using PI with various bit widths(wPI) to specify a small waiting time and using separate QWAIT in-structions to specify longer waiting times (ts3). �e simulationresults are shown in Fig. 7.

Con�g 1 is (ts1, no PI, no SOMQ), and Con�g 1 with w = 1 ischosen as the baseline. By increasing w from 1 to 4, the numberof instructions can be reduced up to 62% (RB). Benchmarks withsubstantial parallelism (RB and IM) bene�t more from a big w . �einstruction reduction in SR (∼ 8%) indicates that large w slightlyimproves quantum applications with limited parallelism.

Con�g 2 is (ts2, no PI, no SOMQ). A minimum w of 2 is requiredby ts2 to distinguish it from ts1. Compared with Con�g 1, by includ-ing the QWAIT operation as part of a quantum bundle instruction,Con�g 2 can reduce the number of instructions by 20 - 33% (RB),24 - 45% (IM), 43 - 50% (SR) by varying w from 2 to 4. SR bene�tsmost because of two reasons. First, due to its sequential nature, ithas relatively more QWAIT instructions. Second, limited parallelismin this algorithm leaves potential VLIW slots unused, which can be�lled by QWAIT instructions.

Con�g 3/4/5/6 is (ts3, wPI = 1/2/3/4, no SOMQ). Con�g 3 canreduce the number of instructions by 13 - 33% for RB and 28 -44% for IM with w varying from 1 to 4 compared with Con�g 1.Since the intervals between operations in RB and IM are mostlyclose to 1, further increasing wPI up to 4 bits introduces marginal

1 6 5 13 70 opcode Sd Imm

SMIS Dst SReg Qubit Mask

1 6 5 4 160 opcode Td Imm

SMIT Dst TReg Qubit Pair Mask

1 6 5 200 opcode Imm

QWAIT Wait time

1 6 5 5 150 opcode Rs

QWAITR Src GPR

1 9 5 9 5 31 q opcode Si/Ti q opcode Si/Ti PI

quantum operation 0 quantum operation 1

Fig. 8. Format of the SMIS and SMIT (top two), QWAIT and QWAITR

(middle two), and quantum bundle (bottom) instruction.

bene�t. Con�g 3 reduces the number of instructions of SR by∼ 17% regardless of w . Further increasing wPI to 3 or 4 bits canreduce the number of instructions of SR by up to 48%. Like SR,quantum algorithms are scheduled to be executed in a time as shortas possible. �is result of Con�g 3-6 suggests that most of thewaiting time is short and can be encoded in a 3-bit PI �eld. Notethat Con�g 3/4/5 is also more bene�cial than Con�g 2 when w = 1or w = 2.

Con�g 7/8/9/10 is (ts3, wPI = 1/2/3/4, SOMQ). Our analysisassumes that the target registers can always provide the requiredqubit (pair) list, and therefore shows the theoretical maximumbene�t that can be obtained by SOMQ. Compared to Con�g 3/4/5/6,SOMQ can introduce a maximum reduction of 42% (Con�g 8,w = 2)in the number of instructions for RB, while it can only reduce atmost 4% instructions for SR (Con�g 8, w = 1). Regardless of wPI,SOMQ can help reduce the number of instructions of IM by ∼ 24,19, 9, and 2% for di�erent w . �is fact suggests that SOMQ ismore e�ective for highly parallel applications, especially whenw is small. An application that would bene�t signi�cantly fromSOMQ is quantum error correction, which requires performing well-pa�erned error syndrome measurements repeatedly presentinghigh parallelism. As not shown in the �gure, we also analyzed thenumber of e�ective quantum operations in each quantum bundlefor Con�g 9, which is 1.795, 2.296, and 3.144 for RB, 1.485, 1.622, and1.623 for IM, and 1.118, 1.147, and 1.147 for SR withw varying from2 to 4, respectively. It indicates that with the existence of SOMQ,w > 2 is not highly required for many quantum applications (RB isa special case with extreme parallelism).

As a result of the analysis, our eQASM instantiation adopts Con-�g 9 (ts3, wPI = 3, SOMQ) with w = 2. A width of 32 bits is usedby all instructions for the memory alignment. Two instructionformats are used: the single format with the highest bit being ‘0’and the bundle format with the highest bit being ‘1’. Single formatinstructions use the other 31 bits to encode a single instruction, in-cluding all auxiliary classical instructions, and SMIS, SMIT, QWAIT(R)instructions. For brevity, we only present the format of quantuminstructions as shown in Fig. 8.

�ere are 32 single- (two-)qubit target registers, and the targetregister address width is 5 bits. �e target registers use a maskformat. �e mask is 7- (16-)bit wide in the single- (two-)qubittarget register. Each bit in the mask of the value ‘1’ indicates that

9

Page 10: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

Quantum PipelineQubit Measurement Result Register

Fas

t Con

ditio

nal E

xecu

tion

Timing Control Unit

Add

r. Lo

gic

...

Add

r. Lo

gic

...

Event k Queue 2

Event k Queue 1

Event k Queue n

...

Event 1 Queue 1

Event 1 Queue n

...

Timing Queue

Tim

ing

Con

trol

ler

VLIW pipelane

Qua

ntum

Mic

roin

stru

ctio

n B

uffe

r

Target Registers

Timestamp Manager

Microcode Unit

Q Control Store

Ope

ratio

n C

ombi

natio

n

Inst

ruct

ion

Mem

ory

Qua

ntum

Inst

ruct

ion

Dec

oder

Qub

its

Cla

ssic

al P

ipel

ine

Cod

ewor

d Tr

igge

red

Pul

se G

ener

atio

nM

easu

rem

ent

Dis

crim

inat

ion

.........

...

ADI

Dev

ice

Eve

nt D

istr

ibut

or

Dat

a M

emor

y

Non-deterministic Timing Domain Deterministic Timing Domain

Synchronization Clock

Fig. 9. �antummicroarchitecture implementing the instantiated eQASM for the seven-qubit superconducting quantum pro-cessor.

the corresponding qubit (allowed qubit pair) is selected. In theQWAIT(R) instruction, only the least signi�cant 20 bits of the Imm

�eld or GPR Rs are used to specify the waiting time. In the quantumbundle instruction, each quantum operation occupies 14 bits andthe q opcode is 9 bits.

4.3 Microarchitecture�MIS is implemented by the control microarchitecture �MAwith codeword-based event control, queue-based event timing con-trol and multi-level instruction decoding [1]. Adopting these threemechanisms, we redesign a quantum control microarchitecture,�MA v2, implementing the instantiated eQASM as shown inFig. 9. It supports all features of eQASM. �e classical pipelinemaintains the PC and implements the GPR �le and the compar-ison �ags. �e execution �ag register is maintained by the fastconditional execution module. �e classical pipeline fetches andprocesses instructions one by one from the instruction memory.All auxiliary classical instructions are processed by the classicalpipeline while quantum instructions are forwarded to the quantumpipeline for further processing.

�e timestamp manager processes the QWAIT(R) instructions andthe PI �eld to generate timing points. �e quantum pipeline con-tains a VLIW front end with two VLIW lanes, each lane processingone quantum operation. �e SMIS (SMIT) instructions update thecorresponding target registers in each VLIW lane. Inside eachVLIW lane, the q opcode is translated by the microcode unit intoone micro-operation (labeled as µ ops) for a single-qubit opera-tion or two micro-operations (labeled as µ opsrc and µ optgt) for a

two-qubit operation. µ opsrc (µ optgt) will be applied on the source(target) qubit of the target qubit pair. �e con�guration of the mi-crocode unit is stored in the Q control store, which is implementedusing a lookup table. �e target register Si (Ti) is read for a single-(two-)qubit operation.

�e quantum microinstruction bu�er resolves the mask-basedqubit address and associates the quantum operations to the lastgenerated timing point. It resolves the qubit address in two steps.

First, the mask stored in Si (Ti) is translated into seven two-bitmicro-operation selection signals OpSeli , where i = 0, 1, · · · , 6,with each signal for one qubit. Table 2 lists the meaning of everycase of the micro-operation selection signal. For single-qubit op-

Table 2. De�nition of the micro-operation selection signal.Value Operation to Select Value Operation to Select‘00’ None ‘10’ µ optgt‘01’ µ opsrc ‘11’ µ ops

erations, OpSeli is set to ‘11’ (‘00’) if the i-th bit in the mask is ‘1’(‘0’). For a two-qubit operation, OpSeli is set to ‘00’ if qubit i is notcontained in any selected allowed qubit pair. Otherwise, OpSeliis ‘01’ (‘10’) if the target qubit pair contains qubit i as the source(target) qubit. Take qubit 0 as an example. It is connected to edges0, 1, 8, and 9. When edge 0 or 9 (1 or 8) is selected in the mask, qubit0 is the target (source) qubit and should be applied with µ optgt(µ opsrc). In other words, OpSel0 should be ‘10’ (‘01’), and can begenerated using a simple OR (_) logic:

OpSel0 = (Ti[0] _ Ti[9]) :: (Ti[1] _ Ti[8]).10

Page 11: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

�e assembler should check the validity of two-qubit target registervalues. For example, it is invalid if two edges connecting to thesame qubit are selected in the same T register.

Second, based on OpSeli , either none or one micro-operation isoutput for qubit i . �is step is fully parallel.

�e operation combination module also works in a two-stepfashion. First, since each VLIW lane outputs none or one micro-operation for each qubit, the operation combination module mergesboth micro-operations from both VLIW lanes. If both VLIW lanesoutput one micro-operation on the same qubit, an error is raised,and the quantum processor stops. Second, as explained in Sec-tion 3.4, a long quantum bundle requires multiple quantum bundleinstructions to describe it. �e operation combination modulebu�ers all micro-operations associated with the same timing point.Only when it detects that all quantum operations in the same quan-tum bundle have been collected, the operation combination modulesends the bu�ered micro-operations to the device event distributor.�is detection can be done, e.g., by recognizing a new timing pointgenerated by the timestamp manager which is di�erent to the oneassociated to the bu�ered micro-operations. Also, if two di�erentquantum bundle instructions specify a quantum operation on thesame qubit, an error is raised, and the quantum processor stops.

As shown in Section 4.4, operating a qubit may require thecollaboration of multiple electronic devices in the analog-digital-interface, and a single device may also control multiple qubits.Hence, the micro-operations should be reorganized into deviceoperations to trigger the corresponding devices. �e device eventdistributor reorganizes multiple micro-operations associated withthe same timing label into di�erent device operations. A�er that,each device operation with the associated timing label is bu�eredat an event queue of the timing control unit awaiting execution.�e timing controller then triggers every device operation at itsexpected timing point.

A�er the device operations have been triggered by the timingcontroller, fast conditional execution is performed based on theselected execution �ags of the target qubits. �e execution �agselection signal comes from the microcode unit con�gured by theprogrammer. Only device operations for qubits of which the se-lected execution �ag is ‘1’ are released to the analog-digital inter-face (ADI). In this eQASM instantiation, four types of combinatoriallogic are used to de�ne the execution �ags:

(1) ‘1’ (the default for unconditional execution);(2) ‘1’ i� the last �nished measurement result is |1〉;(3) ‘1’ i� the last �nished measurement result is |0〉;(4) ‘1’ i� the last two �nished measurements get the same result.

Note, the last �nished measurement result refers to the result of thelast �nished measurement instruction on this qubit when these �agsare used. It is irrelevant to the validity of the quantum measurementresult register. Once there returns a measurement result for a qubitfrom the analog-digital interface, the fast conditional executionunit immediately update the execution �ags corresponding to thatqubit.

To support CFC, a counter Ci is a�ached to each qubit measure-ment result register Qi, with an initial value of 0. Once a mea-surement instruction acting on qubit i is issued from the classical

Fig. 10. Hardware structure implementing the instantiatedeQASM for the seven-qubit superconducting quantum pro-cessor. �in (thick) lines represent digital (analog) signals.

pipeline to the quantum pipeline, Ci increments by 1. If the mea-surement discrimination unit writes back a measurement result forqubit i , Ci decrements by 1. Qi is valid only when Ci is 0. If Ci is not0 when the instruction FMR Rd, Qi is issued, the pipeline is stalleduntil Ci is 0. In this way, it is ensured that the instruction FMR Rd,

Qi always fetches the result of the last measurement instructionacting on qubit i .

4.4 Implementation�e hardware structure implementing the microarchitecture (Fig. 10)consists of a Central Controller responsible for orchestrating threemodules containing slave devices for microwave control, �ux con-trol, and measurement.

�e Central Controller is a digital device built with an Intel AlteraCyclone V SOC 5CSTFD6D5F31I7N Field Programmable Gate Array(FPGA) chip. �e Central Controller implements the digital partof the microarchitecture (le� to the ADI in Fig. 9). �e timingcontroller and fast conditional execution module work at 50 MHzto get a cycle time of 20 ns. �e other parts work at 100 MHz.

Single-qubit x and y rotations are performed by applying mi-crowave pulses to the qubits. �e pulses are generated by Zurich In-struments High Density Arbitrary Waveform Generators (HDAWG)and modulated using a Rohde & Schwarz (R&S) SGS100A microwavesource. A custom-built vector switch matrix (VSM) is responsiblefor duplicating and routing the pulses to the respective qubits aswell as tailoring the waveforms to the individual qubits [51] usinga qubit-frequency reuse scheme that allows for e�cient scaling ofthe microwave control module [46].

Flux pulses that implement two-qubit CZ gates and single-qubitz rotations are performed by applying pulses generated by anHDAWG on the dedicated �ux lines for each qubit.

�e measurement discrimination unit is implemented using twoZurich Instruments Ultra-High-Frequency �antum Controllers(UHFQC) connected to the two feedlines shown in Fig. 6. �eUHFQC has two analog outputs that can be used to generate themeasurement pulses and two analog inputs to sample the trans-mi�ed signals from which the UHFQC can infer the measurementresult. �e measurement pulses going to (coming from) the qubits

11

Page 12: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

are modulated (demodulated) using a single R&S SGS100A. Allanalog ports operate at 1.8 GSa/s allowing for simultaneous mea-surement of up to 9 qubits per feedline using frequency multiplexingtechniques [52].

�e Central Controller connects to the UHFQCs and HDAWGsvia a 32-bit digital interface working at 50 MHz. Since measurementresults are sent from the UHFQC to the Central Controller, 16 bits ofthe connection are sent from the Central Controller to the UHFQCand the other 16 bits the other way around. All operations onUHFQCs and HDAWGs are codeword triggered. �e routing ofmicrowave pulses by the VSM is controlled through seven digitalsignals with a sampling rate of 400 MSa/s.

5 EXPERIMENTSince the target seven-qubit quantum chip is still under test at thetime of writing, we replaced the quantum chip of this microarchi-tecture with a two-qubit superconducting quantum processor tovalidate the eQASM design. �e two qubits are interconnected andcoupled to a single feedline. A con�guration �le is used to specifythe quantum chip topology with the two qubits renamed as qubit0 and 2. It is used by the quantum compiler and the assembler.eQASM programs used to perform the experiments as described be-low are all compiled from OpenQL descriptions with correspondingquantum operation con�guration.

We �rst used eQASM to perform some single-qubit calibrationexperiments which utilize uncalibrated operations. For example,the Rabi oscillation [53] applies an x-rotation pulse on the qubita�er initialization and then measures it. A sequence of �xed-lengthx-rotation pulses with variable amplitudes are used. Each pulse inthe sequence is uploaded to the codeword triggered pulse genera-tion unit of the microarchitecture and con�gured to be an opera-tion X_Amp_i in eQASM. As a result, this experiment calibrated theamplitude of the X gate pulse. Together with other experiments,the �delity of single-qubit quantum operations used later reached99.90% as measured in the following RB experiment. It is worthmentioning that we observed considerable speedup in performingthese experiments with the eQASM control paradigm in practice.

eQASM is then con�gured to include single-qubit gates {I ,X ,Y ,X90,Y90,Xm90,Ym90} and a two-qubit CZ gate for the followingexperiments. �e AllXY experiment is typically used to calibratesingle-qubit gates. In AllXY, pairs of single-qubit gates are chosenfrom the set {I ,X ,Y ,X90,Y90} and applied in such a way that theexpected measurement outcomes produce a characteristic staircasepa�ern that is highly sensitive to gate errors (red line in Fig. 3). Inthe two-qubit AllXY experiment, the control pulses are applied oneach qubit simultaneously. �e sequence is modi�ed to distinguishthe qubits on which it is applied: each gate pair in the sequenceis repeated on the �rst qubit while the entire sequence is repeatedon the second qubit. �e �delity of qubit to the |1〉 state can beextracted by averaging the measurement results for each gate pairover N rounds and correcting for readout errors. �e eQASM pro-gram for one routine of this experiment is shown in Fig. 3. Figure 11shows the �nal measurement result of the entire experiment (bluedots), which matches well with the expectation (red line). �isdemonstrates that the timing control, SOMQ, and VLIW of eQASMwork properly in the experiment.

0 10 20 30 40Gate Pair Combination

0.0

0.2

0.4

0.6

0.8

1.0

F |1

Qubit 0

0 10 20 30 40Gate Pair Combination

Qubit 2

Fig. 11. Two-qubitAllXY result, corrected for readout errors.

0 250 500 750 1000 1250 1500 1750 2000Number of Clifford Gates

0.1

0.0

0.1

0.2

0.3

0.4

0.5

0.6

F |1

(320ns)=0.71%(160ns)=0.35%(80ns)=0.20%(40ns)=0.12%(20ns)=0.10%

320 ns160 ns80 ns40 ns20 ns

0 20 40 60 80 100

0.00

0.05

0.10

0.15

Fig. 12. Single-qubit randomized benchmarking results fordi�erent intervals between gates. Dashed line indicates a10% error rate for visual reference.

To evaluate the impact of the timing of operations on the errorrate, we use single-qubit randomized benchmarking, a techniquethat can estimate the average error rate for a set of operations undera very general noise model [49, 50]. In this experiment, a sequenceof k random Cli�ord gates are applied on a qubit initialized in the|0〉 state. Before measurement, a Cli�ord is chosen that inverts allpreceding operations so that the qubit should end up in the |0〉 statewith survival probability p(k). By performing this experiment fordi�erent k and averaging over many randomizations, the Cli�ord�delity FCl can be extracted from the exponential decay. Becauseeach Cli�ord gate is decomposed into primitive x- and y-rotationsthe gate count is increased by 1.875 on average. �e average errorrate per gate, ϵ , is then calculated as ϵ = 1 − F 1/1.875

Cl .Single-qubit randomized benchmarking was performed for dif-

ferent intervals between the starting points of consecutive gates(320, 160, 80, 40, and 20 ns). As shown in Fig. 12, the average errorper gate decreases by a factor of ∼ 7, from 0.71% to 0.10% whendecreasing the interval from 320 ns to 20 ns. �is demonstrates thesigni�cant impact of timing on the �delity of the �nal computationresult, which substantiates the requirement of explicit speci�cationof timing at QISA level to enable platform-speci�c optimizationand especially scheduling by the compiler.

Fast conditional execution is veri�ed by the active qubit resetexperiment with qubit 2 using the code as shown in Fig. 4. We �ndthe probability of measuring the qubit in the |0〉 state a�er condi-tionally applying the C_X gate to be 82.7%, limited by the readout�delity. We veri�ed CFC by connecting the Central Controller andthe UHFQC. �e eQASM program used is shown in Fig. 5. �e

12

Page 13: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

UHFQC is programmed to generate alternative mock measurementresults for qubit 0. �e alternation between X and Y operationsis veri�ed by detecting the output digital signals using an oscillo-scope. We also measured the feedback latency of fast conditionalexecution and CFC, which are ∼ 92 ns and ∼ 316 ns, respectively.�e feedback latency is de�ned as the time between sending themeasurement result into the Central Controller and receiving thedigital output based on the feedback from the Central Controller.

As a proof of concept of performing quantum algorithms usingeQASM, we executed a two-qubit Grover’s search algorithm [54, 55].�e algorithmic �delity, i.e., correcting for readout in�delity, isfound to be 85.6% using quantum tomography with maximumlikelihood estimation. �is �delity is limited by the CZ gate.

6 CONCLUSIONIn this paper, we have proposed eQASM, a QISA that can be directlyexecuted on a control microarchitecture a�er instantiation. Withruntime feedback, eQASM supports full quantum program �owcontrol at the (micro)architecture level [6, 16]. With e�cient tim-ing speci�cation, SOMQ execution, and VLIW architecture, eQASMalleviates the quantum operation issue rate problem, presentingbe�er scalability than �MIS. �antum operations in eQASM canbe con�gured at compile time instead of QISA design time, whichcan support uncalibrated or uncommon operations, leaving am-ple space for compiler-based optimization. Low-level hardwareinformation mainly appears in the binary of a particular eQASMinstantiation, which makes eQASM assembly expressive. It is worthnoting that by removing the timing information in the eQASM de-scription, the quantum semantics of the program can be kept andfurther converted into another executable format targeting anotherhardware platform.

As validation, eQASM was instantiated into a 32-bit instructionset targeting a seven-qubit superconducting quantum chip, andimplemented using a quantum microarchitecture. eQASM was ver-i�ed by several experiments with this microarchitecture performedon a two-qubit chip. �e e�ciency improvement observed in usingeQASM to control quantum experiments broadens the scope ofapplication of quantum assemblies.

Future work will include performing verifying comprehensivefeedback control with qubits and controlling the originally tar-geted seven-qubit superconducting quantum processor with theimplemented microarchitecture. Also, it will be interesting to in-stantiate eQASM to control other quantum processors, includingsuperconducting quantum processors with a di�erent quantumchip topology, and altogether di�erent quantum hardware, such asspins in quantum dots [56], nitrogen vacancy centers [57].

ACKNOWLEDGEMENTSWe thank C. C. Bultink, L. Lao, and M. Zhang for discussionswhile designing and implementing eQASM, S. Pole�o, N. Haider,D. J. Michalak, and A. Bruno for designing and fabricating thetwo-qubit quantum chip used in experiments, T. van der Laan, andJ. Veltin for project management when implementing the CentralController, Y. Shi for discussions on feedback control. We acknowl-edge funding from the China Scholarship Council (X. Fu), IntelCorporation, an ERC Synergy Grant, and the O�ce of the Director

of National Intelligence (ODNI), Intelligence Advanced ResearchProjects Activity (IARPA), via the U.S. Army Research O�ce grantW911NF-16-1-0071. �e views and conclusions contained hereinare those of the authors and should not be interpreted as necessarilyrepresenting the o�cial policies or endorsements, either expressedor implied, of the ODNI, IARPA, or the U.S. Government. �e U.S.Government is authorized to reproduce and distribute reprints forGovernmental purposes notwithstanding any copyright annotationthereon.

REFERENCES[1] X. Fu, M. A. Rol, C. C. Bultink, J. van Someren, N. Khammassi, I. Ashraf, R. F. L.

Vermeulen, J. C. de Sterke, W. J. Vlothuizen, R. N. Schouten, C. G. Almudever,L. DiCarlo, and K. Bertels, “An experimental microarchitecture for a super-conducting quantum processor,” in Proceedings of the 50th Annual IEEE/ACMInternational Symposium on Microarchitecture. ACM, 2017, pp. 813–825.

[2] R. P. Feynman, “Simulating physics with computers,” International Journal of�eoretical Physics, vol. 21, pp. 467–488, 1982.

[3] I. Kassal, J. D. Whit�eld, A. Perdomo-Ortiz, M.-H. Yung, and A. Aspuru-Guzik,“Simulating chemistry using quantum computers,” Annual Review of PhysicalChemistry, vol. 62, pp. 185–207, 2011.

[4] J. Preskill, “�antum computing in the NISQ era and beyond,” arXiv:1801.00862,2018.

[5] B. M. Terhal, “�antum error correction for quantum memories,” Reviews ofModern Physics, vol. 87, p. 307, 2015.

[6] P. Selinger, “Towards a quantum programming language,” Mathematical Struc-tures in Computer Science, vol. 14, pp. 527–586, 2004.

[7] D. Riste, C. C. Bultink, K. W. Lehnert, and L. DiCarlo, “Feedback control of asolid-state qubit using high-�delity projective measurement,” Physical ReviewLe�ers, vol. 109, p. 240502, 2012.

[8] C. H. Benne�, G. Brassard, C. Crepeau, R. Jozsa, A. Peres, and W. K. Woo�ers,“Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels,” Physical Review Le�ers, vol. 70, p. 1895, 1993.

[9] A. Paetznick and K. M. Svore, “Repeat-Until-Success: Non-deterministic decom-position of single-qubit unitaries,” �antum Information & Computation, vol. 14,pp. 1277–1301, 2014.

[10] A. Y. Kitaev, “�antum measurements and the Abelian stabilizer problem,” inElectronic Colloquium on Computational Complexity, 1996.

[11] A. JavadiAbhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T. Chong, andM. Martonosi, “Sca�CC: Scalable compilation and analysis of quantum pro-grams,” Parallel Computing, vol. 45, pp. 2–17, 2015.

[12] R. S. Smith, M. J. Curtis, and W. J. Zeng, “A practical quantum instruction setarchitecture,” arXiv:1608.03355, 2016.

[13] S. Liu, X. Wang, L. Zhou, J. Guan, Y. Li, Y. He, R. Duan, and M. Ying, “Q |SI〉: Aquantum programming environment,” arXiv:1710.09500, 2017.

[14] A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambe�a, “Open quantumassembly language,” arXiv:1707.03429, 2017.

[15] F. T. Chong, D. Franklin, and M. Martonosi, “Programming languages and com-piler design for realistic quantum hardware,” Nature, vol. 549, p. 180, 2017.

[16] M. Ying, Foundations of �antum Programming. Morgan Kaufmann, 2016.[17] D. Wecker and K. M. Svore, “LIQUi | 〉: A so�ware design architecture and domain-

speci�c language for quantum computing,” arXiv:1402.4467, 2014.[18] D. S. Steiger, T. Haner, and M. Troyer, “ProjectQ: an open source so�ware frame-

work for quantum computing,” �antum, vol. 2, 2018.[19] IBM Q team, “IBM �antum Experience,” h�ps://www.research.ibm.com/ibm-q/,

2018.[20] Rige�i, “Rige�i Forest,” h�ps://www.rige�i.com/forest, 2018.[21] Alibaba, “Alibaba �antum Computing Cloud,” h�p://quantumcomputer.ac.cn,

2018.[22] C. C. Bultink, M. A. Rol, T. E. O’Brien, X. Fu, B. Dikken, R. Vermeulen, J. C.

de Sterke, A. Bruno, R. N. Schouten, and L. DiCarlo, “Active resonator reset inthe nonlinear dispersive regime of circuit QED,” Physical Review Applied, vol. 6,p. 034008, 2016.

[23] N. Ofek, A. Petrenko, R. Heeres, P. Reinhold, Z. Leghtas, B. Vlastakis, Y. Liu,L. Frunzio, S. Girvin, L. Jiang et al., “Extending the lifetime of a quantum bit witherror correction in superconducting circuits,” Nature, vol. 536, p. 441, 2016.

[24] L. Hu, Y. Ma, W. Cai, X. Mu, Y. Xu, W. Wang, Y. Wu, H. Wang, Y. Song, C. Zou,S. M. Girvin, L.-M. Duan, and L. Sun, “Demonstration of quantum error correctionand universal gate set on a binomial bosonic logical qubit,” arXiv:1805.09072,2018.

[25] X. Fu, M. A. Rol, C. C. Bultink, J. van Someren, N. Khammassi, I. Ashraf, R. F. L.Vermeulen, J. C. de Sterke, W. J. Vlothuizen, R. N. Schouten, C. G. Almudever,L. DiCarlo, and K. Bertels, “A microarchitecture for a superconducting quantumprocessor,” IEEE Micro, vol. 38, pp. 40–47, 2018.

13

Page 14: eQASM: An Executable Quantum Instruction Set …all operations of the quantum application, (ii) uploading the digi-tal waveforms to arbitrary waveform generators (AWG), and (iii) converting

[26] S. S. Tannu, Z. A. Myers, P. J. Nair, D. M. Carmean, and M. K. �reshi, “Tamingthe instruction bandwidth of quantum computers via hardware-managed errorcorrection,” in Proceedings of the 50th Annual IEEE/ACM International Symposiumon Microarchitecture. IEEE/ACM, 2017, pp. 679–691.

[27] C. A. Ryan, B. R. Johnson, D. Riste, B. Donovan, and T. A. Ohki, “Hardware fordynamic quantum computing,” arXiv:1704.08314, 2017.

[28] N. Khammassi, G. G. Guerreschi, I. Ashraf, J. W. Hogaboam, C. G. Almudever,and K. Bertels, “cQASM v1.0: Towards a common quantum assembly language,”arXiv:1805.09607, 2018.

[29] J. E. Stone, D. Gohara, and G. Shi, “OpenCL: A parallel programming standard forheterogeneous computing systems,” Computing in Science & Engineering, vol. 12,pp. 66–73, 2010.

[30] A. J. Abhari, A. Faruque, M. J. Dousti, L. Svec, O. Catu, A. Chakrabati, C.-F.Chiang, S. Vanderwilt, J. Black, and F. Chong, “Sca�old: �antum programminglanguage,” Princeton University, Technical Report, 2012.

[31] K. Svore, A. Geller, M. Troyer, J. Azariah, C. Granade, B. Heim, V. Kliuchnikov,M. Mykhailova, A. Paz, and M. Roe�eler, “Q#: Enabling scalable quantum com-puting and development with a high-level DSL,” in Proceedings of the Real WorldDomain Speci�c Languages Workshop 2018. ACM, 2018, p. 7.

[32] D. C. McKay, T. Alexander, L. Bello, M. J. Biercuk, L. Bishop, J. Chen, J. M.Chow, A. D. Corcoles, D. Egger, S. Filipp, J. Gomez, M. Hush, A. Javadi-Abhari,D. Moreda, P. Nation, B. Paulovicks, E. Winston, C. J. Wood, J. Woo�on, andJ. M. Gambe�a, “Qiskit backend speci�cations for OpenQASM and OpenPulseexperiments,” arXiv:1809.03452, 2018.

[33] J. Werschnik and E. Gross, “�antum optimal control theory,” Journal of PhysicsB: Atomic, Molecular and Optical Physics, vol. 40, p. R175, 2007.

[34] N. Leung, M. Abdelhafez, J. Koch, and D. Schuster, “Speedup for quantum opti-mal control from automatic di�erentiation based on graphics processing units,”Physical Review A, vol. 95, p. 042318, 2017.

[35] E. A. Lee, “Computing needs time,” Communications of the ACM, vol. 52, pp.70–79, 2009.

[36] D. Deutsch, A. Barenco, and A. Ekert, “Universality in quantum computation,”Proceedings of the Royal Society of London A: Mathematical, Physical and Engi-neering Sciences, vol. 449, pp. 669–677, 1995.

[37] S. Lloyd, “Almost any quantum logic gate is universal,” Physical Review Le�ers,vol. 75, p. 346, 1995.

[38] D. P. DiVincenzo, “�antum gates and circuits,” Proceedings of the Royal Society ofLondon A: Mathematical, Physical and Engineering Sciences, vol. 454, pp. 261–276,1998.

[39] D. Kudrow, K. Bier, Z. Deng, D. Franklin, Y. Tomita, K. R. Brown, and F. T. Chong,“�antum rotations: a case study in static and dynamic machine-code generationfor quantum computers,” in ACM SIGARCH Computer Architecture News. ACM,2013, pp. 166–176.

[40] J. Heckey, S. Patil, A. JavadiAbhari, A. Holmes, D. Kudrow, K. R. Brown,D. Franklin, F. T. Chong, and M. Martonosi, “Compiler management of commu-nication and parallelism for quantum computation,” in Proceedings of 20th ACMInternational Conference on Architectural Support for Programming Languagesand Operating Systems. ACM, 2015, pp. 445–456.

[41] S. Vassiliadis, S. Wong, and S. Cotofana, “Microcode processing: Positioning anddirections,” IEEE Micro, vol. 23, pp. 21–30, 2003.

[42] M. J. Flynn, “Some computer organizations and their e�ectiveness,” IEEE trans-actions on computers, vol. 100, pp. 948–960, 1972.

[43] S. Debnath, N. Linke, C. Figga�, K. Landsman, K. Wright, and C. Monroe, “Demon-stration of a small programmable quantum computer with atomic qubits,” Nature,vol. 536, pp. 63–66, 2016.

[44] 5-qubit backend: IBM Q team, “IBM Q 5 Yorktown backend speci�cationV1.1.0,” Retrieved from h�ps://github.com/QISKit/qiskit-backend-information/tree/master/backends/yorktown/V1, 2018.

[45] S. Balensiefer, L. Kregor-Stickles, and M. Oskin, “An evaluation framework andinstruction set architecture for ion-trap based quantum micro-architectures,” inProceedings of 32nd International Symposium on Computer Architecture. IEEE,2005, pp. 186–196.

[46] R. Versluis, S. Pole�o, N. Khammassi, B. Tarasinski, N. Haider, D. J. Michalak,A. Bruno, K. Bertels, and L. DiCarlo, “Scalable quantum circuit and control for asuperconducting surface code,” Physical Review Applied, vol. 8, p. 034021, 2017.

[47] A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, “Surface codes:Towards practical large-scale quantum computation,” Physical Review A, vol. 86,p. 032324, 2012.

[48] L. Riesebos, X. Fu, S. Varsamopoulos, C. G. Almudever, and K. Bertels, “Pauliframes for quantum computer architectures,” in Proceedings of the 54th AnnualDesign Automation Conference. ACM, 2017, p. 76.

[49] E. Magesan, J. M. Gambe�a, and J. Emerson, “Scalable and robust randomizedbenchmarking of quantum processes,” Physical Review Le�ers, vol. 106, p. 180504,2011.

[50] J. M. Epstein, A. W. Cross, E. Magesan, and J. M. Gambe�a, “Investigating thelimits of randomized benchmarking protocols,” Physical Review A, vol. 89, p.062321, 2014.

[51] S. Asaad, C. Dickel, N. K. Langford, S. Pole�o, A. Bruno, M. A. Rol, D. Deurloo, andL. DiCarlo, “Independent, extensible control of same-frequency superconductingqubits by selective broadcasting,” NPJ �antum Information, vol. 2, p. 16029,2016.

[52] J. Heinsoo, C. K. Andersen, A. Remm, S. Krinner, T. Walter, Y. Salathe, S. Gasper-ine�i, J.-C. Besse, A. Potocnik, C. Eichler, and A. Wallra�, “Rapid high-�delitymultiplexed readout of superconducting qubits,” arXiv:1801.07904, 2018.

[53] M. D. Reed, “Entanglement and quantum error correction with superconductingqubits,” Ph.D. dissertation, Yale University, 2013.

[54] L. K. Grover, “A fast quantum mechanical algorithm for database search,” inProceedings of the 28th Annual ACM Symposium on �eory of Computing. ACM,1996, pp. 212–219.

[55] L. DiCarlo, J. M. Chow, J. M. Gambe�a, L. S. Bishop, B. R. Johnson, D. I. Schuster,J. Majer, A. Blais, L. Frunzio, S. M. Girvin, and R. J. Schoelkopf, “Demonstrationof two-qubit algorithms with a superconducting quantum processor,” Nature, vol.460, pp. 240–244, 2009.

[56] T. F. Watson, S. G. J. Philips, E. Kawakami, D. R. Ward, P. Scarlino, M. Veldhorst,D. E. Savage, M. G. Lagally, M. Friesen, S. N. Coppersmith, M. A. Eriksson, andL. M. K. Vandersypen, “A programmable two-qubit quantum processor in silicon,”Nature, vol. 555, p. 633, 2018.

[57] J. Cramer, N. Kalb, M. A. Rol, B. Hensen, M. S. Blok, M. Markham, D. J. Twitchen,R. Hanson, and T. H. Taminiau, “Repeated quantum error correction on a contin-uously encoded qubit by real-time feedback,” Nature Communications, vol. 7, p.11526, 2016.

14