Conference Proceedings
d’ArQ: A QOC Framework with Causality-Aware Grouping and Basis Selection
HPCA '26: 2026 IEEE International Symposium on High Performance Computer Architecture
Jan. 2026 / Sydney, NSW, Australia
Changheon Lee1, Hyungseok Kim1, Seungwoo Choi1, Youngmin Kim1, and Won Woo Ro1
1Yonsei University, Republic of Korea
Quantum Optimal Control (QOC) frameworks are powerful tools for compiling quantum circuits into low-latency hardware control pulses, but recent studies suffer from two critical limitations: lengthy compilation times and potential logical inconsistencies from flawed gate grouping strategies.
In this work, we introduce d'ArQ, a novel QOC framework that solves these challenges.
(i) We identify and resolve the causality problem, a flaw in greedy partitioning that can produce invalid schedules, by introducing a DAG-based grouping algorithm with assigning mergeability to each group so that it guarantees logical correctness.
(ii) To mitigate compilation times, we use a pre-computed library of pulses derived from random unitary matrices to provide a high-quality random initialization for pulse optimization.
(iii) Diverging from prior work based on GRAPE, d'ArQ is built on the GOAT algorithm.
We demonstrate that the choice of analytic basis is a critical hyperparameter and introduce a heuristic cost model to dynamically select the optimal basis for each synthesis task, improving pulse performance.
When evaluated against the state-of-the-art baseline PAQOC on a realistic, inhomogeneous hardware model, d'ArQ demonstrates superior performance.
Notably, d'ArQ reduces circuit latency up to 22.8% and compilation time up to 56.8%, establishing a more robust and physically realistic path for circuit compilation.
Toward Scalable Gate-Level Parallelism on Trapped-Ion Processors with Racetrack Electrodes
HPCA '26: 2026 IEEE International Symposium on High Performance Computer Architecture
Jan. 2026 / Sydney, NSW, Australia
Enhyeok Jang1, Hyungseok Kim1, Yongju Lee1, Jaewon Kwon1, Yipeng Huang2, and Won Woo Ro1
1Yonsei University, Republic of Korea, 2Rutgers University, New Jersey, USA
A recent advancement in quantum computation has demonstrated quantum advantage by using randomized circuits on a racetrack-shaped trapped-ion processor.
This work investigates the execution efficiency of this architecture for general-purpose quantum programs, which exhibit different computational characteristics compared to the randomized circuits.
We first explore the impact of increasing the number of operational zones on runtime efficiency.
Counterintuitively, our evaluations using variational program benchmarks reveal that expanding the number of gate operational zones may degrade runtime performance under existing scheduling policies.
This degradation could be attributed to the proportional increase in track length, which increases ion circulation overhead, thereby offsetting the benefits of enhanced gate-level parallelism.
To mitigate this, we propose three key strategies for scalable parallel execution on racetrack processors:
(i) unitary decomposition and translation to maximize zone utilization,
(ii) prioritizing the execution of nearby gates over ion movement, and
(iii) implementing shortcuts to provide alternative circulation paths.
Our evaluations show that these strategies can reduce the runtime of variational programs by an average of 71% and the fidelity by an average of 19.8%.
These strategies can ensure that the architectural design remains scalable by maintaining runtime performance even as the number of operational zones increases, even in the worst-case scenarios.
QR-Map: A Map-Based Approach to Quantum Circuit Abstraction for Qubit Reuse Optimization
ISCA '25: Proceedings of the 52nd Annual International Symposium on Computer Architecture
20 Jun. 2025 / Tokyo, Japan
Hyungseok Kim1, Enhyeok Jang1, Seungwoo Choi1, Youngmin Kim1, and Won Woo Ro1
1Yonsei University, Republic of Korea
Recent advances in quantum computing introduce the ability to reuse qubits through mid-circuit measurements, thereby enhancing the efficiency of quantum devices with limited computational resources.
However, identifying optimal reuse opportunities in quantum circuits remains challenging due to the intricate dependencies between quantum gates.
Existing frameworks address this by either directly searching for reuse opportunities or converting circuits into directed acyclic graphs (DAGs).
Unfortunately, these frameworks may require exponential search complexity or may not always ensure optimal results due to their non-deterministic property.
To overcome these challenges, we propose QR-Map (Qubit Reuse Map), a map-based framework that abstracts computational dependencies for efficient qubit reuse.
By extracting and aligning two-qubit gates, QR-Map facilitates dependency detection and ensures qubit savings without incurring excessive idle time.
This approach achieves an optimal balance between gate serialization depth and crosstalk reduction.
Evaluations with various quantum circuit benchmarks demonstrate that quantum circuits optimized with QR-Map achieve average reductions of 20% in qubit usage, 25% in circuit depth, and 22% in SWAP insertions compared to those optimized with the state-of-the-art framework.
Qubit Movement-Optimized Program Generation on Zoned Neutral Atom Processors
CGO '25: Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization
01 Mar. 2025 / Las Vegas, NV, USA
Enhyeok Jang1, Youngmin Kim1, Hyungseok Kim1, Seungwoo Choi1, Yipeng Huang2, and Won Woo Ro1
1Yonsei University, Republic of Korea, 2Rutgers University, New Jersey, USA
A zoned neutral atom architecture achieves exceptional fidelity by segregating the execution spaces of 1- and 2-qubit gates, being a promising candidate for high-accuracy quantum systems.
Unfortunately, naïvely applying programs designed for static qubit topologies to zoned architectures may result in most execution time being consumed by intra-zone travels of atoms.
To address this, we introduce Mantra (Minimizing trAp movemeNts for aTom aRray Architectures), which rewrites quantum programs to reduce the interleaving of single- and two-qubit gates.
Mantra incorporates three strategies: (i) a fountain-shaped controlled-Z (CZ) chain, (ii) ZZ-interaction protocol without a 1-qubit gate, and (iii) preemptive gate scheduling.
Mantra reduces inter-zone movements by 68%, physical gate counts by 35%, and improves circuit fidelities by 17% compared to the standard executions.
Recompiling QAOA Circuits on Various Rotational Directions
PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques
13 Oct. 2024 / Long Beach, CA, USA
Enhyeok Jang1, Dongho Ha2, Seungwoo Choi1, Youngmin Kim1, Jaewon Kwon1, Yongju Lee1, Sungwoo Ahn1, Hyungseok Kim1, and Won Woo Ro1
1Yonsei University, Republic of Korea, 2MangoBoost, Washington, USA
The quantum approximate optimization algorithm (QAOA) is introduced to efficiently solve combinatorial optimization problems.
Despite the promise of QAOA, the cost of executing QAOA circuits at scale for quantum advantage may still be excessive for the near-future quantum device.
We observe the increasing overhead of QAOA circuit execution in the native gate translation.
To execute QAOA circuits on a real quantum computing device, Hamiltonians composed of predefined specific rotations (e.g., ZZ and X) should be decomposed into finite native gates.
By adopting rotational combinations that utilize native gates more directly than the standard QAOA circuit model, the execution cost on real quantum devices can be reduced.
In this study, we propose Racoon (Rotational Space Virtualization for QAOA Ansatz), an algorithm-hardware co-design approach that revisits the synthesis conditions of QAOA circuits and selects alternative candidates with different rotational combinations.
Our analysis of six commercial quantum processors demonstrates that applying Racoon to QAOA circuits for the 4-node Sherrington-Kirkpatrick model reduces the number of native gates by an average of 23% and up to 79%.
Consequently, using Racoon results in 43% fewer training epochs, 41% lower training energy consumption, and a 6% improvement in inference on average compared to standard QAOA.
Racoon consistently reduces circuit depth as the number of qubits and layers increases, achieving 123 × more circuit depth reduction compared to the recently proposed Depth First Search (DFS)-based method.
Furthermore, we confirm that Racoon's method can be extended to State-of-The-Art QAOAs with modified ansätze and to the variational quantum eigensolver (VQE).
Journal Articles
Distribution-Adaptive Dynamic Shot Optimization for Variational Quantum Algorithms
Physical Review Research, Vol. 7 Iss. 4
05 Dec. 2025
Youngmin Kim1, Enhyeok Jang1, Hyungseok Kim1, Seungwoo Choi1, Changheon Lee1, Donghwi Kim2, Woomin Kyoung2, Kyujin Shin2, and Won Woo Ro1
1Yonsei University, Republic of Korea, 2Hyundai Motor Company, Republic of Korea
Variational quantum algorithms (VQAs) have attracted remarkable interest over the past few years because of their potential computational advantages on near-term quantum devices.
They leverage a hybrid approach that integrates classical and quantum computing resources to solve high-dimensional problems that are challenging for classical approaches alone.
In the training process of variational circuits, constructing an accurate probability distribution for each epoch is not always necessary, creating opportunities to reduce computational costs through shot reduction.
However, existing shot-allocation methods that capitalize on this potential often lack adaptive feedback or are tied to specific classical optimizers, which limits their applicability to common VQAs and broader optimization techniques.
Our observations indicate that the information entropy of a quantum circuit's output distribution exhibits an approximately exponential relationship with the number of shots needed to achieve a target Hellinger distance.
In this work, we propose a distribution-adaptive dynamic shot (DDS) framework that efficiently adjusts the number of shots per iteration in VQAs using the entropy distribution from the prior training epoch.
Our results demonstrate that the DDS framework sustains inference accuracy while achieving a ~50% reduction in average shot count compared to fixed-shot training, and ~60% higher accuracy than recently proposed tiered shot allocation methods.
Furthermore, in noisy simulations that reflect the error rates of actual IBM quantum systems, DDS achieves approximately a ~30% reduction in the total number of shots compared to the fixed-shot method with minimal degradation in accuracy, and offers about ~70% higher computational accuracy than tiered shot allocation methods.
Research Trends and Prospects of Hybrid Computing-Based Variational Quantum Algorithms (VQAs)
The Magazine of the IEIE, Vol. 52 No. 9
Sep. 2025
Youngmin Kim1, Changheon Lee1, Hyungseok Kim1, and Won Woo Ro1
1Yonsei University, Republic of Korea
This paper examines the latest research trends in Variational Quantum Algorithms (VQA), a quantum-classical hybrid algorithm paradigm that is in the spotlight in the era of Noise Intermediate-Scale Quantum (NISQ).
VQA combines a parameterized quantum circuit structure, an ansatz, and a classical optimizer to repeatedly adjust the quantum state, so it can be applied to various fields such as quantum chemistry, combination optimization, and machine learning, even in hardware with a limited number of qubits and current level of error rate.
First, we introduce the Variable Quantum Eigenvalue Solution (VQE) that calculates the ground state energy of molecules and the Quantum Approximate Optimization Algorithm (QAOA), which solves the optimization problem in the form of Quadratic Unconstructed Binary Optimization (QUBO), and discuss issues such as increasing circuit depth that can appear in real hardware execution, loss of parameter gradient due to barren plateau phenomenon, overhead due to SWAP gate required for qubit rearrangement, and measurement noise.
Next, as the latest research examples to address this, we introduce efficient ansatz design that reduces the number of operations required, circuit rearrangement and transformation tailored to hardware connection structure and error characteristics, how to speed up parameter learning by reducing unnecessary gates inside the circuit, fast convergence using initial parameter setting and transfer learning, and distributed execution techniques that utilize multiple quantum processing units (QPUs) in parallel.
Through these examples, this paper confirms that integrated optimization across algorithms, compilers, and all layers of hardware is essential for the practicalization of VQA.
Workshops and Tutorials
Native Gate-Aware QAOA Ansatz
QDML '25: The 1st International Workshop on Quantum Data and Machine Learning: Systems, Theory and Hardware
In conjunction with ICDE '25
19 May 2025 / Hong Kong SAR, China
Hyungseok Kim1, Enhyeok Jang1, Youngmin Kim1, and Won Woo Ro1
1Yonsei University, Republic of Korea
The quantum approximate optimization algorithm (QAOA) is introduced to solve combinatorial optimization problems efficiently.
Despite the computational benefit of the QAOA, the cost of executing QAOA programs at scale to demonstrate quantum advantage is still expensive for the near-future quantum computing system.
We observe that real quantum computing devices represent and execute QAOA circuits through their finite set of native gates.
In general, the cost and mixer Hamiltonian are realized with ZZ and X-direction rotations of qubits, respectively.
However, the rotation direction of the qubit for QAOA circuit training does not necessarily have to be configured only with the combination described above, nor is this combination of rotation always optimal for all quantum processors.
By adopting rotational combinations that utilize native gates more directly than the standard QAOA circuit model, the execution cost on real quantum devices can be reduced.
In this study, we propose Racoon (Rotational Space Virtualization for QAOA Ansatz), an algorithm-hardware co-design approach that revisits the synthesis conditions of QAOA circuits and selects alternative candidates with different rotational combinations.
Our analysis of six commercial quantum processors demonstrates that applying Racoon to QAOA circuits for the 4-node Sherrington-Kirkpatrick model reduces the number of native gates by an average of 23% and up to 79%.
Consequently, using Racoon results in 43% fewer training epochs, 41% lower training energy consumption, and a 6% improvement in inference on average compared to standard QAOA.
Racoon consistently reduces circuit depth as the number of qubits and layers increases, achieving 123× more circuit depth reduction compared to the recently proposed Depth First Search (DFS)-based method.
Furthermore, we confirm that Racoon method can be extended to state-of-the-art QAOAs with modified ansatz and to variational quantum eigensolvers (VQEs).
A Dead Gate Elimination for Quantum Programs
HAIQ '25: The 1st HPC/AI Integration with Quantum Computing Workshop
In conjunction with HPCA '25
01 Mar. 2025 / Las Vegas, NV, USA
Enhyeok Jang1, Youngmin Kim1, Hyungseok Kim1, Dongho Ha2, Yongju Lee1, Jaewon Kwon1, Jun Woo You1, Jiho Park1, and Won Woo Ro1
1Yonsei University, Republic of Korea, 2MangoBoost, Washington, USA
The computational complexity of quantum programs is influenced by the limitations of the native gate set and the constraints imposed by qubit topology.
These factors necessitate advanced compilation techniques for efficient execution.
Our experimental data reveal that approximately 23.1% of gates in quantum programs are dead gates, which do not contribute to any meaningful alteration in the quantum state.
Removing these dead gates would provide the potential opportunity to reduce the size and improve the accuracy of the quantum program.
However, we observe that existing methods, including those integrated into Qiskit Transpiler, cannot adequately remove these unnecessary gates.
In this work, we introduce Dementor (Dead Quantum Gate Eliminator), which efficiently detects and removes dead gates by considering a range of redundancy patterns.
To evaluate the efficacy of Dementor, we conducted experiments on IBM quantum processors, which have two distinct native gate sets: Echoed Cross-Resonance (ECR)-based and Controlled-X (CX)-based.
Our experiments show that Dementor achieves a reduction in the number of decomposed gates by an average of 46.4% on ECR-based systems and by an average of 60.6% on CX-based systems compared to Qiskit Transpiler with optimization level 3.
Oral and Poster Presentations
TENET: A Pincer Movement with Backward Programs for Bypassing Relaxed Qubit Readouts
ICEIC '26: The 25th International Conference on Electronics, Information, and Communication
Jan. 2026 / Macau SAR, China
Changheon Lee1, Enhyeok Jang1, Youngmin Kim1, Hyungseok Kim1, Sungho Pyun1, and Won Woo Ro1
1Yonsei University, Republic of Korea
Currently available quantum processors are highly susceptible to noise, among which decoherence from thermal relaxation is a major source of error.
This asymmetry, where excited qubit states (|1⟩) decay more frequently into ground states (|0⟩), introduces biased readout errors and undermines the fidelity of quantum computations, particularly when correct outputs contain many |1⟩ states.
To address this challenge, we propose TENET (TEmporal piNcEr operaTion), a bidirectional execution framework that leverages temporally inverted programs to mitigate relaxation-induced errors.
TENET initializes qubits with a hypothesized solution and executes the Hermitian-conjugate circuit to verify whether the output collapses to the all-zero state, bypassing the need for |1⟩ readouts.
This temporal quantum pincer approach enhances program verification under asymmetric noise by enabling cooperative forward–backward execution without requiring hardware modification.
Evaluations on IBM's 127-qubit Eagle r3 processors show that TENET improves the verifiable qubit limit by 1.8× on average (up to 2.3×) and boosts fidelity by 3.3× on average (up to 37×) compared to standard execution.
An extended variant that cross-verifies intermediate states using half-depth forward and backward circuits further increases the verifiable qubit scale by 2.5× and fidelity by up to 245×.
By combining temporal inversion with cooperative verification, TENET establishes a scalable and hardware-agnostic methodology for reliable quantum program validation on noisy intermediate-scale processors.
Summer Annual Conference of the Institute of Electronics and Information Engineers (IEIE), 2025
27 Jun. 2025 / Jeju, Korea
Changheon Lee1, Youngmin Kim1, Hyungseok Kim1, and Won Woo Ro1
1Yonsei University, Republic of Korea
Quantum optimal control (QOC) is essential for extracting the most algorithmic depth from today’s NISQ processors, yet its practical impact is limited by two factors: (i) the analytic basis used to parameterize each control pulse and (ii) the compilation latency required to generate highfidelity waveforms.
We find that ① Fourier envelopes outperforms Gaussian shapes by 1.24× on average, in terms of fidelity.
② The Gaussian basis reduces gate length by 5% relative to the Fourier basis.
③ The Gaussian envelope shave 11% from compilation latency compared to the Fourier basis.
④ Sinc functions, although attractive in theory for their perfect rectangular spectra, underperform on fidelity.
Preprints
Mantra: Rewriting Quantum Programs to Minimize Trap-Movements for Zoned Rydberg Atom Arrays
arXiv preprint
04 Mar. 2025
Enhyeok Jang1, Youngmin Kim1, Hyungseok Kim1, Seungwoo Choi1, Yipeng Huang2, and Won Woo Ro1
1Yonsei University, Republic of Korea, 2Rutgers University, New Jersey, USA
A zoned neutral atom architecture achieves exceptional fidelity by segregating the execution spaces of 1- and 2-qubit gates, being a promising candidate for high-accuracy quantum systems.
Unfortunately, naively applying programs designed for static qubit topologies to zoned architectures may result in most execution time being consumed by inter-zone travels of atoms.
To address this, we introduce Mantra (Minimizing trAp movemeNts for aTom aRray Architectures), which rewrites quantum programs to reduce the interleaving of single- and two-qubit gates.
Mantra incorporates three strategies: (i) a fountain-shaped controlled-Z (CZ) chain, (ii) ZZ-interaction protocol without a 1-qubit gate, and (iii) preemptive gate scheduling.
Mantra reduces inter-zone movements by 68%, physical gate counts by 35%, and improves circuit fidelities by 17% compared to the standard executions.
Distribution-Adaptive Dynamic Shot Optimization for Variational Quantum Algorithms
arXiv preprint
23 Dec. 2024
Youngmin Kim1, Enhyeok Jang1, Hyungseok Kim1, Seungwoo Choi1, Changhun Lee1, Donghwi Kim2, Woomin Kyoung2, Kyujin Shin2, and Won Woo Ro1
1Yonsei University, Republic of Korea, 2Hyundai Motor Company, Republic of Korea
Variational quantum algorithms (VQAs) have attracted remarkable interest over the past few years because of their potential computational advantages on near-term quantum devices.
They leverage a hybrid approach that integrates classical and quantum computing resources to solve high dimensional problems that are challenging for classical approaches alone.
In the training process ofvariational circuits, constructing an accurate probability distribution for each epoch is not always necessary, creating opportunities to reduce computational costs through shot reduction.
However, existing shot-allocation methods that capitalize on this potential often lack adaptive feedback or aretied to specific classical optimizers, which limits their applicability to common VQAs and broader optimization techniques.
Our observations indicate that the information entropy of a quantum circuit's output distribution exhibits an approximately exponential relationship with the numberof shots needed to achieve a target Hellinger distance.
In this work, we propose a distribution-adaptive dynamic shot (DDS) framework that efficiently adjusts the number of shots per iterationin VQAs using the entropy distribution from the prior training epoch.
Our results demonstrate that the DDS framework sustains inference accuracy while achieving a ~50% reduction in average shotcount compared to fixed-shot training, and ~60% higher accuracy than recently proposed tiered shot allocation methods.
Furthermore, in noisy simulations that reflect the error rates of actual IBM quantum systems, DDS achieves approximately a ~30% reduction in the total number of shots compared to the fixed-shot method with minimal degradation in accuracy, and offers about ~70% higher computational accuracy than tiered shot allocation methods.