CIExplorer: Microarchitecture-Aware Exploration for Tightly Integrated Custom Instruction
Proceedings of the 39th ACM International Conference on Supercomputing, 2025
@inproceedings{hao2025ciexplorer,
author = {Hao, Xiaoyu and Zhang, Sen and Qiao, Liang and Jiang, Qingcai and Shi, Jun and Chen, Junshi and An, Hong and Tang, Xulong and Shu, Hao and Yuan, Honghui},
title = {CIExplorer: Microarchitecture-Aware Exploration for Tightly Integrated Custom Instruction},
year = {2025},
isbn = {9798400715372},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3721145.3730421},
doi = {10.1145/3721145.3730421},
booktitle = {Proceedings of the 39th ACM International Conference on Supercomputing},
pages = {975-990},
numpages = {16},
location = {Salt Lake City, U.S.A.},
series = {ICS '25}
}
Abstract
Extending existing architectures with customized instruction extensions is emerging to achieve high performance and energy efficiency for specific applications. Automated discovery of custom instructions (CIs) is well-studied nowadays, which requires exploring combinations of different types and quantities of operations, resulting in a vast search space. However, previous works typically use microarchitecture-agnostic cost models, leading to suboptimal CIs that may degrade performance. They leverage graph isomorphism to reduce area overhead, but few of them consider its potential to benefit performance-oriented exploration. To this end, we present CIExplorer, a framework for adaptive CI exploration. We propose a Seed Growth Method (SGM) based on a genetic algorithm to discover CIs with the consideration of graph similarity. We also propose a compiler-assisted modeling strategy that applies a microarchitecture-aware cost model to estimate the potential benefits of CIs in exploration. We evaluate our framework using various benchmarks in SPEC2006 and Mediabench on in-order, 2-wide OOO, and 4-wide OOO processors. Experimental results demonstrate that CIExplorer achieves average performance improvements of 1.09 × and 1.13 × and energy improvements of 1.07 × and 1.10 × compared with Novia and MaxClique.