首页|Efficient Implementation of RISC-V Vector Permutation Instructions

Efficient Implementation of RISC-V Vector Permutation Instructions

来源：

英文摘要

RISC-V CPUs leverage the RVV (RISC-V Vector) extension to accelerate data-parallel workloads. In addition to arithmetic operations, RVV includes powerful permutation instructions that enable flexible element rearrangement within vector registers --critical for optimizing performance in tasks such as matrix operations and cryptographic computations. However, the diverse control mechanisms of these instructions complicate their execution within a unified datapath while maintaining the fixed-latency requirement of cryptographic accelerators. To address this, we propose a unified microarchitecture capable of executing all RVV permutation instructions efficiently, regardless of their control information structure. This approach minimizes area and hardware costs while ensuring single-cycle execution for short vector machines (up to 256 bits) and enabling efficient pipelining for longer vectors. The proposed design is integrated into an open-source RISC-V vector processor and implemented at 7 nm using the OpenRoad physical synthesis flow. Experimental results validate the efficiency of our unified vector permutation unit, demonstrating that it only incurs 1.5% area overhead to the total vector processor. Furthermore, this area overhead decreases to near-0% as the minimum supported element width for vector permutations increases.

作者：Vasileios Titopoulos、George Alexakis、Chrysostomos Nicopoulos、Giorgos Dimitrakopoulos

作者单位：

学科分类：微电子学、集成电路半导体技术

推荐引用：Vasileios Titopoulos,George Alexakis,Chrysostomos Nicopoulos,Giorgos Dimitrakopoulos.Efficient Implementation of RISC-V Vector Permutation Instructions[EB/OL].(2025-05-11)[2025-06-22].https://arxiv.org/abs/2505.07112.点此复制

Efficient Implementation of RISC-V Vector Permutation Instructions

Efficient Implementation of RISC-V Vector Permutation Instructions

评论