|国家预印本平台
首页|Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics

Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics

Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics

来源:Arxiv_logoArxiv
英文摘要

Vision Transformers (ViTs) have emerged as a powerful architecture for computer vision tasks due to their ability to model long-range dependencies and global contextual relationships. However, their substantial compute and memory demands hinder efficient deployment in scenarios with strict energy and bandwidth limitations. In this work, we propose OptoViT, the first near-sensor, region-aware ViT accelerator leveraging silicon photonics (SiPh) for real-time and energy-efficient vision processing. Opto-ViT features a hybrid electronic-photonic architecture, where the optical core handles compute-intensive matrix multiplications using Vertical-Cavity Surface-Emitting Lasers (VCSELs) and Microring Resonators (MRs), while nonlinear functions and normalization are executed electronically. To reduce redundant computation and patch processing, we introduce a lightweight Mask Generation Network (MGNet) that identifies regions of interest in the current frame and prunes irrelevant patches before ViT encoding. We further co-optimize the ViT backbone using quantization-aware training and matrix decomposition tailored for photonic constraints. Experiments across device fabrication, circuit and architecture co-design, to classification, detection, and video tasks demonstrate that OptoViT achieves 100.4 KFPS/W with up to 84% energy savings with less than 1.6% accuracy loss, while enabling scalable and efficient ViT deployment at the edge.

Mehrdad Morsali、Chengwei Zhou、Deniz Najafi、Sreetama Sarkar、Pietro Mercati、Navid Khoshavi、Peter Beerel、Mahdi Nikdast、Gourav Datta、Shaahin Angizi

光电子技术电子技术应用

Mehrdad Morsali,Chengwei Zhou,Deniz Najafi,Sreetama Sarkar,Pietro Mercati,Navid Khoshavi,Peter Beerel,Mahdi Nikdast,Gourav Datta,Shaahin Angizi.Opto-ViT: Architecting a Near-Sensor Region of Interest-Aware Vision Transformer Accelerator with Silicon Photonics[EB/OL].(2025-07-09)[2025-07-17].https://arxiv.org/abs/2507.07044.点此复制

评论