Embedded Systems
A practical HW-aware NAS flow for AI vision applications on embedded heterogeneous SoCs
Published on - International Workshop on Design and Architecture for Signal and Image Processing 2025
Implementing efficient Deep Neural Networks (DNNs) for dense-prediction vision applications on embedded heterogeneous SoCs comes with many challenges, such as latency and energy constraints. To tackle them, we propose a novel and practical multi-objective Hardware-aware Neural Architecture Search (HW-NAS) framework able, for the first time, to handle complex search spaces while considering the hardware manufacturer’s expertise. This HW-NAS flow targeting Nvidia’s Orin SoCs relies on (1) a practical strategy to reduce the total exploration duration, and (2) a compact enhancement of the existing TensorRT deployment flow. On the FasterSeg’s search space, our framework can obtain a latency-power-mIoU Pareto front for multiple power modes in only 66 hours (-33 % than inital flows) using 8 Nvidia A100 GPUs. Compared to default mappings, these results demonstrate that our novel mapping strategy can obtain practical solutions with either 50 % less power consumption or 80 % less latency for the same accuracy performance, or achieve a better accuracy (+6 %) with 30 % less power consumption.