Robotics

Real-Time Vehicle Trajectory Prediction on an Embedded GPU Architecture

Published on - Real-time Processing of Image, Depth and Video Information 2026

Authors: Oumaima Skalli, Sergio Alberto Rodriguez Florez, Abdelhafid El Ouardi, Stefano Masi

Artificial intelligence (AI), and specifically Deep Learning (DL) models, are known for their high prediction accuracy and capacity to generalize effectively to new data. However, their computational complexity and processing time remain significant challenges, particularly for Advanced Driver Assistance Systems (ADAS) that must operate under strict real-time constraints. Consequently, all AI-based functions that provide features or inputs to ADAS must comply with real-time requirements, in particular, vehicle trajectory prediction, which forecasts the future positions of moving agents in a scene. This work addresses the challenges of deploying DL-based vehicle trajectory prediction models on resource-constrained embedded architectures typical in ADAS, which are limited in processing power and memory. A state-of-the-art trajectory prediction model is studied and evaluated on a NVIDIA Jetson Orin Nano board to assess whether the model can preserve accurate and reliable trajectory predictions while meeting the real-time constraints required for ADAS applications. To achieve this, the state-of-the-art model was analyzed and converted into a different inference frameworks to enable deployment across the embedded architecture. Inference was operated with real-time constraints complying with typical ADAS deployment environments on the GPU and CPU of the embedded architecture, and the resulting outputs were evaluated on the Argoverse dataset using metrics capturing model complexity, prediction precision, and total processing time. Experiments demonstrate that GPU-based inference of the optimized model achieves a latency of 3.87 ms while preserving high prediction accuracy. In contrast, state-of-the-art methods on high-performance, non-embedded architectures typically report latencies of around 30 ms. Additionally, ARM CPU-based inference exhibited a 8× increase in latency, resulting in an inference time of 34.89 ms. Further experiments on the Qualcomm-based Rubik Pi 3 will extend this analysis to an NPU-accelerated architecture, allowing for a tradeoff assessment between accuracy and latency in different embedded architectures.