Robotics

Complex dynamic scene analysis through multi-body motion segmentation : application to intelligent vehicles

Published on

Authors: Hernán Gonzalez

In the context of Advanced Driver Assistance Systems (ADAS) and Autonomous Vehicles, scene understanding is a fundamental inference process in which several servoing and decision making functions depends on. Such a process is intended to retrieve reliable information about the vehicle's surroundings including static and dynamic objects (e.g. obstacles, pedestrians, vehicles), the scene structure (e.g. road, navigable space, lane markings) and ego-localization (e.g. odometry). All this information is essential to make crucial decisions in autonomous navigation and assistance maneuvers. To this end, single or multiple perception systems are designed to provide redundant and reliable observations of the scene. This thesis is devoted and focused on image-based multi-body motion segmentation of dynamic scenes using monocular vision systems. The conducted research starts by surveying methods of the state-of-the-art and contrasting their advantages and drawbacks in terms of performance indicators and computation time. After identifying a recent vision-based methodology, sparse optical flow required pre-processes are studied. As a concept-proof, an algorithm implementation shows, in practice, limits of the addressed approach leading to envision and formalize our contributions. Detecting and tracking objects in a classic processing chain may lead to low-performance and time-consuming solutions. Instead of segmenting moving objects and tracking them independently, a Track-before-Detect framework for a multi-body motion segmentation (namely TbD-SfM) was proposed. This method relies detection and tracking on a tightly coupled strategy intended to reduce the complexity of an existing Multi-body Structure from Motion approach. Efforts were also devoted for reducing the computational cost without introducing any kinematic model constraints and for preserving features density on observed motions. Further, an accelerated implementation variant of TbD (namely ETbD-SfM) was also proposed in order to limit the complexity increasing with respect to the number of observed motions. The proposed methods were extensively tested with different publicly available datasets such as Hopkins155 and KITTI. Hopkins dataset allows a comparison under feature-tracking ideal conditions since the dataset includes referenced optical flow. KITTI provides image sequences under real conditions in order to evaluate robustness of the method. Results on scenarios including the presence of multiple and simultaneous moving objects observed from a moving camera are analyzed and discussed. In conclusion, the obtained results show that TbD-SfM and ETbD-SfM methods can segment dynamic objects using a 6DoF motion model, achieving a low image segmentation error without increasing of computational cost and preserving the density of the feature points. Additionally, the 3D scene geometry and trajectories are provided by estimating scale on the monocular system and comparing these results to referenced object trajectories.