Apprentissage profond supervisé en présence d'annotations bruitées : application industrielle

Publié le 8 juillet 2025

Auteurs : Ichraq Lemghari

In the context of deep learning, the presence of noisy and inaccurate labels remains a major challenge, particularly in real-world industrial datasets. This thesis investigates the problem of training reliable classifiers in such imperfect settings and proposes four complementary contributions aimed at improving the robustness of classification modelstrained under label noise.After a theoretical and methodological overview of deep learning, label noise, and imprecise classification, we introduce several datasets collected and structured for this work, including complex industrial use cases. Our first contribution introduces a custom noise generator designed to simulate realistic and structured label noise. Unlike existing tools that often rely on random perturbations and instance-independent noise, our generator targets samples that are most likely to be confused in practice, due to complex patterns and similarities between classes. The second contribution focuses on the identification of noisy samples using set-valued classifiers. We show how set-valued predictions can serve as indicators of label ambiguity and propose a framework to use them for detecting noisy labels. In our third contribution, we address the correction of these mislabeled instances through two novel approaches, either based on soft-labelling, or based on Venn-Abers predictors, allowing for well-calibrated probability estimates and more effective relabeling of uncertain samples. Finally, we propose a robust loss function designed to mitigate the impact of noise and label uncertainty during training, offering a practical solution for real-world scenarios.Together, these contributions yield very encouraging results and form a coherent strategy for handling noisy labels, providing both practical tools and theoretical foundations for building more reliable and robust deep learning models.