Optimizing synthetic and real training data distributions for deep learning in image recognition

dc.affiliation.instituteInstitut für Medizinische Informatik
dc.contributor.authorNiemeijer, Joshua
dc.contributor.refereeHandels, Heinz
dc.date.accepted2025-11-19
dc.date.accessioned2026-03-18T15:40:22Z
dc.date.available2026-03-18T15:40:22Z
dc.date.issued2026
dc.description.abstractThe recent advances in deep learning have enabled a large variety of applications. Among these are, for example, the environment perception of robots, including self-driving cars, and medical image analysis, which helps identify medical conditions or planning treatment. To build deep learning systems that generalize well, large quantities of relevant human-labeled data must be available for training. This requirement introduces several challenges. Annotations are costly due to the large amounts of data that need to be labeled and the complex nature of the annotation process. This is made more complex by the fact that relevant data needs to be recorded before data can be labeled. Depending on the field of application, this can be challenging. The challenge arises because relevant data is seldom available, which introduces the need to capture large quantities of data to find rare but critical cases. The work investigates a more efficient use of manual annotation through intelligent data selection for labeling, utilizing active learning (AL). In this context, semi-supervised learning (SSL), which aims to replace manual annotation, is utilized. The thesis investigates the use of synthetic data to replace the acquisition of data itself. The work presents strategies to guide the generation process towards creating rare but critical data. Finally, it is shown how to utilize these insights to create models that generalize well toward unseen distributions with minimal human intervention. For each of these methodologies, the thesis contributes novel approaches and analyses. It is shown that the choice of active learning approaches is highly dependent on the type of distribution the selection is performed on and the annotation budget. Next, the work shows how AL and semi-supervised learning are effectively integrated. This insight shows how to develop best practices for the application of AL and SSL. For the use of SSL in adapting networks to novel data domains, this work provides an extensive review of this dynamic field and derives novel low-complexity methods from it. These methods prove useful in their application to the environment perception of autonomous vehicles and the medical domain, as well as for adapting from synthetic to real data. The work provides novel methods for the targeted creation of synthetic data. Building on the creation of synthetic data and the research on SSL, the thesis presents an approach for generalizing to unseen domains. Overall, this thesis provides solutions for minimizing the cost and human effort involved in annotating and acquiring relevant data. The solutions provide efficient adaptation and generalization to new domains and distributions.
dc.identifier.urihttps://epub.uni-luebeck.de/handle/zhb_hl/3617
dc.identifier.urnurn:nbn:de:gbv:841-2026031802
dc.language.isoen
dc.subjectDeep Learning
dc.subjectSynthetic Data
dc.subjectActive Learning
dc.subjectSemi-Supervised Learning
dc.subjectDomain Adaptation
dc.subjectDomain Generalization
dc.subjectAutonomous Driving
dc.subjectMedical Image Analysis
dc.subject.ddc004
dc.titleOptimizing synthetic and real training data distributions for deep learning in image recognition
dc.typethesis.doctoral

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Lade...
Vorschaubild
Name:
Dissertation_Joshua_Niemeijer.pdf
Größe:
47.84 MB
Format:
Adobe Portable Document Format

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
5.07 KB
Format:
Item-specific license agreed to upon submission
Beschreibung:

KONTAKT

Universität zu Lübeck
Zentrale Hochschulbibliothek - Haus 60
Ratzeburger Allee 160
23562 Lübeck
Tel. +49 451 3101 2201
Fax +49 451 3101 2204


IMPRESSUM

DATENSCHUTZ

BARIEREFREIHEIT

Feedback schicken

Cookie-Einstellungen