| Titel | Predicting Abstract Bird’s-Eye-View Representations From Monocular Camera Images Using Deep Learning | |
|---|---|---|
| Auteur(s) | de Marez Oyens, PMW | |
| Supervisor(s) | Bruni, E | |
| Jaar | 2022 | |
| Faculteit | Faculteit der Natuurwetenschappen, Wiskunde en Informatica (FNWI) | |
| Opleiding | FNWI MSc Artificial Intelligence | |
| Abstract | Fully autonomous vehicles rely on a powerful representation of their surroundings to successfully plan and make important decisions. The semantic Bird’s-Eye-View (BEV) representation is the most powerful and widely adopted representation for these tasks, as it fuses and represents a scene and object information in an efficient and easy to comprehend way. With the notion that self-driving vision should only rely on monocular onboard cameras, generating a BEV from monocular camera images becomes the ultimate goal. Creating this from image data is a complex task, and many methods exist to try and solve this problem. In this work, we disambiguate the existing body of research, and we clarify and reproduce existing methods. Additionally, we extend these methods by implementing a state-of-the-art transformer model. We show that the transformer architecture is inherently well suited to predict BEVs from monocular camera images. Additionally, we present a novel, data-centric approach, where we implement a transformer model named Cross-Eyed-Bird (CEB) that uses cross-attention in a novel way to help predict a semantic BEV. We show that our approach has a high predictive performance for important semantic classes in the BEV space and shows improvement compared to other methods. | |
| Taal | Engels | |
| Graad | master | |
| Licentie | Geen licentie toegepast. Lezen en citeren toegestaan. Verspreiden en hergebruik niet toegestaan zonder toestemming van de auteur. | |
| Downloads | 684131.pdf |
|
| Beschikbaar tot | 2029-07-16 | |