"Investigating the Influence of Scale Cues and Pose Integration on AI-B" by Oluwadamilola O. Kadiri
Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Engineering Science

Program

Electrical and Computer Engineering

Collaborative Specialization

Planetary Science and Exploration

Supervisor

McIsaac, Kenneth

Abstract

Depth estimation is crucial for robotic navigation, with monocular depth estimators providing cost-effective and accessible solutions. However, their accuracy can degrade in atypical camera poses. This thesis investigates a novel approach to addressing pose biases in AI-based monocular depth estimation by incorporating camera poses obtained using scale-aware feature extraction as an additional input parameter. The methodology encodes the front-facing camera pose of a rover as an additional input channel to a U-Net-based monocular depth estimator. The pose is extracted from images captured by a rear-facing camera, typically used for regolith tracking, using SIFT and RANSAC algorithms. Scaling is performed using known dimensions of objects in the rear view, followed by refinement with a particle filter. Different pose encoding techniques are analyzed, highlighting their potential to improve depth estimation accuracy while identifying key areas for further optimization.

Summary for Lay Audience

Exploring the Moon or other planets is a monumental and complex task. To make this dream a reality, we often use robots, or “rovers,” like NASA’s Curiosity rover on Mars. These mobile robots need to “see” their surroundings accurately to plan safe paths and make decisions. One way they do this is by creating depth maps—images where each pixel represents the distance from the camera to objects in the scene. These maps are crucial for identifying obstacles and navigating the terrain.

Traditionally, expensive tools like LIDAR or stereo cameras are used to generate depth maps. However, smaller, more affordable rovers often rely on monocular cameras, which are simpler and cheaper but present challenges in estimating depth. Without stereo vision, recent advancements in deep learning have made it possible to use AI models for monocular depth estimation (MDE). These models predict depth directly from single images, but their accuracy can decline when cameras are in atypical positions. For instance, a camera mounted on a rover traversing unstructured terrains, such as those found on the Moon or Mars, may become tilted or positioned at unusual angles.

This research explores whether adding the camera’s position and movement—its "pose"—as an additional input can help AI make better depth predictions. The project uses only two monocular cameras: one facing forward and one facing backward.

The backward-facing camera captures images during the rover’s movement. Algorithms match features between these images and use known object sizes, such as the lunar lander or tire tracks, to calculate how the rover moved. This "pose" information is refined using a technique called a particle filter to improve accuracy.

The refined pose data is fed into the AI depth estimation model to test whether it enhances depth map predictions. Different methods for encoding pose information were tested, comparing absolute positions (relative to the starting point) and changes in position (relative to the last step).

Although the results didn’t surpass existing models, they showed promising potential. This work demonstrates how AI can be adapted to resource-limited systems and lays the groundwork for cost-effective navigation technologies for space exploration.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Inference_Results.zip (1209829 kB)
Folder containing additional model inference results

Share

COinS