Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Computer Science

Supervisor

Ling, Charles

Abstract

Convolutional Neural Network (CNN) has undergone tremendous advancements in recent years, but visual reasoning tasks are still a huge undertaking, particularly in few-shot learning cases. Little is known, especially in solving the Same-Different (SD) task, which is a type of visual reasoning task that requires seeking pattern repetitions in a single image. In this thesis, we propose a patch-as-filter method focusing on solving the SD tasks with few-shot learning. Firstly, a patch in an individual image is detected. Then, transformations are learned to create sample-specific convolutional filters. After applying these filters on the original input images, we, lastly, acquire feature maps indicating the duplicate segments. We show experimentally that our approach achieves the state-of-the-art few-shot performance on the Synthetic Visual Reasoning Test (SVRT) SD tasks by accuracy going up above 30% on average, with only ten training samples. Besides that, to further evaluate the effectiveness of our approach, SVRT-like tasks are generated with more difficult visual reasoning concepts. The results suggest that the average accuracy is increased by approximately 10% compared to several popular few-shot algorithms. The method we suggest here has shed new light upon new CNN approaches in solving the SD tasks with few-shot learning.

Summary for Lay Audience

With the rapid development of Convolutional Neural Networks, computers can achieve nearly the same performance as humans and even better in image recognition and classification. Learning highly abstract concepts, however, is still extremely challenging for computers with millions of training data. In particular, Same-Different (SD) tasks have been proven especially difficult for existing CNN approaches. These SD tasks require reasoning of the similarity between patterns located within the same image. The standard CNN fails on learning highly abstract visual concepts since its filters are optimized based on the training data and then applied on test samples. Therefore, the ability to learn abstract concepts within the same picture is lost. In this thesis, we are aiming to make machines learn highly abstract visual concepts by only observing a few labelled samples. Firstly, a patch in an individual image is detected. Then, transformations are learned to create sample-specific convolutional filters. After applying these filters on the original input images, we, lastly, acquire feature maps indicating the duplicate segments. By using our method, we can identify the highly abstract relations of shapes in each image. Since our filters are generated base on a patch from the input image, the filters of each image in our method are different in both training datasets and test datasets. These sample-specific filters enable our model to learn more abstract visual concepts, such as fuzzy same-different, specific rotation and specific scaling. We show experimentally that our approach achieves the state-of-the-art few-shot performance on the Synthetic Visual Reasoning Test (SVRT) SD tasks by accuracy going up above 30% on average, with only ten training samples. Besides that, to further evaluate the effectiveness of our approach, SVRT-like tasks are generated with more difficult visual reasoning concepts. The results suggest that the average accuracy is increased by approximately 10% compared to several well-performed few-shot methods.

Share

COinS