Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Computer Science


Charles Ling


Visual learning of highly abstract concepts is often simple for humans but very challenging for machines. Same-different (SD) problems are a visual reasoning task with highly abstract concepts. Previous work has shown that SD problems are difficult to solve with standard deep learning algorithms, especially in the few-shot case, despite the ability of such algorithms to learn abstract features. In this thesis, we propose a new method to solve SD problems with few training samples, in which same-different visual concepts can be recognized by examining similarities between Regions of Interest by using a same-different twins network. Our method achieves state-of-the-art results on the Synthetic Visual Reasoning Test SD tasks and outperforms several strong baselines, achieving accuracy above 95% on several tasks and above 85% on average with only 10 training samples. On a few of these challenging SD tasks, our approach even outperforms reported human performance. We further evaluate the performance of our method outside of the synthetic tasks and achieve good performance on the MNIST, FashionMNIST and Face Recognition datasets.

Summary for Lay Audience

In recent years, computer vision has witnessed many significant breakthroughs in standard recognition tasks such as image classification, image segmentation, or object detection. Most of these gains are a result of applying deep convolutional neural networks (CNNs). However, visual learning tasks requiring attention to highly abstract concepts such as "sameness" and "difference" have proven especially difficult for standard deep CNNs, although it may be simple and obvious for humans. The ability to recognize visual tasks with highly abstract concepts is a ubiquitous human skill that has not seen significant progress for the machine.

Besides, humans can learn these highly abstract visual concepts such as ''sameness" and ''difference" with little supervision. When one person only met someone once, he can remember who they are when he meets them on the street next time. In this thesis, we primarily deal with the same-different classification through few-shot learning. In particular, we try to solve SVRT same-different visual reasoning problems by using few-shot learning and then apply our model to solve more complex same-different problems in real life.