Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science




Mur, Marieke


Invariant object recognition, a cornerstone of human vision, enables recognizing objects despite variations in rotations, positions, and scales. To emulate human-like generalization across object transformations, computational models must perform well in this aspect. Deep neural networks (DNNs) are popular models for human ventral visual stream processing, though their alignment with human performance remains inconsistent. We examine object recognition across transformations in human adults and pretrained feedforward DNNs. DNNs are grouped in model families by architecture, visual diet, and learning goal. We focus on object rotation in depth, and observe that object recognition performance is better preserved in humans than in DNNs, although they show a similar pattern in how performance drops as a function of rotational angle. DNNs also exhibit decreased recognition after other transformations, especially scale changes. Model architecture minimally influences performance, while DNNs trained on richer visual diets and unsupervised learning goals excel. Our study suggests that visual diet and learning goals may play an important role in the development of invariant object recognition in humans.

Summary for Lay Audience

Humans excel at identifying objects they see, regardless of the object's position, location, or size. We aim to bridge the gab between humans and machines, both to improve algorithms at visual object identification and to gain insights into how the human brain performs visual recognition. A specific kind of algorithms, known as Deep Neural Networks (DNNs), aims to replicate the way human vision operates. Various DNNs, already trained on image recognition, were put to the test in a task that mimics human visual challenges. These DNNs were categorized based on their design, their training goals, and the types of images they had been trained on. Results from comparing DNNs to humans revealed that the design of DNNs was less important than the diversity and quality of the images they were trained on. Furthermore, the way we train DNNs; whether supervised, unsupervised, or semi-supervised, plays a crucial role for how well these DNNs can identify objects. Lastly we found even the top-performing DNNs fell short of human capabilities in identifying objects under different conditions.