
Thesis Format
Monograph
Degree
Master of Science
Program
Neuroscience
Supervisor
Mur, Marieke
Abstract
Object recognition across identity-preserving transformations, such as viewpoint changes, is computationally challenging and may rely on object-centric representations. Temporal continuity, a natural-world statistic, has been proposed as critical for learning such representations. This study tests whether temporal continuity learning promotes object-centric visual representations using a neural network trained on self-supervised predictive objectives. Stimuli of moving objects with realistic spatiotemporal characteristics were generated in Isaac Gym, combining 3D object models and randomized backgrounds. A convolutional autoencoder with a recurrent encoding layer was trained to predict successive frames of these sequences. Representations from the network's encoding layer were analyzed for clustering by object identity and tested with a linear decoder to assess object-centric properties. Results suggest that temporal continuity enables robust representation learning, with implications for understanding the mechanisms underlying human and artificial object recognition.
Summary for Lay Audience
Humans can recognize objects even when they appear different, like seeing a mug from the side versus from above. This ability is not as easy for computers, which often struggle when objects change how they look in an image. One prominent theory is that our brains solve this challenge by creating "object-centric" representations---mental models that group different views of the same object together. But how do we learn these representations? One idea is that the brain learns by observing how objects move smoothly and predictably over time. For example, if you see a spinning mug, you learn that its identity doesn’t change even though its appearance does. To test this idea, we trained a computer model to mimic this process. First, we used computers to create videos of everyday objects, like mugs and bowls, moving and spinning in realistic ways. We used physics simulation software to make sure these movements looked natural, with randomized lighting and backgrounds to make the task more challenging. Next, we built a computer model called a neural network, designed to process these videos. The network learned by predicting the future frames of video. E.g., it was asked to guess what a mug will look like moments later. This forced the program to pay attention to how objects change over time. After training, we checked if the network could group together images of the same object, even from different angles, while keeping images of different objects separate. We found that it could, and it could keep the images separate better than another model that couldn't observe objects over time. This result provides support for the theory that object-centric representations can be learned by observing how objects move over time. More generally, this research helps us understand how the brain learns to recognize objects and demonstrates the feasibility of using neural networks to model such processes.
Recommended Citation
Zhou, Justin, "Testing the Importance of Temporal Continuity for Learning Object-Centric Representations" (2025). Electronic Thesis and Dissertation Repository. 10750.
https://ir.lib.uwo.ca/etd/10750
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.