
Testing the Importance of Temporal Continuity for Learning Object-Centric Representations
Abstract
Object recognition across identity-preserving transformations, such as viewpoint changes, is computationally challenging and may rely on object-centric representations. Temporal continuity, a natural-world statistic, has been proposed as critical for learning such representations. This study tests whether temporal continuity learning promotes object-centric visual representations using a neural network trained on self-supervised predictive objectives. Stimuli of moving objects with realistic spatiotemporal characteristics were generated in Isaac Gym, combining 3D object models and randomized backgrounds. A convolutional autoencoder with a recurrent encoding layer was trained to predict successive frames of these sequences. Representations from the network's encoding layer were analyzed for clustering by object identity and tested with a linear decoder to assess object-centric properties. Results suggest that temporal continuity enables robust representation learning, with implications for understanding the mechanisms underlying human and artificial object recognition.