Faculty
social science
Supervisor Name
Dr. Marieke Mur
Keywords
convolutional neural network, deep neural network, vision, dorsal stream, ventral stream, grasp selection, precision grip, object recognition, representation contribution analysis, representational similarity analysis
Description
Deep convolutional neural networks (DCNNs) have been used to model the ventral visual stream. However, there have been relatively few computational models of the dorsal visual stream, preventing a wholistic understanding of the human visual system. Additionally, current DCNN models of the ventral stream have shortcomings (such as an over-reliance on texture data) which can be ameliorated by incorporating dorsal stream information. The current study aims to investigate two questions: 1) does incorporating action information improve computational models of the ventral visual system? 2) how do the ventral and dorsal streams influence each other during development?
Three models will be created: a two-task neural network trained to both perform object recognition and to generate human grasp points; a single-task neural network trained to perform only image recognition; and a lesioned neural network, which will be identical to the two-task neural network except that the units with greatest representation contribution towards grasp point generation will be deactivated. All networks will be evaluated on performance metrics such as accuracy (evaluated with ImageNet and Stylized-ImageNet), transfer learning, and robustness against distortions. The networks will also be evaluated on representational metrics such as representation contribution analysis and representational similarity analysis.
We expect the two-task network will score higher on performance measures than either the lesioned or single-task network. Additionally, for the two-task network we predict more units will contribute towards grasp point generation than object recognition. Lastly, we expect representations in the two-task network to better reflect human data than the single-task network.
Acknowledgements
This research was supported by the Western Undergraduate Summer Research Internships (USRI) program. Special thanks to Melvyn Goodale, Jody Culham, and Jonathan Michaels for their insightful feedback.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.
Document Type
Poster
Incorporating action information into computational models of the human visual system
Deep convolutional neural networks (DCNNs) have been used to model the ventral visual stream. However, there have been relatively few computational models of the dorsal visual stream, preventing a wholistic understanding of the human visual system. Additionally, current DCNN models of the ventral stream have shortcomings (such as an over-reliance on texture data) which can be ameliorated by incorporating dorsal stream information. The current study aims to investigate two questions: 1) does incorporating action information improve computational models of the ventral visual system? 2) how do the ventral and dorsal streams influence each other during development?
Three models will be created: a two-task neural network trained to both perform object recognition and to generate human grasp points; a single-task neural network trained to perform only image recognition; and a lesioned neural network, which will be identical to the two-task neural network except that the units with greatest representation contribution towards grasp point generation will be deactivated. All networks will be evaluated on performance metrics such as accuracy (evaluated with ImageNet and Stylized-ImageNet), transfer learning, and robustness against distortions. The networks will also be evaluated on representational metrics such as representation contribution analysis and representational similarity analysis.
We expect the two-task network will score higher on performance measures than either the lesioned or single-task network. Additionally, for the two-task network we predict more units will contribute towards grasp point generation than object recognition. Lastly, we expect representations in the two-task network to better reflect human data than the single-task network.