Faculty

social science

Supervisor Name

Dr. Marieke Mur

Keywords

convolutional neural network, deep neural network, vision, dorsal stream, ventral stream, grasp selection, precision grip, object recognition, representation contribution analysis, representational similarity analysis

Description

Deep convolutional neural networks (DCNNs) have been used to model the ventral visual stream. However, there have been relatively few computational models of the dorsal visual stream, preventing a wholistic understanding of the human visual system. Additionally, current DCNN models of the ventral stream have shortcomings (such as an over-reliance on texture data) which can be ameliorated by incorporating dorsal stream information. The current study aims to investigate two questions: 1) does incorporating action information improve computational models of the ventral visual system? 2) how do the ventral and dorsal streams influence each other during development?

Three models will be created: a two-task neural network trained to both perform object recognition and to generate human grasp points; a single-task neural network trained to perform only image recognition; and a lesioned neural network, which will be identical to the two-task neural network except that the units with greatest representation contribution towards grasp point generation will be deactivated. All networks will be evaluated on performance metrics such as accuracy (evaluated with ImageNet and Stylized-ImageNet), transfer learning, and robustness against distortions. The networks will also be evaluated on representational metrics such as representation contribution analysis and representational similarity analysis.

We expect the two-task network will score higher on performance measures than either the lesioned or single-task network. Additionally, for the two-task network we predict more units will contribute towards grasp point generation than object recognition. Lastly, we expect representations in the two-task network to better reflect human data than the single-task network.

Acknowledgements

This research was supported by the Western Undergraduate Summer Research Internships (USRI) program. Special thanks to Melvyn Goodale, Jody Culham, and Jonathan Michaels for their insightful feedback.

Creative Commons License

Creative Commons Attribution-Share Alike 4.0 License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 License.

Document Type

Poster

Share

COinS
 

Incorporating action information into computational models of the human visual system

Deep convolutional neural networks (DCNNs) have been used to model the ventral visual stream. However, there have been relatively few computational models of the dorsal visual stream, preventing a wholistic understanding of the human visual system. Additionally, current DCNN models of the ventral stream have shortcomings (such as an over-reliance on texture data) which can be ameliorated by incorporating dorsal stream information. The current study aims to investigate two questions: 1) does incorporating action information improve computational models of the ventral visual system? 2) how do the ventral and dorsal streams influence each other during development?

Three models will be created: a two-task neural network trained to both perform object recognition and to generate human grasp points; a single-task neural network trained to perform only image recognition; and a lesioned neural network, which will be identical to the two-task neural network except that the units with greatest representation contribution towards grasp point generation will be deactivated. All networks will be evaluated on performance metrics such as accuracy (evaluated with ImageNet and Stylized-ImageNet), transfer learning, and robustness against distortions. The networks will also be evaluated on representational metrics such as representation contribution analysis and representational similarity analysis.

We expect the two-task network will score higher on performance measures than either the lesioned or single-task network. Additionally, for the two-task network we predict more units will contribute towards grasp point generation than object recognition. Lastly, we expect representations in the two-task network to better reflect human data than the single-task network.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.