Deep Convolutional Neural Networks (CNNs) are gaining traction as the benchmark model of visual object recognition, with performance now surpassing humans. While CNNs can accurately assign one image to potentially thousands of categories, network performance could be the result of layers that are tuned to represent the visual shape of objects, rather than object category, since both are often confounded in natural images. Using two stimulus sets that explicitly dissociate shape from category, we correlate these two types of information with each layer of multiple CNNs. We also compare CNN output with fMRI activation along the human visual ventral stream by correlating artificial with neural representations. We find that CNNs encode category information independently from shape, peaking at the final fully connected layer in all tested CNN architectures. Comparing CNNs with fMRI brain data, early visual cortex (V1) and early layers of CNNs encode shape information. Anterior ventral temporal cortex encodes category information, which correlates best with the final layer of CNNs. The interaction between shape and category that is found along the human visual ventral pathway is echoed in multiple deep networks. Our results suggest CNNs represent category information independently from shape, much like the human visual system.
Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual Cortex / Zeman, A. A.; Ritchie, J. B.; Bracci, S.; Op de Beeck, H.. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - ELETTRONICO. - 10:1(2020), pp. 245301-245312. [10.1038/s41598-020-59175-0]
Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual Cortex
Bracci S.;
2020-01-01
Abstract
Deep Convolutional Neural Networks (CNNs) are gaining traction as the benchmark model of visual object recognition, with performance now surpassing humans. While CNNs can accurately assign one image to potentially thousands of categories, network performance could be the result of layers that are tuned to represent the visual shape of objects, rather than object category, since both are often confounded in natural images. Using two stimulus sets that explicitly dissociate shape from category, we correlate these two types of information with each layer of multiple CNNs. We also compare CNN output with fMRI activation along the human visual ventral stream by correlating artificial with neural representations. We find that CNNs encode category information independently from shape, peaking at the final fully connected layer in all tested CNN architectures. Comparing CNNs with fMRI brain data, early visual cortex (V1) and early layers of CNNs encode shape information. Anterior ventral temporal cortex encodes category information, which correlates best with the final layer of CNNs. The interaction between shape and category that is found along the human visual ventral pathway is echoed in multiple deep networks. Our results suggest CNNs represent category information independently from shape, much like the human visual system.File | Dimensione | Formato | |
---|---|---|---|
Zeman-2020-Orthogonal-representations-of-objec.pdf
accesso aperto
Descrizione: articolo principale
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
2.08 MB
Formato
Adobe PDF
|
2.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione