Background: Naturalistic stimuli, such as videos, can elicit complex brain activations. However, the intricate nature of these stimuli makes it challenging to attribute specific brain functions to the resulting activations, particularly for higher-level processes such as social interactions.
Objective: We hypothesized that activations in different layers of a convolutional neural network (VGG-16) would correspond to varying levels of brain activation, reflecting the brain's visual processing hierarchy. Additionally, we aimed to explore which brain regions would be linked to the deeper layers of the network.
Methods: This study analyzed functional MRI data from participants watching a cartoon video. Using a pre-trained VGG-16 convolutional neural network, we mapped hierarchical features of the video to different levels of brain activation. Activation maps from various kernels and layers were extracted from video frames, and the time series of average activation patterns for each kernel were used in a voxel-wise model to examine brain responses.
Results: Lower layers of the network were primarily associated with activations in lower visual regions, although some kernels also unexpectedly showed associations with the posterior cingulate cortex. Deeper layers were linked to more anterior and lateral regions of the visual cortex, as well as the supramarginal gyrus.
Conclusions: This analysis demonstrated both the potential and limitations of using convolutional neural networks to connect video content with brain functions, providing valuable insights into how different brain regions respond to varying levels of visual processing.
Author contributions
Wonbum Sohn (Conceptualization, Formal analysis, Investigation, Software, Visualization, Writing - original draft), Xin Di (Conceptualization, Funding acquisition, Project administration, Software, Writing - review & editing), Zhen Liang (Writing - review & editing), Zhiguo Zhang (Writing - review & editing), and Bharat B. Biswal (Funding acquisition, Project administration, Resources, Supervision, Writing - review & editing)
Conflict of interest
One of the authors, Bharat B. Biswal, is also the associate editor of Psychoradiology. He was blinded from reviewing or making decisions on the manuscript.
Acknowledgements
This work was supported by (US) National Institute of Mental Health grants to Xin Di (R15MH125332) and Bharat B. Biswal (R01MH131335).
Data and code availability
The fMRI data used in this study is public data sets and is available on openneuro (
https://openneuro.org/ accession #: ds000228). The codes in this study are available upon a reasonable request to the corresponding author.
| [1] |
Bartels A, Zeki S, Logothetis NK (2008) Natural vision reveals regional specialization to local motion and to contrast-invariant, global flow in the human brain. Cereb Cortex 18:705-17.
|
| [2] |
Brandman T, Malach R, Simony E (2021) The surprising role of the default mode network in naturalistic perception. Commun Biol 4: 1-9.
|
| [3] |
Buckner RL, DiNicola LM (2019) The brain's default network: updated anatomy, physiology and evolving insights. Nat Rev Neurosci 20:593-608.
|
| [4] |
Çelik E, Keles U, Kiremitçi İ, et al. (2021) Cortical networks of dynamic scene category representation in the human brain. Cortex 143:127-47.
|
| [5] |
Chen P-HA, Jolly E, Cheong JH, et al. (2020) Intersubject representational similarity analysis reveals individual variations in affective experience when watching erotic movies. Neuroimage 216: 116851.
|
| [6] |
Deng J, Dong W, Socher R, et al. (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA: IEEE, 248-55.
|
| [7] |
Di X, Biswal BB (2015) Characterizations of resting-state modulatory interactions in the human brain. J Neurophysiol 114: 2785-96.
|
| [8] |
Di X, Biswal BB (2020) Intersubject consistent dynamic connectivity during natural vision revealed by functional MRI. Neuroimage 216:116698.
|
| [9] |
Di X, Zhang Z, Xu T, et al. (2022) Dynamic and stationary brain connectivity during movie watching as revealed by functional MRI. Brain Struct Funct 227:2299-312,
|
| [10] |
Friston KJ, Williams S, Howard R, et al. (1996) Movement-related effects in fMRI time-series: movement artifacts in fMRI. Magn Reson Med 35:346-55.
|
| [11] |
Grill-Spector K, Kourtzi Z, Kanwisher N (2001) The lateral occipital complex and its role in object recognition. Vision Res 41: 1409-22.
|
| [12] |
Han S, Jiang Y, Humphreys GW, et al. (2005) Distinct neural substrates for the perception of real and virtual visual worlds. Neuroimage 24:928-35.
|
| [13] |
Hasson U, Nir Y, Levy I, et al. (2004) Intersubject synchronization of cortical activity during natural vision. Science 303:1634-40.
|
| [14] |
Hu W, Zhang Z, Zhao H, et al. (2023) EEG microstate correlates of emotion dynamics and stimulation content during video watching. Cereb Cortex 33:523-42.
|
| [15] |
Jiahui G, Feilong M, Visconti di Oleggio Castello M, et al. (2022) Not so fast: limited validity of deep convolutional neural networks as in silico models for human naturalistic face processing. J Vision 22:3714.
|
| [16] |
Kaefer K, Stella F, McNaughton BL, et al. (2022) Replay, the default mode network and the cascaded memory systems model. Nat Rev Neurosci 23:628-40.
|
| [17] |
Kriegeskorte N, Mur M, Bandettini P (2008) Representational similarity analysis-connecting the branches of systems neuroscience. Front Syst Neuroscience 2:4.
|
| [18] |
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60:84-90.
|
| [19] |
Lamm C, Decety J, Singer T (2011) Meta-analytic evidence for common and distinct neural networks associated with directly experienced pain and empathy for pain. Neuroimage 54:2492-502.
|
| [20] |
Malach R, Reppas JB, Benson RR, et al. (1995) Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. Proc Natl Acad Sci USA 92:8135-9.
|
| [21] |
Margulies DS, Ghosh SS, Goulas A, et al. (2016) Situating the defaultmode network along a principal gradient of macroscale cortical organization. Proc Natl Acad Sci USA 113:12574-9.
|
| [22] |
McMahon E, Bonner MF, Isik L (2023) Hierarchical organization of social action features along the lateral visual pathway. Curr Biol 33:5035-5047.e8. e8.
|
| [23] |
Nastase SA Gazzola V Hasson U Keysers C (2019) Measuring shared responses across subjects using intersubject correlation.. Soc Cogn Affect Neurosci 14:667-85.
|
| [24] |
Raichle ME, MacLeod AM, Snyder AZ, et al. (2001) A default mode of brain function. Proc Natl Acad Sci USA 98:676-82.
|
| [25] |
Rao H, Wang J, Tang K, et al. (2007) Imaging brain activity during natural vision using CASL perfusion fMRI. Hum Brain Mapp 28: 593-601.
|
| [26] |
Raz G, Winetraub Y, Jacob Y, et al. (2012) Portraying emotions at their unfolding: a multilayered approach for probing dynamics of neural networks. Neuroimage 60:1448-61.
|
| [27] |
Richardson H, Lisandrelli G, Riobueno-Naylor A, et al. (2018) Development of the social brain from age three to twelve years. Nat Commun 9:1027.
|
| [28] |
Silani G, Lamm C, Ruff CC, et al. (2013) Right supramarginal gyrus is crucial to overcome emotional egocentricity bias in social judgments. J Neurosci 33:15466-76.
|
| [29] |
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations (ICLR 2015), Computational and Biological Learning Society, 1-14.
|
| [30] |
Sun Y, Ma J, Huang M, et al. (2022) Functional connectivity dynamics as a function of the fluctuation of tension during film watching. Brain Imag Behav 16:1260-74.
|
| [31] |
Zeiler MD, Fergus R. (2014) Visualizing and understanding convolutional networks. In: D Fleet, T Pajdla, B Schiele, T Tuytelaars (eds). Computer Vision-ECCV 2014. Lecture Notes in Computer Science Chamonix: Springer International Publishing. p. 818-33.
|