Post-event residential building damage assessment supports reconnaissance, screening, and recovery planning following major hazard events. Recent hurricanes have generated large volumes of satellite, aerial, and street-level imagery, creating challenges not of data scarcity but of integrating heterogeneous visual and contextual information into consistent and reviewable damage classifications. This study presents a multimodal deep-learning framework, the Multimodal Swin Transformer (MMST), that combines street-view imagery with structured building and hazard attributes to classify post-hurricane residential building damage. The model is evaluated using a curated dataset derived from extensive Structural Extreme Events Reconnaissance (StEER) field reconnaissance following Hurricane Ian (2022), which incorporates human interpretation, quality control, and selective sampling across impacted communities. Results show that integrating visual features with contextual information such as building age, building value, and wind speed improves classification performance relative to image-only baselines, achieving an accuracy of 92.67%. Attention-based visualizations further enable post-hoc inspection of image regions the model weighted most heavily, supporting qualitative review of model behavior rather than physical interpretation. The proposed MMST serves as a decision-support tool to augment reconnaissance workflows and enhance the continuity of critical community infrastructure in future hurricanes.
| [1] |
Smith A (2021) US billion-dollar weather and climate disasters in historical context. NOAA National Centers for Environmental Information (NCEI). https://doi.org/10.25921/stkw-7w73. https://www.ncei.noaa.gov/access/billions/. Accessed 16 Apr 2026
|
| [2] |
Lee County BoCC (2023) Ian progress report. https://ianprogress.leegov.com/. Accessed 5 Jan 2024
|
| [3] |
Wang C, Liu Y, Zhang X, Li X, Paramygin V, Sheng P, Zhao X, Xu S. Scalable and rapid building damage detection after Hurricane Ian using causal bayesian networks and insar imagery. Int J Disaster Risk Reduct, 2024
|
| [4] |
Zhou Z, Gong J, Hu X. Community-scale multi-level post-hurricane damage assessment of residential buildings using multi-temporal airborne lidar data. Autom Constr, 2019, 98: 30-45
|
| [5] |
He S, Liao Y, Sun PP, Zhang R. Deep learning enabled seismic fragility evaluation of structures subjected to mainshock-aftershock earthquakes. Urban Lifeline, 2024, 2(1): 2
|
| [6] |
Liu L, Gong MS, Xie LL. Research on evaluating effect of lifeline system for city’s ability in reducing earthquake disasters. Adv Mater Res, 2014, 838: 1526-1529
|
| [7] |
Chen X (2020) Using satellite imagery to automate building damage assessment: a case study of the xBD dataset. In: Proceedings of the Institute of Industrial and Systems Engineers (IISE) Annual Conference
|
| [8] |
Duarte D, Nex F, Kerle N, Vosselman G. Satellite image classification of building damages using airborne and satellite image samples in a deep learning approach. ISPRS Ann Photogramm Remote Sens Spat Inf Sci, 2018, 4: 89-96
|
| [9] |
Yu M, Yang C, Li Y. Big data in natural disaster management: a review. Geosciences, 2018, 8(5 165
|
| [10] |
Kerle N. Satellite-based damage mapping following the 2006 Indonesia earthquake–how accurate was it?. Int J Appl Earth Obs Geoinf, 2010, 12(6): 466-476
|
| [11] |
Zou S, Wang L. Detecting individual abandoned houses from Google street view: a hierarchical deep learning approach. ISPRS J Photogramm Remote Sens, 2021, 175: 298-310
|
| [12] |
Wang C, Antos SE, Triveno LM. Automatic detection of unreinforced masonry buildings from street view images using deep learning-based image segmentation. Autom Constr, 2021, 132 103968
|
| [13] |
Zhai W, Peng ZR. Damage assessment using Google street view: evidence from hurricane michael in Mexico beach, Florida. Appl Geogr, 2020, 123 102252
|
| [14] |
Hong Z, Zhong H, Pan H, Liu J, Zhou R, Zhang Y, Han Y, Wang J, Yang S, Zhong C. Classification of building damage using a novel convolutional neural network based on post-disaster aerial images. Sensors, 2022, 22(15 5920
|
| [15] |
Seydi ST, Rastiveis H, Kalantar B, Halin AA, Ueda N. Bdd-net: an end-to-end multiscale residual cnn for earthquake-induced building damage detection. Remote Sens, 2022, 14(9 2214
|
| [16] |
Sriwong K, Kerdprasop K, Kerdprasop N. The study of noise effect on cnn-based deep learning from medical images. Int J Mach Learn Comput, 2021, 11: 202-207
|
| [17] |
Chen H, Nemni E, Vallecorsa S, Li X, Wu C, Bromley L (2022) Dual-tasks siamese transformer framework for building damage assessment. In: IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, IEEE, pp 1600–1603
|
| [18] |
Kaur N, Lee CC, Mostafavi A, Mahdavi-Amiri A. Large-scale building damage assessment using a novel hierarchical transformer architecture on satellite images. Comput Aided Civ Infrastruct Eng, 2023
|
| [19] |
Cheng CS, Behzadan AH, Noshadravan A. Deep learning for post-hurricane aerial damage assessment of buildings. Comput Aided Civ Infrastruct Eng, 2021, 36(6): 695-710
|
| [20] |
Hao H, Wang Y. Leveraging multimodal social media data for rapid disaster damage assessment. Int J Disaster Risk Reduct, 2020, 51 101760
|
| [21] |
Chen J, Tang H, Ge J, Pan Y. Rapid assessment of building damage using multi-source data: a case study of April 2015 Nepal earthquake. Remote Sens, 2022, 14(6 1358
|
| [22] |
Wang C, Liu Y, Zhang X, Li X, Paramygin V, Subgranon A, Sheng P, Zhao X, Xu S (2023) Causality-informed rapid post-hurricane building damage detection in large scale from insar imagery. In: Proceedings of the 8th ACM SIGSPATIAL International Workshop on Security Response using GIS. Association for Computing Machinery, New York, pp 7–12
|
| [23] |
Al Shafian S, Hu D. Integrating machine learning and remote sensing in disaster management: a decadal review of post-disaster building damage assessment. Buildings, 2024, 14(8 2344
|
| [24] |
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE, New York, pp 10012–10022. https://doi.org/10.1109/ICCV48922.2021.00986
|
| [25] |
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008. Curran Associates, Inc., Red Hook, NY. https://papers.neurips.cc/paper/7181-attention-is-all-you-need
|
| [26] |
Xu S, Dimasaka J, Wald DJ, Noh HY. Seismic multi-hazard and impact estimation via causal inference from satellite imagery. Nat Commun, 2022, 13(1 7793
|
| [27] |
Wang Y, Chew AWZ, Zhang L. Building damage detection from satellite images after natural disasters on extremely imbalanced datasets. Autom Constr, 2022, 140 104328
|
| [28] |
Kerle N, Hoffman RR. Collaborative damage mapping for emergency response: the role of cognitive systems engineering. Nat Hazard, 2013, 13(1): 97-113
|
| [29] |
Lei X, Liu C, Li L, Wang G. Automated pavement distress detection and deterioration analysis using street view map. IEEE Access, 2020, 8: 76163-76172
|
| [30] |
Meunpong P, Buathong S, Kaewgrajang T. Google street view virtual survey and in-person field surveys: an exploratory comparison of urban tree risk assessment. Arboricultural J, 2019, 41(4): 226-236
|
| [31] |
Khajwal AB, Cheng CS, Noshadravan A (2022) Multi-view deep learning for reliable post-disaster damage classification. Preprint at https://arxiv.org/abs/2208.03419
|
| [32] |
Cheng Z, Gong W, Tang H, Juang CH, Deng Q, Chen J, Ye X. Uav photogrammetry-based remote sensing and preliminary assessment of the behavior of a landslide in Guizhou, China. Eng Geol, 2021, 289 106172
|
| [33] |
Kerle N, Nex F, Gerke M, Duarte D, Vetrivel A. Uav-based structural damage mapping: a review. ISPRS Int J Geo-Inf, 2019, 9(1 14
|
| [34] |
Erdelj M, Natalizio E, Chowdhury KR, Akyildiz IF. Help from the sky: leveraging uavs for disaster management. IEEE Pervasive Comput, 2017, 16(1): 24-32
|
| [35] |
Kumar A, Singh JP. Location reference identification from tweets during emergencies: a deep learning approach. Int J Disaster Risk Reduct, 2019, 33: 365-375
|
| [36] |
Alam F, Ofli F, Imran M. Processing social media images by combining human and machine computing during crises. Int J Hum-Comput Interact, 2018, 34(4): 311-327
|
| [37] |
He S, Wang S, Zhang R. A generalizable gated graph recurrent unit (graph-gru) network for nonlinear response prediction of cross-structures. Comput Struct, 2025, 318 107968
|
| [38] |
Giardina G, Macchiarulo V, Foroughnia F, Jones JN, Whitworth MR, Voelker B, Milillo P, Penney C, Adams K, Kijewski-Correa T. Combining remote sensing techniques and field surveys for post-earthquake reconnaissance missions. Bull Earthq Eng, 2024, 22(7): 3415-3439
|
| [39] |
Kang J, Körner M, Wang Y, Taubenböck H, Zhu XX. Building instance classification using street view images. ISPRS J Photogramm Remote Sens, 2018, 145: 44-59
|
| [40] |
Pinelli JP, Roueche D, Kijewski-Correa T, Plaz F, Prevatt D, Zisis I, Elawady A, Haan F, Pei S, Gurley K et al (2018) Overview of damage observed in regional construction during the passage of hurricane irma over the state of Florida. In: Eighth Congress on Forensic Engineering, American Society of Civil Engineers Reston, VA, pp 1028–1038
|
| [41] |
Sextos A, De Risi R, Pagliaroli A, Foti S, Passeri F, Ausilio E, Cairo R, Capatti MC, Chiabrando F, Chiaradonna A, et al.. Local site effects and incremental damage of buildings during the 2016 central Italy earthquake sequence. Earthquake Spectra, 2018, 34(4): 1639-1669
|
| [42] |
Wang C, Antos SE, Gosling-Goldsmith JG, Triveno LM, Zhu C, von Meding J, Ye X. Assessing climate disaster vulnerability in Peru and Colombia using street view imagery: a pilot study. Buildings, 2023, 14(1 14
|
| [43] |
Lin Q, Ci T, Wang L, Mondal SK, Yin H, Wang Y. Transfer learning for improving seismic building damage assessment. Remote Sens, 2022, 14(1 201
|
| [44] |
Li X, Caragea D, Zhang H, Imran M (2018) Localizing and quantifying damage in social media images. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, pp 194–201
|
| [45] |
Sathianarayanan M, Hsu PH, Chang CC. Extracting disaster location identification from social media images using deep learning. Int J Disaster Risk Reduct, 2024, 104 104352
|
| [46] |
Song X, Li D, Cho C. Image-based machine learning approach for structural damage detection through wavelet transforms. Urban Lifeline, 2024, 2(1): 4
|
| [47] |
Nguyen DT, Ofli F, Imran M, Mitra P (2017) Damage assessment from social media imagery data during disasters. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017. Association for Computing Machinery (ACM), New York, pp 569–576
|
| [48] |
Zhang R, Liu Y, Sun H. Physics-guided convolutional neural network (phycnn) for data-driven seismic response modeling. Eng Struct, 2020, 215 110704
|
| [49] |
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) Transunet: Transformers make strong encoders for medical image segmentation. Preprint at https://arxiv.org/abs/2102.04306
|
| [50] |
Zhang R, He S, Liao Y, Sun Z. Recurrent transformer for rapid assessment of structural seismic resilience under mainshock-aftershock earthquakes. Eng Struct, 2025, 335 120236
|
| [51] |
Guo J, Liu P, Xiao B, Deng L, Wang Q. Surface defect detection of civil structures using images: review from data perspective. Autom Constr, 2024, 158 105186
|
| [52] |
Bazi Y, Bashmal L, Rahhal MMA, Dayil RA, Ajlan NA. Vision transformers for remote sensing image classification. Remote Sens, 2021, 13(3 516
|
| [53] |
Yu Z, Yu J, Fan J, Tao D (2017) Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of the IEEE international conference on computer vision. IEEE, New York, pp 1821–1830
|
| [54] |
Kim JH, Jun J, Zhang BT (2018) Bilinear attention networks. Adv Neural Inf Process Syst 31:1564–1574. Curran Associates, Inc., Red Hook, NY. https://papers.neurips.cc/paper/7429-bilinear-attention-networks
|
| [55] |
Yu Z, Wu F, Yang Y, Tian Q, Luo J, Zhuang Y (2014) Discriminative coupled dictionary hashing for fast cross-media retrieval. In: Proceedings of the 37th international ACM SIGIR conference on Research and development in information retrieval. Association for Computing Machinery (ACM), New York, pp 395–404
|
| [56] |
Anderson P, He X, Buehler C, Teney D, Johnson M, Gould S, Zhang L (2018) Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE, New York, pp 6077–6086
|
| [57] |
Baltrušaitis T, Ahuja C, Morency LP. Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell, 2018, 41(2): 423-443
|
| [58] |
Wang T, Tao Y, Chen SC, Shyu ML (2020) Multi-task multimodal learning for disaster situation assessment. In: 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), IEEE, pp 209–212
|
| [59] |
Xu P, Zhu X, Clifton DA. Multimodal learning with transformers: a survey. IEEE Trans Pattern Anal Mach Intell, 2023, 45(10): 12113-12132
|
| [60] |
Bayoudh K, Knani R, Hamdaoui F, Mtibaa A. A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets. Vis Comput, 2022, 38(8): 2939-2970
|
| [61] |
Barnum G, Talukder S, Yue Y (2020) On the benefits of early fusion in multimodal representation learning. Preprint at https://arxiv.org/abs/2011.07191
|
| [62] |
Imran J, Raman B. Evaluating fusion of rgb-d and inertial sensors for multimodal human action recognition. J Ambient Intell Humaniz Comput, 2020, 11(1): 189-208
|
| [63] |
Lin K, Li L, Lin CC, Ahmed F, Gan Z, Liu Z, Lu Y, Wang L (2022) Swinbert: End-to-end transformers with sparse attention for video captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, New York, pp 17949–17958
|
| [64] |
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang ZH, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: Training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE, New York, pp 558–567
|
| [65] |
Bao H, Dong L, Piao S, Wei F (2021) Beit: Bert pre-training of image transformers. Preprint at https://arxiv.org/abs/2106.08254
|
| [66] |
Guo J, Jia N, Bai J. Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image. Sci Rep, 2022, 12(1 15473
|
| [67] |
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, PMLR, pp 2048–2057
|
| [68] |
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. Preprint at https://arxiv.org/abs/1508.04025
|
| [69] |
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. Association for Computational Linguistics, Stroudsburg, PA, pp 1480–1489
|
| [70] |
Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE/CVF international conference on computer vision. IEEE, New York, pp 3286–3295
|
| [71] |
Jetley S, Lord NA, Lee N, Torr PH (2018) Learn to pay attention. Preprint at https://arxiv.org/abs/1804.02391
|
| [72] |
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS. Multimodal fusion for multimedia analysis: a survey. Multimedia Syst, 2010, 16: 345-379
|
| [73] |
Nagrani A, Yang S, Arnab A, Jansen A, Schmid C, Sun C. Attention bottlenecks for multimodal fusion. Adv Neural Inf Process Syst, 2021, 34: 14200-14213
|
| [74] |
Vielzeuf V, Lechervy A, Pateux S, Jurie F (2018) Centralnet: a multilayer approach for multimodal fusion. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops. Springer, Cham, Switzerland, pp 575–589. https://doi.org/10.1007/978-3-030-11024-6_44
|
| [75] |
Arevalo J, Solorio T, Montes-y Gómez M, González FA (2017) Gated multimodal units for information fusion. Preprint at https://arxiv.org/abs/1702.01992
|
| [76] |
Kijewski-Correa T, Prevatt D, Roueche D, Robertson I, Alam M, Safiey A, Zisis I, Lafontaine O, Nofal O, Rhode-Barbarigos L, Subgranon A, Faraone D, Micali J, Santiago-Hernández JX, Agdas D (2023) Steer: Hurricane ian early access reconnaissance report (earr). https://doi.org/10.17603/DS2-3PC2-7P82
|
| [77] |
Kijewski-Correa T, Roueche DB, Mosalam KM, Prevatt DO, Robertson I. Steer: a community-centered approach to assessing the performance of the built environment after natural hazard events. Front Built Environ, 2021, 7 636197
|
| [78] |
Cortes M, Arora P, Ceferino L, Ibrahim H, Istrati D, Reed D, Roueche D, Safiey A, Tomiczek T, Zisis I, Alam M, Kijewski-Correa T, Prevatt D, Robertson I (2022) Steer: Hurricane ian preliminary virtual reconnaissance report (pvrr). https://doi.org/10.17603/DS2-KC9K-S242
|
| [79] |
Kijewski-Correa T, Mosalam K, Prevatt D, Robertson I, Roueche D (2019) Field assessment structural team (fast) handbook, version 1.2. Struct Eng Extreme Events Reconnaissance
|
| [80] |
Vickery PJ, Skerlj PF, Lin J, Twisdale JL, Young MA, Lavelle FM. Hazus-mh hurricane model methodology. II: damage and loss estimation. Nat Hazards Rev, 2006, 7(2): 94-103
|
| [81] |
Krippendorff K. Content analysis: An introduction to its methodology, 2018Sage Publications
|
| [82] |
Jayati S, Choi E, Burton H, Newsam S (2024) Leveraging large multimodal models to augment image-based building damage assessment. In: Proceedings of the 7th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, pp 79–85
|
| [83] |
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Preprint at https://arxiv.org/abs/1409.1556
|
| [84] |
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
|
| [85] |
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, Brookline, MA, pp 6105–6114
|
| [86] |
Liu Z, Mao H, Wu CY, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11976–11986
|
| [87] |
Tolstikhin IO, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J, et al.. Mlp-mixer: An all-mlp architecture for vision. Adv Neural Inf Process Syst, 2021, 34: 24261-24272
|
| [88] |
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11534–11542
|
| [89] |
Gorodkin J. Comparing two k-category assignments by a k-category correlation coefficient. Comput Biol Chem, 2004, 28(5–6): 367-374
|
| [90] |
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genomics, 2020, 21(1): 1-13
|
| [91] |
Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
|
| [92] |
Kocsis P, Súkeník P, Brasó G, Nießner M, Leal-Taixé L, Elezi I. The unreasonable effectiveness of fully-connected layers for low-data regimes. Adv Neural Inf Process Syst, 2022, 35: 1896-1908
|
| [93] |
He X, Hooi B, Laurent T, Perold A, LeCun Y, Bresson X (2023) A generalization of vit/mlp-mixer to graphs. In: International Conference on Machine Learning, PMLR, pp 12724–12745
|
| [94] |
Meyes R, Lu M, de Puiseau CW, Meisen T (2019) Ablation studies in artificial neural networks. Preprint at https://arxiv.org/abs/1901.08644
|
| [95] |
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
|
| [96] |
Egnew AC, Roueche DB, Prevatt DO. Linking building attributes and tornado vulnerability using a logistic regression model. Nat Hazard Rev, 2018, 19(4 04018017
|
| [97] |
Arkin E, Yadikar N, Xu X, Aysa A, Ubul K. A survey: object detection methods from CNN to transformer. Multimed Tools Appl, 2023, 82(14): 21353-21383
|
| [98] |
Wang X, Han Y, Leung VC, Niyato D, Yan X, Chen X. Convergence of edge computing and deep learning: a comprehensive survey. IEEE Commun Surv Tutor, 2020, 22(2): 869-904
|
| [99] |
Smilkov D, Thorat N, Kim B, Viégas F, Wattenberg M (2017) Smoothgrad: removing noise by adding noise. Preprint at https://arxiv.org/abs/1706.03825
|
| [100] |
Xiong C, Zheng J, Xu L, Cen C, Zheng R, Li Y. Multiple-input convolutional neural network model for large-scale seismic damage assessment of reinforced concrete frame buildings. Appl Sci, 2021, 11(17): 8258
|
| [101] |
Ci T, Liu Z, Wang Y. Assessment of the degree of building damage caused by disaster using convolutional neural networks in combination with ordinal regression. Remote Sens, 2019, 11(23 2858
|
| [102] |
Lu J, Liang B, Lei Q, Li X, Liu J, Liu J, Xu J, Wang W. Scueu-net: efficient damage detection method for railway rail. IEEE Access, 2020, 8: 125109-125120
|
| [103] |
Fritz M. Tumor evolution models of phase-field type with nonlocal effects and angiogenesis. Bull Math Biol, 2023, 85(6 44
|
Funding
National Science Foundation(2303578)
Gulf Research Program
RIGHTS & PERMISSIONS
The Author(s)