Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey

Volodymyr V. Mihunov , Navid H. Jafari , Kejin Wang , Nina S. N. Lam , Dylan Govender

International Journal of Disaster Risk Science ›› 2022, Vol. 13 ›› Issue (5) : 729 -742.

PDF
International Journal of Disaster Risk Science ›› 2022, Vol. 13 ›› Issue (5) : 729 -742. DOI: 10.1007/s13753-022-00442-1
Article

Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey

Author information +
History +
PDF

Abstract

Twitter can supply useful information on infrastructure impacts to the emergency managers during major disasters, but it is time consuming to filter through many irrelevant tweets. Previous studies have identified the types of messages that can be found on social media during disasters, but few solutions have been proposed to efficiently extract useful ones. We present a framework that can be applied in a timely manner to provide disaster impact information sourced from social media. The framework is tested on a well-studied and data-rich case of Hurricane Harvey. The procedures consist of filtering the raw Twitter data based on keywords, location, and tweet attributes, and then applying the latent Dirichlet allocation (LDA) to separate the tweets from the disaster affected area into categories (topics) useful to emergency managers. The LDA revealed that out of 24 topics found in the data, nine were directly related to disaster impacts—for example, outages, closures, flooded roads, and damaged infrastructure. Features such as frequent hashtags, mentions, URLs, and useful images were then extracted and analyzed. The relevant tweets, along with useful images, were correlated at the county level with flood depth, distributed disaster aid (damage), and population density. Significant correlations were found between the nine relevant topics and population density but not flood depth and damage, suggesting that more research into the suitability of social media data for disaster impacts modeling is needed. The results from this study provide baseline information for such efforts in the future.

Keywords

Disaster impacts / Hurricane Harvey / Infrastructure impacts / Latent Dirichlet allocation (LDA) / Social media analysis / Twitter data

Cite this article

Download citation ▾
Volodymyr V. Mihunov, Navid H. Jafari, Kejin Wang, Nina S. N. Lam, Dylan Govender. Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey. International Journal of Disaster Risk Science, 2022, 13(5): 729-742 DOI:10.1007/s13753-022-00442-1

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Alam F, Ofli F, Imran M. Descriptive and visual summaries of disaster events using artificial intelligence techniques: Case studies of Hurricanes Harvey, Irma, and Maria. Behaviour & Information Technology, 2020, 39(3): 288-318

[2]

Albalawi, R., T.H. Yeap, and M. Benyoucef. 2020. Using topic modeling methods for short-text data: A comparative analysis. Frontiers in Artificial Intelligence 3: Article 42.

[3]

Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993-1022.

[4]

Blum A, Hopcroft J, Kannan R. Foundations of data science, 2020, Cambridge: Cambridge University Press

[5]

Cambon, J., D. Hernangómez, C. Belanger, and D. Possenriede. 2021. tidygeocoder: An R package for geocoding. Journal of Open Source Software 6(65): Article 3544.

[6]

Chakkarwar V, Tamane SC. Quick insight of research literature using topic modeling, 2020, Singapore: Springer

[7]

Chen Y, Ji W. Enhancing situational assessment of critical infrastructure following disasters using social media. Journal of Management in Engineering, 2021, 37(6): 04021058

[8]

Cheng X, Yan X, Lan Y, Guo J. BTM: Topic modeling over short texts. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(12): 2928-2941

[9]

Endsley MR. Toward a theory of situation awareness in dynamic systems. Human Factors: The Journal of the Human Factors and Ergonomics Society, 1995, 37(1): 32-64

[10]

Esri. 2021. How the zonal statistics tools work. https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-analyst/how-zonal-statistics-works.htm. Accessed Jan 2022.

[11]

Fan A, Doshi-Velez F, Miratrix L. Assessing topic model relevance: Evaluation and informative priors. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2019, 12(3): 210-222

[12]

Feinerer, I., and K. Hornik. 2020. tm: Text mining package. R package version 0.7-8, https://CRAN.R-project.org/package=tm. Accessed Apr 2021.

[13]

Fellows, I. 2018. wordcloud: Word clouds. https://cran.r-project.org/package=wordcloud. Accessed Aug 2021.

[14]

FEMA (Federal Emergency Management Agency) FEMA—Harvey flood depths grid. HydroShare, 2018

[15]

FEMA (Federal Emergency Management Agency). 2020. OpenFEMA dataset: Registration Intake and Individuals Household Program (RI-IHP) – v1. https://www.fema.gov/openfema-data-page/registration-intake-and-individuals-household-program-ri-ihp-v1. Accessed Nov 2021.

[16]

Ferner, C., C. Havas, E. Birnbacher, S. Wegenkittl, and B. Resch. 2020. Automated seeded latent Dirichlet allocation for social media based event detection and mapping. Information 11(8): Article 376.

[17]

Ford I. Semantic representation of general topology in the wolfram language, 2017, Cham: Springer

[18]

Google. 2022. Geocoding API. https://developers.google.com/maps/documentation/geocoding. Accessed Apr 2021.

[19]

Griffiths TL, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences, 2004, 101(S1): 5228-5235

[20]

Grün B, Hornik K. topicmodels: An R package for fitting topic models. Journal of Statistical Software, 2011, 40(13): 1-30

[21]

Huang Q, Xiao Y. Geographic situational awareness: Mining tweets for disaster preparedness, emergency response, impact, and recovery. ISPRS International Journal of Geo-Information, 2015, 4(3): 1549-1568

[22]

Imran M, Castillo C, Diaz F, Vieweg S. Processing social media messages in mass emergency. ACM Computing Surveys, 2015, 47(4): 1-38

[23]

Jafari, N.H., X. Li, Q. Chen, C.-Y. Le, L.P. Betzer, and Y. Liang. 2021. Real-time water level monitoring using live cameras and computer vision techniques. Computers & Geosciences 147: Article 104642.

[24]

Jamali M, Nejat A, Ghosh S, Jin F, Cao G. Social media data and post-disaster recovery. International Journal of Information Management, 2019, 44: 25-37

[25]

Khan, S.M., M. Chowdhury, L.B. Ngo, and A. Apon. 2020. Multi-class Twitter data categorization and geocoding with a novel computing framework. Cities 96: Article 102410.

[26]

Kryvasheyeu, Y., H. Chen, N. Obradovich, E. Moro, P. Van Hentenryck, J. Fowler, and M. Cebrian. 2016. Rapid assessment of disaster damage using social media activity. Science Advances 2(3): Article e1500779.

[27]

Li, J., K.K. Stephens, Y. Zhu, and D. Murthy. 2019. Using social media to call for help in Hurricane Harvey: Bonding emotion, culture, and community relationships. International Journal of Disaster Risk Reduction 38: Article 101212.

[28]

Lyu, J.C., and G.K. Luli. 2021. Understanding the public discussion about the Centers for Disease Control and Prevention during the COVID-19 pandemic using Twitter data: Text mining analysis study. Journal of Medical Internet Research 23(2): Article e25108.

[29]

Middleton SE, Kordopatis-Zilos G, Papadopoulos S, Kompatsiaris Y. Location extraction from social media. ACM Transactions on Information Systems, 2018, 36(4): 1-27

[30]

Mihunov VV, Lam NSN, Zou L, Wang Z, Wang K. Use of Twitter in disaster rescue: Lessons learned from Hurricane Harvey. International Journal of Digital Earth, 2020, 13: 1454-1466

[31]

Murzintcev, N., and N. Chaney. 2020. ldatuning package. https://CRAN.R-project.org/package=ldatuning. Accessed Apr 2021.

[32]

Phan, X.-H., L.-M. Nguyen, and S. Horiguchi. 2008. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), 21–25 Apr 2008, Beijing, China, 91–100.

[33]

Rinker, T.W. 2013. qdapDictionaries: Dictionaries to accompany the qdap Package. 1.0.7. University at Buffalo, Buffalo, New York, USA. http://github.com/trinker/qdapDictionaries. Accessed Apr 2021.

[34]

Russell SJ, Norvig P. Artificial intelligence: A modern approach, 2010, Pearson: Prentice-Hall

[35]

Samuels R, Taylor JE, Mohammadi N. Silence of the tweets: Incorporating social media activity drop-offs into crisis detection. Natural Hazards, 2020, 103(1): 1455-1477

[36]

Sarkar D. Text analytics with Python: A practical real-world approach to gaining actionable insights from your data, 2016, New York: Apress

[37]

Schofield, A., M. Magnusson, and D. Mimno. 2017. Pulling out the stops: Rethinking stopword removal for topic models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, ed. M. Lapata, P. Blunsom, and A. Koller, 432–436. Valencia, Spain: Association for Computational Linguistics.

[38]

Stanley, S., and C. Arendt. 2020. tidyjson: Tidy complex ‘JSON’. https://cran.r-project.org/package=tidyjson. Accessed Sept 2020.

[39]

U.S. Census Bureau 2013–2017 American community survey 5-year estimates: DP05 ACS demographic and housing estimates, 2021, Washington: U.S. Census Bureau

[40]

U.S. Census Bureau. 2022. Census geocoder documentation. https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/census-geocoder.html. Accessed Jan 2022.

[41]

Wang Z, Ye X. Social media analytics for natural disaster management. International Journal of Geographical Information Science, 2018, 32(1): 49-72

[42]

Wang Z, Ye X. Space, time, and situational awareness in natural hazards: A case study of Hurricane Sandy with social media data. Cartography and Geographic Information Science, 2018, 46(4): 334-346

[43]

Wang Z, Lam NSN, Obradovich N, Ye X. Are vulnerable communities digitally left behind in social responses to natural disasters? An evidence from Hurricane Sandy with Twitter data. Applied Geography, 2019, 108: 1-8

[44]

Wang, K., N.S.N. Lam, L. Zou, and V. Mihunov. 2021. Twitter use in Hurricane Isaac and its implications for disaster resilience. ISPRS International Journal of Geo-Information 10(3): Article 116.

[45]

Watson, K.M., G.R. Harwell, D.S. Wallace, T.L. Welborn, V.G. Stengel, and J.S. McDowell. 2018. Characterization of peak streamflows and flood inundation of selected areas in southeastern Texas and southwestern Louisiana from the August and September 2017 flood resulting from Hurricane Harvey. Scientific Investigations Report 2018-5070. Reston, VA: U.S. Geological Survey.

[46]

Wolfram Research, Inc. 2021. Mathematica, Version 12.3.1. Champaign, IL: Wolfram Research, Inc.

[47]

Xu Z, Lachlan K, Ellis L, Rainear AM. Understanding public opinion in different disaster stages: A case study of Hurricane Irma. Internet Research, 2020, 30(2): 695-709

[48]

Xue, J., J. Chen, C. Chen, C. Zheng, S. Li, and T. Zhu. 2020. Public discourse and sentiment during the COVID 19 pandemic: Using latent Dirichlet allocation for topic modeling on Twitter. PLoS ONE 15(9): Article e0239441.

[49]

Yao, F., and Y. Wang. 2020. Towards resilient and smart cities: A real-time urban analytical and geo-visual system for social media streaming data. Sustainable Cities and Society 63: Article 102448.

[50]

Yuan, F., M. Li, R. Liu, W. Zhai, and B. Qi. 2021. Social media for enhanced understanding of disaster resilience during Hurricane Florence. International Journal of Information Management 57: Article 102289.

[51]

Zou L, Lam NSN, Cai H, Qiang Y. Mining Twitter data for improved understanding of disaster resilience. Annals of the American Association of Geographers, 2018, 108(5): 1422-1441

[52]

Zou L, Lam NSN, Shams S, Cai H, Meyer MA, Yang S, Lee K, Park S-J, Reams MA. Social and geographical disparities in Twitter use during Hurricane Harvey. International Journal of Digital Earth, 2019, 12(11): 1300-1318

AI Summary AI Mindmap
PDF

117

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/