Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey
Volodymyr V. Mihunov , Navid H. Jafari , Kejin Wang , Nina S. N. Lam , Dylan Govender
International Journal of Disaster Risk Science ›› 2022, Vol. 13 ›› Issue (5) : 729 -742.
Disaster Impacts Surveillance from Social Media with Topic Modeling and Feature Extraction: Case of Hurricane Harvey
Twitter can supply useful information on infrastructure impacts to the emergency managers during major disasters, but it is time consuming to filter through many irrelevant tweets. Previous studies have identified the types of messages that can be found on social media during disasters, but few solutions have been proposed to efficiently extract useful ones. We present a framework that can be applied in a timely manner to provide disaster impact information sourced from social media. The framework is tested on a well-studied and data-rich case of Hurricane Harvey. The procedures consist of filtering the raw Twitter data based on keywords, location, and tweet attributes, and then applying the latent Dirichlet allocation (LDA) to separate the tweets from the disaster affected area into categories (topics) useful to emergency managers. The LDA revealed that out of 24 topics found in the data, nine were directly related to disaster impacts—for example, outages, closures, flooded roads, and damaged infrastructure. Features such as frequent hashtags, mentions, URLs, and useful images were then extracted and analyzed. The relevant tweets, along with useful images, were correlated at the county level with flood depth, distributed disaster aid (damage), and population density. Significant correlations were found between the nine relevant topics and population density but not flood depth and damage, suggesting that more research into the suitability of social media data for disaster impacts modeling is needed. The results from this study provide baseline information for such efforts in the future.
Disaster impacts / Hurricane Harvey / Infrastructure impacts / Latent Dirichlet allocation (LDA) / Social media analysis / Twitter data
| [1] |
|
| [2] |
Albalawi, R., T.H. Yeap, and M. Benyoucef. 2020. Using topic modeling methods for short-text data: A comparative analysis. Frontiers in Artificial Intelligence 3: Article 42. |
| [3] |
|
| [4] |
|
| [5] |
Cambon, J., D. Hernangómez, C. Belanger, and D. Possenriede. 2021. tidygeocoder: An R package for geocoding. Journal of Open Source Software 6(65): Article 3544. |
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
Esri. 2021. How the zonal statistics tools work. https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-analyst/how-zonal-statistics-works.htm. Accessed Jan 2022. |
| [11] |
|
| [12] |
Feinerer, I., and K. Hornik. 2020. tm: Text mining package. R package version 0.7-8, https://CRAN.R-project.org/package=tm. Accessed Apr 2021. |
| [13] |
Fellows, I. 2018. wordcloud: Word clouds. https://cran.r-project.org/package=wordcloud. Accessed Aug 2021. |
| [14] |
FEMA (Federal Emergency Management Agency) FEMA—Harvey flood depths grid. HydroShare, 2018 |
| [15] |
FEMA (Federal Emergency Management Agency). 2020. OpenFEMA dataset: Registration Intake and Individuals Household Program (RI-IHP) – v1. https://www.fema.gov/openfema-data-page/registration-intake-and-individuals-household-program-ri-ihp-v1. Accessed Nov 2021. |
| [16] |
Ferner, C., C. Havas, E. Birnbacher, S. Wegenkittl, and B. Resch. 2020. Automated seeded latent Dirichlet allocation for social media based event detection and mapping. Information 11(8): Article 376. |
| [17] |
|
| [18] |
Google. 2022. Geocoding API. https://developers.google.com/maps/documentation/geocoding. Accessed Apr 2021. |
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
Jafari, N.H., X. Li, Q. Chen, C.-Y. Le, L.P. Betzer, and Y. Liang. 2021. Real-time water level monitoring using live cameras and computer vision techniques. Computers & Geosciences 147: Article 104642. |
| [24] |
|
| [25] |
Khan, S.M., M. Chowdhury, L.B. Ngo, and A. Apon. 2020. Multi-class Twitter data categorization and geocoding with a novel computing framework. Cities 96: Article 102410. |
| [26] |
Kryvasheyeu, Y., H. Chen, N. Obradovich, E. Moro, P. Van Hentenryck, J. Fowler, and M. Cebrian. 2016. Rapid assessment of disaster damage using social media activity. Science Advances 2(3): Article e1500779. |
| [27] |
Li, J., K.K. Stephens, Y. Zhu, and D. Murthy. 2019. Using social media to call for help in Hurricane Harvey: Bonding emotion, culture, and community relationships. International Journal of Disaster Risk Reduction 38: Article 101212. |
| [28] |
Lyu, J.C., and G.K. Luli. 2021. Understanding the public discussion about the Centers for Disease Control and Prevention during the COVID-19 pandemic using Twitter data: Text mining analysis study. Journal of Medical Internet Research 23(2): Article e25108. |
| [29] |
|
| [30] |
|
| [31] |
Murzintcev, N., and N. Chaney. 2020. ldatuning package. https://CRAN.R-project.org/package=ldatuning. Accessed Apr 2021. |
| [32] |
Phan, X.-H., L.-M. Nguyen, and S. Horiguchi. 2008. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In Proceedings of the 17th International World Wide Web Conference (WWW 2008), 21–25 Apr 2008, Beijing, China, 91–100. |
| [33] |
Rinker, T.W. 2013. qdapDictionaries: Dictionaries to accompany the qdap Package. 1.0.7. University at Buffalo, Buffalo, New York, USA. http://github.com/trinker/qdapDictionaries. Accessed Apr 2021. |
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
Schofield, A., M. Magnusson, and D. Mimno. 2017. Pulling out the stops: Rethinking stopword removal for topic models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, ed. M. Lapata, P. Blunsom, and A. Koller, 432–436. Valencia, Spain: Association for Computational Linguistics. |
| [38] |
Stanley, S., and C. Arendt. 2020. tidyjson: Tidy complex ‘JSON’. https://cran.r-project.org/package=tidyjson. Accessed Sept 2020. |
| [39] |
U.S. Census Bureau 2013–2017 American community survey 5-year estimates: DP05 ACS demographic and housing estimates, 2021, Washington: U.S. Census Bureau |
| [40] |
U.S. Census Bureau. 2022. Census geocoder documentation. https://www.census.gov/programs-surveys/geography/technical-documentation/complete-technical-documentation/census-geocoder.html. Accessed Jan 2022. |
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
Wang, K., N.S.N. Lam, L. Zou, and V. Mihunov. 2021. Twitter use in Hurricane Isaac and its implications for disaster resilience. ISPRS International Journal of Geo-Information 10(3): Article 116. |
| [45] |
Watson, K.M., G.R. Harwell, D.S. Wallace, T.L. Welborn, V.G. Stengel, and J.S. McDowell. 2018. Characterization of peak streamflows and flood inundation of selected areas in southeastern Texas and southwestern Louisiana from the August and September 2017 flood resulting from Hurricane Harvey. Scientific Investigations Report 2018-5070. Reston, VA: U.S. Geological Survey. |
| [46] |
Wolfram Research, Inc. 2021. Mathematica, Version 12.3.1. Champaign, IL: Wolfram Research, Inc. |
| [47] |
|
| [48] |
Xue, J., J. Chen, C. Chen, C. Zheng, S. Li, and T. Zhu. 2020. Public discourse and sentiment during the COVID 19 pandemic: Using latent Dirichlet allocation for topic modeling on Twitter. PLoS ONE 15(9): Article e0239441. |
| [49] |
Yao, F., and Y. Wang. 2020. Towards resilient and smart cities: A real-time urban analytical and geo-visual system for social media streaming data. Sustainable Cities and Society 63: Article 102448. |
| [50] |
Yuan, F., M. Li, R. Liu, W. Zhai, and B. Qi. 2021. Social media for enhanced understanding of disaster resilience during Hurricane Florence. International Journal of Information Management 57: Article 102289. |
| [51] |
|
| [52] |
|
/
| 〈 |
|
〉 |