Bolstering integrity in environmental data science and machine learning requires understanding socioecological inequity
Joe F. Bozeman III
Bolstering integrity in environmental data science and machine learning requires understanding socioecological inequity
● Socioecological inequity must be understood to improve environmental data science.
● The Systemic Equity Framework and Wells-Du Bois Protocol mitigate inequity.
● Addressing irreproducibility in machine learning is vital for bolstering integrity.
● Future directions include policy enforcement and systematic programming.
Socioecological inequity in environmental data science—such as inequities deriving from data-driven approaches and machine learning (ML)—are current issues subject to debate and evolution. There is growing consensus around embedding equity throughout all research and design domains—from inception to administration, while also addressing procedural, distributive, and recognitional factors. Yet, practically doing so may seem onerous or daunting to some. The current perspective helps to alleviate these types of concerns by providing substantiation for the connection between environmental data science and socioecological inequity, using the Systemic Equity Framework, and provides the foundation for a paradigmatic shift toward normalizing the use of equity-centered approaches in environmental data science and ML settings. Bolstering the integrity of environmental data science and ML is just beginning from an equity-centered tool development and rigorous application standpoint. To this end, this perspective also provides relevant future directions and challenges by overviewing some meaningful tools and strategies—such as applying the Wells-Du Bois Protocol, employing fairness metrics, and systematically addressing irreproducibility; emerging needs and proposals—such as addressing data-proxy bias and supporting convergence research; and establishes a ten-step path forward. Afterall, the work that environmental scientists and engineers do ultimately affect the well-being of us all.
Equity / Bias / Machine Learning / Data Science / Justice / Systemic Equity
[1] |
Baker E, Carley S, Castellanos S, Nock D, Bozeman III J F, Konisky D, Monyei C G, Shah M, Sovacool B. (2023). Metrics for decision-making in energy justice. Annual Review of Environment and Resources, 48(1): 737–760
CrossRef
Google scholar
|
[2] |
Balayn A M A, Lofi C, Houben G J P M. (2021). Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems. VLDB Journal, 30(5): 739–768
CrossRef
Google scholar
|
[3] |
Bozeman J F III, Nobler E, Nock D. (2022). A path toward systemic equity in life cycle assessment and decision-making: standardizing sociodemographic data practices. Environmental Engineering Science, 39(9): 759–769
CrossRef
Google scholar
|
[4] |
Bozeman III J F, Chopra S S, James P, Muhammad S, Cai H, Tong K, Carrasquillo M, Rickenbacker H, Nock D, Ashton W.
CrossRef
Google scholar
|
[5] |
Chubb J, Reed M S. (2018). The politics of research impact: academic perceptions of the implications for research funding, motivation and quality. British Politics, 13(3): 295–311
CrossRef
Google scholar
|
[6] |
Cui S, Gao Y, Huang Y, Shen L, Zhao Q, Pan Y, Zhuang S. (2023). Advances and applications of machine learning and deep learning in environmental ecology and health. Environmental Pollution, 335(10): 122358
CrossRef
Google scholar
|
[7] |
FeldmanM, Friedler S, MoellerJ, ScheideggerC, Venkatasubramanian S (2014). Certifying and removing disparate impact. 10.48550/arxiv.1412.3756 arXiv. 1412.3756
|
[8] |
Gauchat G. (2012). Politicization of science in the public sphere: a study of public trust in the United States, 1974 to 2010. American Sociological Review, 77(2): 167–187
CrossRef
Google scholar
|
[9] |
Gibert K, Horsburgh J S, Athanasiadis I N, Holmes G. (2018). Environmental data science. Environmental Modelling & Software, 106: 4–12
CrossRef
Google scholar
|
[10] |
Grineski S, Bolin B, Boone C. (2007). Criteria air pollution and marginalized populations: environmental inequity in metropolitan Phoenix, Arizona. Social Science Quarterly, 88(2): 535–554
CrossRef
Google scholar
|
[11] |
GundersenO E, CoakleyK, Kirkpatrick C, GilY (2022). Sources of irreproducibility in machine learning: a review.ArXiv, abs/2204.07610 10.48550/arXiv.2204.07610
|
[12] |
HardtM, Price E, SrebroN (2016). Equality of opportunity in supervised learning. Proceedings of the 30th International Conference on Neural Information Processing Systems, 3323–333110.48550/arxiv.1610.02413
|
[13] |
HinnefeldJ H, CoomanP, MammoN, DeeseR (2018). Evaluating Fairness Metrics in the Presence of Dataset Bias. 10.48550/arxiv.1809.09245 arXiv. 1809.09245
|
[14] |
IEEE(2020). Bejing: IEEE Recommended Practice for Assessing the Impact of Autonomous and Intelligent Systems on Human Well. IEEE Std 7010–2020, 1–96 10.1109/IEEESTD.2020.9084219
|
[15] |
Joshi B, Swarnakar P. (2023). How fair is our air? The injustice of procedure, distribution, and recognition within the discourse of air pollution in Delhi, India. Environmental Sociology, 9(2): 176–189
CrossRef
Google scholar
|
[16] |
Liu X, Lu D, Zhang A, Liu Q, Jiang G. (2022). Data-driven machine learning in environmental pollution: gains and problems. Environmental Science & Technology, 56(4): 2124–2133
CrossRef
Google scholar
|
[17] |
LokersR, Knapen R, JanssenS, van RandenY, JansenJ (2016). Analysis of Big Data technologies for use in agro-environmental science. Environmental Modelling & Software: With Environment Data News, 84(10), 494–504
|
[18] |
Monroe-WhiteT, LecyJ (2022). The Wells-Du Bois Protocol for machine learning bias: building critical quantitative foundations for third sector scholarship. Voluntas, 34, 170–18410.1007/s11266-022-00479-2
|
[19] |
Montoya L D, Mendoza L M, Prouty C, Trotz M, Verbyla M E. (2020). Environmental engineering for the 21st century: increasing diversity and community participation to achieve environmental and social justice. Environmental Engineering Science, 38(5): 288–297
CrossRef
Google scholar
|
[20] |
Mowbray M, Savage T, Wu C, Song Z, Cho B A, Del Rio-Chanona E A, Zhang D. (2021). Machine learning for biochemical engineering: a review. Biochemical Engineering Journal, 172: 108054
CrossRef
Google scholar
|
[21] |
MurrayS G, Wachter R M, CucinaR J (2020). Discrimination by artificial intelligence in a commercial electronic health record: a case study. Health Affairs Forefront,10.1377/hblog20200128.626576
|
[22] |
Petersen A M, Ahmed M E, Pavlidis I. (2021). Grand challenges and emergent modes of convergence science. Humanities & Social Sciences Communications, 8(1): 194
CrossRef
Google scholar
|
[23] |
PrahlA, Goh W W P (2021). “Rogue machines” and crisis communication: When AI fails, how do companies publicly respond? Public Relations Review, 47(4): 102077 10.1016/j.pubrev.2021.102077
|
[24] |
Qian J, Wu W, Yu Q, Ruiz-Garcia L, Xiang Y, Jiang L, Shi Y, Duan Y, Yang P. (2020). Filling the trust gap of food safety in food trade between the EU and China: an interconnected conceptual traceability framework based on blockchain. Food and Energy Security, 9(4): e249
CrossRef
Google scholar
|
[25] |
Ravetz J, Saltelli A. (2015). The future of public trust in science. Nature, 524(7564): 161–161
CrossRef
Google scholar
|
[26] |
Rockström J, Gupta J, Qin D, Lade S J, Abrams J F, Andersen L S, Armstrong McKay D I, Bai X, Bala G, Bunn S E.
CrossRef
Google scholar
|
[27] |
SorrentinoR M, YamaguchiS (2008). Handbook of Motivation and Cognition Across Cultures. San Diego: Academic 10.1016/B978-0-12-373694-9.00024-6
|
[28] |
TaeK H, Roh Y, OhY H, KimH, WhangS E (2019). Data cleaning for accurate, fair, and robust models: a big data−AI integration approach. DEEM’19: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learningm, 30 June 2019, Amsterdam, Netherlands
|
[29] |
Tahmasebi P, Kamrava S, Bai T, Sahimi M. (2020). Machine learning in geo- and environmental sciences: from small to large scale. Advances in Water Resources, 142: 103619
CrossRef
Google scholar
|
[30] |
Tessum C W, Apte J S, Goodkind A L, Muller N Z, Mullins K A, Paolella D A, Polasky S, Springer N P, Thakrar S K, Marshall J D.
CrossRef
Google scholar
|
[31] |
Verlegh P W J, Steenkamp J B E M. (1999). A review and meta-analysis of country-of-origin research. Journal of Economic Psychology, 20(5): 521–546
CrossRef
Google scholar
|
[32] |
VesilindP A (2010). Engineering Peace and Justice the Responsibility of Engineers to Society. London: Springer-Verlag
|
[33] |
Vorst R V D. (1998). Engineering, ethics and professionalism. European Journal of Engineering Education, 23(2): 171–179
CrossRef
Google scholar
|
[34] |
Wailoo K A, Dzau V J, Yamamoto K R. (2023). Embed equity throughout innovation. Science, 381(6662): 1029–1029
CrossRef
Google scholar
|
[35] |
Wen Y, Zhou Z, Zhang S, Wallington T J, Shen W, Tan Q, Deng Y, Wu Y. (2022). Urban–rural disparities in air quality responses to traffic changes in a megacity of China revealed using machine learning. Environmental Science & Technology Letters, 9(7): 592–598
CrossRef
Google scholar
|
[36] |
Zhu M, Wang J, Yang X, Zhang Y, Zhang L, Ren H, Wu B, Ye L. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2): 107–116
CrossRef
Google scholar
|
[37] |
ZliobaiteI (2015). On the relation between accuracy and fairness in binary classification. In: The 2nd Workshop on Fairness, Accountability, and Transparency in Machine Learning (FATML) at ICML’15, July 11, 2015, Lille, France 10.48550/arxiv.1505.05723
|
/
〈 | 〉 |