Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

Shuguang ZHOU, Kefa ZHOU, Jinlin WANG, Genfang YANG, Shanshan WANG

Front. Earth Sci. ›› 2018, Vol. 12 ›› Issue (3) : 491-505.

PDF(9657 KB)
PDF(9657 KB)
Front. Earth Sci. ›› 2018, Vol. 12 ›› Issue (3) : 491-505. DOI: 10.1007/s11707-017-0682-8

Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

Author information +
History +


Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.


cluster analysis / compositional data / geochemical anomaly / mineral exploration

Cite this article

Download citation ▾
Shuguang ZHOU, Kefa ZHOU, Jinlin WANG, Genfang YANG, Shanshan WANG. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies. Front. Earth Sci., 2018, 12(3): 491‒505


Abdel-Halim R E, Abdel-Aal R E (1998). Classification of urinary stones by cluster analysis of ionic composition data. Comput Methods Programs Biomed, 58(1): 69–81
CrossRef Google scholar
Afzal P, Khakzad A, Moarefvand P, Omran N R, Esfandiari B, Alghalandis Y F (2010). Geochemical anomaly separation by multifractal modeling in Kahang (Gor Gor) porphyry system, Central Iran. J Geochem Explor, 104(1–2): 34–46
CrossRef Google scholar
Agharezaei M, Hezarkhani A (2016). Delineation of geochemical anomalies based on Cu by the boxplot as an exploratory data analysis (EDA) method and concentration-volume (C-V) fractal modeling in Mesgaran mining area, Eastern Iran. Open Journal of Geology, 6(10): 1269–1278
CrossRef Google scholar
Agterberg F P (2012). Multifractals and geostatistics. J Geochem Explor, 122: 113–122
CrossRef Google scholar
Aitchison J (1982). The statistical analysis of compositional data. J R Stat Soc B, 44(2): 139–177
Aitchison J (1999). Logratios and natural laws in compositional data analysis. Math Geol, 31(5): 563–580
CrossRef Google scholar
Aitchison J, Barcelo-Vidal C, Martin-Fernandez J A, Pawlowsky-Glahn V (2000). Logratio analysis and compositional distance. Math Geol, 32(3): 271–275
CrossRef Google scholar
Aitchison J, Egozcue J J (2005). Compositional data analysis: where are we and where should we be heading? Math Geol, 37(7): 829–850
CrossRef Google scholar
Bölviken B, Stokke P R, Feder J, Jossang T (1992). The fractal nature of geochemical landscapes. J Geochem Explor, 43(2): 91–109
CrossRef Google scholar
Bounessah M, Atkin B P (2003). An application of exploratory data analysis (EDA) as a robust non-parametric technique for geochernical mapping in a semi-arid climate. Appl Geochem, 18(8): 1185–1195
CrossRef Google scholar
Buccianti A (2013). Is compositional data analysis a way to see beyond the illusion? Comput Geosci, 50: 165–173
CrossRef Google scholar
Carranza E J M (2009). Geochemical anomaly and mineral prospectivity mapping in GIS. Handbook of exploration and environmental geochemistry, 11. Elsevier Science
Carranza E J M (2010). Catchment basin modelling of stream sediment anomalies revisited: incorporation of EDA and fractal analysis. Geochem Explor Environ Anal, 10(4): 365–381
CrossRef Google scholar
Carranza E J M (2011). Analysis and mapping of geochemical anomalies using logratio-transformed stream sediment data with censored values. J Geochem Explor, 110(2): 167–185
CrossRef Google scholar
Carranza E J M, Hale M (1997). A catchment basin approach to the analysis of reconnaissance geochemical-geological data from Albay Province, Philippines. J Geochem Explor, 60(2): 157–171
CrossRef Google scholar
Cheng Q, Agterberg F P (2009). Singularity analysis of ore-mineral and toxic trace elements in stream sediments. Comput Geosci, 35(2): 234–244
CrossRef Google scholar
Cheng Q, Agterberg F P, Bonham-Carter G F (1996). A spatial analysis method for geochemical anomaly separation. J Geochem Explor, 56(3): 183–195
CrossRef Google scholar
Davis J C (2002). Statistics and Data Analysis in Geology (3rd ed). New York, Chichester, Brisbane, Toronto, Singapore: John Wiley and Sons
Egozcue J J, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003). Isometric logratio transformations for compositional data analysis. Math Geol, 35(3): 279–300
CrossRef Google scholar
Eilermann M, Post C, Schwarz D, Leufke S, Schembecker G, Bramsiepe C (2017). Generation of an equipment module database for heat exchangers by cluster analysis of industrial applications. Chem Eng Sci, 167: 278–287
CrossRef Google scholar
Fatehi M, Asadi H H (2017). Application of semi-supervised fuzzy c-means method in clustering multivariate geochemical data, a case study from the Dalli Cu-Au porphyry deposit in central Iran. Ore Geol Rev, 81: 245–255
CrossRef Google scholar
Fattahi H (2016). Indirect estimation of deformation modulus of an in situ rock mass: an ANFIS model based on grid partitioning, fuzzy c-means clustering and subtractive clustering. Geosci J, 20(5): 681–690
CrossRef Google scholar
Filzmoser P, Hron K, Reimann C (2009). Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ, 407(23): 6100–6108
CrossRef Google scholar
Filzmoser P, Hron K, Reimann C (2010). The bivariate statistical analysis of environmental (compositional) data. Sci Total Environ, 408(19): 4230–4238
CrossRef Google scholar
Ghosh T, Kanchan R (2014). Geoenvironmental appraisal of groundwater quality in Bengal alluvial tract, India: a geochemical and statistical approach. Environ Earth Sci, 72(7): 2475–2488
CrossRef Google scholar
Han J, Kamber M (2006). Data Minning: Concepts and Techniques (2nd ed). Beijing: China Machine Press
Hassanpour S, Afzal P (2013). Application of concentration–number (C–N) multifractal modeling for geochemical anomaly separation in Haftcheshmeh porphyry system, NW Iran. Arab J Geosci, 6(3): 957–970
CrossRef Google scholar
Hawkes H E, Webb J S (1962). Geochemistry in Mineral Exploration.New York: Harper
He G Q, Chen S D, Xu X, Li J Y, Hao J (2004). An Introduction to the Explanatory Text of the Map of Tectonics of Xinjiang and Its Neighbouring Area (1:250000). Beijing: Geological Publishing House (in Chinese)
Howarth R J (1983). Statistics and Data Analysis in Geochemical Prospecting. Handbook of Exploration Geochemistry, 2. Amsterdam-Oxford-New York Elsevier
Kim T, Moon D C, Park W B, Park K H, Ko G W (2007). Classification of springs of Jeju Island using cluster analysis of annual fluctuations in discharge variables: investigation of the regional groundwater system. Geosci J, 11(4): 397–413
CrossRef Google scholar
Kitzig M C, Kepic A, Kieu D T (2017). Testing cluster analysis on combined petrophysical and geochemical data for rock mass classification. Explor Geophys, 48(3): 344–352
CrossRef Google scholar
Lee J Y, Song S H (2007). Groundwater chemistry and ionic ratios in a western coastal aquifer of Buan, Korea: implication for seawater intrusion. Geosci J, 11(3): 259–270
CrossRef Google scholar
Leite M L C (2016). Applying compositional data methodology to nutritional epidemiology. Stat Methods Med Res, 25(6): 3057–3065
CrossRef Google scholar
Meng H, Song Y, Song F, Shen H (2011). Research and application of cluster and association analysis in geochemical data processing. Computat Geosci, 15(1): 87–98
CrossRef Google scholar
Sahraei Parizi H, Samani N (2013). Geochemical evolution and quality assessment of water resources in the Sarcheshmeh copper mine area (Iran) using multivariate statistical techniques. Environ Earth Sci, 69(5): 1699–1718
CrossRef Google scholar
Parsa M, Maghsoudi A, Yousefi M, Carranza E J M (2017). Multifractal interpolation and spectrum–area fractal modeling of stream sediment geochemical data: implications for mapping exploration targets. J Afr Earth Sci, 128: 5–15
CrossRef Google scholar
Pazand K, Hezarkhani A, Ataei M, Ghanbari Y (2011). Application of multifractal modeling technique in systematic geochemical stream sediment survey to identify copper anomalies: a case study from Ahar, Azarbaijan, Northwest Iran. Chemie der Erde- Geochemistry, 71(4): 397–402
Reimann C, Filzmoser P (2000). Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environmental Geology, 39(9): 1001–1014
CrossRef Google scholar
Reimann C, Filzmoser P, Garrett R G (2002). Factor analysis applied to regional geochemical data: problems and possibilities. Appl Geochem, 17(3): 185–206
CrossRef Google scholar
Reimann C, Filzmoser P, Garrett R G (2005). Background and threshold: critical comparison of methods of determination. Sci Total Environ, 346(1–3): 1–16
CrossRef Google scholar
Reimann C, Garrett R G (2005). Geochemical background- concept and reality. Sci Total Environ, 350(1–3): 12–27
CrossRef Google scholar
Rock N M S (1988). Numerical Geology. Lecture Notes in Earth Sciences, 18. New York, Berlin, Heidelberg: Springer-Verlag
Stück H, Koch R, Siegesmund S (2013). Petrographical and petrophysical properties of sandstones: statistical analysis as an approach to predict material behaviour and construction suitability. Environ Earth Sci, 69(4): 1299–1332
CrossRef Google scholar
Su Y, Tang H, Hou G, Liu C (2006). Geochemistry of aluminous A-type granites along Darabut teconic belt in west Junggar, Xinjiang. Geochimica, 35(1): 55–67 (in Chinese)
Templ M, Filzmoser P, Reimann C (2008). Cluster analysis applied to regional geochemical data: problems and possibilities. Appl Geochem, 23(8): 2198–2213
CrossRef Google scholar
Templ M, Hron K, Filzmoser P (2017). Exploratory tools for outlier detection in compositional data with structural zeros. J Appl Stat, 44(4): 734–752
CrossRef Google scholar
Tolosana-Delgado R, McKinley J (2016). Exploring the joint compositional variability of major components and trace elements in the Tellus soil geochemistry survey (Northern Ireland). Appl Geochem, 75: 263–276
CrossRef Google scholar
Tukey J W (1977). Exploratory Data Analysis. Reading: Addison-Wesley
Wang L, Wang Y, Zhang W, Xu C, An Z (2014). Multivariate statistical techniques for evaluating and identifying the environmental significance of heavy metal contamination in sediments of the Yangtze River, China. Environ Earth Sci, 71(3): 1183–1193
CrossRef Google scholar
Wang X Q, Xie X J, Zhang B R, Hou Q Y (2011). Geochemical probe into China’s continental crust. Acta Geoscientica Sinica, 32: 65–83 (in Chinese)
Xie X, Mu X, Ren T (1997). Geochemical mapping in China. J Geochem Explor, 60(1): 99–113
CrossRef Google scholar
Xie X, Wang X, Zhang Q, Zhou G, Cheng H, Liu D, Cheng Z, Xu S (2008). Multi-scale geochemical mapping in China. Geochem Explor Environ Anal, 8(3–4): 333–341
CrossRef Google scholar
Yusta I, Velasco F, Herrero J M (1998). Anomaly threshold estimation and data normalization using EDA statistics: application to lithogeochemical exploration in lower Cretaceous Zn-Pb carbonate-hosted deposits, northern Spain. Appl Geochem, 13(4): 421–439
CrossRef Google scholar
Zhang C, Huang X (1992). The ages and tectonic settings of ophiolites in West Junggar, Xinjiang. Geological Review, 38(6): 509–524 (in Chinese)
Zhang F (2003). The study of geological characteristics of the gold associated minerals and gold vine of Hatu gold deposit. Journal of Xinjiang Nonferrous Metals, 26(3): 5–6 (in Chinese)
Zhu Y, An F, Xu C, Guo H, Xia F, Xiao F, Zhang F, Lin C, Qiu T, Wei S (2013a). Geology and Au-Cu Deposits in the Hatu and its Adjacent Region (Xinjiang): Evolution and Prospecting Model. Beijing: Geological Publishing House
Zhu Y, Chen B, Xu X, Qiu T, An F (2013b). A new geological map of the western Junggar, north Xinjiang (NW China): implications for Paleoenvironmental reconstruction. Episodes, 36(3): 205–220
Zumlot T, Goodell P, Howari F (2009). Geochemical mapping of New Mexico, USA, using stream sediment data. Environmental Geology, 58(7): 1479–1497
CrossRef Google scholar
Zuo R (2012). Exploring the effects of cell size in geochemical mapping. J Geochem Explor, 112: 357–367
CrossRef Google scholar
Zuo R, Cheng Q (2008). Mapping singularities- a technique to identify potential Cu mineral deposits using sediment geochemical data, an example for Tibet, west China. Mineral Mag, 72(1): 531–534
CrossRef Google scholar
Zuo R, Wang J, Chen G, Yang M (2015). Identification of weak anomalies: a multifractal perspective. J Geochem Explor, 148: 12–24
CrossRef Google scholar
Zuo R, Xia Q, Wang H (2013a). Compositional data analysis in the study of integrated geochemical anomalies associated with mineralization. Appl Geochem, 28: 202–211
CrossRef Google scholar
Zuo R, Xia Q, Zhang D (2013b). A comparison study of the C-A and S-A models with singularity analysis to identify geochemical anomalies in covered areas. Appl Geochem, 33: 165–172
CrossRef Google scholar


The authors thank Ratheesh Kumar R.T, Rustam Orozbaev for their assistance to revise the language before we submit the manuscript and the authors are grateful for the anonymous reviewers’ constructive comments and suggestions. This study was funded by the National Natural Science Foundation of China (Grant Nos. U1503291 and 41402296), and a Major Project in Xinjiang Uygur Autonomous Region (201330121-3).


2017 Higher Education Press and Springer-Verlag GmbH Germany
AI Summary AI Mindmap
PDF(9657 KB)




