Introduction
Cluster analysis mainly aims to classify variables or observations into meaningful multivariate homogeneous groups such that the members of individual groups are distinguishable from the members of other groups. The observations are mapped to clusters with centroids that summarize the cluster information, providing an overview of the structure of the data. In general, cluster analysis algorithms can be classified into hierarchical and non-hierarchical algorithms. In greater detail, these algorithms can be classified into groups that include partitioning, hierarchical, density-based, grid-based, model-based, constraint-based, and high-dimensional clustering algorithms (
Han and Kamber, 2006;
Meng et al., 2011).
Cluster analysis has been applied in many well-known fields (
Kim et al., 2007;
Lee and Song, 2007;
Sahraei Parizi and Samani, 2013;
Stück et al., 2013;
Ghosh and Kanchan, 2014;
Wang et al., 2014;
Fattahi, 2016;
Eilermann et al., 2017;
Fatehi and Asadi, 2017;
Kitzig et al., 2017), but controversies exist regarding the merits of this analytical method (
Rock, 1988;
Davis, 2002). For instance, different clustering algorithms yield different groupings from the same data (
Templ et al., 2008). Furthermore, cluster analysis is often affected by other issues, such as the possibility that a different result will be obtained if even a single variable is added or removed (
Templ et al., 2008). Additionally, the results of cluster analysis may differ with respect to the diverse parameters (the distance between or similarities among the variables or observations considered) used for the analysis (
Abdel-Halim and Abdel-Aal, 1998). The distributional characteristics of the data and the method of data preparation may also affect the results of cluster analysis, since some clustering algorithms are reliable only for normally distributed data (
Reimann et al., 2002). Such algorithms are not appropriate for use with geochemical data (
Reimann and Filzmoser, 2000;
Zuo et al., 2013a). Furthermore, geochemical data are frequently made up of compositional information expressed in concentrations (i.e., wt.% or mg/kg) that sum up to a constant (
Leite, 2016;
Tolosana-Delgado and McKinley, 2016;
Templ et al., 2017). Consequently, the multivariate statistical methods that are used to evaluate the data may produce biased results (
Filzmoser et al., 2010;
Buccianti, 2013). Therefore, appropriate data transformations must be applied to open the data prior to performing cluster analysis (
Aitchison, 1982;
Aitchison, 1999;
Aitchison et al., 2000;
Aitchison and Egozcue, 2005;
Filzmoser et al., 2009;
Zuo et al., 2013a). Some studies perform many repeated experiments using different clustering techniques and selections of variables until the results of the cluster analysis fit the preconceived ideas of the authors (
Templ et al., 2008). However, all these previous studies indicate that cluster analysis can be applied as an exploratory data analysis tool for investigating the behavior of individual data sets and extracting different types of information from them.
Detecting outliers or anomalies is one of the main tasks in the statistical analysis of geochemical data. The data obtained from stream sediment reconnaissance surveys provide useful information on the regional controls on mineralization or the occurrences of deposits. Hence, such data can be used as a basis for mineral prospecting. In well-prospected areas of mineralization, the main objective is to identify new target areas, whereas areas without mineralization are examined to distinguish anomalies from the background (
Carranza and Hale, 1997). Several methods can be applied to select thresholds for identifying outliers or anomalies. For example, ‘mean±2 standard deviation (sdev)’ is recommended as a threshold that separates anomalies from the background (
Hawkes and Webb, 1962). The values within the range of ‘mean±2 sdev’ are defined as belonging to the background, whereas those values that lie outside of this range are considered to be anomalies. However, the threshold can be significantly influenced by extreme values. Thus, removing obvious outliers or anomalies from the data and carrying out a logarithmic transformation prior to calculating the ‘mean±2 sdev’ are two methods that are commonly used to avoid or mitigate inaccuracies. However, some studies do not recommend this method, because other methods that can perform such tasks more accurately are readily available (
Reimann and Garrett, 2005).
Exploratory data analysis (EDA) (
Tukey, 1977) is an unconventional and informal approach to analyze and interpret univariate data that do not follow a normal distribution (
Carranza, 2009). It can be useful in the analysis and modeling of single-element geochemical anomalies (
Yusta et al., 1998;
Bounessah and Atkin, 2003;
Reimann et al., 2005;
Reimann and Garrett, 2005;
Zumlot et al., 2009;
Agharezaei and Hezarkhani, 2016). Both ‘mean±2 sdev’ and EDA are based on data values and only a single threshold is derived from each, which disregards the condition that the background may be spatially variable. Some studies have applied fractal and multi-fractal models to characterize the concentrations of geochemical elements in various environments, such as in soil and stream sediments (
Cheng et al., 1996;
Zuo and Cheng, 2008;
Cheng and Agterberg, 2009;
Afzal et al., 2010;
Carranza, 2010;
Carranza, 2011;
Pazand et al., 2011;
Agterberg, 2012;
Zuo, 2012;
Hassanpour and Afzal, 2013;
Zuo et al., 2013a,
b,
2015;
Parsa et al., 2017). Fractal and multi-fractal models are suitable for analyzing geochemical data because geochemical distribution patterns (or geochemical landscapes) are regarded as plausibly consisting of fractals (background and anomaly patterns) (
Bölviken et al., 1992). However, none of the methods mentioned above are especially appropriate for multivariate analysis. On the other hand, cluster analysis is designed for multivariate analysis; thus, it is considered to be a relevant method for identifying geochemical anomalies (
Howarth, 1983).
In this study, cluster analysis is applied to analyze geochemical data and to identify possible anomalies from these data. Three different clustering algorithms, the hierarchical, k-means, and fuzzy c-means algorithms, are used to analyze geochemical data obtained from stream sediment collected in Karamay, northwestern Xinjiang, China. To compare the performance of these three algorithms, they are applied to the geochemical data to which six different transformations (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) have been applied. In addition, the study also addresses the following critical questions related to cluster analysis: (i) How do the data transformation methods and clustering algorithms used affect the results of applying cluster analysis to geochemical data? (ii) How suitable are the clustering algorithms for extracting mineral-related information from geochemical data?
Description of the study area
Geology
The study area is located in the western Junggar Basin, approximately 330 km to the northwest of Urumqi (Fig. 1). The major rock bodies in this area include ophiolitic mélange belts, volcanic-sedimentary rocks, and intermediate-acidic intrusive bodies that formed due to the amalgamation of island arcs and accretionary processes (
He et al., 2004;
Zhu et al., 2013b). Major plutonic rocks are represented in this area by the Miaoergou, Hatu, Akebasitao, Red Mountain and north Karamay granite batholiths, which have an age of 300 Ma, as determined from zircon LA-ICP-MS U-Pb techniques (
Su et al., 2006). The distributions of intrusive rocks and Au deposits are clearly correlated with fault zones in this region. The major faults, including the Hatu, Anqi, Dalabute, and Yijiaren faults, strike NNE. The Dalabute ophiolitic mélange belt, which is approximately 50 km
2 in size and is extensively crosscut by imbricate structures with thrust faults, is distributed as a band along the Dalabute fault. Oceanic crust materials often occur in the terrigenous detrital sediments of the ancient continental margin, and these materials display the geochemical signatures of mantle-derived rocks (
Zhang and Huang, 1992).
Au deposits
The Hatu and Baogutu Au deposits are located in the northern and southern parts of the study area, respectively, and they are representative deposits in this study area (Fig. 1). The distribution of the Hatu Au deposit is controlled by two NE-trending faults, specifically the Anqi extension fault and the Hatu compression and scissor fault. The metallogenic structures are associated with NW-, NE, and E-W-trending secondary faults, which are closely related to the NE-trending fault. The ore bodies are concentrated as groups and display en echelon and/or end-to-end alignment (
Zhu et al., 2013a). The Hatu Au deposit is mainly composed of quartz vein-type and altered rock-type ore bodies, which are the products of homogeneous hydrothermal flow (
Zhang, 2003). The gangue minerals of these two types of ore bodies are mainly quartz, albite, sericite, and carbonate minerals. Two types of native Au, encapsulated and fissure-filling Au, are found in these ore bodies, and arsenopyrite, pyrite, and quartz are the main Au-bearing minerals. The range of mineralization temperatures (200°C–280°C) is consistent between the main metallogenic stage of the quartz vein-type and altered rock-type ore bodies. Moreover, calcite veins were produced during the late stage of mineralization of these two types of ore bodies, and no Au is present in the calcite veins (
Zhu et al., 2013a). Copper, Ag, As, and Sb appear in relatively high concentrations to varying degrees in altered Carboniferous tuffaceous shale.
The Baogutu Au deposit occurs in the Lower Carboniferous strata. There are two types of Au mineralizations in the Baogutu area. The first type is located in an area where intermediate and acidic dikes are concentrated and controlled by NE-trending faults, and it includes quartz stockwork ore bodies and quartz vein-type ore bodies. The second type is located in the contact zone between porphyry bodies and the surrounding host rock, and the Au is hosted in sulfide minerals. Andesite, tuff, and tuffaceous sandstone are the Au-bearing rocks, and the ore bodies are sulfide-bearing. Silicification, sericitization, carbonatation, pyritization, and arsenopyritization are the main modes of alteration of the host rock, and the primary mineral assemblage is arsenopyrite+ pyrite+ native Au+ native arsenic+ native antimony+ stibnite. The ore-associated elements are Au, As, and Sb (
Zhu et al., 2013a).
Materials and methods
Geochemical data
The geochemical data used in this study include 39 elements or variables that were determined from 1444 stream sediment samples, which were collected over an area of 5774 km
2 (Fig. 1). The data were acquired from the National Geochemical Mapping Project of China, also known as the Regional Geochemistry-National Reconnaissance Project (
Xie et al., 1997;
Xie et al., 2008). The stream sediment samples were mainly collected from natural gullies with a grid spacing of approximately one sample per km
2. Groups of four samples were then combined into single samples that each represent 4 km
2.
The concentrations of the 39 elements in the stream sediments were measured using various facilities (
Wang et al., 2011), and more details are listed in Table 1. The maximum and minimum concentrations of the 39 elements and their standard deviations are shown in Table 2.
Distance measures
Measuring the distance among observations or variables is a key point in most types of cluster analysis. Note that the distance in cluster analysis does not reflect the geographical distance between two observations or variables; instead, it is a measure of the similarity among observations or variables. This distance is the basis of classifying different observations or variables into different groups. Small distances reflect similar observations, whereas large distances indicate possibly dissimilar observations. In this study, the distance between two variables is defined as one minus the correlation coefficient of each pair of variables, whereas the Euclidean, squared Euclidean, and correlation coefficient distances are used to classify the observations.
Clustering the geochemical compositional data
Clustering by variables (R-type cluster analysis)
To characterize the influence of the characteristics of the geochemical data (e.g., abnormal distributions and compositional properties) on the cluster analysis of the variables, five of the transformation methods (NT, the ZST, the LT, the ALT, and the CLT) are applied to the data describing the 39 elements or variables before the hierarchical cluster analysis algorithm is used. For this purpose, the Matlab R2012a software package is used for the ZST and the LT, whereas the R software package is used for the ALT, the CLT, and the ILT. Notably, the results of the ALT are affected by the ratio element used in this transformation (
Templ et al., 2008); any element can be selected as the ratio element except the target element or the element of interest (i.e., those related to the Au deposit or hydrothermal or epithermal elements). In this study, SiO
2 is selected as the ratio element. The ILT is not suitable for R-type cluster analysis, considering that the relationship among the original variables will be lost and the newly produced variables will no longer be directly interpretable in terms of the originally entered variables (
Templ et al., 2008); therefore, the ILT is not applied here.
The hierarchical algorithm is used to cluster the variables on the basis of the geochemical data preprocessing mentioned above. Building a dendrogram is one of the most important parts of the hierarchical algorithm, and various methods can be used to create dendrograms. In this study, the group average method, in which the distance among the variables is defined as one minus the correlation coefficient of each pair of variables, is selected to create the dendrogram because it is a moderate method of addressing the data space. The results of applying R-type cluster analysis are presented in Fig. 2.
R-type cluster analysis is also applied because it can provide valuable information for dimensional reduction. Such information is very important because the distribution of high-dimensional data is often complex and difficult to fully understand. In the present study, the Au deposit-related and the hydrothermal or epithermal elements are the target elements or the elements of interest, and Q-type cluster analysis (section 4.2) will be implemented based on these elements.
The results of applying R-type cluster analysis to the geochemical data after employing different data transformation methods are shown in Fig. 2. It can be seen that U-Mo-Au-Sb-B-Hg-W-As-Ag (Fig. 2(a)), U-Mo-Au-Sb-B-Hg-W-As-Ag (this figure is the same as that shown in Fig. 2(a)), U-Au-Hg-P-Cd-Th-F-Zn-Li-B-W-Mo-Sb-As (Fig. 2(b)), Au-Mo-Sb-As (Fig. 2(c)), and Mo-Au-As-Sb-Hg-B-Ag (Fig. 2(d)) belong to the relatively consistent group. These groups were determined and selected according to the following conventions:
- All or most of the Ag, As, Au, and Sb are included in the target cluster because they are the primary Au deposit-related elements and hydrothermal/epithermal elements in this study area.
- To reduce the dimensions of geochemical data, the number of elements in the target cluster should be as small as possible. To simplify the data analysis, we neglect the information that is uncorrelated or weakly correlated with the known Au deposits in the study area, which permits better interpretation of the results.
According to this study, R-type cluster analysis produces different results when various data transformation methods are used to analyze the present geochemical data, except when NT and the ZST are used. In this study, U-Mo-Au-Sb-B-Hg-W-As-Ag (i.e., the results of applying R-type cluster analysis to the output from NT) is selected for the following section, which is titled “Clustering by observations”.
Clustering by observations (Q-type cluster analysis)
To classify the geochemical data, the hierarchical, k-means, and fuzzy c-means algorithms are applied to the U-Mo-Au-Sb-B-Hg-W-As-Ag group. However, the inverse distance weighted (IDW) interpolation method is applied to the selected elements before Q-type cluster analysis is employed to produce raster data for each element, which are helpful in mapping and interpreting the results. The spatial resolution of the raster data is determined to be approximately 2 km, according to the distance between each pair of adjacent sampling sites in this study. Second, the number of clusters used in Q-type cluster analysis needs to be defined in advance; However, the appropriate number of clusters is unknown prior to the calculation. The solutions for this issue are as follows.
For the hierarchical algorithm, 1) the results of applying Q-type cluster analysis to the results of NT, the ZST, the LT, the ALT, the CLT, and the ILT are all determined by increasing the inconsistent coefficient from zero until the number of clusters stabilizes; 2) the number of clusters obtained using the results of NT, the ZST, the LT, the ALT, the CLT, and the ILT should be nearly equal.
For the k-means and fuzzy c-means algorithms, each set of transformed geochemical data (i.e., the results of NT, the ZST, the LT, the ALT, the CLT, and the ILT) are classified into the same number of clusters to compare the results obtained using Q-type cluster analysis.
Based on the solutions mentioned above, the inconsistent coefficients of the hierarchical algorithm are determined to be 10, 9, 6, 6, 6, and 6 for NT, the ZST, the LT, the ALT, the CLT, and the ILT, respectively, and the level of inconsistency is determined to be 50. Thus, 12, 12, 13, 12, 11, and 11 clusters are produced for NT, the ZST, the LT, the ALT, the CLT, and the ILT, respectively. The cluster(s) that contain(s) a smaller number of observations is/are reclassified into a single cluster that is defined as an anomaly, whereas the cluster(s) that contain(s) most of the observations is/are defined as representing the background conditions.
The results of Q-type cluster analysis differ from each other, depending on whether the hierarchical (Fig. 3(a)), k-means (Fig. 3(c)), or fuzzy c-means (Fig. 3(e)) algorithms are applied to the output from NT. The known Au deposits are not associated with any of the clusters shown in Figs. 3(a) and 3(c). However, there is a relatively good relationship between the known Au deposits and cluster 1 in Fig. 3(e) (21 known Au deposits are located in cluster 1, representing 78% of the total number). Moreover, the results of applying Q-type cluster analysis to the output of the ZST data with the hierarchical (Fig. 3(b)), k-means (Fig. 3(d)), and fuzzy c-means (Fig. 3(f)) algorithms also differ from one another, and the known Au deposits are not associated with any of the clusters shown in Figs. 3(b), 3(d), or 3(f).
The results of applying Q-type cluster analysis with the hierarchical (Fig. 4(a)), k-means (Fig. 4(c)), and fuzzy c-means (Fig. 4(e)) algorithms to the output from the LT differ from each other. However, the known Au deposits are associated with the anomaly clusters in Fig. 4(a) or cluster 1 in Figs. 4(c) and 4(e), and the extents of cluster 1 in Fig. 4(c) and Fig. 4(e) are similar to each other (they are both mainly distributed in the north-central and southwestern parts of the study area). In the same way, the anomaly clusters (specifically, the anomaly cluster in Fig. 4(b) and cluster 1 in Fig. 4(d) and Fig. 4(f)) of Q-type cluster analysis of ALT data with hierarchical (Fig. 4(b)), k-means (Fig. 4(d)), and fuzzy c-means (Fig. 4(f)) algorithms are also correlated with the known Au deposits. The distribution of cluster 1 in Figs. 4(d) and 4(f) are similar to each other to some extent. The results of applying Q-type cluster analysis to the output from the LT and the ALT differ from each other, even when the same clustering algorithm is used (Fig. 4). However, notably, there is always a cluster that is clearly associated with the known Au deposits within the study area, regardless of which clustering algorithm is applied to the output from the LT or the ALT.
Most of the known Au deposits are located within the anomaly clusters shown in Figs. 5(a) and 5(b) or within cluster 1 in Figs. 5(c), 5(e), 5(d), and 5(f). The results of applying Q-type cluster analysis to the output from the CLT and the ILT correlate well with each other, regardless of which clustering algorithm is used (Fig. 5). In particular, similar results (Figs. 5(c), 5(e), 5(d), and 5(f)) are produced when the k-means and fuzzy c-means algorithms are applied to the output of the CLT and the ILT.
The observations of the geochemical data are grouped into three clusters when the k-means and fuzzy c-means algorithms are applied to the geochemical data, and this number of clusters may be subjectively chosen. To improve the reliability of the results of Q-type cluster analysis, the number of clusters employed in Q-type cluster analysis must be discussed further in the following section.
Considering the length restrictions of this paper, the present study only compares the results of applying the k-means and fuzzy c-means algorithms to the output from NT and the ILT, with the goal of carrying out a further test of the k-means and fuzzy c-means algorithms. The observations are classified into four, six, or eight clusters, as shown in Figs. 6 and 7, respectively. The results of applying Q-type cluster analysis differ from each other when the k-means algorithm is applied to the output of NT (Figs. 6(a), 6(c) and 6(e)). Similar results are also shown in Figs. 6(b), 6(d), and 6(f) when the k-means algorithm is applied to the output from the ILT. However, one or two cluster(s) (the cluster(s) are outlined in solid black and/or white lines) are present in Fig. 6(b), Fig. 6(d), and Fig. 6(f)). The spatial pattern(s) of these clusters are similar, and most of the known Au deposits are located within these cluster(s).
A cluster (shown within the solid white line) is always observed when Q-type cluster analysis with the fuzzy c-means algorithm is applied to the output from NT and the ILT, and this cluster is associated with the known Au deposits to some extent. Additionally, one or two cluster(s) (shown within the solid black and/or white lines) is/are always present in Figs. 7(b), 7(d) and 7(f), and their spatial patterns are similar, and the known Au deposits are associated with the cluster(s).
The anomaly cluster(s) (shown within the solid black and/or white lines) in Figs. 6(b), 6(d), 6(f), 7(b), 7(d), and 7(f) are associated with the known Au deposits, and the distributions of the cluster(s) are more or less similar (they are mainly located in the north-central and southwestern portions of the study area), regardless of whether the k-means or fuzzy c-means algorithms are applied to the output from the ILT or whether the observations are divided into four, six, or eight clusters.
Discussion
Some interesting results have been produced from the geochemical data using cluster analysis algorithms accompanied by compositional data transformation techniques, and the results are valuable for further mineral exploration.
The present study first classifies the 39 elements or variables of the geochemical data into several groups using the hierarchical algorithm, and one of these groups represents the hydrothermal and epithermal activity that has occurred within the study area. The study reveals that the results of applying R-type cluster analysis to the output from NT and the ZST are the same. On the other hand, the results obtained with the output from LT, the ALT, and the CLT differ significantly from each other and are different from the results obtained using the output of NT and the ZST. These results suggest that the results of R-type cluster analysis are affected by LT, the ALT, and the CLT, rather than by the ZST. Although ALT, the CLT, and the ILT are considered to be data transformation techniques that are effective in opening compositional data (
Egozcue et al., 2003;
Templ et al., 2008), their effects on R-type cluster analysis are not clear because the results of applying R-type cluster analysis to the output from the ALT and the CLT differ from each other and are difficult to interpret. In general, one of the variables of geochemical data must be selected as a ratio variable to open the compositional data when the ALT method is used, and the selected variable then cannot be used in further analyses. The CLT method results in collinear data (
Templ et al., 2008) when it is applied to compositional data. Although the ILT method avoids collinearity (
Egozcue et al., 2003), it is not a reliable choice for R-type cluster analysis, as it can erase the direct relationships among the original variables, and the results will become fuzzy. Selecting an appropriate data transformation method before applying R-type cluster analysis is a challenging task. Fortunately, some knowledge and previous studies in the present study area are helpful for the selection of the most suitable method (e.g., expert knowledge of the target deposit and the geological background of the study area can help determine which elements are the elements of interest that should be targeted).
Second, the number of clusters should be pre-defined when the hierarchical algorithm is used to perform Q-type cluster analysis; however, this assignment is difficult. In this study, the inconsistent coefficient is used as a criterion to produce the different results from the hierarchical algorithm. If the inconsistent coefficient is too small, then the number of clusters will be too large to explain the meaning of each cluster. Fortunately, the results of applying Q-type cluster analysis tend to stabilize when the inconsistent coefficient changes with an appropriate step from a small number to a larger one. Therefore, the relatively stable results can represent an appropriate solution. All of the clusters that contain a small number of observations are reclassified into a new cluster, which is defined as the anomaly cluster. This solution is reasonable because the process of mineralization is an event with small probability, and the geochemical anomaly should have a lower proportion. The clusters which contain the largest number of observations can be regarded as representing the background. However, the solution above is subjective and can only be used as an exploratory data analysis tool. The k-means and fuzzy c-means algorithms are more user-friendly for Q-type cluster analysis, given that they can easily produce the pre-defined number of clusters. Furthermore, there is/are always a/some cluster(s) that can be considered to be anomaly cluster(s), and the anomaly cluster(s) is/are closely associated with the known Au deposits within the study area.
The results of applying Q-type cluster analysis to the output from NT and ZST are unstable, regardless of what clustering algorithm is used. Hence, it is difficult to interpret the results. The known Au deposits are not associated with any of the clusters obtained from NT and ZST data (with the exception of Fig. 3(e)), and the reason is not clear. However, the known Au deposits are in good correlation with one of the clusters identified by applying Q-type cluster analysis to the output from the LT and the ALT. The most reliable and interpretable results are produced using the output from the CLT and the ILT when they are compared with NT, the ZST, the LT, and the ALT data. Moreover, the known Au deposits are associated with the anomaly cluster(s) of CLT and the ILT data. Furthermore, the spatial distributions of the cluster(s) are very similar when the same clustering algorithm is applied to the output from the CLT and the ILT. To clarify the reason why the results of applying cluster analysis to the output from the CLT and the ILT are more stable and interpretable and can provide more valuable information for mineral exploration than other data transformation methods, we compare the structure of the data after different data transformation methods are used. The quantile-quantile plot of the variables in each transformed data set show that the LT, the ALT, the CLT, and the ILT improve the structure of geochemical data (i.e., most of the transformed variables resemble normal distributions more closely); however, the reason why the CLT and the ILT are more suitable for use in the cluster analysis of geochemical data than the LT and the ALT is not very clear, and this point needs to be discussed in future studies.
The known Au deposits are always associated with one (contained within the solid white lines shown in Figs. 6 and 7) or two (contained within the solid white and black lines shown in Figs. 6 and 7) clusters when the observations of the output of the ILT are classified into four, six, or eight clusters, regardless of whether the k-means algorithm or the fuzzy c-means algorithm is used. Also, the solid white lines are nested within the solid black lines, and they display a halo feature together (Figs. 6 and 7). This pattern may represent the geochemical dispersion halo. Thus, the cluster(s) within the solid white and black lines shown in Figs. 6 and 7 is/are considered as strong and moderate anomalies, respectively.
Conclusions
1) The result of R-type cluster analysis is not affected by the use of the ZST; however, it is evidently affected by use of the LT, the ALT, and the CLT.
2) The k-means and fuzzy c-means algorithms are more user-friendly for Q-type cluster analysis of geochemical data than the hierarchical algorithm, but they are not suitable for application to the output from the NT and the ZST.
3) The results of applying Q-type cluster analysis to the output from the CLT and the ILT are very similar, regardless of whether the k-means or the fuzzy c-means algorithm is used. This result suggests that the use of the CLT and the ILT can lead to more stable results than the ALT.
4) The use of different distance metrics with the same clustering algorithm can produce different Q-type cluster analysis results (not presented in this study). This statement is especially true for applying Q-type cluster analysis to the output from NT, the ZST, the LT, and the ALT. However, the different distance metrics do not strongly affect the results of applying Q-type cluster analysis to the output from the CLT and the ILT.
5) The hierarchical algorithm is not recommended for use in Q-type cluster analysis because it is subjective, and it is difficult to determine the number of clusters in advance. NT and the ZST are also not recommended for use before performing Q-type cluster analysis of geochemical data.
6) In combination with the k-means or fuzzy c-means algorithm, the CLT or the ILT yields more reliable and interpretable results when Q-type cluster analysis is applied to geochemical data. According to the results of applying Q-type cluster analysis to the output from the CLT and the ILT, the northeastern and southwestern parts (i.e., the areas within the solid white and black lines) of the study area are promising areas for further geological exploration.
Higher Education Press and Springer-Verlag GmbH Germany