Visualization of amino acid composition differences between processed protein from different animal species by self-organizing feature maps

Amino acids are the dominant organic components of processed animal proteins, however there has been limited investigation of differences in their composition between various protein sources. Information on these differences will not only be helpful for their further utilization but also provide fundamental information for developing species-specific identification methods. In this study, self-organizing feature maps (SOFM) were used to visualize amino acid composition of fish meal, and meat and bone meal (MBM) produced from poultry, ruminants and swine. SOFM display the similarities and differences in amino acid composition between protein sources and effectively improve data transparency. Amino acid composition was shown to be useful for distinguishing fish meal from MBM due to their large concentration differences between glycine, lysine and proline. However, the amino acid composition of the three MBMs was quite similar. The SOFM results were consistent with those obtained by analysis of variance and principal component analysis but more straightforward. SOFM was shown to have a robust sample linkage capacity and to be able to act as a powerful means to link different sample for further data mining.


Introduction
Processed animal proteins (PAPs) including fish meal (FM) and meat and bone meal (MBM) are high protein feed ingredients which have been widely used over recent decades [1] . However, following an outbreak of bovine spongiform encephalopathy (BSE) suspected to have resulted from feeding cattle ruminant MBM contaminated with the BSE prion, the usage of mammalian MBM as ingredient in feed for ruminants was largely prohibited [2] . Feeding animals with PAPs from the same species was also banned since scientific advice suggested intraspecies recycling presents a risk of spreading various diseases [3] . Due to the lack of methods to identify MBM from specific animal species in feed, the feeding of MBM to all farmed animals was effectively banned [4,5] .
With the rapid development of intensive animal production, the demand for protein feed continues to increase. Within limited FM production and supply, FM has become one of the most expensive feed ingredients and its price is likely to increase in the future [6] . Significant amounts of MBM are produced [7] and MBM is much cheaper than FM. So reintroducing MBM for feeding farmed animals will both save feed costs and improve the sustainability of intensive animal production.
Some attempts have been made to evaluate collagen, fatty acids, histidine dipeptides, odor and osteocalcin of PAPs as candidate species-specific markers [8][9][10][11][12] . The protein concentration in PAPs is usually over 50%. Amino acids are the building blocks of protein and they are vital for animal growth, development, reproduction and health [13,14] . Investigations of amino acid composition differences between PAPs from different animal species have, however, been limited. Information on any differences will facilitate the safe use of PAPs, as well as provide fundamental information for developing speciesspecific identification of PAPs in the future.
A self-organizing feature map (SOFM) is a newly developed artificial neural network method that uses an unsupervised learning strategy to reveal underlying data structures and it has been widely used in pattern recognition [15][16][17][18] . It can project multidimensional data into a two dimensional hexagonal grid (map) and preserve the nonlinear relations of the input data. By comparing the similarity between the input training data and the weights of output neurons and iteratively adjusting the weights of neurons in a certain number of training runs, similar input training samples are mapped closely together and dissimilar ones separated. Each variable of all training samples is normalized between 0 and 1. Combining the output map and variable weights, the features of different samples can be visually detected and easily interpreted.
The main objective of this study was to verify the suitability of SOFM for visualizing the similarities and differences in amino acid composition between PAPs produced from different animal species.

Sampling
Forty-one FM samples from Brazil, China, Peru and USA were collected. Forty-four representative MBM samples, including 25 single-species MBM samples (6 poultry, 6 ruminant and 13 swine MBM samples) and 19 mixedspecies MBM samples were directly collected from MBM production facilities in different provinces of China. Prior to analysis, each sample was well mixed and ground using a Retsch ZM 200 mill (Retsch GmbH & Co. KG, Haam, Germany). One portion was passed through a 0.5 mm sieve and used for moisture, crude protein, crude fat and crude ash determination. The other portion was passed through a 0.25 mm sieve and used for amino acid analysis. The purity of FM samples was tested using a Leica DM2500 microscope (Leica Microsystems GmbH, Wetzlar, Germany) according to the European Union Commission Directive 2003/126/EC.

Data analysis
One-way analysis of variance (ANOVA) was used to compare the amino acid composition differences between FM and MBM samples using SPSS 17.0 software (SPSS Inc., Chicago, IL, USA). Levene's test was used to assess homogeneity of variance. If sample variances were homogenous, the LSD (least significant difference) method was used to compare their means; otherwise, Tamhane's T2 method was used. Twenty-five single-species MBM and 30 single-species FM samples were selected using the Kennard and Stone algorithm [20] based on their amino acid composition and were used to train the SOFM model. The remaining FM samples and the mixed-species MBM samples were used for validation.
Principal component analysis (PCA) [21] is an established method for unsupervised extraction of features and detecting groups in multidimensional data. PCA analysis results were compared with those obtained by SOFM analysis.
To perform SOFM analysis, a 6 Â 6 neuron hexagonal grid map with normal boundaries was used. Eigenvalues were used to initialize the weights of neurons and the training epoch was defined as 100. Other training parameters were the default values and other details as previously published [22] . To obtained the clusters from the SOFM map, the Ward's method [23] was applied to the weights of the SOFM neurons after training and neurons with different clusters were shaded differently.

Results and discussion
3.1 Statistical analysis Table 1 summarizes the results of moisture, protein, ash, fat and amino acids in PAPs from different species. The mean values for concentration of ash, fat and protein in FM were similar to those previously reported [25] . However, MBMs had higher concentrations of ash (18.6% vs 10.6%) and protein (61.8% vs 51.3%) than FM.
The ANOVA results showed that there were significant differences between FM and MBM in the mean concentrations of Asp, Glu, Gly, His, Ile, Leu, Lys, Met, Phe, Pro, Thr, Trp, Tyr and Val (P < 0.05), but not for Ala, Arg, Cys and Ser (P > 0.05). The largest difference of all amino acids was for Gly (-3.4%), followed by Lys (-2.1%) and Pro (-2.0%). Inadequacy of amino acids, especially essential amino acids, will directly affect animal growth, development, reproduction and health [13] , while oversupply of amino acids will not only increase feeding cost but also increase nitrogen excretion leading to environmental pollution [26] . Considering the existing differences, much attention should be paid to balancing amino acid compositions of MBM when used to replace FM in animal diets.
No significant differences were found between poultry and ruminant MBM samples for all 18 amino acids (P > 0.05). However, the mean concentrations of Arg, Cys, Ser and Thr in swine MBM were significantly (P < 0.05) higher than in ruminant MBM. Mean concentration of Trp in swine MBM was significantly (P < 0.05) lower than in ruminant MBM. For the remaining 13 amino acids, no significant differences were found between ruminant and swine MBM (P > 0.05). The mean concentrations of Ser and Pro in swine MBM were significantly (P < 0.05) higher than in poultry MBM, while no significant differences were observed for the other 16 amino acids (P > 0.05).

Principal component analysis (PCA)
PCA analysis was applied to the amino acid data. Figure 1a shows a plot of the first two principal components.  (Fig. 1b), Gly, Lys and Pro were found to have the greatest influence on the explained variance of PC1. However, different MBMs from different species were scattered randomly in the PCs space and no clustering was detected.
3.3 Self-organizing feature maps (SOFM) analysis Using the amino acid data as input variables, the trained SOFM map shown in Fig. 2 was constructed.
The MBM samples were located in the upper part of the map, while the FM samples were located in the lower part of the map. No FM and MBM samples were on the same neuron. However, MBMs from the same species did not group together, but were scattered on different neurons. Some MBM samples from different species were even located on the same neurons, e.g. poultry and swine MBM located on Neuron 01, and ruminant and swine MBM located on Neuron 27. The cluster analysis based on the weights of neurons after training clearly showed FM and MBM clusters, but no cluster of MBMs from a single animal species was detected (Appendix A, Fig. S1). The clustering indicated by SOFM analysis was similar to that obtained by PCA.
As can be seen in Fig. 2, PAP samples with similar amino acid composition were grouped on the same neuron or neighboring neurons, while dissimilar samples were placed on different neurons. For instance, the four FM samples in the training set located on the Neuron 36 were quite similar and no large variations were observed among all 18 amino acids (Fig. 3). Neighboring neurons from the same cluster have similar mean total amino acid (TAA) concentration, e.g. Neurons 12 and 18, and Neurons 25 and 26. Distant neurons from the same cluster differed markedly in their TAA concentrations, e.g. Neurons 03 and 36, and Neurons 01 and 34. Moreover, SOFM was quite sensitive to minor variations. For example, Neurons 12, 23 and 27 have similar TAA concentrations (about 61%), but they were not on neighboring neurons. By    (Fig. 4), it can be found that Neurons 12 and 23, where FM samples were located, had generally similar weights, but weights of His, Phe, Trp and Tyr had small variations. Compared to those two neurons, Neuron 27, where MBM samples were located, had relatively lower weights of Asp, His, Ile, Leu, Lys, Met, Phe, Thr, Trp and Tyr, but higher weights of Ala, Gly and Pro.
Notably, for those amino acids having statistical differences between MBM and FM, weight differences could also be observed visually on their weight maps (Fig. 5). It can be see that weights of Asp, Thr, Glu, Ile, Tyr Lys, Met and Trp of MBM located neurons were obviously lower than those of FM located neurons. While weights of Gly and Pro of MBM located neurons were apparently higher than that of FM located neurons. Also, similar weights of some amino acids, e.g. Ala, Arg, Cys and Ser were observed between FM located neurons and MBM located neurons.
These results were consistent to those obtained by ANOVA. Furthermore, the differences between one sample and any other sample can be compared visually by combining the results of SOFM and weight maps, which is not possible with ANOVA or PCA. For instance, the sample FM36 on Neuron 29 had the highest concentrations of Ala, Asp, Ile and Val compared to all the other samples in the training set and poultry sample P01 on Neuron 07 had the lowest concentrations of Ala, Asp, Arg, Glu, Lys, Met, Thr and Tyr compared to all the other samples in the training set. Compared to the statistics shown in Table 1 or PCA shown in Fig. 1, SOFM effectively improved the data transparency and provided more detailed information.
3.4 Partial least squares discriminant analysis (PLSDA) PLSDA was applied to the amino acid data of all PAP samples and the results are shown in Fig. 6. PLSDA confirmed the clustering obtained by PCA and SOFM analysis.
NIR technology (near infrared spectroscopy and near infrared microscopy) has been used to identify PAPs from different animal species [27,28] . Previous research showed that defatted FM and MBM can still be discriminated by NIR technology [29] . It is well known that NIR technology relies on organic composition differences to distinguish samples and amino acids are the dominant organic compositions in PAPs. The results presented above show that amino acid composition of MBM and FM are quite dissimilar. These results indicated that amino acid composition may be useful in FM and MBM identification. While poultry, ruminant and swine MBM have very similar amino acid compositions, these results partially explained why NIR technology has been unsuitable for distinguishing MBMs from different species.

Conclusions
Results from this study showed that the amino acid compositions of poultry, ruminant and swine MBM are similar, but that of MBM and FM are quite different, especially for Gly, Lys and Pro. Amino acid composition should be useful in FM and MBM identification by various means. SOFM visually presented the similarities and differences of amino acid composition in PAP samples from different animal species and effectively improved data transparency. The SOFM analysis was highly consistent with the results of ANOVA and PCA, but more straightforward. This has demonstrated that SOFM is a powerful means to link different samples for further data mining.