An intelligent identification method for foreign fibers in seed cotton based on hyperspectral imaging with the PCA-AlexNet model

Ling ZHAO; Quang LI; Xin YU; Yaning CHANG

doi:10.15302/J-FASE-2025639

Front. Agr. Sci. Eng. ›› 2025, Vol. 12 ›› Issue (4) : 883 -899. DOI: 10.15302/J-FASE-2025639

RESEARCH ARTICLE

An intelligent identification method for foreign fibers in seed cotton based on hyperspectral imaging with the PCA-AlexNet model

Author information +

History +

PDF (5765KB)

Abstract

The integration of hyperspectral imaging and deep learning for foreign fiber detection has primarily focused on plastic film. However, detecting various foreign fibers in long-staple cotton, particularly those that are white, transparent or similar in color, remains a significant challenge. The spectral response differences of various foreign fibers across different wavelengths are significant, which makes hyperspectral multi-target detection more complex. To address this challenge, a hyperspectral identification algorithm is proposed. First, hyperspectral image of the experimental samples are captured, and principal component analysis (PCA) is applied to select the optimal feature bands for recognition by a convolutional neural network. Next, the AlexNet model is fine-tuned to optimal parameters using the primary feature bands. After multidimensional experimental validation, the PCA-AlexNet model efficiently identifies foreign fibers. Finally, after analyzing the experimental results from multiple perspectives, the fiber identification model is identified as PCA-AlexNet-23. The results show that the PCA-AlexNet-23 model excels in identifying multiple fibers, achieving an overall accuracy of 97.2%, an average accuracy of 95.2%, and a Kappa coefficient of 93.1%. These accuracy rates outperform those of a support vector machine, artificial neural network, LDA-VGGNet and LDA-LeNet models. In the experimental tests, the overall foreign fiber removal rate exceeds 85%.

Graphical abstract

Keywords

Long-staple cotton / foreign fibers / hyperspectral image / principal component analysis / convolutional neural network

Highlight

	● The multi-object identification and classification fusion of a deep learning model and hyperspectral imaging have been effectively realized.
	● PCA is utilized to select the optimal feature bands for each foreign fiber, reducing redundancy in hyperspectral data and minimizing training time costs.
	● The PCA-AlexNet-23 model is verified that expresses remarkable efficacy in classifying different foreign fibers in seed cotton.
	● The model is especially applicable to fibers that are white, transparent, or similar in color to the seed cotton.

Cite this article

Download citation ▾

Ling ZHAO, Quang LI, Xin YU, Yaning CHANG. An intelligent identification method for foreign fibers in seed cotton based on hyperspectral imaging with the PCA-AlexNet model. Front. Agr. Sci. Eng., 2025, 12(4): 883-899 DOI:10.15302/J-FASE-2025639

登录浏览全文

4963

注册一个新账户忘记密码

1 Introduction

The Xinjiang long-staple cotton in China is of excellent quality and is widely used in the production of premium textiles^[¹^]. In addition to stems, leaves and plastic film mixed during mechanical picking, other potential impurities may also be present. There may also be contamination with human hair, polypropylene fibers, feathers, packaging rope and other impurities during packaging, storage, drying, transportation and procurement^[²–⁴^]. The efficient cotton production technology in Xinjiang, China primarily focuses on fine staple cotton. However, the mechanized harvesting and subsequent processing technology for long-staple cotton is still in its early stages^[⁵,⁶^]. Currently, the cleaning process for long-staple cotton in China relies on manual picking, which poses several challenges. Workers involved in this picking process often experience visual fatigue, which can affect the accuracy and consistency of foreign fiber detection^[⁷–⁹^]. Efficient removal of various foreign fibers from long-staple cotton requires high-quality image acquisition, along with effective processing and feature extraction technologies, which are essential for automatic fiber removal^[¹⁰^].

Many methods in terms of the problem of identification of foreign fiber in cotton have been proposed. Yang et al.^[¹¹^] segmented color images of fibers into salient regions, calculating the characteristic values of the three RGB channels and merging them into a color significance map. Zhao et al.^[¹²^] used an image segmentation technique to extract 75 feature vectors from various foreign fibers in cotton, including cloth, feathers, human hair, twine, polypropylene fibers and plastic film. A two-step grid search strategy was used by these researchers to optimize the kernel extreme learning machine. Then, the 75 feature vectors were used to classify fibers, which achieving an accuracy of 93.6%. These studies effectively detected and identified foreign fibers with distinct color characteristics. However, the selection of image segmentation thresholds and color features relies on manual experience. Mustafic and coworkers^[¹³,¹⁴^] used blue and violet light to identify the optimal excitation light source, thereby developing a fluorescence imaging system for detecting cotton pollutants. Characteristic information of foreign fibers was manually extracted using RGB and HSV color models. These color features were then applied in linear discriminant analysis (LDA) for classification. Their results show that the fluorescence reaction provides superior recognition. However, the recognition performance of the non-fluorescence reaction is still unclear.

The studies mentioned above primarily relied on the color characteristics of cotton fibers for classification. However, misclassification can occur when fibers have similar colors. Also, the presence of colorless and transparent plastic film, which do not exhibit fluorescence, further complicates the classification process. Therefore, it is necessary to develop a flexible multifiber vision identification method.

Hyperspectral imaging is a novel technology that non-destructively captures both spatial and spectral information for each pixel of an object. The line-scanning-based hyperspectral imaging system has been shown to effectively detect and differentiate various fiber types in seed cotton^[¹⁵,¹⁶^]. Zhang et al.^[¹⁷^] conducted a study in which they collected hyperspectral image of fibers in cotton wool. The optimal spectral bands were selected using minimum noise fraction and minimum redundancy maximum relevance. LDA and support vector machine (SVM) were then used to classify 14 types of fibers. Their results demonstrated high overall accuracy, although the cotton wool samples prepared by hand were ideal. Chang et al.^[¹⁸^] collected hyperspectral image of five impurities in seed cotton and classified them using SVM, LDA and artificial neural network (ANN). Their results indicated that the classification accuracies were 83.4% for SVM, 86.2% for LDA and 81.8% for ANN. Despite the ability of hyperspectral imaging to overcome the limitations of color features, misclassifications still occurred in the recognition of plastic film. Both SVM and ANN are effective in modeling the nonlinear relationship between the spectrum and the sample. However, SVM is required careful optimization of the kernel function parameters, making it challenging to apply in multiclass classification tasks. As a shallow learning model, ANN struggles with complex model applications^[¹⁹–²¹^]. Standard machine learning algorithms rely heavily on image segmentation and feature extraction, which poses a challenge to accurately classify non-plant fibers.

Deep learning models effectively capture and interpret complex samples in hyperspectral data by using rich spatial and spectral information^[²²^]. Wang et al.^[²³^] developed a hyperspectral imaging-based rapid detection method for insect-damaged maize seeds, proposing a hybrid 1D-CNN (on dimensional convolutional neural network) model integrating spectral and texture features. Their model demonstrated optimal performance with both F1-score and accuracy reaching 0.96, achieving efficient classification using only two band images. Pourdarbani et al.^[²⁴^] compared 2D-CNN and 3D-CNN performance in hyperspectral image analysis for orange bruise detection, constructing shallow (7-layer) and deep (18-layer) CNN models to process 2D single-band images and 3D full-spectral data respectively. The 3D-CNN-18 attained the highest accuracy (94%), outperforming 2D models. Additionally, Diao et al.^[²⁵^] proposed a lightweight 3D-CNN model incorporating depthwise separable convolutions and skip connections to reduce parameters, which achieved an average recognition accuracy of 98.6% for rapid identification of corn seedlings and weeds in hyperspectral images. In addition, some scholars have combined hyperspectral imaging and deep learning to detect plastic film in cotton. Ni et al.^[²⁶^] proposed a plastic film recognition algorithm that integrates hyperspectral imaging with deep learning. Their algorithm combined a variable-wise weighted autoencoder (VW-SAE) and ANN to enhance feature extraction, followed by classification using an extreme learning machine, achieving a recognition accuracy exceeding 95%. Also, Liu et al.^[²⁷^] proposed a plastic film recognition algorithm for seed cotton that uses hyperspectral imaging and CNN, achieving a detection accuracy of 96.5%. Earlier research demonstrated that hyperspectral imaging technology shows great potential for plastic film recognition in seed cotton. However, challenges remain in hyperspectral multi-target foreign fiber detection. The spectral responses of foreign fibers vary significantly across wavelengths, making the detection process more complex. This technology serves as a theoretical basis for identifying various foreign fibers in seed cotton. Building on this, the paper proposes a method based on hyperspectral imaging with PCA-AlexNet model to identify foreign fibers in long-staple cotton, starting from the seed cotton stage.

Two major outcomes arose from this work. (1) The multi-object classification fusion of a deep learning model and hyperspectral imaging was effectively realized. Based on multiple experiments, PCA was used to identify the optimal feature bands for each foreign fiber, reducing redundancy in hyperspectral data and minimizing training time costs. (2) Based on the optimized feature bands and the favorably adjusted AlexNet parameters, the PCA-AlexNet model was able to automatically extract feature information that demonstrates remarkable efficacy in classifying different foreign fibers in seed cotton. It is especially applicable for fibers that are white, transparent or similar in color to the seed cotton.

2 Materials and methods

2.1 Integral technical route

The integral technical route for identifying various foreign fibers used in this study is shown in Fig.1. Step 1, the hyperspectral image of seed cotton mixed with various foreign fibers, which are acquired by a hyperspectral camera, were optimized through spectral reflectivity correction and image preprocessing. Step 2, the primary characteristic bands of hyperspectral data were selected using PCA. Step 3, the hyperspectral data after PCA processing was sent into the convolutional neural network for training and learning, which is mainly used to adjust the optimal structure and parameters of the CNN model. Step 4, the experimental results of the model were comprehensively examined by analyzing the accuracy and loss curves, model evaluation indices, confusion matrix and result visualizations.

2.2 Materials and instruments

Based on field investigation findings, the typical foreign fibers in seed cotton include residual plastic film, cotton boll hull, cotton stalk, leaf fragments, human hair, polypropylene fiber (from broken fertilizer bags) and colored thread (used for binding), among others. The basic description of foreign fibers is shown in Tab.1. Also, cotton is susceptible to contamination by Aspergillus flavus during harvesting and processing. Prolonged exposure to sunlight can lead to yellowing, known as yellow cotton, which also impacts cotton quality^[²⁸^]. The experiment used 100 g of long-staple cotton (seed cotton), mechanically harvested from Awati County, Aksu Prefecture, in the Xinjiang Uygur Autonomous Region. Additionally, a mixture of these foreign fibers and seed cotton was carefully selected and thoroughly blended to form the experimental sample, which measures 300 mm × 200 mm with a thickness of 2–3 cm.

Fig.2 illustrates the hyperspectral imaging system. The Gaia Sorter-Dual, a full-band hyperspectral sorter, operated with a computer system and was composed of several components, including a halogen lamp, spectral camera, platform lifting device, platform conveyor and camera distance adjustment device. The hyperspectral image was captured using a hyperspectral sorter controlled by SpecView software. It had a resolution of 732 × 800 pixels and covered a spectral range in the shortwave infrared, with a total of 150 bands. The imaging process captured data line by line in the X direction, followed by advancing the platform conveyor. Then, the aligned detectors scanned a narrow strip, completing a longitudinal scan in the Y direction. By integrating data from both transverse and longitudinal scans, a complete hyperspectral image of the target sample was obtained.

2.3 Spectral reflectivity correction

Hyperspectral cameras are sensitive to environmental factors, including variations in light intensity and angle, as well as the presence of dark currents within the camera. These factors can introduce noise interference in hyperspectral image, emphasizing the need for reflectivity correction^[²⁹^]. The formula is expressed as:

(1)

I n e w = I r a w − I d a r k I w h i t e − I d a r k

where,

I n e w

is the corrected image,

I r a w

is the original image,

I w h i t e

is the whiteboard corrected image and

I d a r k

is the blackboard corrected image.

2.4 Image preprocessing

The ENVI 5.3 (64-bit) software is used to preprocess the hyperspectral data while preserving the feature scale of the original data. Subsequently, three bands, 50, 100 and 150, were selected to generate 2D and 3D images respectively. Since 2D images needed to have added labels, the 2D hyperspectral data images were annotated with 585,600 samples (732 × 800 pixels) as shown in Tab.2. The labeled 2D and 3D images were then converted into Python-compatible formats for further analysis.

2.5 Reflectivity curve

In the visible light spectrum, RGB images are limited in their ability to accurately distinguish foreign fibers that are white, transparent or similar in color to seed cotton. Examples of such fibers include plastic film, polypropylene fiber and white packaging rope. Conversely, in a hyperspectral image, each individual pixel encompasses multiple bands, allowing for the fitting of reflectivity data into a nearly continuous curve known as a spectral curve. The ENVI 5.3 (64-bit) software was used to visually represent hyperspectral data and generate spectral curves, as shown in Fig.3. Reflectance curves of different objects varied in the number, size and distribution of local peaks and valleys. Hence, the analysis of spectral curves can be served as a theoretical foundation for identifying and classifying foreign fiber in seed cotton.

2.6 PCA dimension reduction

Principal component analysis (PCA) is a linear dimensionality reduction technique that projects high-dimensional data into a lower-dimensional subspace. It achieves this by transforming the data into a new coordinate system, where the axes (principal components) align with the directions of maximum variance, thus capturing the most significant features of the data set^[³⁰^]. PCA effectively suppresses noise by extracting principal components while retaining key spectral features relevant to foreign fibers. The processed low-dimensional data are then used as input to deep learning models, leveraging their automatic feature extraction capabilities to achieve spectral-spatial multimodal feature fusion^[³¹,³²^]. The solution steps are as follows:

Set the hyperspectral dataset

(m × n) × Q

resolution is

m × n

(width and height of image) giving a total of

Q

bands.

(2)

x i = [x 1, x 2, . . ., x Q] i T (i = 1, 2, . . ., m × n)

Centralized correction of data set:

(3)

x ¯ = 1 m × n ∑ i = 1 m × n [x 1, x 2, . . ., x Q] i T

Myopically solving for the eigenvalues of the covariance matrix:

(4)

A x = 1 m × n ∑ i = 1 m × n (x i − x ¯) (x i − x ¯) T = C D C T

where,

β 1, β 2, . . ., β Q

is the eigenvalue of

A x

, thus forming the diagonal matrix

D = d i a g (β 1, β 2, . . ., β Q)

. The eigenvectors corresponding to

β 1, β 2, . . ., β Q

form the orthogonal vector

C = (c 1, c 2, . ., c Q)

. The corresponding eigenvalues and eigenvectors are sorted in descending order:

β 1 ⩾ β 2 ⩾ ⋅ ⋅ ⋅ ⩾ β Q

The eigenvectors

c j T (j = 1, 2, . . ., K)

corresponding to the first K eigenvalues of the matrix

C T

are retained and the projection matrix in K-dimensional space is obtained. The low-dimensional space for model training can be expressed through the dot product operation of hyperspectral data and the projection matrix as follows:

(5)

Z i = [c 11 c 12 … c 1 K … c 1 Q c 21 c 22 … c 2 K … c 2 Q ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ c K 1 c K 2 … c K K … c K Q] ⋅ [x 1, x 2, …, x K, …, x Q] T (i = 1, 2, …, m × n)

2.7 Adjustment of convolutional neural network model

A CNN has the characteristics of local coupling, weight distribution, strong generalization ability, translation invariance and avoidance of redundant model training parameters, which makes them outstanding in the field of hyperspectral image classification^[³³,³⁴^]. However, 1D-CNN is limited in their ability to use spatial feature information, whereas 3D-CNN can directly extract both global spectral and spatial information from the original hyperspectral image. Nevertheless, 3D-CNN is also hindered by several drawbacks, including a large number of parameters, high computational complexity and extended training times. Hence, this research focused on using 2D-CNN, which can effectively extract joint features from the local spatial spectrum of target samples for hyperspectral image classification and recognition. The ability to use multiple GPUs for model training has contributed to AlexNet’s exceptional performance in large-scale image recognition^[³⁵^]. To improve the adaptation of the model to AlexNet, the relevant structure and parameters were adjusted in this study. Fig.4 illustrates the structure of the modified AlexNet model. The convolution layer uses an efficient ReLU function to optimize convolution operations. The 2D-CNN output formula for extracting features from hyperspectral image is denoted as^[³⁶^]:

(6)

p l, i x, y = f (∑ p ∑ h = 0 H l − 1 ∑ w = 0 W l − 1 k l, i, p h, w p l − 1, p x + h, y + w + b l, i)

where,

p l, i x, y

is the output value of the

i th

feature map in

l th

layer at position

(x, y)

f (⋅)

is the activation function,

b l, i

is the offset item,

k l, i h, w

is the output value of the

i th

convolution kernel in

l th

layer at position

(h, w)

p l − 1, p x + h, y + w

is the output value of the

p th

feature map in

(l − 1) th

layer at position

(x + y, h + w)

H l

presents the height and

W l

is the width of the convolution kernel.

Following the feature mapping in the convolutional layer, the addition of a subsampling layer (pooling layer) became necessary. The max pooling method was mostly used in the study, which indirectly mitigates the issue of overfitting^[³⁷^].

The fundamental architecture of a CNN is constructed by sequentially stacking convolutional and pooling layers. The fully connected layer, serving as the classifier output, was placed at the end of the CNN. Probability distributions for each category were output by Softmax regression. This paper used the cross-entropy loss function to address the multi-classification problem. Softmax regression activated the input values into corresponding probability distributions, which were then used to calculate the loss function.

Neural network optimization typically involves two main steps. Step 1, the forward propagation algorithm calculates the model’s output, which is then compared to the actual output to assess the model’s performance, and Step 2, the backpropagation algorithm calculates the gradient of the objective function. The model weights and parameters are then fine-tuned from the output layer to the input layer using SGD, ensuring the optimization of all parameters^[³⁸^].

The batch normalization (BN) is used to address the issue of internal covariate drift, optimizing the training process for neural networks. Dropout was used in this study to mitigate overfitting in the network^[³⁹^]. Generally, using BN and dropout independently helps prevent overfitting and accelerates model training. However, when combined, these techniques can lead to performance degradation due to the variance-bias tradeoff. To address this, one solution is to position the dropout layer after the BN layer. This arrangement helps mitigate variance deviation and effectively resolves the exclusion problem associated with both techniques^[⁴⁰,⁴¹^].

The specific parameter values are presented in Tab.3^[⁴²–⁴⁴^]. The AlexNet model consists of 3 convolutional groups (each containing 1 convolutional layer and 1 pooling layer), 2 additional convolutional layers and 1 fully connected layer (comprising 1 flatten layer and 1 dense layer). In the input layer, 5 × 5 represents the manually divided image size for the input convolutional neural network, while D is the data dimension obtained after using PCA dimension reduction algorithm.

3 Results and discussion

3.1 Train of AlexNet model

Based on the hyperspectral image processing steps outlined above, the original hyperspectral data underwent multilevel dimensionality reduction and was input into the AlexNet model for training. The accuracy and loss rate curves for each dimension varied with the number of training iterations, as shown in Fig.5 and Fig.6. The vertical axis represents accuracy and loss, while the horizontal axis represents the training batch.

As shown in Fig.5, the experimental data analysis was as follows. The accuracy and loss rate curves for the validation set across 100 training batches in Fig.5 failed to converge stably and had significant fluctuations. Specifically, at batch 100, the validation accuracy in Fig.5 was about 0.87 with a the loss rate of 0.3, indicating overfitting of the model. A similar pattern was observed as shown in Fig.5, where the final loss value is high, suggesting mild overfitting. In Fig.5, the model performed well on the training set, but the accuracy curve for the validation set begins to converge around the 80th batch, with an overall increasing trend. The final value of the loss rate curve was about 0.17, showing no convergence and considerable fluctuations. The accuracy curve in Fig.5 began to converge at the 60th batch, while the loss rate curve started to converge at the 70th batch. Both curves then stabilize, indicating improved performance.

In Fig.6, it is evident that as the dimension of the PCA-Alexnet model increased, the fluctuations in the two training curves tended to become more stable. It is worth mentioning that with advanced convergence batches, the accuracy rate gradually improves and the loss value gradually decreases. Also, the difference between the verification set and the training set decreased gradually. In the case of the Fig.6, the two curves began to converge in the 40th batch, with minimal fluctuations. There was also a gradual convergence of local peak and trough changes on the curve. Ultimately, the accuracy rate of the final verification set reached 0.96, while the loss rate settled at about 0.14. In Fig.6, the training process started to show signs of convergence at the 30th batch, while at the 40th batch the data for accuracy and loss were essentially unchanged. Compared to Fig.6, Fig.6 shows a gradual decrease in the magnitude of change in accuracy and loss values, demonstrating more stable and superior performance.

As the dimensionality of retained hyperspectral data increased, it was be observed that the magnitude of the increase in accuracy and loss rate gradually decreased. Additionally, the reduction in the magnitude of data bias between the validation set and the training set also gradually decreased. The accuracy and loss rate (Fig.6) reach their peak at the 30th batch, demonstrating a stable trend overall. The accuracy of the final validation set was 0.97, with a corresponding loss value of about 0.12. Fig.6 shows training set accuracies close to 1 and losses close to 0. By the 100th batch, Fig.6 shows comparable accuracy and loss rates to Fig.6, indicating excellent generalization performance.

Fig.5 and Fig.6 show that the accuracy and loss curves fluctuated during hyperspectral data training. These fluctuations can be attributed to: data imbalance, which biases predictions toward more frequent classes and destabilizes the loss values; noise from random neuron dropout; and mini-batch updates in the SGD optimizer, which may cause biased gradient estimates, leading to curve fluctuations. However, as dimensionality increased, the hyperspectral data retained more feature information, resulting in higher accuracy and lower loss, with diminishing fluctuation amplitude. This indicates that training stability improves, with no significant impact on model performance from the curve fluctuations.

3.2 Analysis of confusion matrix

The training results show that the PCA-23 and PCA-26 models had good convergence and stability, indicating strong generalization and robustness. In model testing, the performance can be evaluated through data analysis of the confusion matrix. The diagonal of the confusion matrix represents the number of correctly predicted samples. The rows represent the actual number of samples, while the columns represent the predicted number, as shown in Fig.7 and Fig.8.

The PCA-23 model, trained on the improved AlexNet architecture, achieves an overall accuracy exceeding 90% for all categories except for human hair. Specifically, the accuracy for categories such as plastic film, cotton boll shell, colored thread, cotton stalk, background, yellow cotton, and cotton exceeds 95%. Small targets are susceptible to noise interference due to limited spectral information and a low signal-to-noise ratio. This interference hampers the extraction of both spectral and spatial features. The lack of human hair samples and the small range of pixel sizes contributes to imbalanced data, which leads to lower accuracy when classifying human hair. Nevertheless, the classification results of the PCA-23 model still meet the requirements for fiber identification, which is notable.

The classification results of the PCA-26 model indicated an accuracy for human hair classification of 87.4%, and for polypropylene fiber of 90.0%. In the PCA-26 model, the classification accuracy for both human hair and polypropylene fiber decreased compared to PCA-23, while the accuracy for other categories showed only minor changes. As the dimensionality of hyperspectral data increases, the feature space becomes more complex and high-dimensional, leading to increased sample sparsity. As a result, the relationships between target samples become more ambiguous, making it difficult to accurately distinguish feature differences. Meanwhile, due to the increase of spectral dimension, there may be some correlation between many bands, which leads to redundant information in the feature space. Redundant information interferes with the learning process of the model and reduces classification accuracy.

In summary, increasing the dimensionality of hyperspectral data leads to issues with ambiguous relationships and redundant information, which affect the classification performance of the AlexNet model. However, the PCA-AlexNet-23 model performed well in all aspects.

3.3 Model evaluation index

To evaluate the performance of the classification algorithm more accurately, it is essential to use various evaluation indicators, such as overall accuracy (OA), average accuracy (AA), and the Kappa coefficient. The OA, AA and Kappa coefficients were calculated using the actual and predicted classification samples from the confusion matrix. Note, as the value of the evaluation coefficients increases, the reliability of the classification improves.

Fig.9 shows the PCA-AlexNet-X evaluation index line chart with error bars representing the standard deviation for n = 3. It shows that both the OA and Kappa coefficient steadily increased, while the AA remained relatively stable at about 95% after PCA-17. As the PCA retention dimension increased, the model better captured the data characteristics, leading to improved OA and Kappa coefficients. However, when the PCA dimension was low and failed to capture the data characteristics fully, misclassifications occurred, resulting in abnormal AA.

Additionally, the OA and Kappa coefficient increased inversely with the PCA dimension. However, after reaching PCA-20, the trend in the OA and Kappa coefficients stabilized. Therefore, for PCA-23, increasing the retention dimension of hyperspectral data did not significantly improve accuracy, but it may increase the model training time. Additionally, dimensional redundancy may result in decreased classification performance. For multiclass foreign fiber classification, retaining a hyperspectral data dimension of 23 is most appropriate, consistent with the experimental results above.

3.4 Visualization of results

To visualize the test results of the PCA-AlexNet model, the Spectral tool library in PyCharm is used to plot and display the predicted results as a two-dimensional image. Fig.10 indicates the true distribution of foreign fibers, with the resolution of 732 × 800 pixels. The real spatial positions of the different fibers, which are represented in different colors, are shown to clearly compare the actual results with the model predictions.

Fig.11 shows the predicted distribution of foreign fibers (PCA-5, 8, 11, 14). The PCA-5 and PCA-8 models had a significant amount of misclassification, primarily misclassify seed cotton as polypropylene fiber and plastic film. The main reason was that the hyperspectral data had a low dimension and retains less important feature information, which led to serious misclassification. The classification performance of the PCA-11 and PCA-14 models for foreign fibers was better. Although a small number of seed cotton samples were misclassified as polypropylene fiber, primarily as plastic film. Additionally, a large number of spurious data points indicate poor classification performance for plastic film and seed cotton. Fig.12 shows the predicted distribution of foreign fibers (PCA-17, 20, 23, 26). A small amount of misclassification still occurred in the PCA-17 and PCA-20 models. To address this issue, methods such as image threshold segmentation, enhancement, and denoising can be used to clean up misclassifications. Given the concentrated distribution and limited pixel range of human hair as a foreign fiber, care must be taken to avoid mistakenly removing genuine human hair samples during image processing to filter out outliers.

Given the difficulty in identifying small target foreign fibers, increasing the spectral dimension of hyperspectral data can help mitigate the probability of misclassification. Therefore, it was observed that the classification of the test results of PCA-23 and PCA-26, which are distinct on the whole, showed a small difference between them and the actual samples. Although there was some noise in the data point distribution, it did not affect the results and could be removed using relevant image processing methods. As mentioned above, the confusion matrix of the PCA-26 model had a downward trend in the recognition accuracy of polypropylene fiber and human hair. Additionally, computational cost must also be considered.

Morphological processing applied to the predicted results of PCA-23 effectively reduced the impact of noise caused by variations in illumination, static electricity and artificial labeling. Additionally, this technique helped eliminate spurious data points from small foreign target fibers and artifacts along the image edges. In this study, the Gaussian filter and opening operation were applied for image morphology. These techniques were used to enhance the image boundary, facilitate the isolation of the target sample, and improve the classification of foreign fiber in seed cotton. Fig.13 shows the processing results of PCA-AlexNet-23 model. Seed cotton and the background are grouped together, while plastic film, cotton boll hull, leaf fragments, human hair, polypropylene fiber, colored thread, cotton stalk and yellow cotton are classified as foreign fibers, forming two distinct categories. After morphological processing, the noise and spurious data points in the image were largely eliminated. Although a few human hair pixels were mistakenly deleted, this did not affect the accuracy of foreign fiber identification.

3.5 Comparison with other models

Tab.4 compares the accuracy of different models. Various models were used for validation in this study, and the PCA-AlexNet model had higher accuracy in recognizing foreign fibers. For complex hyperspectral data, SVM and ANN models have challenges in parameter tuning and have a higher risk of overfitting, resulting in lower accuracy. In multiclass classification, LDA extracts only a limited number of feature bands, making it difficult to capture the complex information of hyperspectral data. PCA reduces the dimensionality of the data while maximizing the retention of original information. VGGNet has excessive parameters, which can lead to overfitting, while LeNet is not sufficiently scaled and has a simpler structure. As a result, the accuracy of LDA-VGGNet and LDA-LeNet models is lower than that of PCA-AlexNet.

3.6 Experimental sorting test

Fig.14 shows the experimental sorting test system. According to the above, the model PCA-AlexNet-23 was selected to verify the feasibility of the hyperspectral multifiber separation proposed in this study. The method was integrated in a Nvidia RTX2080 Ti GPU server and connected with a sorting system to identify various types of fibers in seed cotton. The GPU server obtained the coordinates of the fiber and transmitted them to the control center. According to the running speed of the conveyor belt, the control center adjusted the starting time of the spray valve system to remove the fiber in the seed cotton. Also, the hyperspectral camera equipped with a dome halogen light source enabled 360° full-angle illumination, effectively addressing the dark zones caused by cotton surface depressions and the impact of lighting variations. The integrated multi-angle dome light source, rated at 800 W, ensured light uniformity greater than 95% within the illuminated space, significantly enhancing the accuracy of foreign fiber detection.

Thicker cotton layers scatter and absorb light during penetration, reducing the intensity reaching the foreign fiber surface and resulting in poor imaging quality. In some cases, thick cotton layers may obstruct foreign fibers, leading to missed detections. To address this, the transmission mode of the hyperspectral camera can be used to capture foreign fiber information obscured by the cotton layers. In the experiment, cotton thickness on the conveyor belt was set between 3 and 8 cm, with varying layer thicknesses introduced. Hyperspectral imaging was performed while adjusting the focal length of the camera, and the results were recorded.

However, potential challenges remain in the system. The hyperspectral camera is sensitive to temperature and humidity due to exposure to external conditions. Prolonged operation may cause contamination of optical components or calibration shifts, requiring frequent maintenance. During operation, coordination control of conveyor belt speed, hyperspectral camera processing and spray valve response require frequent adjustments, and the system may become unstable due to algorithm delays or hardware load fluctuations. The real-time sorting performance of the system is insufficient, requiring further optimization of the core processor, FPGA acceleration and communication system improvements. The experiment provided preliminary validation of feasibility. After many experimental sorting tests, the removal rate of foreign fiber was above 85%. However, small impurities such as leaf fragments and human hair caused the unintended removal of some cotton fibers.

4 Conclusions

A hyperspectral multifiber intelligent identification method based on PCA-AlexNet is proposed after being verified. Based on the optimal feature band and the adjusted AlexNet model, multidimensional experimental verification was conducted. Overall, the PCA-AlexNet-23 model achieved excellent performance with an OA of 97.2%, AA of 95.2% and a Kappa coefficient of 93.1%, demonstrating strong capability for detecting various foreign fibers in seed cotton, especially fibers that are white, transparent or similar in color to the seed cotton. In the experimental sorting test, the removal rate of foreign fibers in seed cotton was more than 85%. The method outlined in this paper offers theoretical support to the photoelectric separation and detection technology of foreign fibers in mixed machine picking of long-staple cotton (seed cotton). It could be instrumental in facilitating the efficient and fully automated mechanization process of Xinjiang long-staple cotton.

However, the variety of foreign fibers in long-staple cotton exceeds the eight types tested in the study, and detection accuracy is lower for smaller fibers. Future research should focus on expanding the foreign fiber data set and enhancing data preprocessing techniques. Additionally, exploring complementary data fusion methods will be crucial for further optimizing hyperspectral multi-target recognition algorithms.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Liang Z, Cui G, Xiong M, Li X, Jin X, Lin T. YOLO-C: an efficient and robust detection algorithm for mature long staple cotton targets with high-resolution RGB images.Agronomy, 2023, 13(8): 1988

[2]	Zhang D, Liu H, Hu W, Qin X, Ma X, Yan C, Wang H. The status and distribution characteristics of residual mulching film in Xinjiang, China.Journal of Integrative Agriculture, 2016, 15(11): 2639–2646

[3]	Feng Z, Zhao L, Huangfu Z, Liu Z, Dong Z, Yu X, Han J, Zhou G, Wu Y. Bionic design of a winding roller and experiments for cleaning long foreign matter from raw cotton.Applied Sciences, 2022, 12(19): 10003

[4]	Zhao X, Guo X, Luo J, Tan X. Efficient detection method for foreign fibers in cotton.Information Processing in Agriculture, 2018, 5(3): 320–328

[5]	Zheng Z, Xu H, Lin T, Guo R, Wang L, Cui J, Zhang D, Wei X, Nusirat O. Evolution tendency and comprehensive evaluation of long-staple cotton bred varieties in Xinjiang. Journal of China Agricultural University, 2022, 27(06): 55−70 (in Chinese)

[6]	Wang X, Lin T, Cui J, Zhang P, Tang Q, Guo R, Wang L, Shao Y. Effects of planting patterns and irrigation quotas on light interception at different canopy layers and yield formation of machine-picked sea-island cotton. Chinese Journal of Ecology, 2024, 43(3): 741−748 (in Chinese)

[7]	Wei W, Zhang C, Deng D. Content estimation of foreign fibers in cotton based on deep learning.Electronics, 2020, 9(11): 1795

[8]	Yang S, Li J, Li Y, Nie J, Ercisli S, Khan M A. Imbalanced segmentation for abnormal cotton fiber based on GAN and multiscale residual U-Net.Alexandria Engineering Journal, 2024, 106: 25–41

[9]	Fisher O J, Rady A, El-Banna A A A, Watson N J, Emaish H H. An image processing and machine learning solution to automate Egyptian cotton lint grading.Textile Research Journal, 2023, 93(11−12): 2558–2575

[10]	Shunqi M, Kai L, Xichun W, Jia C. Analysis of progress in research on technology for the detection and removal of foreign fibers.Textile Research Journal, 2025, 95(1−2): 173–193

[11]	Yang W, Li D, Wang S, Lu S, Yang J. Saliency-based color image segmentation in foreign fiber detection.Mathematical and Computer Modelling, 2013, 58(3−4): 852–858

[12]	Zhao X, Li D, Yang B, Liu S, Pan Z, Chen H. An efficient and effective automatic recognition system for online recognition of foreign fibers in cotton.IEEE Access: Practical Innovations, Open Solutions, 2016, 4: 8465–8475

[13]	Mustafic A, Li C, Haidekker M. Blue and UV LED-induced fluorescence in cotton foreign matter.Journal of Biological Engineering, 2014, 8(1): 29

[14]	Mustafic A, Li C. Classification of cotton foreign matter using color features extracted from fluorescent images.Textile Research Journal, 2015, 85(12): 1209–1220

[15]	Zhang H, Li D. Applications of computer vision techniques to cotton foreign matter inspection: a review.Computers and Electronics in Agriculture, 2014, 109: 59–70

[16]	Zhou C, Li Z, Wang D, Xue S, Zhu T, Ni C. SSNet: exploiting spatial information for tobacco stem impurity detection with hyperspectral imaging.IEEE Access: Practical Innovations, Open Solutions, 2024, 12: 55134–55145

[17]	Zhang M, Li C, Yang F. Classification of foreign matter embedded inside cotton lint using short wave infrared (SWIR) hyperspectral transmittance imaging.Computers and Electronics in Agriculture, 2017, 139: 75–90

[18]	Chang J, Zhang R, Pang Y, Zhang M, Zha Y. Classification of impurities in machine-harvested seed cotton using hyperspectral imaging. Spectroscopy and Spectral Analysis, 2021, 41(11): 3552−3558 (in Chinese)

[19]	Jayaprakash C, Damodaran B B, Viswanathan S, Soman K P. Randomized independent component analysis and linear discriminant analysis dimensionality reduction methods for hyperspectral image classification.Journal of Applied Remote Sensing, 2020, 14(3): 036507

[20]	Malek S, Melgani F, Bazi Y. One-dimensional convolutional neural networks for spectroscopic signal regression.Journal of Chemometrics, 2018, 32(5): e2977

[21]	Cui S, Ling P, Zhu H, Keener H M. Plant pest detection using an artificial nose system: a review.Sensors, 2018, 18(2): 378

[22]	Li Z, Wang D, Zhu T, Tao Y, Ni C. Review of deep learning-based methods for non-destructive evaluation of agricultural products.Biosystems Engineering, 2024, 245: 56–83

[23]	Wang Z, Fan S, An T, Zhang C, Chen L, Huang W. Detection of insect-damaged maize seed using hyperspectral imaging and hybrid 1D-CNN-BiLSTM model.Infrared Physics & Technology, 2024, 137: 105208

[24]	Pourdarbani R, Sabzi S, Zohrabi R, García-Mateos G, Fernandez-Beltran R, Molina-Martínez J M, Rohban M H. Comparison of 2D and 3D convolutional neural networks in hyperspectral image analysis of fruits applied to orange bruise detection.Journal of Food Science, 2023, 88(12): 5149–5163

[25]	Diao Z, Yan J, He Z, Zhao S, Guo P. Corn seedling recognition algorithm based on hyperspectral image and lightweight-3D-CNN.Computers and Electronics in Agriculture, 2022, 201: 107343

[26]	Ni C, Li Z, Zhang X, Sun X, Huang Y, Zhao L, Zhu T, Wang D. Online sorting of the film on cotton based on deep learning and hyperspectral imaging.IEEE Access: Practical Innovations, Open Solutions, 2020, 8: 93028–93038

[27]	Liu Z, Zhao L, Yu X, Zhang Y, Cui J, Ni C, Zhang L. Intelligent identification of film on cotton based on hyperspectral imaging and convolutional neural network. Science Progress, 2022, 105(4): 00368504221137461

[28]	Dochia M, Sirghie C, Kozłowski R M, Roskwitalski Z. Handbook of Natural Fibres. Cambridge: Woodhead Publishing Press, 2012

[29]	Zhao J, Pan F, Li Z, Lan Y, Lu L, Yang D, Wen Y. Detection of cotton waterlogging stress based on hyperspectral images and convolutional neural network.International Journal of Agricultural and Biological Engineering, 2021, 14(2): 167–174

[30]	Wu S, Wai H, Li L, Scaglione A. A review of distributed algorithms for principal component analysis.Proceedings of the IEEE, 2018, 106(8): 1321–1340

[31]	Zhao Y, Kang Z, Chen L, Guo Y, Mu Q, Wang S, Zhao B, Feng C. Quality classification of kiwifruit under different storage conditions based on deep learning and hyperspectral imaging technology.Journal of Food Measurement and Characterization, 2023, 17(1): 289–305

[32]	Zhong Q, Zhang H, Tang S, Li P, Lin C, Zhang L, Zhong N. Feasibility study of combining hyperspectral imaging with deep learning for chestnut-quality detection.Foods, 2023, 12(10): 2089

[33]	LeCun Y, Bengio Y, Hinton G. Deep learning.Nature, 2015, 521(7553): 436–444

[34]	Kim C, Lee M, Hwang K, Ha Y. End-to-end deep learning-based autonomous driving control for high-speed environment.Journal of Supercomputing, 2022, 78(2): 1961–1982

[35]	Liu M, Hu L, Tang Y, Wang C, He Y, Zeng C, Lin K, He Z, Huo W. A deep learning method for breast cancer classification in the pathology images.IEEE Journal of Biomedical and Health Informatics, 2022, 26(10): 5025–5032

[36]	Chang Y, Tan T H, Lee W H, Chang L, Chen Y N, Fan K C, Alkhaleefah M. Consolidated convolutional neural network for hyperspectral image classification.Remote Sensing, 2022, 14(7): 1571

[37]	Sun M, Song Z, Jiang X, Pan J, Pang Y. Learning pooling for convolutional neural network.Neurocomputing, 2017, 224: 96–104

[38]	Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J, Chen T. Recent advances in convolutional neural networks.Pattern Recognition, 2018, 77: 354–377

[39]	Dos Santos C F G, Papa J P. Avoiding overfitting: a survey on regularization methods for convolutional neural networks.ACM Computing Surveys, 2022, 54(10s): 213

[40]	Hendrycks D, Gimpel K. Adjusting for dropout variance in batch normalization and weight initialization. arXiv Preprint, 2016, arXiv:1607.02488

[41]	Li X, Chen S, Hu X, Yan J. Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift. In: CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings of the IEEE, 2019: 2682–2690

[42]	Bu Y, Jiang X, Tian J, Hu X, Han L, Huang D, Luo H. Rapid nondestructive detecting of sorghum varieties based on hyperspectral imaging and convolutional neural network.Journal of the Science of Food and Agriculture, 2023, 103(8): 3970–3983

[43]	Cai L, Zhang Y, Diao Z, Zhang J, Shi R, Li X, Li J. Detection of early decayed oranges by using hyperspectral transmittance imaging and visual coding techniques coupled with an improved deep learning model.Postharvest Biology and Technology, 2024, 217: 113095

[44]	Chen M, Guo W, Yi X, Jiang Q, Hu X, Peng J, Tian J. Hyperspectral imaging combined with convolutional neural network for Pu’er ripe tea origin recognition.Journal of Food Composition and Analysis, 2025, 139: 107093

RIGHTS & PERMISSIONS

The Author(s) 2025. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)