Forecasting the yield of wafer by using improved genetic algorithm, high dimensional alternating feature selection and SVM with uneven distribution and high-dimensional data

Qiuhao Xu, Chuqiao Xu, Junliang Wang

Autonomous Intelligent Systems ›› 2022, Vol. 2 ›› Issue (1) : 24. DOI: 10.1007/s43684-022-00041-3
Original Article

Forecasting the yield of wafer by using improved genetic algorithm, high dimensional alternating feature selection and SVM with uneven distribution and high-dimensional data

Author information +
History +

Abstract

Wafer yield prediction, as the basis of quality control, is dedicated to predicting quality indices of the wafer manufacturing process. In recent years, data-driven machine learning methods have received a lot of attention due to their accuracy, robustness, and convenience for the prediction of quality indices. However, the existing studies mainly focus on the model level to improve the accuracy of yield prediction does not consider the impact of data characteristics on yield prediction. To tackle the above issues, a novel wafer yield prediction method is proposed, in which the improved genetic algorithm (IGA) is an under-sampling method, which is used to solve the problem of data overlap between finished products and defective products caused by the similarity of manufacturing processes between finished products and defective products in the wafer manufacturing process, and the problem of data imbalance caused by too few defective samples, that is, the problem of uneven distribution of data. In addition, the high-dimensional alternating feature selection method (HAFS) is used to select key influencing processes, that is, key parameters to avoid overfitting in the prediction model caused by many input parameters. Finally, SVM is used to predict the yield. Furthermore, experiments are conducted on a public wafer yield prediction dataset collected from an actual wafer manufacturing system. IGA-HAFS-SVM achieves state-of-art results on this dataset, which confirms the effectiveness of IGA-HAFS-SVM. Additionally, on this dataset, the proposed method improves the AUC score, G-Mean and F1-score by 21.6%, 34.6% and 0.6% respectively compared with the conventional method. Moreover, the experimental results prove the influence of data characteristics on wafer yield prediction.

Keywords

Wafer yield / High dimension / Imbalance / Prediction

Cite this article

Download citation ▾
Qiuhao Xu, Chuqiao Xu, Junliang Wang. Forecasting the yield of wafer by using improved genetic algorithm, high dimensional alternating feature selection and SVM with uneven distribution and high-dimensional data. Autonomous Intelligent Systems, 2022, 2(1): 24 https://doi.org/10.1007/s43684-022-00041-3

References

[1]
MünchL., UzsoyR., FowlerJ.W.. A survey of semiconductor supply chain models part I: semiconductor supply chains, strategic network design, and supply chain simulation. Int. J. Prod. Res., 2017, 56(13):4524-4545
CrossRef Google scholar
[2]
WangJ., YangJ., ZhangJ., et al.. Big data driven cycle time parallel prediction for production planning in wafer manufacturing. Enterp. Inf. Syst., 2018, 12(6):714-732
CrossRef Google scholar
[3]
WangJ., ZhangJ.. Big data analytics for forecasting cycle time in semiconductor wafer fabrication system. Int. J. Prod. Res., 2016, 54(23):7231-7244
CrossRef Google scholar
[4]
LeeK.B., CheonS., KimC.O.. A convolutional neural network for fault classification and diagnosis in semiconductor manufacturing processes. IEEE Trans. Semicond. Manuf., 2017, 30(2):135-142
CrossRef Google scholar
[5]
XuH., ZhangJ., LvY., et al.. Hybrid feature selection for wafer acceptance test parameters in semiconductor manufacturing. IEEE Access, 2022, 8: 17320-17330
CrossRef Google scholar
[6]
WangJ., ZhangJ., WangX.. A data driven cycle time prediction with feature selection in a semiconductor wafer fabrication system. IEEE Trans. Semicond. Manuf., 2018, 31(1):173-182
CrossRef Google scholar
[7]
ChienC.-F., ChenY.-H., LoM.-F.. Advanced quality control (AQC) of silicon wafer specifications for yield enhancement for smart manufacturing. IEEE Trans. Semicond. Manuf., 2020, 33(4):569-577
CrossRef Google scholar
[8]
WangJ., YangZ., ZhangJ., et al.. AdaBalGAN: an improved generative adversarial network with imbalanced learning for wafer defective pattern recognition. IEEE Trans. Semicond. Manuf., 2019, 32(3):310-319
CrossRef Google scholar
[9]
ShaheeS.A., AnanthakumarU.. An overlap sensitive neural network for class imbalanced data. Data Min. Knowl. Discov., 2021, 35(4):1654-1687
CrossRef Google scholar
[10]
MwangiB., TianT.S., SoaresJ.C.. A review of feature reduction techniques in neuroimaging. Neuroinformatics, 2014, 12(2):229-244
CrossRef Google scholar
[11]
PolenghiA., RodaI., MacchiM., et al.. An ontological modelling of multi-attribute criticality analysis to guide prognostics and health management program development. Auton. Intell. Syst., 2022, 2(1):1-16
CrossRef Google scholar
[12]
WenT., FangY., LuB.. Neural network-based adaptive sliding mode control for underactuated dual overhead cranes suffering from matched and unmatched disturbances. Auton. Intell. Syst., 2022, 2(1):1-15
CrossRef Google scholar
[13]
GoodaleM.A.. Lessons from human vision for robotic design. Auton. Intell. Syst., 2021, 1(1):1-10
CrossRef Google scholar
[14]
SchelasinR.. Using static capacity modeling and queuing theory equations to predict factory cycle time performance in semiconductor manufacturing. Proceedings of the 2011 Winter Simulation Conference (WSC), 2011 2040-2049
CrossRef Google scholar
[15]
FangX., ChangC., UsingL.G.. Bayesian network technology to predict the semiconductor manufacturing yield rate in IoT. J. Supercomput., 2021, 77(8):9020-9045
CrossRef Google scholar
[16]
VargasR., MosaviA., RuizR.. Deep Learning: A Review. Advances in Intelligent Systems and Computing, 2017
[17]
ChenK., ChangP., YehC.. Wafer die yield prediction by heuristic methods. The 40th International Conference on Computers & Indutrial Engineering, 2010 1-4
[18]
DarganS., KumarM., AyyagariM.R., et al.. A survey of deep learning and its applications: a new paradigm to machine learning. Arch. Comput. Methods Eng., 2020, 27(4):1071-1092
CrossRef Google scholar
[19]
AdlyF., AlhusseinO., YooP.. Simplified subspaced regression network for identification of defect patterns in semiconductor wafer maps. IEEE Trans. Ind. Inform., 2015, 11(6):1267-1276
CrossRef Google scholar
[20]
LimM.Y.S., SharmaA., ChinC.S., et al.. Prediction of wafer map categories using wafer acceptance test parameters in semiconductor manufacturing. IFIP International Conference on Artificial Intelligence Applications and Innovations, 2022 136-144
CrossRef Google scholar
[21]
JiangC., LinW., RaghavanN.. A Gaussian mixture model clustering ensemble regressor for semiconductor manufacturing final test yield prediction. IEEE Access, 2021, 9: 22253-22263
CrossRef Google scholar
[22]
DongH., ChenN., WangK.. Wafer yield prediction using derived spatial variables. Qual. Reliab. Eng. Int., 2017, 33(8):2327-2342
CrossRef Google scholar
[23]
MayabadiS., SaadatfarH.. Two density-based sampling approaches for imbalanced and overlapping data. Knowl.-Based Syst., 2022, 241
CrossRef Google scholar
[24]
BrancoP., TorgoL., RibeiroR.P.. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv., 2016, 49(2):1-50
CrossRef Google scholar
[25]
ChawlaN.V.. Data mining for imbalanced datasets: an overview. Data Mining and Knowledge Discovery Handbook, 2009 Boston Springer 875-886
CrossRef Google scholar
[26]
ChawlaN.V., BowyerK.W., HallL.O., et al.. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res., 2002, 16: 321-357
CrossRef Google scholar
[27]
MaulideviN.U., SurendroK.. SMOTE-LOF for noise identification in imbalanced data classification. J. King Saud Univ, Comput. Inf. Sci., 2021, 34(6):3413-3423
[28]
GuanH., ZhangY., XianM., et al.. SMOTE-WENN: solving class imbalance and small sample problems by oversampling and distance scaling. Appl. Intell., 2021, 51(3):1394-1409
CrossRef Google scholar
[29]
TsaiC.-F., LinW.-C., HuY.-H., et al.. Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci., 2019, 477: 47-54
CrossRef Google scholar
[30]
Guzmán-PonceA., ValdovinosR.M., SánchezJ.S., et al.. A new under-sampling method to face class overlap and imbalance. Appl. Sci., 2020, 10(15
CrossRef Google scholar
[31]
EsterM., KriegelH.P., SanderJ., et al.. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, 1996, 96 34):226-231
[32]
KoziarskiM.. CSMOUTE: combined synthetic oversampling and undersampling technique for imbalanced data classification. 2021 International Joint Conference on Neural Networks (IJCNN), 2021 New York IEEE Press 1-8
[33]
HaJ., LeeJ.S.. A new under-sampling method using genetic algorithm for imbalanced data classification. Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication, 2016 1-6
[34]
SunJ., LangJ., FujitaH., et al.. Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf. Sci., 2018, 425: 76-91
CrossRef Google scholar
[35]
RaviV., NaveenN., PandeyM., et al.. Hybrid classification and regression models via particle swarm optimization auto associative neural network based nonlinear PCA. Int. J. Hybrid Intell. Syst., 2013, 10(3):137-149
[36]
AgarwalA., El-GhazawiT., El-AskaryH., et al.. Efficient hierarchical-PCA dimension reduction for hyperspectral imagery. 2007 IEEE International Symposium on Signal Processing and Information Technology, 2007 353-356
CrossRef Google scholar
[37]
LiD., YangB., ZhangY.. Dimension-reduction and reconstruction of multi-dimension spatial wind power data based on optimal RBF kernel principal component analysis. 2020 10th International Conference on Power and Energy Systems (ICPES), 2020 326-332
CrossRef Google scholar
[38]
KeW., WuC., WuY., et al.. A new filter feature selection based on criteria fusion for gene microarray data. IEEE Access, 2018, 6: 61065-61076
CrossRef Google scholar
[39]
PengH., LongF., DingC.. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8):1226-1238
CrossRef Google scholar
[40]
YuL., LiuH.. Feature selection for high-dimensional data: a fast correlation-based filter solution. Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003 856-863
[41]
GokalpO., TasciE., UgurA.. A novel wrapper feature selection algorithm based on iterated greedy metaheuristic for sentiment classification. Expert Syst. Appl., 2020, 146
CrossRef Google scholar
[42]
HeidariA.A., MirjaliliS., FarisH., et al.. Harris hawks optimization: algorithm and applications. Future Gener. Comput. Syst., 2019, 97: 849-872
CrossRef Google scholar
[43]
ZhangJ., XiongY., MinS.. A new hybrid filter/wrapper algorithm for feature selection in classification. Anal. Chim. Acta, 2019, 1080: 43-54
CrossRef Google scholar
[44]
SakarC.O., SerbesG., GunduzA., et al.. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput., 2019, 74: 255-263
CrossRef Google scholar
[45]
I. Guyon, S. Gunn, A. Ben-Hur et al., Result analysis of the NIPS 2003 feature selection challenge. Advances in Neural Information Processing Systems, 17 (2004)
Funding
National Natural Science Foundation of China(No.51905091)

Accesses

Citations

Detail

Sections
Recommended

/