A computational model to identify fertility-related proteins using sequence information
Yan LIN , Jiashu WANG , Xiaowei LIU , Xueqin XIE , De WU , Junjie ZHANG , Hui DING
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (1) : 181902
A computational model to identify fertility-related proteins using sequence information
Fertility is the most crucial step in the development process, which is controlled by many fertility-related proteins, including spermatogenesis-, oogenesis- and embryogenesis-related proteins. The identification of fertility-related proteins can provide important clues for studying the role of these proteins in development. Therefore, in this study, we constructed a two-layer classifier to identify fertility-related proteins. In this classifier, we first used the composition of amino acids (AA) and their physical and chemical properties to code these three fertility-related proteins. Then, the feature set is optimized by analysis of variance (ANOVA) and incremental feature selection (IFS) to obtain the optimal feature subset. Through five-fold cross-validation (CV) and independent data tests, the performance of models constructed by different machine learning (ML) methods is evaluated and compared. Finally, based on support vector machine (SVM), we obtained a two-layer model to classify three fertility-related proteins. On the independent test data set, the accuracy (ACC) and the area under the receiver operating characteristic curve (AUC) of the first layer classifier are 81.95% and 0.89, respectively, and them of the second layer classifier are 84.74% and 0.90, respectively. These results show that the proposed model has stable performance and satisfactory prediction accuracy, and can become a powerful model to identify more fertility related proteins.
fertility-related proteins / machine learning / sequence information / feature selection
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
| [39] |
|
| [40] |
|
| [41] |
|
| [42] |
|
| [43] |
|
| [44] |
|
| [45] |
|
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
|
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |