%A Liangxiao JIANG %T Learning random forests for ranking %0 Journal Article %D 2011 %J Front. Comput. Sci. %J Frontiers of Computer Science %@ 2095-2228 %R 10.1007/s11704-010-0388-5 %P 79-86 %V 5 %N 1 %U {https://journal.hep.com.cn/fcs/EN/10.1007/s11704-010-0388-5 %8 2011-03-05 %X

The random forests (RF) algorithm, which combines the predictions from an ensemble of random trees, has achieved significant improvements in terms of classification accuracy. In many real-world applications, however, ranking is often required in order to make optimal decisions. Thus, we focus our attention on the ranking performance of RF in this paper. Our experimental results based on the entire 36 UC Irvine Machine Learning Repository (UCI) data sets published on the main website of Weka platform show that RF doesn’t perform well in ranking, and is even about the same as a single C4.4 tree. This fact raises the question of whether several improvements to RF can scale up its ranking performance. To answer this question, we single out an improved random forests (IRF) algorithm. Instead of the information gain measure and the maximum-likelihood estimate, the average gain measure and the similarity-weighted estimate are used in IRF. Our experiments show that IRF significantly outperforms all the other algorithms used to compare in terms of ranking while maintains the high classification accuracy characterizing RF.