A new focused crawler using an improved tabu search algorithm incorporating ontology and host information

Jingfa LIU , Zhen WANG , Guo ZHONG , Zhihe YANG

Front. Inform. Technol. Electron. Eng ›› 2023, Vol. 24 ›› Issue (6) : 859 -875.

PDF (1559KB)
Front. Inform. Technol. Electron. Eng ›› 2023, Vol. 24 ›› Issue (6) : 859 -875. DOI: 10.1631/FITEE.2200315
Orginal Article
Orginal Article

A new focused crawler using an improved tabu search algorithm incorporating ontology and host information

Author information +
History +
PDF (1559KB)

Abstract

To solve the problems of incomplete topic description and repetitive crawling of visited hyperlinks in traditional focused crawling methods, in this paper, we propose a novel focused crawler using an improved tabu search algorithm with domain ontology and host information (FCITS_OH), where a domain ontology is constructed by formal concept analysis to describe topics at the semantic and knowledge levels. To avoid crawling visited hyperlinks and expand the search range, we present an improved tabu search (ITS) algorithm and the strategy of host information memory. In addition, a comprehensive priority evaluation method based on Web text and link structure is designed to improve the assessment of topic relevance for unvisited hyperlinks. Experimental results on both tourism and rainstorm disaster domains show that the proposed focused crawlers overmatch the traditional focused crawlers for different performance metrics.

Keywords

Focused crawler / Tabu search algorithm / Ontology / Host information / Priority evaluation

Cite this article

Download citation ▾
Jingfa LIU, Zhen WANG, Guo ZHONG, Zhihe YANG. A new focused crawler using an improved tabu search algorithm incorporating ontology and host information. Front. Inform. Technol. Electron. Eng, 2023, 24(6): 859-875 DOI:10.1631/FITEE.2200315

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

Zhejiang University Press

AI Summary AI Mindmap
PDF (1559KB)

Supplementary files

FITEE-0859-23006-JFL_suppl_1

FITEE-0859-23006-JFL_suppl_2

453

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/