Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge

Jingfa LIU , Fan LI , Ruoyao DING , Zi'ang LIU

Front. Inform. Technol. Electron. Eng ›› 2022, Vol. 23 ›› Issue (8) : 1189 -1204.

PDF (1222KB)
Front. Inform. Technol. Electron. Eng ›› 2022, Vol. 23 ›› Issue (8) : 1189 -1204. DOI: 10.1631/FITEE.2100360
Orginal Article
Orginal Article

Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge

Author information +
History +
PDF (1222KB)

Abstract

At present, focused crawler is a crucial method for obtaining effective domain knowledge from massive heterogeneous networks. For most current focused crawling technologies, there are some difficulties in obtaining high-quality crawling results. The main difficulties are the establishment of topic benchmark models, the assessment of topic relevance of hyperlinks, and the design of crawling strategies. In this paper, we use domain ontology to build a topic benchmark model for a specific topic, and propose a novel multiple-filtering strategy based on local ontology and global ontology (MFSLG). A comprehensive priority evaluation method (CPEM) based on the web text and link structure is introduced to improve the computation precision of topic relevance for unvisited hyperlinks, and a simulated annealing (SA) method is used to avoid the focused crawler falling into local optima of the search. By incorporating SA into the focused crawler with MFSLG and CPEM for the first time, two novel focused crawler strategies based on ontology and SA (FCOSA), including FCOSA with only global ontology (FCOSA_G) and FCOSA with both local ontology and global ontology (FCOSA_LG), are proposed to obtain topic-relevant webpages about rainstorm disasters from the network. Experimental results show that the proposed crawlers outperform the other focused crawling strategies on different performance metric indices.

Keywords

Focused crawler / Ontology / Priority evaluation / Simulated annealing / Rainstorm disaster

Cite this article

Download citation ▾
Jingfa LIU, Fan LI, Ruoyao DING, Zi'ang LIU. Focused crawling strategies based on ontologies and simulated annealing methods for rainstorm disaster domain knowledge. Front. Inform. Technol. Electron. Eng, 2022, 23(8): 1189-1204 DOI:10.1631/FITEE.2100360

登录浏览全文

4963

注册一个新账户 忘记密码

References

RIGHTS & PERMISSIONS

Zhejiang University Press

AI Summary AI Mindmap
PDF (1222KB)

Supplementary files

FITEE-1189-22005-JFL_suppl_1

FITEE-1189-22005-JFL_suppl_2

FITEE-1189-22005-JFL_suppl_3

456

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/