Assessing the effectiveness of crawlers and large language models in detecting adversarial hidden link threats in meta computing
Junjie Xiong , Mingkui Wei , Zhuo Lu , Yao Liu
High-Confidence Computing ›› 2025, Vol. 5 ›› Issue (3) : 100292
Assessing the effectiveness of crawlers and large language models in detecting adversarial hidden link threats in meta computing
In the emerging field of Meta Computing, where data collection and integration are essential components, the threat of adversary hidden link attacks poses a significant challenge to web crawlers. In this paper, we investigate the influence of these attacks on data collection by web crawlers, which famously elude conventional detection techniques using large language models (LLMs). Empirically, we find some vulnerabilities in the current crawler mechanisms and large language model detection, especially in code inspection, and propose enhancements that will help mitigate these weaknesses. Our assessment of real-world web pages reveals the prevalence and impact of adversary hidden link attacks, emphasizing the necessity for robust countermeasures. Furthermore, we introduce a mitigation framework that integrates element visual inspection techniques. Our evaluation demonstrates the framework’s efficacy in detecting and addressing these advanced cyber threats within the evolving landscape of Meta Computing.
Meta computing / Data integration / Adversary hidden link / Web crawling / Content deception detection / Large language model
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
Google, Googlebot crawling and indexing, 2024, https://developers.google.com/search/docs/crawling-indexing/googlebot. |
| [20] |
Microsoft, Bing webmaster, 2024, https://www.bing.com/webmasters/about. |
| [21] |
Yandex, Yandex. webmaster tools, 2024, https://webmaster.yandex.com/. |
| [22] |
Apple, Applebot, 2024, https://support.apple.com/en-us/119829. |
| [23] |
|
| [24] |
Baidu, Baiduspider information, 2024, https://baiduspider.github.io/index.html. |
| [25] |
|
| [26] |
Facebook, Web crawlers documentation, 2024, https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/. |
| [27] |
BotReports, Exabot report, 2024, https://abuseme.nl/botreports/e/exabot.html. |
| [28] |
Swiftype, Swiftype documentation, 2024, https://swiftype.com/documentation. |
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
Scrapy, Scrapy: An open source and collaborative web crawling framework, 2024, https://github.com/scrapy/scrapy. |
| [34] |
Binux, Pyspider: A powerful spider web framework, 2024, https://github.com/binux/pyspider. |
| [35] |
Code4Craft, Webmagic: A powerful web crawler framework, 2024, https://github.com/code4craft/webmagic. |
| [36] |
Apify, Crawlee: A web scraping library, 2024, https://github.com/apify/crawlee. |
| [37] |
|
| [38] |
Crummy, Beautiful soup documentation, 2024, https://www.crummy.com/software/BeautifulSoup/bs4/doc/. |
| [39] |
Nokogiri, |
| [40] |
|
| [41] |
MechanicalSoup, Mechanicalsoup: A python library for automating web interactions, 2024, https://github.com/MechanicalSoup/MechanicalSoup. |
| [42] |
Apache, Apache nutch: open source web crawler, 2024, https://github.com/apache/nutch. |
| [43] |
Similarweb, Top websites ranking, 2024, https://www.similarweb.com/top-websites/. |
| [44] |
SEMRUSH, Top websites, 2024, https://www.semrush.com/website/top/. |
| [45] |
Lightning-AI, Lit-GPT, 2024, https://github.com/Lightning-AI/litgpt. |
| [46] |
|
| [47] |
|
| [48] |
|
| [49] |
|
| [50] |
|
| [51] |
|
| [52] |
|
| [53] |
|
| [54] |
Anthropic, Build with claude: computer use, 2024, https://docs.anthropic.com/en/docs/build-with-claude/computer-use. |
| [55] |
|
| [56] |
|
| [57] |
|
| [58] |
|
| [59] |
|
| [60] |
|
| [61] |
|
| [62] |
|
| [63] |
|
| [64] |
|
| [65] |
|
| [66] |
|
| [67] |
|
| [68] |
|
| [69] |
|
| [70] |
|
| [71] |
|
| [72] |
|
| [73] |
|
| [74] |
|
| [75] |
|
| [76] |
|
| [77] |
Y. University, Usability & web accessibility, 2024, |
| [78] |
Google, Spam policies for google web search, 2024, https://developers.google.com/search/docs/essentials/spam-policies#hidden-text-and-links. |
| [79] |
|
| [80] |
|
| [81] |
|
| [82] |
|
| [83] |
|
| [84] |
|
| [85] |
|
| [86] |
|
| [87] |
|
/
| 〈 |
|
〉 |