咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Robots Exclusion and Guidance ... 收藏

Robots Exclusion and Guidance Protocol

Robots Exclusion and Guidance Protocol

作     者:Dajie Ge Zhijun Ding 

作者机构:Department of Computer Science and Technology Tongji University 

出 版 物:《Tsinghua Science and Technology》 (清华大学学报(自然科学版(英文版))

年 卷 期:2016年第21卷第6期

页      面:643-659页

核心收录:

学科分类:08[工学] 080402[工学-测试计量技术及仪器] 0804[工学-仪器科学与技术] 

基  金:partially supported by the National Natural Science Foundation of China(Nos.61672381 and90818023) 

主  题:deep web Ajax crawler protocol 

摘      要:With the rapid development of the Internet, general-purpose web crawlers have increasingly become unable to meet people's individual needs as they are no longer efficient enough to fetch deep web pages. The presence of several deep web pages in the websites and the widespread use of Ajax make it difficult for generalpurpose web crawlers to fetch information quickly and efficiently. On the basis of the original Robots Exclusion Protocol(REP), a Robots Exclusion and Guidance Protocol(REGP) is proposed in this paper, by integrating the independent scattered expansions of the original Robots Protocol developed by major search engine *** protocol expands the file format and command set of the REP as well as two labels of the Sitemap *** our protocol, websites can express their aspects of requirements for restrictions and guidance to the visiting crawlers, and provide a general-purpose fast access of deep web pages and Ajax pages for the crawlers,and facilitates crawlers to easily obtain the open data on websites effectively with ease. Finally, this paper presents a specific application scenario, in which both a website and a crawler work with support from our protocol. A series of experiments are also conducted to demonstrate the efficiency of the proposed protocol.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分