咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >RDF partitioning for scalable ... 收藏

RDF partitioning for scalable SPARQL query processing

RDF partitioning for scalable SPARQL query processing

作     者:Xiaoyan WANG Tao YANG Jinchuan CHEN Long HE Xiaoyong DU 

作者机构:School of Information Renmin University of China Beijing 100872 China Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education Renmin University Beijing 100872 China Information Center Supreme People's Court Beijing 100745 China State Key Laboratory of Software Development Environment Beihang University Beijing 100191 China 

出 版 物:《Frontiers of Computer Science》 (中国计算机科学前沿(英文版))

年 卷 期:2015年第9卷第6期

页      面:919-933页

核心收录:

学科分类:080705[工学-制冷及低温工程] 12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 08[工学] 0807[工学-动力工程及工程热物理] 

基  金:Beijing Engineering Laboratory of Big Data Mining State Key Laboratory of Software Development Environment Open Fund [SKLSDE-2012KF-09] Fundamental Research Funds for the Central Universities Renmin University of China [14XNLQ06] Sa Shi-Xuan Research Center of Big Data Management and Analytics 

主  题:RDF data data partitioning SPARQL query 

摘      要:The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even to- tally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically parti- tioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these pro- posed approaches have been evaluated by extensive experiments over large RDF data sets.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分