Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers
Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers作者机构:School of Computer Science and Engineering Southeast University Nanjing 211189 China
出 版 物:《Tsinghua Science and Technology》 (清华大学学报(自然科学版(英文版))
年 卷 期:2016年第21卷第5期
页 面:471-481页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by the National Natural Science Foundation of China (Nos. 61320106007, 61572129, 61502097, and 61370207) the National High-Tech Research and Development (863) Program of China (No. 2013AA013503) International S&T Cooperation Program of China (No. 2015DFA10490) Jiangsu research prospective joint research project (No. BY2013073-01) Jiangsu Provincial Key Laboratory of Network and Information Security (No. BM2003201) Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No. 93K-9) supported by Collaborative Innovation Center of Novel Software Technology and Industrialization and Collaborative Innovation Center of Wireless Communications Technology
主 题:data placement geo-distributed data center Lagrangian relaxation
摘 要:Recent developments in cloud computing and big data have spurred the emergence of data-intensive applications for which massive scientific datasets are stored in globally distributed scientific data centers that have a high frequency of data access by scientists worldwide. Multiple associated data items distributed in different scientific data centers may be requested for one data processing task, and data placement decisions must respect the storage capacity limits of the scientific data centers. Therefore, the optimization of data access cost in the placement of data items in globally distributed scientific data centers has become an increasingly important *** data placement approaches for geo-distributed data items are insufficient because they either cannot cope with the cost incurred by the associated data access, or they overlook storage capacity limitations, which are a very practical constraint of scientific data centers. In this paper, inspired by applications in the field of high energy physics, we propose an integer-programming-based data placement model that addresses the above challenges as a Non-deterministic Polynomial-time(NP)-hard problem. In addition we use a Lagrangian relaxation based heuristics algorithm to obtain ideal data placement solutions. Our simulation results demonstrate that our algorithm is effective and significantly reduces overall data access cost.