Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments
Network and data location aware approach for simultaneous job scheduling and data replication in large-scale data grid environments作者机构:Department of Computer Science Shahid Bahonar University of Kerman Kerman 97175-569 Iran
出 版 物:《Frontiers of Computer Science》 (中国计算机科学前沿(英文版))
年 卷 期:2014年第8卷第3期
页 面:391-408页
核心收录:
学科分类:080202[工学-机械电子工程] 08[工学] 0802[工学-机械工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:International Virtual Observatory Alliance European Commission
主 题:data replication data grid optorSim job scheduling simulation
摘 要:Data Grid integrates graphically distributed resources for solving data intensive scientific applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Scheduling is a traditional problem in parallel and distributed system. However, due to special issues and goals of Grid, traditional approach is not effective in this environment any more. Therefore, it is necessary to propose methods specialized for this kind of parallel and distributed system. Another solution is to use a data replication strategy to create multiple copies of files and store them in convenient locations to shorten file access times. To utilize the above two concepts, in this paper we develop a job scheduling policy, called hierarchical job scheduling strategy (HJSS), and a dynamic data replication strategy, called advanced dynamic hierarchical replication strategy (ADHRS), to improve the data access efficiencies in a hierarchical Data Grid. HJSS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. Moreover, due to the limited storage capacity, a good replica replacement algorithm is needed. We present a novel replacement strategy which deletes files in two steps when free space is not enough for the new replica: first, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. The simulation results show that our proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, number of intercommunications, number of replications, hit ratio, computing resource usage and storage usage.