Apollo: Rapidly Picking the Optimal Cloud Configurations for Big Data Analytics Using a Data-Driven Approach
作者机构:University of Chinese Academy of SciencesBeijing 100049China State Key Laboratory of Computer ScienceInstitute of SoftwareChinese Academy of SciencesBeijing 100190China
出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))
年 卷 期:2021年第36卷第5期
页 面:1184-1199页
核心收录:
学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:big data analytics cloud configuration data driven
摘 要:Big data analytics applications are increasingly deployed on cloud computing infrastructures,and it is still a big challenge to pick the optimal cloud configurations in a cost-effective *** this paper,we address this problem with a high accuracy and a low *** propose Apollo,a data-driven approach that can rapidly pick the optimal cloud configurations by reusing data from similar *** first classify 12 typical workloads in BigDataBench by characterizing pairwise correlations in our offline *** a new workload comes,we run it with several small datasets to rank its key characteristics and get its similar *** on the rank,we then limit the search space of cloud configurations through a classification *** last,we leverage a hierarchical regression model to measure which cluster is more suitable and use a local search strategy to pick the optimal cloud configurations in a few extra *** evaluation on 12 typical workloads in HiBench shows that compared with state-of-the-art approaches,Apollo can improve up to 30%search accuracy,while reducing as much as 50%overhead for picking the optimal cloud configurations.