Skyway:Accelerate Graph Applications with a Dual-Path Architecture and Fine-Grained Data Management
作者机构:Institute of Computing TechnologyChinese Academy of SciencesBeijing 100190China University of Chinese Academy of SciencesBeijing 100049China Institute of Information EngineeringChinese Academy of SciencesBeijing 100045China Department of Computer ScienceIllinois Institute of TechnologyChicagoIL 60616U.S.A.
出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))
年 卷 期:2024年第39卷第4期
页 面:871-894页
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported in part by the U.S.National Science Foundation under Grant Nos.CCF-2008907 and CCF-2029014 the Chinese Academy of Sciences Project for Young Scientists in Basic Research under Grant No.YSBR-029 the Chinese Academy of Sciences Project for Youth Innovation Promotion Association
主 题:graph application computer architecture memory hierarchy
摘 要:Graph processing is a vital component of many AI and big data ***,due to its poor locality and complex data access patterns,graph processing is also a known performance killer of AI and big data *** this work,we propose to enhance graph processing applications by leveraging fine-grained memory access patterns with a dual-path architecture on top of existing software-based graph *** first identify that memory accesses to the offset,edge,and state array have distinct locality and impact on *** then introduce the Skyway architecture,which consists of two primary components:1)a dedicated direct data path between the core and memory to transfer state array elements efficiently,and 2)a data-type aware fine-grained memory-side row buffer hardware for both the newly designed direct data path and the regular memory hierarchy data *** proposed Skyway architecture is able to improve the overall performance by reducing the memory access interference and improving data access efficiency with a minimal *** evaluate Skyway on a set of diverse algorithms using large real-world *** a simulated fourcore system,Skyway improves the performance by 23%on average over the best-performing graph-specialized hardware optimizations.