Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays
Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays作者机构:Institute of Microelectronics Tsinghua University
出 版 物:《Science China(Physics,Mechanics & Astronomy)》 (中国科学:物理学、力学、天文学(英文版))
年 卷 期:2014年第57卷第12期
页 面:2214-2227页
核心收录:
学科分类:08[工学] 081201[工学-计算机系统结构] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:memory architecture CGRA context cache cache prefetch data memory
摘 要:The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth ***,two methods have been used to resolve this *** method loads the context into the CGRA at run *** method occupies very small on-chip memory but induces very large latency,which leads to low computational *** other method adopts a multi-context *** method loads the context into the on-chip context memory at the boot *** the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle *** size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application *** paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a *** this architecture,context is dynamically transferred into the *** a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context *** preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth *** than fundamentally reducing the amount of input data,the transferred data and computations are processed in ***,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale *** paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency *** this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate *** HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly *** a result of using PCC and