咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Efficient and flexible memory ... 收藏

Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays

Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays

作     者:YANG Chen LIU Lei Bo YIN Shou Yi WEI Shao Jun 

作者机构:Institute of Microelectronics Tsinghua University 

出 版 物:《Science China(Physics,Mechanics & Astronomy)》 (中国科学:物理学、力学、天文学(英文版))

年 卷 期:2014年第57卷第12期

页      面:2214-2227页

核心收录:

学科分类:08[工学] 081201[工学-计算机系统结构] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by the National High Technology Research and Development Program of China(Grant No.2012AA012701) 

主  题:memory architecture CGRA context cache cache prefetch data memory 

摘      要:The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth ***,two methods have been used to resolve this *** method loads the context into the CGRA at run *** method occupies very small on-chip memory but induces very large latency,which leads to low computational *** other method adopts a multi-context *** method loads the context into the on-chip context memory at the boot *** the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle *** size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application *** paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a *** this architecture,context is dynamically transferred into the *** a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context *** preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth *** than fundamentally reducing the amount of input data,the transferred data and computations are processed in ***,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale *** paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency *** this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate *** HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly *** a result of using PCC and

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分