咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >VBMq: pursuit baremetal perfor... 收藏

VBMq: pursuit baremetal performance by embracing block I/O parallelism in virtualization

VBMq: pursuit baremetal performance by embracing block I/O parallelism in virtualization

作     者:Diming ZHANG Fei XUE Hao HUANG Shaodi YOU 

作者机构:Faculty of Computer Science and Technology Nanjing University Nanjing 210023 China College of Computer Science and Engineering Jiangsu University of Science and Technology Zhenjiang 212003 China College of Engineering and Computer Science Australian National University Canberra 2600 Australia Data61-CSIRO Australian National University Canberra 2600 Australia 

出 版 物:《Frontiers of Computer Science》 (中国计算机科学前沿(英文版))

年 卷 期:2018年第12卷第5期

页      面:873-886页

核心收录:

学科分类:08[工学] 0835[工学-软件工程] 081201[工学-计算机系统结构] 081202[工学-计算机软件与理论] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:This work was supported by the National Natural Science Foundation of China (Grant No. 61321491) 

主  题:high-performance parallelism paravirtual I/O 

摘      要:Barely acceptable block I/O performance prevents virtualization from being widely used in the HighPerformance Computing field. Although the virtio paravirtual framework brings great I/O performance improvement, there is a sharp performance degradation when accessing high-performance NAND-flash-based devices in the virtual machine due to their data parallel design. The primary cause of this fact is the deficiency of block I/O parallelism in hypervisor, such as KVM and Xen. In this paper, we propose a novel design of block I/O layer for virtualization, named VBMq. VBMq is based on virtio paravirtual I/O model, aiming to solve the block I/O parallelism issue in virtualization. It uses multiple dedicated I/O threads to handle I/O requests in parallel. In the meanwhile, we use polling mechanism to alleviate overheads caused by the frequent context switches of the VM's notification to and from its hypervisor. Each dedicated I/O thread is assigned to a non-ovedapping core to improve performance by avoiding unnecessary scheduling. In addition, we configure CPU affinity to optimize I/O completion for each request. The CPU affinity setting is very helpful to reduce CPU cache miss rate and increase CPU efficiency. The prototype system is based on Linux 4.1 kernel and QEMU 2.3.1. Our measurements show that the proposed method scales graciously in the multi-core environment, and provides performance which is 39.6x better than the baseline at most, and approaches bare-metal performance.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分