咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Unsupervised Video Object Segm... 收藏

Unsupervised Video Object Segmentation via Weak User Interaction and Temporal Modulation

作     者:FAN Jiaqing ZHANG Kaihua ZHAO Yaqian LIU Qingshan FAN Jiaqing;ZHANG Kaihua;ZHAO Yaqian;LIU Qingshan

作者机构:College of Computer Science and Technology Nanjing University of Aeronautics and Astronautics College of Computer and Software Nanjing University of Information Science and Technology Engineering Research Center of Digital Forensics Ministry of Education Inspur Suzhou Intelligent Technology Corporation 

出 版 物:《Chinese Journal of Electronics》 (电子学报(英文))

年 卷 期:2023年第32卷第3期

页      面:507-518页

核心收录:

学科分类:0808[工学-电气工程] 08[工学] 080203[工学-机械设计及理论] 0802[工学-机械工程] 

基  金:supported by National Key Research and Development Program (2021ZD0112200) National Natural Science Foundation of China (U21B2044) 

主  题:Earth Training Codes Annotations Modulation Object segmentation Complexity theory 

摘      要:In unsupervised video object segmentation(UVOS), the whole video might segment the wrong target due to the lack of initial prior information. Also, in semi-supervised video object segmentation(SVOS), the initial video frame with a fine-grained pixel-level mask is essential to good segmentation accuracy. It is expensive and laborious to provide the accurate pixel-level masks for each training sequence. To address this issue, We present a weak user interactive UVOS approach guided by a simple human-made rectangle annotation in the initial frame. We first interactively draw the region of interest by a rectangle, and then we leverage the mask RCNN(region-based convolutional neural networks) method to generate a set of coarse reference labels for subsequent mask propagations. To establish the temporal correspondence between the coherent frames, we further design two novel temporal modulation modules to enhance the target representations. We compute the earth mover’s distance(EMD)-based similarity between coherent frames to mine the co-occurrent objects in the two images, which is used to modulate the target representation to highlight the foreground target. We design a cross-squeeze temporal modulation module to emphasize the co-occurrent features across frames, which further helps to enhance the foreground target representation. We augment the temporally modulated representations with the original representation and obtain the compositive spatio-temporal information, producing a more accurate video object segmentation(VOS) model. The experimental results on both UVOS and SVOS datasets including Davis2016,FBMS, Youtube-VOS, and Davis2017, show that our method yields favorable accuracy and complexity. The related code is available.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分