Salient region detection and segmentation for general object recognition and image understanding
Salient region detection and segmentation for general object recognition and image understanding作者机构:National Engineering Laboratory for Video Technology School of Electrical Engineering and Computer Science Peking University Beijing China Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences Beijing China
出 版 物:《Science China(Information Sciences)》 (中国科学:信息科学(英文版))
年 卷 期:2011年第54卷第12期
页 面:2481-2490页
核心收录:
学科分类:08[工学] 080203[工学-机械设计及理论] 0802[工学-机械工程]
基 金:supported by the National Natural Science Foundation of China(Grant Nos.90820003,60973055) National Basic Research Program of China(Grant No.2009CB320906) supported by Program for New Century Excellent Talents in University of Ministry of Education of China
主 题:object recognition image understanding visual saliency salient object segmentation visual dictionary
摘 要:General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottom-up model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.