咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Emotion-Aware Music Driven Mov... 收藏

Emotion-Aware Music Driven Movie Montage

作     者:刘伍琴 林敏轩 黄海斌 马重阳 宋玉 董未名 徐常胜 Wu-Qin Liu;Min-Xuan Lin;Hai-Bin Huang;Chong-Yang Ma;Yu Song;Wei-Ming Dong;Chang-Sheng Xu

作者机构:School of Artificial IntelligenceUniversity of Chinese Academy of SciencesBeijing 101408China The State Key Laboratory of Multimodal Artificial Intelligence System(MAIS)Institute of AutomationChinese Academy of SciencesBeijing 100190China Kuaishou TechnologyBeijing 100085China School of Mechanical EngineeringUniversity of Science and Technology BeijingBeijing 100083China 

出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))

年 卷 期:2023年第38卷第3期

页      面:540-553页

核心收录:

学科分类:1303[艺术学-戏剧与影视学] 13[艺术学] 

基  金:supported by the National Key Research and Development Program of China under Grant No.2020AAA0106200 and the National Natural Science Foundation of China under Grant No.61832016 

主  题:movie montage emotion analysis audio-visual modality contrastive learning 

摘      要:In this paper, we present Emotion-Aware Music Driven Movie Montage, a novel paradigm for the challenging task of generating movie montages. Specifically, given a movie and a piece of music as the guidance, our method aims to generate a montage out of the movie that is emotionally consistent with the music. Unlike previous work such as video summarization, this task requires not only video content understanding, but also emotion analysis of both the input movie and music. To this end, we propose a two-stage framework, including a learning-based module for the prediction of emotion similarity and an optimization-based module for the selection and composition of candidate movie shots. The core of our method is to align and estimate emotional similarity between music clips and movie shots in a multi-modal latent space via contrastive learning. Subsequently, the montage generation is modeled as a joint optimization of emotion similarity and additional constraints such as scene-level story completeness and shot-level rhythm synchronization. We conduct both qualitative and quantitative evaluations to demonstrate that our method can generate emotionally consistent montages and outperforms alternative baselines.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分