Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Yuchong Sun1 Hongwei Xue2 Ruihua Song1y Bei Liu3y Huan Yang3 Jianlong Fu3
Long-FormVideo-LanguagePre-TrainingwithMultimodalTemporalContrastiveLearningYuchongSun1,HongweiXue2,RuihuaSong1y,BeiLiu3y,HuanYang3,JianlongFu31RenminUniversityofChina,Beijing,China,2UniversityofScienceandTechnologyofChina,Hefei,China,3MicrosoftResearch,Beijing,China,1{ycsun,rsong}@ruc.edu.cn,2gh0...
2025-04-27
1.65MB 20 页 0
0
10玖币