mPLUG-21 [논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video 논문링크 : https://arxiv.org/abs/2302.00402 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration whil arxiv.org Published .. 2024. 1. 26. 이전 1 다음