논문링크 : https://arxiv.org/abs/2302.00402
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration whil
arxiv.org
Published : 2023.02 (arxiv)
Citation : 20회 (24.01.17기준)
Github link : https://github.com/alibaba/AliceMind
>>Local Temporal 추가설명<<