시작은 미약하였으나 , 그 끝은 창대하리라

[논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video 본문

논문 리뷰

[논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

애플파ol 2024. 1. 26. 12:17

논문링크 : https://arxiv.org/abs/2302.00402

 

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration whil

arxiv.org

 

Published : 2023.02 (arxiv)

Citation : 20회 (24.01.17기준)

Github link : https://github.com/alibaba/AliceMind

 

 

 


>>Local Temporal 추가설명<<

 

 

축의 방향에 따라 Conv차원을 정하는 것.

 

 

Conv3d -출처:&nbsp; &nbsp;https://thomelane.github.io/convolutions/3DConv.html

 

 

 


 

 

 

 

Comments