[논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration whil

arxiv.org

Published : 2023.02 (arxiv)

Citation : 20회 (24.01.17기준)

Github link : https://github.com/alibaba/AliceMind

>>Local Temporal 추가설명<<

Conv3d -출처:   https://thomelane.github.io/convolutions/3DConv.html

'논문 리뷰' 카테고리의 다른 글

[논문리뷰 : 개념] Language Is Not All You Need : Aligning Perception with Language Models (0)	2024.02.08
[논문리뷰 : 개념] Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles (0)	2024.01.27
[논문리뷰 : 개념] VideoChat : Chat-Centric Video Understanding (0)	2024.01.14
[논문리뷰 : 개념] In-flight positional and energy use data set of a DJI Matrice 100quadcopter for small package delivery (0)	2024.01.02
[논문리뷰 : 개념] CapERA: Captioning Events in Aerial Videos (0)	2023.12.06

시작은 미약하였으나 , 그 끝은 창대하리라

[논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

>>Local Temporal 추가설명<<

'논문 리뷰' 카테고리의 다른 글

티스토리툴바

[논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

>>Local Temporal 추가설명<<

'논문 리뷰' 카테고리의 다른 글

관련글

티스토리툴바