[논문리뷰 : 개념] KOSMOS-2: Grounding Multimodal Large Language Models to the World

Kosmos-2: Grounding Multimodal Large Language Models to the World

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual world. Specifically, we represent refer expressions as links in Markdown, i

arxiv.org

Published : 2023.07 (arXiv)

Citation : 132회 (24.02.09기준)

'논문 리뷰' 카테고리의 다른 글

[논문리뷰:개념] DeepNet, Foundation Transformers (1)	2024.03.27
[논문리뷰 : 개념] LLaVA: Large Language and Vision Assistant (Visual Instruction Tuning) (0)	2024.02.23
[논문리뷰 : 개념] Language Is Not All You Need : Aligning Perception with Language Models (0)	2024.02.08
[논문리뷰 : 개념] Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles (0)	2024.01.27
[논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (0)	2024.01.26

시작은 미약하였으나 , 그 끝은 창대하리라

[논문리뷰 : 개념] KOSMOS-2: Grounding Multimodal Large Language Models to the World

'논문 리뷰' 카테고리의 다른 글

티스토리툴바

[논문리뷰 : 개념] KOSMOS-2: Grounding Multimodal Large Language Models to the World

'논문 리뷰' 카테고리의 다른 글

관련글

티스토리툴바