논문 링크: https://arxiv.org/abs/2302.14045
Language Is Not All You Need: Aligning Perception with Language Models
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn
arxiv.org
Published : 2023.03 (arXiv)
Citation : 194회 (24.01.27기준)