본문 바로가기

논문 리뷰23

[논문리뷰 : 개념] KOSMOS-2: Grounding Multimodal Large Language Models to the World 논문링크 : https://arxiv.org/abs/2306.14824 Kosmos-2: Grounding Multimodal Large Language Models to the World We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and grounding text to the visual world. Specifically, we represent refer expressions as links in Markdown, i arxiv.org Published : 2023.07 (arXi.. 2024. 2. 19.
[논문리뷰 : 개념] Language Is Not All You Need : Aligning Perception with Language Models 논문 링크: https://arxiv.org/abs/2302.14045 Language Is Not All You Need: Aligning Perception with Language Models A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn arxiv.org Published : 2023.03 .. 2024. 2. 8.
[논문리뷰 : 개념] Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles 논문 링크: https://www.mdpi.com/2504-446X/7/2/114 Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Lar www.mdpi.co.. 2024. 1. 27.
[논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video 논문링크 : https://arxiv.org/abs/2302.00402 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration whil arxiv.org Published .. 2024. 1. 26.
[논문리뷰 : 개념] VideoChat : Chat-Centric Video Understanding 논문 링크 : https://arxiv.org/abs/2305.06355 VideoChat: Chat-Centric Video Understanding In this paper, we initiate an attempt of developing an end-to-end chat-centric video understanding system, coined as VideoChat. It integrates video foundation models and large language models via a learnable neural interface, excelling in spatiotemporal re arxiv.org Published : 2021.03.24 (arxiv - 24.01.14 기준) C.. 2024. 1. 14.
[논문리뷰 : 개념] In-flight positional and energy use data set of a DJI Matrice 100quadcopter for small package delivery 논문 링크 : https://arxiv.org/abs/2103.13313 In-flight positional and energy use data set of a DJI Matrice 100 quadcopter for small package delivery We autonomously direct a small quadcopter package delivery Uncrewed Aerial Vehicle (UAV) or "drone" to take off, fly a specified route, and land for a total of 209 flights while varying a set of operational parameters. The vehicle was equipped with onbo.. 2024. 1. 2.