논문리뷰2 [논문리뷰 : 개념] Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles 논문 링크: https://www.mdpi.com/2504-446X/7/2/114 Semantic Scene Understanding with Large Language Models on Unmanned Aerial Vehicles Unmanned Aerial Vehicles (UAVs) are able to provide instantaneous visual cues and a high-level data throughput that could be further leveraged to address complex tasks, such as semantically rich scene understanding. In this work, we built on the use of Lar www.mdpi.co.. 2024. 1. 27. [논문리뷰 : 개념] mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video 논문링크 : https://arxiv.org/abs/2302.00402 mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration whil arxiv.org Published .. 2024. 1. 26. 이전 1 다음