논문 리뷰 [논문리뷰 : 개념] VATT : Transformers for Multimodal Self-Supervised Learning From Raw Video, Audio and Text 애플파ol 2023. 9. 10. 23:24