MM-LLMs: Recent Advances in MultiModal Large Language Models

Mar 12, 2024 multi modal model arXiv (2024)

A Theory of Multimodal Learning

Jan 29, 2024 multi modal model NIPS (2023)

Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection

Vision Expertによる異常度マップ(トークン化)と，異常度マップとViTの特徴量を入力としたQ-Formerの出力トークンとしてLLMに入力することで工業製品への異常検知と言語インタラクションが可能なMyriadを提案
MVTec，VisAのFew-ShotでPadim系やAnomalyGPTと同等以上の性能を達成

Dec 4, 2023 multi modal model arXiv (2023)

Grounded Language-Image Pre-training

Nov 14, 2023 multi modal model CVPR (2022)

Open-Vocabulary Object Detection Using Captions

Oct 31, 2023 multi modal model CVPR (2021)

Link-Context Learning for Multimodal LLMs

ICL(In-Context Learning)は膨大な事前学習の知識を用いて，プロンプトからモデルパラメータの更新を行わない学習が可能であるが，因果関係が弱いため新しい概念を理解することは困難
サポート集合とクエリ集合の因果関係を明示的に強化するLCL(Link Context LEarning)を提案し，生成画像とそのテキストペアからなるISEKAIデータセットを提案し，LCLの性能を評価

Oct 24, 2023 multi modal model arXiv (2023)

Llama 2: Open Foundation and Fine-Tuned Chat Models

Sep 27, 2023 large language model arXiv (2023)

LLaMA: Open and Efficient Foundation Language Models

Sep 27, 2023 large language model arXiv (2023)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

Jul 11, 2023 object detection arXiv 2022

Posts