VLM

视觉语言模型（VLMs）知多少？

最近这几年，自然语言处理和计算机视觉这两大领域真是突飞猛进，让机器不仅能看懂文字，还能理解图片。这两个领域的结合，催生了视觉语言模型，也就是Vision language models (VLMs) ，它们能同时处理视觉信息和…

AI推介-多模态视觉语言模型VLMs论文速览（arXiv方向）：2024.04.15-2024.04.25

文章目录~ 1.AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models2.Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering3.CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pr…

AI推介-多模态视觉语言模型VLMs论文速览（arXiv方向）：2024.04.10-2024.04.15

文章目录~ 1.Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models2.Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection3.UNIAA: A Unified Multi-modal Image Aesthetic Assessment Base…

AI推介-多模态视觉语言模型VLMs论文速览（arXiv方向）：2024.04.10-2024.04.15

文章目录~ 1.Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models2.Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection3.UNIAA: A Unified Multi-modal Image Aesthetic Assessment Base…

多模态之ALBEF—先对齐后融合，利用动量蒸馏学习视觉语言模型表征，学习细节理解与论文详细阅读：Align before Fuse

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation （ALBEF）在融合之前对齐：利用动量蒸馏进行视觉与语言表示学习 Paper: arxiv.org/pdf/2107.07651.pdf Github: https://github.com/salesforce/…

AI推介-多模态视觉语言模型VLMs论文速览（arXiv方向）：2024.04.10-2024.04.15

文章目录~ 1.Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models2.Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection3.UNIAA: A Unified Multi-modal Image Aesthetic Assessment Base…

AI推介-多模态视觉语言模型VLMs论文速览（arXiv方向）：2024.04.25-2024.05.01

文章目录~ 1.Soft Prompt Generation for Domain Generalization2.Modeling Caption Diversity in Contrastive Vision-Language Pretraining3.Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM4.HELPER-X: A Unified Instructable Embodied Agent t…

AI推介-多模态视觉语言模型VLMs论文速览（arXiv方向）：2024.05.20-2024.05.25

文章目录~ 1.LM4LV: A Frozen Large Language Model for Low-level Vision Tasks2.Disease-informed Adaptation of Vision-Language Models3.VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap4.Composed Image Retrieval fo…

【第3节】“茴香豆“：搭建你的 RAG 智能助理

目录 1 基础知识1.1.RAG技术的概述1.2 RAG的基本结构有哪些呢？1.3 RAG 工作原理：1.4 向量数据库(Vector-DB )：1.5 RAG常见优化方法1.6RAG技术vs微调技术 2、茴香豆介绍2.1应用场景2.2 场景难点2.3 茴香豆的构建： 3 论文快读 1 基础…

AI推介-多模态视觉语言模型VLMs论文速览（arXiv方向）：2024.05.20-2024.05.25

文章目录~ 1.LM4LV: A Frozen Large Language Model for Low-level Vision Tasks2.Disease-informed Adaptation of Vision-Language Models3.VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap4.Composed Image Retrieval fo…