基于知识蒸馏的轻量化遥感多模态大语言模型
DOI:
作者:
作者单位:

复旦大学 信息学院 电子工程系

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目


Lightweight Remote Sensing Multimodal Large Language Model Based on Knowledge Distillation
Author:
Affiliation:

Fudan University

Fund Project:

National Key Research and Development Program of China

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    遥感多模态大语言模型融合了丰富的视觉语言模态信息,在遥感图像分析和解译等领域中展现出巨大潜力。然而,现有的知识蒸馏方法多聚焦于单模态大语言模型的压缩,忽视了各模态间的特征对齐,因而阻碍了大语言模型在跨模态任务中的性能表现。针对上述问题,提出一种基于知识蒸馏的遥感多模态大语言模型轻量化方法,通过在特征层对齐各模态的输出,实现了多模态信息的有效对齐;通过引入反向Kullback-Leibler散度作为损失函数,并结合教师混合采样和单步分解的优化策略,进一步提升了学生模型的泛化性与稳定性。实验结果表明,本文方法在遥感图像的场景分类、视觉问答、视觉定位与图像描述四种下游任务上实现了更高的准确性与效率,同时显著减少了模型参数量和对计算资源的需求,为多模态大语言模型在遥感领域的高效应用提供了新的解决方案。

    Abstract:

    Remote sensing multimodal large language models (MLLMs), which integrate rich visual-linguistic modal information, have shown great potential in areas such as remote sensing image analysis and interpretation. However, existing knowledge distillation methods primarily focus on the compression of unimodal large language models, neglecting the alignment of features across modalities, thus hindering the performance of large language models in cross-modal tasks. To address this issue, a lightweighting method for remote sensing MLLMs based on knowledge distillation is proposed. This method achieves effective alignment of multimodal information by aligning the outputs across modalities at the feature level. By introducing the reverse Kullback-Leibler divergence as the loss function and combining optimization strategies such as teacher mixed sampling and single-step decomposition, the generalization and stability of the student model are further enhanced. Experimental results demonstrate that the proposed method achieves higher accuracy and efficiency in four downstream tasks of remote sensing image scene classification, visual question answering, visual localization, and image description, significantly reducing the number of model parameters and the demand for computational resources, thereby providing a new solution for the efficient application of MLLMs in the field of remote sensing.

    参考文献
    相似文献
    引证文献
引用本文
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-11-14
  • 最后修改日期:2024-12-19
  • 录用日期:2024-12-31
  • 在线发布日期:
  • 出版日期:
文章二维码