[arXiv 2025]BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM

论文网址：BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM

论文代码：https://github.com/1994cxy/BP-GPT

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Method

2.3.1. fMRI to Text Decoding

2.3.2. Training

2.3.3. Inference

2.4. Experiment

2.4.1. Dataset

2.4.2. Implementing Details

2.4.3. Baseline and Evaluation Metrics

2.4.4. Evaluation the Text Prompt

2.4.5. Evaluation of fMRI to Text Decoding

2.4.6. Ablation Study

2.5. Conclusion

3. Reference

1. 心得

（1）不好意思哈xd这么早给你扒来读了，只是刚好看到了，就当宣传了，github多来点Star也不是不行

（2）还只有四页，轻松愉悦看一看

（3）一天一论文，头发远离我

2. 论文逐段精读

2.1. Abstract

①现存问题：现有的LLM在从fMRI中提取语义的时候没有端到端？？？？？有点以偏概全了，我觉得不是一个很好的limitation

②They proposed Brain Prompt GPT (BP-GPT) to decoding fMRI by aligning fMRI and text

2.2. Introduction

①我很欣赏你，用一句名言开头。只有小登的世界是这样的，一本真正的故事会，而不是八股。

“The limits of my language mean the limits of my world” - Ludwig Wittgenstein.

如果作者认为语言带来了理解，这总有一种不能进步的意味。实际上造词这种东西时有发生，我们的词袋也一直更新，但ai似乎不能自动更新捏。

②The frequency of pronouncing is different from BOLD reaction

③Chanllenge: decoding multi words in one repetition time (TR)（这个现存问题不比上面那啥端到端正常？？？）

④Framework of BP-GPT:

（这图片还可以再优化一下吧....）

2.3. Method

2.3.1. fMRI to Text Decoding

①Encode fMRI by:

$P_i^B=\mathbf{E}_\eta(x_i^B),$

where $\mathbf{E}_\eta$ denotes encoder, $x_i^B$ denotes fMRI signal.

②BCELoss of fMRI encoder:

$\mathcal{L}_{brain} =-\sum_{i=1}^{N}\log p_{\eta}(W|P_{i}^{B}) \\ =-\sum^{N}\sum^{\mathcal{L}}\log p_{\eta}(w_{j}|p_{1}^{B},\ldots,p_{k}^{B},w_{1},\ldots,w_{j-1})$

③The similarity between positive pair fMRI prompt and text prompt:

$S_p=\exp(cos(P_B^i\cdot P_T^i)/\tau)$

where $\tau$ is temperature hyperparameter

④Negative pairs from different samples, the similarity is calculated by:

$S_n=\exp(\cos(P_B^i\cdot P_B^j)/\tau)+\exp(\cos(P_B^i\cdot P_T^j)/\tau),i\neq j$

⑤The contrastive loss:

$L_{\mathcal{C}}=-\mathbb{E}\left[\log\frac{S_p}{S_n}\right]$

2.3.2. Training

①BCEloss is for training text prompt, and the decoder is trained by:

$L=L_{brain}+\alpha L_{C}$

2.3.3. Inference

①The length of sentence is different from fMRI windows. "当前解决方案在最近的工作中利用字率模型来预测参与者感知的单词数。当生成的文本长度满足字率模型预测的字数时，文本生成过程将停止。虽然这种方法可以解决问题，但它并没有充分利用 LLM 的特性。"

②So they add $ in the real text:

based on TR

2.4. Experiment

2.4.1. Dataset

①Dataset:

A. LeBel, L. Wagner, S. Jain, A. Adhikari-Desai, B. Gupta, A. Morgenthal, J. Tang, L. Xu, and A. G. Huth, “A natural language fmri dataset for voxelwise encoding models,” Scientific Data, vol. 10, no. 1, p. 555, 2023.

②Subjects: they choose 3 from 8

③Situation: passively listened to naturally spoken English stories such as The Month and New York Times Modern Love podcasts

2.4.2. Implementing Details

① $\tau =0.1$

② $\alpha =1$

③Time series windows for fMRI sequence and corresponding text: 20s with no gap

④Length of prompt: $k=30$

⑤Input dimesion of BERT: 512

⑥Layer of Transformer: 8 with 8 head

⑦Optimizer: AdamW

⑧Batch size: 32

2.4.3. Baseline and Evaluation Metrics

①Test set: story “Where There's Smoke”

2.4.4. Evaluation the Text Prompt

①Performance:

2.4.5. Evaluation of fMRI to Text Decoding

①Performance table:

2.4.6. Ablation Study

①Contrastive module ablation:

②Fine tune ablation:

2.5. Conclusion

3. Reference

@article{chen2025bp,title={BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM},author={Chen, Xiaoyu and Du, Changde and Liu, Che and Wang, Yizhe and He, Huiguang},journal={arXiv preprint arXiv:2502.15172},year={2025}
}