[arXiv 2025]BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用


1. 心得




2. 论文逐段精读

2.1. Abstract


        ②They proposed Brain Prompt GPT (BP-GPT) to decoding fMRI by aligning fMRI and text

2.2. Introduction


“The limits of my language mean the limits of my world” - Ludwig Wittgenstein.


        ②The frequency of pronouncing is different from BOLD reaction

        ③Chanllenge: decoding multi words in one repetition time (TR)(这个现存问题不比上面那啥端到端正常???)

        ④Framework of BP-GPT:


2.3. Method

2.3.1. fMRI to Text Decoding

        ①Encode fMRI by:


where \mathbf{E}_\eta denotes encoder, x_i^B denotes fMRI signal.

        ②BCELoss of fMRI encoder:

\mathcal{L}_{brain} =-\sum_{i=1}^{N}\log p_{\eta}(W|P_{i}^{B}) \\ =-\sum^{N}\sum^{\mathcal{L}}\log p_{\eta}(w_{j}|p_{1}^{B},\ldots,p_{k}^{B},w_{1},\ldots,w_{j-1})

        ③The similarity between positive pair fMRI prompt and text prompt:

S_p=\exp(cos(P_B^i\cdot P_T^i)/\tau)

where \tau is temperature hyperparameter

        ④Negative pairs from different samples, the similarity is calculated by:

S_n=\exp(\cos(P_B^i\cdot P_B^j)/\tau)+\exp(\cos(P_B^i\cdot P_T^j)/\tau),i\neq j

        ⑤The contrastive loss:


2.3.2. Training

        ①BCEloss is for training text prompt, and the decoder is trained by:

L=L_{brain}+\alpha L_{C}

2.3.3. Inference

        ①The length of sentence is different from fMRI windows. "当前解决方案在最近的工作中利用字率模型来预测参与者感知的单词数。当生成的文本长度满足字率模型预测的字数时,文本生成过程将停止。虽然这种方法可以解决问题,但它并没有充分利用 LLM 的特性。"

        ②So they add $ in the real text:

based on TR

2.4. Experiment

2.4.1. Dataset


A. LeBel, L. Wagner, S. Jain, A. Adhikari-Desai, B. Gupta, A. Morgenthal, J. Tang, L. Xu, and A. G. Huth, “A natural language fmri dataset for voxelwise encoding models,” Scientific Data, vol. 10, no. 1, p. 555, 2023.

        ②Subjects: they choose 3 from 8

        ③Situation: passively listened to naturally spoken English stories such as The Month and New York Times Modern Love podcasts

2.4.2. Implementing Details

        ①\tau =0.1

        ②\alpha =1

        ③Time series windows for fMRI sequence and corresponding text: 20s with no gap

        ④Length of prompt: k=30

        ⑤Input dimesion of BERT: 512

        ⑥Layer of Transformer: 8 with 8 head

        ⑦Optimizer: AdamW

        ⑧Batch size: 32

2.4.3. Baseline and Evaluation Metrics

        ①Test set: story “Where There's Smoke”

2.4.4. Evaluation the Text Prompt


2.4.5. Evaluation of fMRI to Text Decoding

        ①Performance table:

2.4.6. Ablation Study

        ①Contrastive module ablation:

        ②Fine tune ablation:

2.5. Conclusion


3. Reference

@article{chen2025bp,title={BP-GPT: Auditory Neural Decoding Using fMRI-prompted LLM},author={Chen, Xiaoyu and Du, Changde and Liu, Che and Wang, Yizhe and He, Huiguang},journal={arXiv preprint arXiv:2502.15172},year={2025}



