PSP - MetaPredict 预测蛋白质序列的内源性无序区域 (Intrinsically Disordered Regions)

news/2024/11/22 21:27:44/



MetaPredict 算法简介:

内源性无序区域(IDRs)在所有生命领域中都普遍存在,并发挥着各种功能作用。虽然折叠结构域通常可以用一个三维结构来很好地描述,但是 IDRs 存在于一系列相互转化的状态中,称为集合体。这种结构异质性意味着 IDRs 在 PDB 中大部分缺失,导致缺乏从序列预测集合体构象特性的计算方法。在这里,我们结合了合理的序列设计、大规模的分子模拟和深度学习,开发 ALBATROSS,一个用于从序列预测 IDR 集合体尺寸的深度学习模型。ALBATROSS 能够瞬间预测蛋白质组范围内的集合体平均特性。ALBATROSS 轻量、易用,既可以作为一个本地安装的软件包,也可以作为一个点击式的云端界面。我们首先通过检验IDRs中序列-集合体关系的泛化性,来展示我们的预测器的适用性。然后,我们利用 ALBATROSS 的高通量特性,来表征 IDRs 在蛋白质组内外的新出现的生物物理行为。

使用工具 metapredict

  • GitHub:metapredict: A machine learning-based tool for predicting protein disorder.

  • 使用文档:

Paper:Direct prediction of intrinsically disordered protein conformational properties from sequence

  • 更新时间 2023-05-28


参考的测试文档,来自 @盼盼:

1. 工程配置

测试 T1157s1_A1029.fasta,来自于 CASP15 :


安装 Python 包:

 pip install metapredict==2.61import metapredict as meta

2. 函数调用

2.1 核心函数 Predict Disorder Batch

测试,计算 Residue Disorder 的概率值0~1,值越大表示越可能是 Disorder 位点,阈值取 0.5,映射成0、1二值化,1 表示 disorder,0 表示 fold。

def predict_batch(seq_list):if not isinstance(seq_list, list):seq_list = [seq_list]output = meta.predict_disorder_batch(seq_list)assert len(seq_list) == len(output)res_list = []for sample in output:sample_disorder = sample[1]print(f"disorder range: {np.min(sample_disorder)}~{np.max(sample_disorder)}")sample_disorder_idx = list(np.where(sample_disorder > 0.5, 1, 0))print(f"sample_disorder_idx: {sample_disorder_idx}")# 获取 disorder 区间d_list, tmp_list = [], []for i, v in enumerate(sample_disorder_idx):if v == 1:  # 无序tmp_list.append(i)else:if tmp_list:d_list.append(copy.copy(tmp_list))tmp_list = []domain = []for r in d_list:if (r[-1] - r[0]) >= 2:domain.append([r[0], r[-1]])# seq, disorder_idx, domainres_list.append([sample[0], sample_disorder_idx, domain])return res_list

其中,阈值 0.5,来自于算法建议值。

Altering the disorder theshhold - To alter the disorder threshold, simply set disorder_threshold=my_value where my_value is a float. The higher the threshold value, the more conservative metapredict will be for designating a region as disordered. Default = 0.5 (V2) and 0.42 (legacy).


disorder range: 0.0~0.9868000149726868
disorder list:
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
disordered domains:
[[224, 230], [553, 570], [772, 810], [832, 880]]

2.2 Disorder Domains (Official)

predict_disorder_domains 的 disorder 预测值与 predict_disorder_batch 一致,只是选择 domains 的条件不同,范围更大。


output = meta.predict_disorder_domains(seq)
print(f"disorder: {output.disorder}")
print(f"disordered_domain_boundaries: {output.disordered_domain_boundaries}")
for boundary in output.disordered_domain_boundaries:s = boundary[0]e = boundary[1]print(f"disorder: {np.min(output.disorder[s:e])} ~ {np.max(output.disorder[s:e])}")
print(f"folded_domain_boundaries: {output.folded_domain_boundaries}")
print(f"disordered_domains: {output.disordered_domains}")
print(f"folded_domains: {output.folded_domains}")


DisorderObject for sequence with 1029 residues, 2 IDRs, and 3 folded domains
Available dot variables are:.sequence.disorder.disordered_domain_boundaries.folded_domain_boundaries.disordered_domains.folded_domainsdisorder: [0.5593 0.5207 0.4646 ... 0.7331 0.7067 0.6378]
disordered_domain_boundaries: [[553, 571], [772, 880]]
disorder: 0.5241 ~ 0.854
disorder: 0.2163 ~ 0.9868
folded_domain_boundaries: [[0, 553], [571, 772], [880, 1029]]

2.3 Predict pLDDT

metapredict 支持预测 Residue 的 pLDDT 值,用于评估序列的质量。


output = meta.predict_pLDDT(seq)
print(f"mean_plddt: {np.mean(output)}")

2.4 绘图 Graph Disorder

直接调用 graph_disorder,即可绘图:

meta.graph_disorder(seq, pLDDT_scores=True)

Disorder Scores 与 pLDDT 成负相关,绘图如下:


3. 测试结构

使用 ChimeraX 测试结构,命令脚本如下:

def get_chimerax_select_cmd(cls, seq_list, mod_num="1"):"""[kaɪˈmɪrə]select #1:553-571 #1:772-880"""res_list = cls.predict_batch(seq_list)r_str_list = []for res in res_list:domains = res[2]c_list = ["select"]for domain in domains:c_str = f"#{mod_num}:{domain[0]}-{domain[1]}"c_list.append(c_str)r_str = " ".join(c_list)r_str_list.append(r_str)return r_str_list


ChimeraX: select #1:224-230 #1:553-570 #1:772-810 #1:832-880





  • Paper - Direct Prediction of Intrinsically Disordered Protein Conformational Properties From Sequence

  • metapredict online (v2.3)

  • chimerax-example-scripts-commands

关于 ChatGPT 的翻译 Prompt:



#!/usr/bin/env python
# -- coding: utf-8 --
Copyright (c) 2022. All rights reserved.
Created by C. L. Wang on 2023/6/29
import copy
import osimport metapredict as meta
import numpy as npfrom protein_utils.seq_utils import get_seq_from_fasta
from root_dir import DATA_DIRclass SeqIdrPredictor(object):"""序列的 IDRs 区域预测pip install metapredict==2.61"""def __init__(self):pass@staticmethoddef predict_batch(seq_list):"""核心函数"""if not isinstance(seq_list, list):seq_list = [seq_list]output = meta.predict_disorder_batch(seq_list)assert len(seq_list) == len(output)res_list = []for sample in output:sample_disorder = sample[1]print(f"disorder range: {np.min(sample_disorder)}~{np.max(sample_disorder)}")sample_disorder_idx = list(np.where(sample_disorder > 0.5, 1, 0))# 获取 disorder 区间d_list, tmp_list = [], []for i, v in enumerate(sample_disorder_idx):if v == 1:  # 无序tmp_list.append(i)else:if tmp_list:d_list.append(copy.copy(tmp_list))tmp_list = []domains = []for r in d_list:if (r[-1] - r[0]) >= 2:domains.append([r[0], r[-1]])# seq, disorder_idx, domainres_list.append([sample[0], sample_disorder_idx, domains])return res_list@classmethoddef get_chimerax_select_cmd(cls, seq_list, mod_num="1"):"""[kaɪˈmɪrə]select #1:553-571 #1:772-880"""res_list = cls.predict_batch(seq_list)r_str_list = []for res in res_list:domains = res[2]c_list = ["select"]for domain in domains:c_str = f"#{mod_num}:{domain[0]}-{domain[1]}"c_list.append(c_str)r_str = " ".join(c_list)r_str_list.append(r_str)return r_str_list@staticmethoddef predict_disorder_domains(seq, is_print=False):output = meta.predict_disorder_domains(seq)if is_print:output = meta.predict_disorder_domains(seq)print(output)print(f"disorder: {output.disorder}")print(f"disordered_domain_boundaries: {output.disordered_domain_boundaries}")for boundary in output.disordered_domain_boundaries:s = boundary[0]e = boundary[1]print(f"disorder: {np.min(output.disorder[s:e])} ~ {np.max(output.disorder[s:e])}")print(f"folded_domain_boundaries: {output.folded_domain_boundaries}")print(f"disordered_domains: {output.disordered_domains}")print(f"folded_domains: {output.folded_domains}")return outputdef main():fasta_dir = os.path.join(DATA_DIR, "CASP15-Monomer-Targets-56", "fasta")fasta_path = os.path.join(fasta_dir, "T1157s1_A1029.fasta")seq = get_seq_from_fasta(fasta_path)[0]sip = SeqIdrPredictor()res_list = sip.predict_batch(seq)print(f"seq:\n{res_list[0][0]}")print(f"disorder list:\n{res_list[0][1]}")print(f"disordered domains:\n{res_list[0][2]}")sip.predict_disorder_domains(seq, is_print=True)output = meta.predict_pLDDT(seq)print(f"mean_plddt: {np.mean(output)}")meta.graph_disorder(seq, pLDDT_scores=True)r_str_list = sip.get_chimerax_select_cmd(seq)print(f"ChimeraX: {r_str_list[0]}")if __name__ == '__main__':main()



1、打包好的项目: 首先将打包好的项目放置public下,如下图 2、nginx配置文件 不带注释的伪静态(推荐) 备注:若在 location /admin 中的 admin 后面不加 “斜杠/”,则会出现访问 /admin-user 路由&#x…


在wpf中,有时会遇到如下错误: System.Windows.Markup.XamlParseException:““在“System.Windows.Baml2006.TypeConverterMarkupExtension”上提供值时引发了异常。”,行号为“2509”,行位置为“47”。” IOException: 找不到资源…


使用SQLyog连接数据库时报错: error number: 2003, Cant no connet to MySQL server on 192.168.186.X systemctl stop firewalld //关闭防火墙 systemctl disable firewalld error number:1130,Host is not allowed to connect to …

mysql 2509错误解决方法

在Navicat中进行连接测试时,发现报错2509,还有乱码! mysql 2509 加密方式导致的报错,在8以后的版本默认的加密方式都改为了caching_sha2_password 此时要更改加密方式 1.进入mysql的命令行界面,选择mysql数据库 us…

C2589 C2059

C2589&#xff1a;“(”:“::”右边的非法标记 C2059&#xff1a;语法错误&#xff1a;“::” ......include\QtCore\qdatetime.h 解决方案&#xff1a;在qdatetime.h文件中114行&#xff0c;修改如下 static inline qint64 nullJd() { return (std::numeric_limits<qint…

window 10 安装node.js时遇到2502 2503错误解决方法

最近想安装一下node.js, 可是在安装过程中出现了2503和2502的问题, 如下图: 不过除了这些代码外&#xff0c;微软并没有提供解决办法。这一问题出现在Win7/Win8.1/Win10中&#xff0c;原因就是C:\Windows\Temp文件夹NTFS权限错误。 为了能够让Windows Installer操作正确&#…

hdu 2509

博弈相关知识 #include <iostream> #include <cstring> #include <cstdio>using namespace std;int main() {int n;int m, s, flag;while(~scanf("%d", &n)){s 0;flag 0;for(int i 0; i < n; i){cin>>m;s ^ m;if(m > 1)flag 1…

本地win10安装的MySQL8.0.12用navicat12报错 2509 -Authentication plugin ' caching_sha2_password' cannot be :

本地安装了MySQL8.0.12用navicatl 12报错了&#xff0c;报错请款如下: 网上很多的方法都是说在my.ini 文件中添加 default_authentication_pluginmysql_native_password&#xff1b; 我知道有的人的是可以的&#xff0c;但是我的报错 然后有需要修改my.ini文件 最后一行加上…