基于多尺度序列网络与多轨道图网络的平均融合用于蛋白质功能预测
Average Fusion of Multi-scale Sequence and Multi-track Graph Networks for Protein Function Prediction
何哲 1宁乔 1邓赵红1
作者信息
- 1. 江南大学人工智能与计算机学院,无锡 214122
- 折叠
摘要
准确预测蛋白质功能对于解析生物过程与发现潜在药物靶点具有重要意义。尽管预训练蛋白语言模型在序列表征上取得显著进展,但单纯依赖序列嵌入容易忽略蛋白质的三维拓扑信息;相反,基于图神经网络的结构方法虽能刻画空间邻接关系,但在语义层面上可能错失预训练模型提供的上下文信息。为同时利用这两类互补信息,本文提出SGAFGO,一种模块化的序列-结构并行框架。该框架由两条并行通路构成:多尺度序列网络仅以预训练蛋白语言模型生成的残基嵌入为输入,通过并行的不同感受视野一维卷积与残差连接提取多尺度序列语义并汇聚为序列级预测;多轨道图网络在蛋白质接触图上并行构建若干轨道(包括残差式图卷积、图注意力与学习性池化轨道),以多视角学习结构敏感的图级表示并生成结构级预测。两条通路在预测级别采用算术平均进行融合,该融合不引入额外可学习参数,作为默认的融合策略以降低融合相关的训练与复现复杂度。广泛的基准测试与消融实验表明,相较于单一路径基线与融合基线,SGAFGO在标准GO功能注释任务上均实现了稳健的性能提升,系统性的模块消融进一步验证了各组件间的互补机制。
Abstract
Accurate prediction of protein function is of paramount importance for deciphering biological processes and identifying potential drug targets. While pre-trained protein language models (PLMs) have made significant strides in sequence representation, relying solely on sequence embeddings tends to overlook the three-dimensional topological information of proteins. Conversely, structure-based methods utilizing Graph Neural Networks (GNNs) capture spatial adjacency but may miss the rich contextual information provided by pre-trained models at the semantic level. To leverage these two complementary types of information, this work proposes SGAFGO-a modular, parallel sequence-structure framework. The framework consists of two parallel branches: Multi-scale Sequence Network (MSSN): This branch takes residue embeddings generated by PLMs as input. It extracts multi-scale sequence semantics through parallel 1D convolutions with varying receptive fields and residual connections, aggregating them into a sequence-level prediction.Multi-track Graph Network (MTGN): This branch constructs several parallel tracks (including residual Graph Convolutional, Graph Attention, and learnable pooling tracks) on protein contact maps to learn structure-sensitive graph-level representations from multiple perspectives, generating a structure-level prediction.The two pathways are fused at the prediction level using an arithmetic mean. This fusion strategy introduces no additional learnable parameters and serves as the default approach to reduce the complexity of training and reproduction.Extensive benchmarks and ablation experiments demonstrate that SGAFGO achieves robust performance improvements over both single-path and fusion baselines in standard GO (Gene Ontology) functional annotation tasks; systematic module ablation further validates the complementary mechanisms among components.关键词
生物信息学/序列-结构融合/预训练蛋白语言模型/多尺度卷积/多轨道图网络Key words
bioinformatics/sequence-structure fusion/multi-scale convolutions/multi-track graph networks引用本文复制引用
何哲,宁乔,邓赵红.基于多尺度序列网络与多轨道图网络的平均融合用于蛋白质功能预测[EB/OL].(2026-02-06)[2026-02-08].http://www.paper.edu.cn/releasepaper/content/202602-41.学科分类
生物科学理论、生物科学方法/生物科学研究方法、生物科学研究技术
评论