Diffusion Decoding for Peptide De Novo Sequencing
Diffusion Decoding for Peptide De Novo Sequencing
Peptide de novo sequencing is a method used to reconstruct amino acid sequences from tandem mass spectrometry data without relying on existing protein sequence databases. Traditional deep learning approaches, such as Casanovo, mainly utilize autoregressive decoders and predict amino acids sequentially. Subsequently, they encounter cascading errors and fail to leverage high-confidence regions effectively. To address these issues, this paper investigates using diffusion decoders adapted for the discrete data domain. These decoders provide a different approach, allowing sequence generation to start from any peptide segment, thereby enhancing prediction accuracy. We experiment with three different diffusion decoder designs, knapsack beam search, and various loss functions. We find knapsack beam search did not improve performance metrics and simply replacing the transformer decoder with a diffusion decoder lowered performance. Although peptide precision and recall were still 0, the best diffusion decoder design with the DINOISER loss function obtained a statistically significant improvement in amino acid recall by 0.373 compared to the baseline autoregressive decoder-based Casanovo model. These findings highlight the potential of diffusion decoders to not only enhance model sensitivity but also drive significant advancements in peptide de novo sequencing.
Chi-en Amy Tai、Alexander Wong
生物科学研究方法、生物科学研究技术
Chi-en Amy Tai,Alexander Wong.Diffusion Decoding for Peptide De Novo Sequencing[EB/OL].(2025-07-15)[2025-07-23].https://arxiv.org/abs/2507.10955.点此复制
评论