stat940W25-presentation
a
Group 24 Presentation: Mitigating the Missing Fragmentation Problem in De Novo Peptide Sequencing With A Two-Stage Graph-Based Deep Learning Model
Paper Citation
Mao, Z., Zhang, R., Xin, L. et al. Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model. Nat Mach Intell 5, 1250–1260 (2023). https://doi.org/10.1038/s42256-023-00738-x
https://www.nature.com/articles/s42256-023-00738-x#citeas
Background
- Proteins are crucial for biological functions
- Proteins are formed from peptides which are sequences of amino acids
- Mass spectrometry is used to analyze peptide sequences
- De Novo sequencing is used to piece together peptide sequences when the sequences are missing from existing established protein databases
- Deep learning has become commonly implimented to solve the problem of de-novo peptide sequencing
- When a peptide fails to fragment in the expected manner, it can make protein reconstruction difficult due to missing data
- One error in the protein can propogate to errors throughout the entire sequence
Paper Contributions
- Graph Novo was developed to handle incomplete segments
- GraphNovo-PathSearcher instead of directly predicting, does a path search method to predict the next peptide in a sequence
- A graph neural network is used to find the best path from the graph generated from the mass spectrometry input
- GraphNovo-SeqFiller instead of directly predicting, does a path search method to predict the next peptide in a sequence.
- It's expected that some peptides/ amino acids may have been missed, SeqFiller uses a transformer to add in amino acids which have been missed from PathSearcher
- Input is mass spectrum from mass spectrometry
- Graph construction is done where nodes represent possible fragments, and edges represent possible peptides (PathSearcher module)
- PathSearcher uses machine learning to find the optimal path on the generated graph
- SeqFiller fills in missing amino acids that may have not been included in the PathSearcher module due to lacking data from the mass spectrometry inputs