Structure-Aware Annotation of Leucine-rich Repeat Domains
Structure-Aware Annotation of Leucine-rich Repeat Domains
Protein domain annotation is typically done by predictive models such as HMMs trained on sequence motifs. However, sequence-based annota- tion methods are prone to error, particularly in calling domain bound- aries and motifs within them. These methods are limited by a lack of structural information accessible to the model. With the advent of deep learning-based protein structure prediction, we aim to leverage the ge- ometry of protein structures to assist in domain annotation and enhance existing sequence-based annotation. We develop dimensionality reduction methods to annotate repeat units of the Leucine Rich Repeat solenoid do- main. The methods are able to correct mistakes made by existing machine learning-based annotation tools and enable the automated detection of hairpin loops and structural anomalies in the solenoid. The methods are applied to 127 predicted structures of LRR-containing intracellular innate immune proteins in the model plant Arabidopsis thaliana and validated against a benchmark dataset of 172 manually-annotated LRR domains.
Xu Boyan、Cerbu Alois、Krasileva Ksenia、Lim Daven、Tralie Christopher J
分子生物学
Xu Boyan,Cerbu Alois,Krasileva Ksenia,Lim Daven,Tralie Christopher J.Structure-Aware Annotation of Leucine-rich Repeat Domains[EB/OL].(2025-03-28)[2025-05-02].https://www.biorxiv.org/content/10.1101/2023.10.27.562987.点此复制
评论