|国家预印本平台
首页|A Scalable Tool For Analyzing Genomic Variants Of Humans Using Knowledge Graphs and Machine Learning

A Scalable Tool For Analyzing Genomic Variants Of Humans Using Knowledge Graphs and Machine Learning

A Scalable Tool For Analyzing Genomic Variants Of Humans Using Knowledge Graphs and Machine Learning

来源:Arxiv_logoArxiv
英文摘要

The integration of knowledge graphs and graph machine learning (GML) in genomic data analysis offers several opportunities for understanding complex genetic relationships, especially at the RNA level. We present a comprehensive approach for leveraging these technologies to analyze genomic variants, specifically in the context of RNA sequencing (RNA-seq) data from COVID-19 patient samples. The proposed method involves extracting variant-level genetic information, annotating the data with additional metadata using SnpEff, and converting the enriched Variant Call Format (VCF) files into Resource Description Framework (RDF) triples. The resulting knowledge graph is further enhanced with patient metadata and stored in a graph database, facilitating efficient querying and indexing. We utilize the Deep Graph Library (DGL) to perform graph machine learning tasks, including node classification with GraphSAGE and Graph Convolutional Networks (GCNs). Our approach demonstrates significant utility using our proposed tool, VariantKG, in three key scenarios: enriching graphs with new VCF data, creating subgraphs based on user-defined features, and conducting graph machine learning for node classification.

Praveen Rao、Ajay Kumar、Deepthi Rao、Eduardo Simoes、Shivika Prasanna

生物科学研究方法、生物科学研究技术计算技术、计算机技术基础医学

Praveen Rao,Ajay Kumar,Deepthi Rao,Eduardo Simoes,Shivika Prasanna.A Scalable Tool For Analyzing Genomic Variants Of Humans Using Knowledge Graphs and Machine Learning[EB/OL].(2024-07-30)[2025-08-02].https://arxiv.org/abs/2407.20879.点此复制

评论