Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements.
C. Titus Brown、Rosangela Canino-Koning、Susannah Tringe、Rachel Mackelprang、James M. Tiedje、Jason Pell、Adina Chuang Howe、Janet Jansson
生物科学研究方法、生物科学研究技术分子生物学遗传学
C. Titus Brown,Rosangela Canino-Koning,Susannah Tringe,Rachel Mackelprang,James M. Tiedje,Jason Pell,Adina Chuang Howe,Janet Jansson.Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets[EB/OL].(2012-12-01)[2025-05-02].https://arxiv.org/abs/1212.0159.点此复制
评论