|国家预印本平台
首页|A Python-based optimization framework for high-performance genomics

A Python-based optimization framework for high-performance genomics

A Python-based optimization framework for high-performance genomics

来源:bioRxiv_logobioRxiv
英文摘要

Abstract Exponentially-growing next-generation sequencing data requires high-performance tools and algorithms. Nevertheless, the implementation of high-performance computational genomics software is inaccessible to many scientists because it requires extensive knowledge of low-level software optimization techniques, forcing scientists to resort to high-level software alternatives that are less efficient. Here, we introduce Seq—a Python-based optimization framework that combines the power and usability of high-level languages like Python with the performance of low-level languages like C or C++. Seq allows for shorter, simpler code, is readily usable by a novice programmer, and obtains significant performance improvements over existing languages and frameworks. We showcase and evaluate Seq by implementing seven standard, widely-used applications from all stages of the genomics analysis pipeline, including genome index construction, finding maximal exact matches, long-read alignment and haplotype phasing, and demonstrate its implementations are up to an order of magnitude faster than existing hand-optimized implementations, with just a fraction of the code. By enabling researchers of all backgrounds to easily implement high-performance analysis tools, Seq further opens the door to the democratization and scalability of computational genomics.

Amarasinghe Saman、Shajii Ariya、Leighton Alexander T.、Greenyer Haley、Numanagi? Ibrahim、Berger Bonnie

Computer Science and AI Lab, Massachusetts Institute of TechnologyComputer Science and AI Lab, Massachusetts Institute of TechnologyComputer Science and AI Lab, Massachusetts Institute of TechnologyDepartment of Computer Science, University of VictoriaComputer Science and AI Lab, Massachusetts Institute of Technology||Department of Computer Science, University of VictoriaComputer Science and AI Lab, Massachusetts Institute of Technology||Department of Mathematics, Massachusetts Institute of Technology

10.1101/2020.10.29.361402

生物科学研究方法、生物科学研究技术计算技术、计算机技术生物工程学

Computational genomicssequencinghigh-performancedomain-specific languages

Amarasinghe Saman,Shajii Ariya,Leighton Alexander T.,Greenyer Haley,Numanagi? Ibrahim,Berger Bonnie.A Python-based optimization framework for high-performance genomics[EB/OL].(2025-03-28)[2025-06-29].https://www.biorxiv.org/content/10.1101/2020.10.29.361402.点此复制

评论