robustica : customizable robust independent component analysis
robustica : customizable robust independent component analysis
ABSTRACT MotivationIndependent Component Analysis (ICA) allows the dissection of omic datasets into modules that help to interpret global molecular signatures. The inherent randomness of this algorithm can be overcome by clustering many iterations of ICA together to obtain robust components. Existing algorithms for robust ICA are dependent on the choice of clustering method and on computing a potentially biased and large Pearson distance matrix. ResultsWe present robustica, a Python-based package to compute robust independent components with a fully customizable clustering algorithm and distance metric. Here, we exploited its customizability to revisit and optimize robust ICA systematically. From the 6 popular clustering algorithms considered, DBSCAN performed the best at clustering independent components across ICA iterations. After confirming the bias introduced with Pearson distances, we created a subroutine that infers and corrects the components’ signs across ICA iterations to enable using Euclidean distance. Our subroutine effectively corrected the bias while simultaneously increasing the precision, robustness, and memory efficiency of the algorithm. Finally, we show the applicability of robustica by dissecting over 500 tumor samples from low-grade glioma (LGG) patients, where we define a new gene expression module with the key modulators of tumor aggressiveness downregulated upon IDH1 mutation. Availability and implementationrobustica is written in Python under the open-source BSD 3-Clause license. The source code and documentation are freely available at https://github.com/CRG-CNAG/robustica. Additionally, all scripts to reproduce the work presented are available at https://github.com/MiqG/publication_robustica. Contactmiquel.anglada@crg.eu
Serrano Luis、Anglada-Girotto Miquel、Head Sarah A.、Miravet-Verde Samuel
Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology||Universitat Pompeu Fabra (UPF)||ICREACentre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyCentre for Genomic Regulation (CRG), The Barcelona Institute of Science and TechnologyCentre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology
生物科学研究方法、生物科学研究技术分子生物学计算技术、计算机技术
bioinformaticsindependent component analysisclusteringunsupervised learninglow-grade gliomaPython
Serrano Luis,Anglada-Girotto Miquel,Head Sarah A.,Miravet-Verde Samuel.robustica : customizable robust independent component analysis[EB/OL].(2025-03-28)[2025-04-26].https://www.biorxiv.org/content/10.1101/2021.12.10.471891.点此复制
评论