|国家预印本平台
首页|On the Complexity of Finding Approximate LCS of Multiple Strings

On the Complexity of Finding Approximate LCS of Multiple Strings

On the Complexity of Finding Approximate LCS of Multiple Strings

来源:Arxiv_logoArxiv
英文摘要

Finding an Approximate Longest Common Substring (ALCS) within a given set $S=\{s_1,s_2,\ldots,s_m\}$ of $m \ge 2$ strings is a problem of particular importance in computational biology (e.g., identifying related mutations across multiple genetic sequences). In this paper, we study several ALCS problems that, for given integers $k$ and $t \le m$, require finding a longest string $u$ -- or a longest substring $u$ of any string in $S$ -- that lies within distance $k$ of at least one substring in $t$ distinct strings of $S$. Although two of these problems, denoted $k$-LCS and \textit{k-t} LCS, are NP-hard, nevertheless restricted variations of them under Hamming and edit distance can be solved in $O(N^2)$ and $O(k\ell N^2)$ time, respectively, where $\ell$ is the length of each string and $N=m\ell$. Further, we show that using the $k$-errata tree data structure, a restricted variation of the ALCS problem under both Hamming and edit distance can be computed in $O(mN\log^k \ell)$ time. We also establish an $O(N^2)$ conditional lower bound for a variation of the ALCS problem under the Strong Exponential Time Hypothesis.

Hamed Hasibi、Neerja Mhaskar、W. F. Smyth

计算技术、计算机技术生物科学研究方法、生物科学研究技术

Hamed Hasibi,Neerja Mhaskar,W. F. Smyth.On the Complexity of Finding Approximate LCS of Multiple Strings[EB/OL].(2025-05-21)[2025-06-27].https://arxiv.org/abs/2505.15992.点此复制

评论