Computing maximal palindromes in non-standard matching models Takuya Mieno1 Mitsuru Funakoshi2 Yuto Nakashima2 Shunsuke Inenaga2

2025-04-27 0 0 906.06KB 20 页 10玖币
侵权投诉
Computing maximal palindromes in non-standard matching
models
Takuya Mieno1, Mitsuru Funakoshi2, Yuto Nakashima2, Shunsuke Inenaga2,
Hideo Bannai3, and Masayuki Takeda2
1The University of Electro-Communications, Japan
2Kyushu University, Japan
3Institute of Science Tokyo, Japan
Abstract
Palindromes are popular and important objects in textual data processing, bioinformat-
ics, and combinatorics on words. Let S=XaY be a string where Xand Yare of the same
length, and ais either a single character or the empty string. Then, there exist two alterna-
tive definitions for palindromes: Sis said to be a palindrome if Sis equal to its reversal SR
(Reversal-based definition); or if its right-arm Yis equal to the reversal of its left-arm XR
(Symmetry-based definition). It is clear that if the “equality” () used in both definitions is
exact character matching (=), then the two definitions are the same. However, if we apply other
string-equality criteria , including the complementary-matching model for biological sequences,
the Cartesian-tree model [Park et al., TCS 2020], the parameterized model [Baker, JCSS 1996],
the order-preserving model [Kim et al., TCS 2014], and the palindromic-structure model [I et
al., TCS 2013], then are the reversal-based palindromes and the symmetry-based palindromes
the same? To the best of our knowledge, no previous work has considered or answered this
natural question. In this paper, we first provide answers to this question, and then present
efficient algorithms for computing all maximal palindromes under the non-standard matching
models in a given string. After confirming that Gusfield’s offline suffix-tree-based algorithm
for computing maximal symmetry-based palindromes can be readily extended to the aforemen-
tioned matching models, we show how to extend Manacher’s online algorithm for computing
maximal reversal-based palindromes in linear time for all the aforementioned matching models.
1 Introduction
Finding characteristic patterns in strings, such as tandem repeat, unique factor, and palindrome, is
a fundamental task in string data processing. Of course, what kind of characteristic string we want
depends on the application domain. For example, when strings represent DNA/RNA sequences,
tandem repeats can help find genetic diseases [15], unique factors may be helpful in preprocessing
PCR primer design [31], and (gapped) palindromes may represent secondary structures of RNA
called hairpin [15]. Let us consider the case when a string represents a stock chart, which is a
numerical sequence. In this case, one is less interested in their exact values (i.e., exact prices) than
in finding price fluctuations from the stock chart. Motivated by discovering such fluctuations, Park
et al. [30] introduced a notion of Cartesian-tree match. The Cartesian-tree of a numeric string S
Current affiliation: NTT Communication Science Laboratories, Japan.
1
arXiv:2210.02067v4 [cs.DS] 15 Feb 2025
of length nis an ordered binary tree recursively defined below: the root is the leftmost occurrence
iof the smallest value in S, the left child of the root is the Cartesian-tree of S[1..i 1], and
the right child of the root is the Cartesian-tree of S[i+ 1..n]. We say that two strings of equal
length Cartesian-tree match if their Cartesian-trees are isomorphic. Cartesian-tree matching can
capture shapes of stock chart patterns such as head-and-shoulder [30]. Also, some significant stock
chart patterns exhibit palindrome-like structures (e.g., head-and-shoulder and double-top). Hence,
enumerating such palindrome-like structures in a stock chart is expected to improve the efficiency
of chart pattern searches. The first motivation of our research is to explore efficient methods to
compute such palindrome-like structures by applying Cartesian-tree matching.
First, let us revisit the definition of palindromes in the context of the standard matching model,
that is, the “equality” () between two strings is the exact matching on a character-by-character
basis (=). There are two common definitions for palindromes: Let S=XaY be a string, where X
and Yare the same length, and ais either a single character or the empty string. Then, Sis said
to be a palindrome if:
Reverse-based definition: Sis equal to its reversal SR;
Symmetry-based definition: Yis equal to the reversal of X.
It is clear that if we assume the standard matching model, then the two definitions are the same,
meaning that a string Sis a reverse palindrome iff Sis a symmetric palindrome. Observe this
with examples such as racecar and noon. However, the two definitions above differ if we apply the
Cartesian-tree model. For instance, the Cartesian-tree of S1=223 is the tree with no branches that
slopes down to the right while the Cartesian-tree of SR
1=322 is shaped like a logical AND symbol ()
because the leftmost smallest element in S1is the second one. Thus, S1is not a reverse palindrome.
On the other hand, S1=223 is a symmetric palindrome since the Cartesian-trees of its right-half
3and the reversal of its left-half (2)R=2are both singletons. In addition, the two definitions are
also different if we apply the complementary-matching model where Amatches T, and Gmatches C,
which is a standard matching model for DNA sequences. For instance, a string S2=AGTCT is not a
reverse palindrome since S2[3] = Tand SR
2[3] = Tdo not complementary-match, but is a symmetric
palindrome since (AG)R=GA and CT complementary-match. Now we raise the following question: If
we apply other string-equality criteria , including the parameterized model [4], the order-preserving
model [22], and the palindromic-structure model [18], then are the reverse palindromes and the
symmetric palindromes the same? This question is interesting because, while the symmetry-based
palindrome definition requires only XRY, the reversal-based palindrome definition requires more
strict matching under so that SSRXaY (XaY )RXaY YRaXR. Thus, these two
types of palindromes can be quite different for some matching criterion (see also Figure 1). To
our knowledge, no previous work has considered or answered this natural question. In this paper,
we first provide quick answers to this question and present efficient algorithms for computing such
palindromes in a given string under the non-standard matching models.
One of the well-studied topics regarding palindromes is maximal palindromes, which are substring
palindromes whose left-right extensions are not palindromes. It is interesting and important to find
all maximal palindromes in a given string Tbecause any substring palindrome of Tcan be obtained
by removing an equal number of characters from the left and the right of some maximal palindrome.
Hence, by computing all maximal palindromes of a string, we obtain a compact representation of
all palindromes in the string. Manacher [25] showed an elegant algorithm for finding all maximal
palindromes in a given string T. Manacher’s algorithm works in an online manner (processes
the characters of Tfrom left to right) and runs in O(n)time and space for general (unordered)
alphabets, where nis the length of the input string T. Later, Gusfield [15] showed another famous
2
Complementary
Parameterized
Order-Preserving
Cartesian-Tree
Palindromic-Structure
Symmetric Palindromes Reverse Palindromes
ACCTGCAGGTAATCGGATT
abccbaddababccbbddbc
accbaabccaaccbabcddb
decfadbdc
badbcdbcbd
bbdacdbcbd
abbcbadbabbcbaacabbabdbbc
(inward)
(outward)
Exactabcbbdbbcbaabcbbdbbcba
Figure 1: This figure shows examples of a palindrome in each non-standard matching model.
A bijection fsuch that f(a) = c,f(b) = b,f(c) = d,f(d) = agives the parameterized sym-
palindrome. A bijection gsuch that g(a) = b,g(b) = a,g(c) = d,g(d) = cgives the parameterized
rev-palindrome. In the palindromic-structure sym-palindrome, though palindromes bb,abba, and
bab exist, these palindromes are ignored in this symmetric condition.
algorithm for computing all maximal palindromes. Gusfield’s algorithm uses the suffix tree [38]
built on a concatenation of Tand its reversal TRthat is enhanced with an LCA (lowest common
ancestor) data structure [33]. Gusfield’s method is offline (processes all the characters of Ttogether)
and works in O(n)time and space for linearly-sortable alphabets, including integer alphabets of
size poly(n). We remark that Gusfield’s algorithm uses only the symmetry-based definition for
computing maximal palindromes. That is, computing the longest common prefix of T[c..n]and
(T[1..c 1])Rfor each integer position cin Tgives us the maximal palindrome of even length
centered at c0.5. Maximal palindromes of odd lengths can be computed analogously. On the
other hand, Manacher’s algorithm (which will be briefly recalled in a subsequent section) uses both
reversal-based and symmetry-based definitions for computing maximal palindromes.
In this paper, we propose a new framework for formalizing palindromes under the non-standard
matching models, which we call Substring Consistent Symmetric and Two-Transitive Relations (SC-
STTRs) that include the complementary-matching model and Substring Consistent Equivalence
Relations (SCERs) [27]. We note that SCERs include the Cartesian-tree model [30], the param-
eterized model [4], the order-preserving model [22], and the palindromic-structure model [18]. As
far as we are aware, the existing algorithms are designed only for computing standard maximal
palindromes based on exact character matching, and it is not clear how they can be adapted for
the aforementioned palindromes under the non-standard matching models. Our first claim is that
Gusfield’s framework is easily extensible for palindromes under the non-standard matching models:
if one has a suffix-tree-like data structure that is built on a suitable encoding for a given matching
criterion enhanced with the LCA data structure, then one can readily compute all maximal sym-
metric palindromes under the non-standard matching models in O(n)time (the details are shown
in Section 3). Thus, the construction time for the suffix-tree-like data structure dominates the total
time complexity (see Table 1). On the other hand, extending Manacher’s algorithm for palindromes
under the non-standard matching models is much more involved. The main contribution of this pa-
per is to design Manacher-style algorithms that compute all maximal reverse palindromes in O(n)
time under all the non-standard matching models considered in this paper (see Table 1). We remark
that a straightforward implementation of the extension of Manacher’s algorithm requires quadratic
time. We then reduce the time complexity to (near) linear by exploiting and utilizing combinato-
3
Matching
Type Symmetric Palindromes Reverse Palindromes
Exact O(n)time [15] O(n)time [25]
(linearly sortable) (general unordered)
Complementary O(n)time O(n)time
(linearly sortable) (general unordered)
Cartesian-Tree O(nlog n)time O(n)time
(general ordered) (linearly sortable)
Parameterized O(nlog(σ+π)) time O(n)time
(linearly sortable) (linearly sortable)
Order-Preserving O(nlog log2n/ log log log n)time O(n)time
(linearly sortable) (general ordered)
Palindromic-Structure O(nmin{log n, log σ/ log log σ})time O(n)time
(general unordered) (general unordered)
Table 1: Time complexities of algorithms for computing each type of palindromes, where ndenotes
the length of the input string, σis the (static) alphabet size, and πis the size of the parameterized
alphabet for parameterized matching. The time complexities are valid under some assumptions for
the alphabet, which are designated in parentheses. Each of the algorithms uses O(n)space.
rial properties of the non-standard matching models (in particular, the Cartesian-tree model, the
parameterized model, and the palindromic-structure model require non-trivial elaborations).
This paper is a full version of a preliminary version [11]. This paper includes complete proofs
of theorems and possible future work, which were omitted in the preliminary version.
Related work The Cartesian-tree matching problem is introduced by Park et al. [30]. They
showed a linear-time algorithm to solve the problem and also proposed an O(n)-space index for
the problem, which can be constructed in O(nlog n)time. Kim and Cho [23] developed a compact
3n+o(n)-bits index. Nishimoto et al., [28] proposed another O(n)-space index, which can be con-
structed in O(nlog σ)time where σis the alphabet size. A probabilistic method for multiple pattern
Cartesian-tree matching is also studied [35]. Many other variants of the Cartesian-tree matching
problem are studied, including matching on indeterminate strings [12], subsequence matching [29],
approximate matching [2], and the longest common factor computing [10].
As for other or more general settings, Kikuchi et al. [21] considered the problem of finding covers
under SCERs. Kociumaka et al. [24] introduced squares (tandem repeats) in the non-standard
matching models, including order-preserving matching. Gawrychowski et al. [13] presented a worst-
case optimal time algorithm to compute the distinct order-preserving matching squares in a string.
We note that our results are not readily obtained by the results above and require new techniques
due to the different natures between covers/squares and palindromes.
2 Preliminaries
2.1 Notations
Let Σbe an alphabet.
Alphabet Σis said to be ordered if there is a total order, denoted as , on Σ. Otherwise, Σis
4
摘要:

Computingmaximalpalindromesinnon-standardmatchingmodelsTakuyaMieno1,MitsuruFunakoshi∗2,YutoNakashima2,ShunsukeInenaga2,HideoBannai3,andMasayukiTakeda21TheUniversityofElectro-Communications,Japan2KyushuUniversity,Japan3InstituteofScienceTokyo,JapanAbstractPalindromesarepopularandimportantobjectsintex...

展开>> 收起<<
Computing maximal palindromes in non-standard matching models Takuya Mieno1 Mitsuru Funakoshi2 Yuto Nakashima2 Shunsuke Inenaga2.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:906.06KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注