Finding Top- kLongest Palindromes in Substrings Kazuki Mitani1 Takuya Mieno2 Kazuhisa Seto3 and Takashi

2025-04-27 0 0 774.32KB 20 页 10玖币
侵权投诉
Finding Top-kLongest Palindromes in
Substrings
Kazuki Mitani1, Takuya Mieno2, Kazuhisa Seto3, and Takashi
Horiyama3
1Graduate School of Information Science and Technology,
Hokkaido University, Sapporo, Japan
2Department of Computer and Network Engineering, University of
Electro-Communications, Chofu, Japan
3Faculty of Information Science and Technology, Hokkaido
University, Sapporo, Japan
Abstract
Palindromes are strings that read the same forward and backward.
Problems of computing palindromic structures in strings have been stud-
ied for many years with a motivation of their application to biology. The
longest palindrome problem is one of the most important and classical
problems regarding palindromic structures, that is, to compute the longest
palindrome appearing in a string Tof length n. The problem can be
solved in O(n)time by the famous algorithm of Manacher [Journal of the
ACM, 1975]. This paper generalizes the longest palindrome problem to
the problem of finding top-klongest palindromes in an arbitrary substring,
including the input string Titself. The internal top-klongest palindrome
query is, given a substring T[i..j]of Tand a positive integer kas a query,
to compute the top-klongest palindromes appearing in T[i..j]. This pa-
per proposes a linear-size data structure that can answer internal top-k
longest palindromes query in optimal O(k)time. Also, given the input
string T, our data structure can be constructed in O(nlog n)time. For
k= 1, the construction time is reduced to O(n).
1 Introduction
Palindromes are strings that read the same backward as forward. Palindromes
have been widely studied with the motivation of their application to biology [23].
Computing and counting palindromes in a string are fundamental tasks. Man-
acher [27] proposed an O(n)-time algorithm that computes all maximal palin-
dromes in the string of length n. Droubay et al. [17] showed that any string
of length ncontains at most n+ 1 distinct palindromes (including the empty
1
arXiv:2210.02000v4 [cs.DS] 17 Jun 2023
string). Then, Groult et al. [22] proposed an O(n)-time algorithm to enumerate
the number of distinct palindromes in a string. The above O(n)-time algorithms
are time-optimal since reading the input string of length ntakes Ω(n)time.
Regarding the longest palindrome computation, Funakoshi et al. [19] consid-
ered the problem of computing the longest palindromic substring of the string
Tafter a single character insertion, deletion, or substitution is applied to the
input string Tof length n. Of course, using O(n)time, we can obtain the
longest palindromic substring of Tfrom scratch. However, this idea is naïve
and appears to be inefficient. To avoid such inefficiency, Funakoshi et al. [19]
proposed an O(n)-space data structure that can compute the solution for any
editing operation given as a query in O(log(min{σ, log n})) time where σis
the alphabet size. Amir et al. [7] considered the dynamic longest palindromic
substring problem, which is an extension of Funakoshi et al.’s problem where
up to O(n)sequential editing operations are allowed. They proposed an al-
gorithm that solves this problem in O(nlog2n)time per a single character
edit w.h.p. with a data structure of size O(nlog n), which can be constructed
in O(nlog2n)time. Furthermore, Amir and Boneh [6] proposed an algorithm
running in poly-logarithmic time per a single character substitution.
Internal queries are queries about substrings of the input string T. Let us
consider a situation where we solve a certain problem for each of kdifferent sub-
strings of T. If we run an O(|w|)-time algorithm from scratch for each substring
w, the total time complexity can be as large as O(kn). To be more efficient, by
performing an appropriate preprocessing on T, we construct some data structure
for the query to output each solution efficiently. Such efficient data structures
for palindromic problems are known. Rubinchik and Shur [31] proposed an al-
gorithm that computes the number of distinct palindromes in a given substring
of an input string of length n. Their algorithm runs in O(log n)time with a data
structure of size O(nlog n), which can be constructed in O(nlog n)time. Amir
et al. [7] considered a problem of computing the longest palindromic substring in
a given substring of the input string of length n; it is called the internal longest
palindrome query. Their algorithm runs in O(log n)time with a data structure
of size O(nlog n), which can be constructed in O(nlog2n)time.
This paper proposes a new algorithm for the internal longest palindrome
query. The algorithm of Amir et al. [7] uses 2-dimensional orthogonal range
maximum queries [3, 4, 12]; furthermore, the time and space complexities of
their algorithm are dominated by this query. Instead of 2-dimensional orthogo-
nal range maximum queries, by using palindromic trees [32], weighted ancestor
queries [20], and range maximum queries [18], we obtain a time-optimal algo-
rithm.
Theorem 1. Given a string Tof length nover a linearly sortable alphabet, we
can construct a data structure of size O(n)in O(n)time that can answer any
internal longest palindrome query in O(1) time.
Here, an alphabet is said to be linearly sortable if any sequence of ncharac-
ters from Σcan be sorted in O(n)time. For example, the integer alphabet
2
{1,2, . . . , nc}for some constant cis linearly sortable because we can sort a se-
quence from the alphabet in linear time by using a radix sort with base n. We
also assume the word-RAM model with word size ωlog nbits for input size
n.
Furthermore, we consider a more general problem of finding palindromes,
i.e., the problem of finding top-klongest palindromes in a substring of T, rather
than just the longest palindrome in T. Then, we finally prove the following
proposition, which will be given as a corollary in Section 4.
Corollary 1. Given a string Tof length n, we can construct a data struc-
ture of size O(n)in O(nlog n)time that can answer any internal top-klongest
palindrome query in O(k)time.
Our results are summarized in Table 1.
longest palindrome top-kpalindromes
String Tpreprocessing O(n)time [27] O(n)time
query O(1) time O(k)time
Query substring preprocessing O(n)time O(nlog n)time
T[i..j]query O(1) time O(k)time
Table 1: Manacher’s algorithm [27] can compute the longest palindrome in a
string Tin linear time. We study three generalized problems and give efficient
data structures and algorithms. All the proposed data structures are of linear
size.
Related Work Internal queries have been studied on many problems, not
only those related to palindromic structures. For instance, Kociumaka et al. [26]
considered the internal pattern matching queries that are ones for computing
the occurrences of a substring Uof the input string Tin another substring Vof
T. Besides, internal queries for string alignment [14, 33, 34, 35], longest common
prefix [1, 5, 21, 29], and longest common substring [7] have been studied in the
last two decades. See [25] for an overview of internal queries. We also refer to
[2, 10, 11, 15, 16, 24] and references therein.
Paper Organization The rest of this paper is organized as follows. Section 2
gives some notations and definitions. Section 3 shows our data structure to solve
the internal longest palindrome queries. Section 4 shows how to compute the
top-klongest palindromes in (sub)strings. Finally, Section 5 concludes this
paper.
3
2 Preliminaries
2.1 Strings and Palindromes
Let Σbe an alphabet. An element of Σis called a character, and an element
of Σis called a string. The empty string εis the string of length 0. The
length of a string Tis denoted by |T|. For each iwith 1i≤ |T|, the i-th
character of Tis denoted by T[i]. For each iand jwith 1i, j ≤ |T|, the
string T[i]T[i+ 1] ···T[j]is denoted by T[i..j]. For convenience, let T[i..j] = ε
if i> j. If T=xyz, then x,y, and zare called a prefix, substring, and suffix
of T, respectively. They are called a proper prefix, a proper substring, and a
proper suffix of Tif x̸=T,y̸=T, and z̸=T, respectively. The string yis
called an infix of Tif x̸=εand z̸=ε. The reversal of string Tis denoted by
TR, i.e., TR=T[|T|]···T[2]T[1]. A string Tis called a palindrome if T=TR.
Note that εis also a palindrome. For a palindromic substring T[i..j]of T, the
center of T[i..j]is i+j
2. A palindromic substring T[i..j]is called a maximal
palindrome in Tif i= 1,j=|T|, or T[i1] ̸=T[j+ 1]. In what follows, we
consider an arbitrary fixed string Tof length n > 0. In this paper, we assume
that the alphabet Σis linearly sortable. We also assume the word-RAM model
with word size ωlog nbits.
Let zbe the number of palindromic suffixes of T. Let suf (T) = (s1, s2,...,
sz)be the sequence of the lengths of palindromic suffixes of Tsorted in in-
creasing order. Further let dif i=sisi1for each iwith 2iz. For
convenience, let dif 1= 0. Then, the sequence (dif 1,...,dif z)is monotonically
non-decreasing (Lemma 7 in [28]). Let (suf 1,suf 2,...,suf p)be the partition of
suf (T)such that for any two elements si, sjin suf (T),si, sjsuf kfor some k
iff dif i=dif j. By definition, each suf kforms an arithmetic progression. It is
known that the number pof arithmetic progressions satisfies p∈ O(log n)[9, 28].
For 1kpand 1≤ |suf k|,suf k,ℓ denote the -th term of suf k. Figure 1
shows an example of the above definitions.
2.2 Tools
In this section, we list some data structures used in our algorithm in Section 3.
Palindromic Trees and Series Trees The palindromic tree of Tis a data
structure that represents all distinct palindromes in T[32]. The palindromic
tree of T, denoted by paltree(T), has dordinary nodes and one auxiliary node
where dn+ 1 is the number of all distinct palindromes in T. Each
ordinary node vcorresponds to a palindromic substring of T(including the
empty string ε) and stores its length as weight(v). For the auxiliary node ,
we define weight() = 1. For convenience, we identify each node with its
corresponding palindrome. For an ordinary node vin paltree(T)and a character
c, if nodes vand cvc exist, then an edge labeled cconnects these nodes. The
auxiliary node has edges to all nodes corresponding to length-1palindromes.
Each node vin paltree(T)has a suffix link that points to the longest palindromic
4
摘要:

FindingTop-kLongestPalindromesinSubstringsKazukiMitani1,TakuyaMieno2,KazuhisaSeto3,andTakashiHoriyama31GraduateSchoolofInformationScienceandTechnology,HokkaidoUniversity,Sapporo,Japan2DepartmentofComputerandNetworkEngineering,UniversityofElectro-Communications,Chofu,Japan3FacultyofInformationScience...

展开>> 收起<<
Finding Top- kLongest Palindromes in Substrings Kazuki Mitani1 Takuya Mieno2 Kazuhisa Seto3 and Takashi.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:774.32KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注