Star Anagram Detection and Classication Jason Parker Dan Barker

2025-04-26 0 0 9.73MB 279 页 10玖币
侵权投诉
Star Anagram Detection and Classification
Jason Parker
Dan Barker
October 13, 2022
Abstract
A star anagram is a rearrangement of the letters of one word to produce another word where no letter retains
its original neighbors. These maximally shuffled anagrams are rare, comprising only about 5.7% of anagrams in
English. They can also be depicted as unicursal polygons with varying forms, including the eponymous stars. We
develop automated methods for detecting stars among other anagrams and for classifying them based on their
polygon’s degree of both rotational and reflective symmetry. Next, we explore several properties of star anagrams
including proofs for two results about the edge lengths of perfect, i.e., maximally symmetric, stars leveraging
perhaps surprising connections to modular arithmetic and the celebrated Chinese Remainder Theorem. Finally,
we conduct an exhaustive search of English for star anagrams and provide numerical results about their clustering
into common shapes along with examples of geometrically noteworthy stars.
Keywords: star anagram, star polygon, unicursal polygon, symmetric polygon, Chinese Remainder Theorem
1 Introduction
1.1 Motivation
An anagram is a word or phrase formed by rearranging the letters of another word or phrase. In this article, we use
the word anagram to refer only to a pair of single words that are anagrams of each other. We consider the pair to be
ordered and view the second word as a rearrangement of the letters in the first word, e.g., EARTH HEART. Notice
that EARTH HEART is a simple rotation. All letters keep their original neighbors; nothing has been shuffled.
Astar anagram is a special class of anagrams in which the letters have been maximally shuffled: no letter in the
second word is adjacent to one of its original neighbors, counting the first and last letters as neighbors. An example
is EARTH HATER. The name star anagram derives from an interesting geometric property of these anagrams. In
particular, if we arrange the letters of the first word in a circle and trace the path formed by the rearranged second
word, we obtain a star as on the right in Figure 1. Most anagrams, like our earlier example EARTH HEART, do not
produce a star shape, as we see on the left of the Figure.
EARTH --> HEART
not a star
E
A
RT
H
EARTH --> HATER
perfect star
E
A
RT
H
Figure 1: (Left) An anagram that is not a star. (Right) An example star anagram.
An interesting subset of star anagrams are symmetric. They can be folded perfectly along some dividing line
(reflective symmetry) or rotated less than one full turn and look the same (rotational symmetry). Stars which lack
US Air Force Research Laboratory, Wright Patterson AFB, USA
Freedom From Religion Foundation, Madison, WI, USA
1
arXiv:2210.06397v1 [cs.OH] 18 Sep 2022
this symmetry are asymmetric. An even rarer type of star anagram has all edges of the same length, which we denote
as perfect. Note that perfect stars are both reflexively and rotationally symmetric, although we place them into a
special class. Figure 2 provides examples of all three classes of star anagrams for words of length 8.
DOWNLOAD --> WOODLAND
asymmetric
D
O
W
N
L
O
A
D
CRITTERS --> RESTRICT
*symmetric*
C
R
I
T
T
E
R
S
HOTSPOTS --> POTSHOTS
***perfect***
H
O
T
S
P
O
T
S
Figure 2: Length 8 examples of the three star anagrams classes.
Barker [1] originally coined the term star anagram and introduced it to the first author. Prior to the work
described in this article, Barker searched for star anagrams without automated tools. While intellectually rewarding
(because it challenges you to turn one dimension into two), this approach makes it difficult to identify large groups
of star anagrams, particularly among longer words and those with repeated letters.
1.2 Contribution
The primary contribution of this paper is a numerically inexpensive method for automatically detecting star anagrams
and classifying them based on their degree of rotational and reflective symmetry, including a simple test for perfection.
All of these methods rely on simple operations computed from the edge lengths of the anagram’s representation as a
unicursal polygon. We also use the Chinese Remainder Theorem to prove that perfect stars must have edge lengths
that are coprime with their word length, a result already well known in the study of star polygons.
A star anagram’s polygon changes based on the ordering of the words. We prove that reversing the order of the
anagram preserves both starriness and perfection. A surprising result on the edge length of reversed perfect stars is
also provided, demonstrating that the edge lengths of a perfect star and its reversed star are modular inverses in the
parlance of number theory.
Finally, we conduct a detailed numerical study of the star anagrams in English. First, all star anagrams in a large
database of English words are detected and classified. We then provide numerical results on the clustering of these
star anagrams into common shapes and their distribution across word lengths. An Appendix provides a complete set
of figures depicting all star anagrams detected in English. We also discuss the initially surprising notion of autostars,
which are words that can be star anagrams of themselves. An exhaustive search of autostars provides interesting
examples of polygon shapes that do not appear among normal star anagrams in the English language.
1.3 Outline
The remainder of this paper is organized as follows. Section 2 describes our approach for detecting star anagrams,
and Section 3 describes our approach for star anagram classification. In Section 4, we prove several properties of star
anagrams, discuss clustering of stars into common shapes, and introduce autostars. Section 5 presents the numerical
results for our search of English words for star anagrams. Finally, Section 6 provides concluding remarks and possible
future work.
1.4 Notation
Throughout this article we will use bold face capital letters for matrices (e.g., A), bold face lower case letters for
vectors (e.g., p), and non-bold letters for scalars (e.g., N). We will denote the set of integers as Z. A length Nvector
pof integers will be written as pZN, with the nth entry denoted as pn. Similarly, a matrix with Mrows and N
columns will be denoted as AZM×N, with the scalar entry in the mth row and nth column denoted as amn. Note
that we use 0 based indexing, e.g., numbering the columns from 0 . . . N 1, throughout this article.
We use N! for the factorial of a scalar N, and the magnitude of a scalar pwill be given as |p|. The modulo N
operation (i.e., remainder after division by N) for a scalar integer Kwill be denoted as Kmod N. We say that ab
2
(mod N) if amod N=bmod N. Finally, we say that two integers Nand Lare coprime if they have no common
positive divisor other than 1.
2 Star Anagram Detection
Generating a comprehensive list of all anagrams in a given set of words is straightforward, e.g., by exhaustively
comparing the sorted letters of equal length words. Omitting those details, we will focus on an algorithmic approach
for detecting whether a given anagram is a star. Before describing this approach, we need to explain how to think
of anagrams as paths.
2.1 Anagrams as Paths
We number the letters of any length Nword with the integer values 0 to N1. Any rearrangement of these letters
can be viewed as traversing a path that connects the letters in the specified order. We will represent such a path
as a vector pZNwith entries {pn}N1
n=0 . For our example EARTH HATER, we obtain the path p= [4,1,3,0,2].
Note that for a given word of length Nthere are N! possible paths, including the original word and many nonsense
arrangements.
In our analysis of star anagrams, we think of the letters of the original word as nodes arranged uniformly around
a circle or ring. The path is drawn as a series of line segments connecting the nodes in the specified order, including a
segment from node pN1back to node p0to close the figure. These shapes produced by a continuous path are known
in geometry as unicursal polygons and have been widely studied. Indeed, star polygons, which are the unicursal
polygons produced by our star anagrams, have been studied since at least the fourteenth century [2, Section 2.8]. In
the sequel, we will see that star anagram detection and classification can be done entirely by looking at the properties
of anagram paths.
2.2 Identifying All Possible Paths
Our simple example EARTH HATER conveniently hid a complication. In particular, the path for an anagram with
repeated letters is not unique. The nodes of the repeated letters can be swapped in the path without changing the
resulting word. For an anagram with Rrepeated letters each of which appears wrtimes, the number of possible
paths Pwill be P=QR1
r=0 wr!, which can be large.
For example, consider CAREERS CREASER. The two “e” and “r” letters can both be visited in either order in
the path, leading to P= (2! ·2!) = 4 possible paths as shown in Figure 3. Only the fourth path reveals this anagram
to be a perfect star.
CAREERS --> CREASER
not a star
C
A
R
EE
R
S
CAREERS --> CREASER
*symmetric*
C
A
R
EE
R
S
CAREERS --> CREASER
not a star
C
A
R
EE
R
S
CAREERS --> CREASER
***perfect***
C
A
R
EE
R
S
Figure 3: The 4 possible paths for the perfect star anagram CAREERS CREASER.
Our solution to this problem is straightforward. We simply generate all possible paths for each anagram using
an exhaustive recursive enumeration and evaluate each path. We then select a single path pout of the set of P
possible paths {pi}P1
i=0 to represent the anagram. A perfect star path is selected if found, followed in preference by
a symmetric star path, an asymmetric star path, and finally a non-star path. If multiple paths are found within the
preferred class, we select one arbitrarily. Multiple symmetric star paths are handled differently, which we describe
after discussing classification.
3
2.3 Step Sizes
Now that we have identified the set of paths to test for a given anagram, we turn our attention to computing the
steps around the circle represented by a given path. First, given a path p, we define the path differences dZN
with entries {dn}N1
n=0 given by
dn=pn+1 pn,(1)
where for convenience we define pN=p0. We would like to use these differences to analyze the geometric properties
of the path around the circle.
However, these raw differences of the node locations include ambiguities that make direct analysis difficult.
Notice that dncan take on values from (N1) to N1, yielding 2N2 possible values1. Clearly these values
are redundant, since starting from a given node there are only N1 possible steps to the next node. Our goal will
be to map these path differences to unambiguous steps.
If the next node is k1steps away around the circle in the clockwise direction, then it will be k2=Nk1steps
away in the counter-clockwise direction. Each of the other nodes can thus always be reached by two complementary
steps with sizes satisfying k1+k2=N. To avoid this ambiguity, we will use the smaller length for our steps. This
choice also allows us to define the length of each edge as the magnitude of the corresponding step. We will also
define a clockwise step as positive and a counter-clockwise step as negative. Finally, when Nis even, the clockwise
and counter-clockwise steps directly across the circle will both have length N/2 with opposite signs. This ambiguity
corresponds exactly to the ±180ambiguity when measuring angles. To avoid this issue, we will simply define a step
directly across the circle as positive N/2.
Putting all of these definitions together, we arrive at a unique set of N1 possible steps sfrom a given node
that satisfy N/2< s N/2, with s=N/2 only allowed for even Nand s6= 0. For a path pwith path differences
dwe can compute the vector of corresponding steps sZNwith entries {sn}N1
n=0 as
sn=
dn,|dn|<N
2
N
2,|dn|=N
2
dnN, dn>N
2
dn+N, dn<N
2.
(2)
Figure 4 illustrates these steps from the top node of the circle for N= 7 and N= 8. As an example, for the N= 8
asymmetric star NITROGEN RINGTONE we obtain
p= [ 3 1 7 5 2 4 0 6 ]
d= [ -2 6 -2 -3 2 -4 6 -3 ]
s= [ -2 -2 -2 -3 2 4 -2 -3 ].
This star is shown on the left of Figure 5 with the steps labeled.
A
B
B
B
B
B
B
B
4
-3
-2
-1 1
2
3
4
Figure 4: All possible steps from the red “A” node for N= 7 (left) and N= 8 (right).
Lemma 1 (Steps).The steps {sn}N1
n=0 for a path pZNsatisfy
pn+1 = (pn+sn) mod N. (3)
Proof. Examining (2), we see that sndn(mod N). Notice also that pn=pnmod N. Using (1), we combine these
facts to obtain pn+1 =pn+1 mod N= (pn+dn) mod N= (pn+sn) mod N.
This relationship will be useful for proving properties of edge lengths in the sequel.
1Notice dn6= 0 since all the {pn}N1
n=0 are distinct.
4
NITROGEN --> RINGTONE
asymmetric
N
I
T
R
O
G
E
N
-2
-2
-2
-3 2
4
-2
-3
DEANSHIP --> PINHEADS
not a star
D
E
A
N
S
H
I
P
-1
-3
2
4
1
-2
4
3
Figure 5: Two example anagrams with the steps {sn}N1
n=0 labeled in blue.
2.4 Detecting a Star Path
Our remaining task for this section is to determine if a given path corresponds to a star anagram.
Theorem 1 (Star Detection).A path pZNis a star anagram path if and only if the path’s steps satisfy |sn| 6= 1
for all n∈ {0,1,...N 1}.
Proof. Recall that an anagram is a star if no letter in the new word retains its original neighbors. By (3), this
condition occurs exactly when no step is to the nearest clockwise neighbor (sn= 1) or the nearest counter-clockwise
neighbor (sn=1). Recalling that dN1=p0pN1, we see that testing the Nsteps captures all pairs of possible
former neighbors.
Notice that this check is easily performed for each of the Ppossible paths for each anagram, requiring only a few
operations on Nscalar values.
3 Star Anagram Classification
Now that we have a reliable method for detecting star anagrams, we turn our attention to classification. We start
with the simpler test for perfection.
3.1 Identifying Perfection
Recall that a perfect star anagram has all edges of the same length. Since the edge lengths are given by the magnitudes
of the steps s, we arrive immediately at our test for perfection.
Theorem 2 (Perfection Test).A path pZNis a perfect star path if and only if the path’s steps satisfy sn=Sfor
all n∈ {0,1,...N 1}for a constant Ssatisfying |S|=L > 1.
Proof. By definition, a perfect star path must satisfy |sn|=Lfor a constant L > 1. We see that the steps must have
the same sign, since two consecutive steps with equal magnitude and opposed signs would cause the path to repeat
the previous node.
To see this test in action, consider our earlier example EARTH HATER. We obtain
p=[ 41 302]
d=[-32-322]
s=[ 22 222].
This star, like all length 5 stars, is a perfect pentagram with L= 2. Indeed, this characteristic shape was the origin
of the name star anagram.
5
摘要:

StarAnagramDetectionandClassi cationJasonParker*DanBarker„October13,2022AbstractAstaranagramisarearrangementofthelettersofonewordtoproduceanotherwordwherenoletterretainsitsoriginalneighbors.Thesemaximallyshuedanagramsarerare,comprisingonlyabout5.7%ofanagramsinEnglish.Theycanalsobedepictedasunicursa...

展开>> 收起<<
Star Anagram Detection and Classication Jason Parker Dan Barker.pdf

共279页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:279 页 大小:9.73MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 279
客服
关注