Star Anagram Detection and Classication Jason Parker Dan Barker

2025-04-26 0 0 9.73MB 279 页 10玖币

侵权投诉

Star Anagram Detection and Classiﬁcation

Jason Parker∗

Dan Barker†

October 13, 2022

Abstract

A star anagram is a rearrangement of the letters of one word to produce another word where no letter retains

its original neighbors. These maximally shuﬄed anagrams are rare, comprising only about 5.7% of anagrams in

English. They can also be depicted as unicursal polygons with varying forms, including the eponymous stars. We

develop automated methods for detecting stars among other anagrams and for classifying them based on their

polygon’s degree of both rotational and reﬂective symmetry. Next, we explore several properties of star anagrams

including proofs for two results about the edge lengths of perfect, i.e., maximally symmetric, stars leveraging

perhaps surprising connections to modular arithmetic and the celebrated Chinese Remainder Theorem. Finally,

we conduct an exhaustive search of English for star anagrams and provide numerical results about their clustering

into common shapes along with examples of geometrically noteworthy stars.

Keywords: star anagram, star polygon, unicursal polygon, symmetric polygon, Chinese Remainder Theorem

1 Introduction

1.1 Motivation

An anagram is a word or phrase formed by rearranging the letters of another word or phrase. In this article, we use

the word anagram to refer only to a pair of single words that are anagrams of each other. We consider the pair to be

ordered and view the second word as a rearrangement of the letters in the ﬁrst word, e.g., EARTH →HEART. Notice

that EARTH →HEART is a simple rotation. All letters keep their original neighbors; nothing has been shuﬄed.

Astar anagram is a special class of anagrams in which the letters have been maximally shuﬄed: no letter in the

second word is adjacent to one of its original neighbors, counting the ﬁrst and last letters as neighbors. An example

is EARTH →HATER. The name star anagram derives from an interesting geometric property of these anagrams. In

particular, if we arrange the letters of the ﬁrst word in a circle and trace the path formed by the rearranged second

word, we obtain a star as on the right in Figure 1. Most anagrams, like our earlier example EARTH →HEART, do not

produce a star shape, as we see on the left of the Figure.

EARTH --> HEART

not a star

EARTH --> HATER

perfect star

Figure 1: (Left) An anagram that is not a star. (Right) An example star anagram.

An interesting subset of star anagrams are symmetric. They can be folded perfectly along some dividing line

(reﬂective symmetry) or rotated less than one full turn and look the same (rotational symmetry). Stars which lack

∗US Air Force Research Laboratory, Wright Patterson AFB, USA

†Freedom From Religion Foundation, Madison, WI, USA

arXiv:2210.06397v1 [cs.OH] 18 Sep 2022

this symmetry are asymmetric. An even rarer type of star anagram has all edges of the same length, which we denote

as perfect. Note that perfect stars are both reﬂexively and rotationally symmetric, although we place them into a

special class. Figure 2 provides examples of all three classes of star anagrams for words of length 8.

DOWNLOAD --> WOODLAND

asymmetric

CRITTERS --> RESTRICT

*symmetric*

HOTSPOTS --> POTSHOTS

***perfect***

Figure 2: Length 8 examples of the three star anagrams classes.

Barker [1] originally coined the term star anagram and introduced it to the ﬁrst author. Prior to the work

described in this article, Barker searched for star anagrams without automated tools. While intellectually rewarding

(because it challenges you to turn one dimension into two), this approach makes it diﬃcult to identify large groups

of star anagrams, particularly among longer words and those with repeated letters.

1.2 Contribution

The primary contribution of this paper is a numerically inexpensive method for automatically detecting star anagrams

and classifying them based on their degree of rotational and reﬂective symmetry, including a simple test for perfection.

All of these methods rely on simple operations computed from the edge lengths of the anagram’s representation as a

unicursal polygon. We also use the Chinese Remainder Theorem to prove that perfect stars must have edge lengths

that are coprime with their word length, a result already well known in the study of star polygons.

A star anagram’s polygon changes based on the ordering of the words. We prove that reversing the order of the

anagram preserves both starriness and perfection. A surprising result on the edge length of reversed perfect stars is

also provided, demonstrating that the edge lengths of a perfect star and its reversed star are modular inverses in the

parlance of number theory.

Finally, we conduct a detailed numerical study of the star anagrams in English. First, all star anagrams in a large

database of English words are detected and classiﬁed. We then provide numerical results on the clustering of these

star anagrams into common shapes and their distribution across word lengths. An Appendix provides a complete set

of ﬁgures depicting all star anagrams detected in English. We also discuss the initially surprising notion of autostars,

which are words that can be star anagrams of themselves. An exhaustive search of autostars provides interesting

examples of polygon shapes that do not appear among normal star anagrams in the English language.

1.3 Outline

The remainder of this paper is organized as follows. Section 2 describes our approach for detecting star anagrams,

and Section 3 describes our approach for star anagram classiﬁcation. In Section 4, we prove several properties of star

anagrams, discuss clustering of stars into common shapes, and introduce autostars. Section 5 presents the numerical

results for our search of English words for star anagrams. Finally, Section 6 provides concluding remarks and possible

future work.

1.4 Notation

Throughout this article we will use bold face capital letters for matrices (e.g., A), bold face lower case letters for

vectors (e.g., p), and non-bold letters for scalars (e.g., N). We will denote the set of integers as Z. A length Nvector

pof integers will be written as p∈ZN, with the nth entry denoted as pn. Similarly, a matrix with Mrows and N

columns will be denoted as A∈ZM×N, with the scalar entry in the mth row and nth column denoted as amn. Note

that we use 0 based indexing, e.g., numbering the columns from 0 . . . N −1, throughout this article.

We use N! for the factorial of a scalar N, and the magnitude of a scalar pwill be given as |p|. The modulo N

operation (i.e., remainder after division by N) for a scalar integer Kwill be denoted as Kmod N. We say that a≡b

(mod N) if amod N=bmod N. Finally, we say that two integers Nand Lare coprime if they have no common

positive divisor other than 1.

2 Star Anagram Detection

Generating a comprehensive list of all anagrams in a given set of words is straightforward, e.g., by exhaustively

comparing the sorted letters of equal length words. Omitting those details, we will focus on an algorithmic approach

for detecting whether a given anagram is a star. Before describing this approach, we need to explain how to think

of anagrams as paths.

2.1 Anagrams as Paths

We number the letters of any length Nword with the integer values 0 to N−1. Any rearrangement of these letters

can be viewed as traversing a path that connects the letters in the speciﬁed order. We will represent such a path

as a vector p∈ZNwith entries {pn}N−1

n=0 . For our example EARTH →HATER, we obtain the path p= [4,1,3,0,2].

Note that for a given word of length Nthere are N! possible paths, including the original word and many nonsense

arrangements.

In our analysis of star anagrams, we think of the letters of the original word as nodes arranged uniformly around

a circle or ring. The path is drawn as a series of line segments connecting the nodes in the speciﬁed order, including a

segment from node pN−1back to node p0to close the ﬁgure. These shapes produced by a continuous path are known

in geometry as unicursal polygons and have been widely studied. Indeed, star polygons, which are the unicursal

polygons produced by our star anagrams, have been studied since at least the fourteenth century [2, Section 2.8]. In

the sequel, we will see that star anagram detection and classiﬁcation can be done entirely by looking at the properties

of anagram paths.

2.2 Identifying All Possible Paths

Our simple example EARTH →HATER conveniently hid a complication. In particular, the path for an anagram with

repeated letters is not unique. The nodes of the repeated letters can be swapped in the path without changing the

resulting word. For an anagram with Rrepeated letters each of which appears wrtimes, the number of possible

paths Pwill be P=QR−1

r=0 wr!, which can be large.

For example, consider CAREERS →CREASER. The two “e” and “r” letters can both be visited in either order in

the path, leading to P= (2! ·2!) = 4 possible paths as shown in Figure 3. Only the fourth path reveals this anagram

to be a perfect star.

CAREERS --> CREASER

not a star

CAREERS --> CREASER

*symmetric*

CAREERS --> CREASER

not a star

CAREERS --> CREASER

***perfect***

Figure 3: The 4 possible paths for the perfect star anagram CAREERS →CREASER.

Our solution to this problem is straightforward. We simply generate all possible paths for each anagram using

an exhaustive recursive enumeration and evaluate each path. We then select a single path p∗out of the set of P

possible paths {pi}P−1

i=0 to represent the anagram. A perfect star path is selected if found, followed in preference by

a symmetric star path, an asymmetric star path, and ﬁnally a non-star path. If multiple paths are found within the

preferred class, we select one arbitrarily. Multiple symmetric star paths are handled diﬀerently, which we describe

after discussing classiﬁcation.

2.3 Step Sizes

Now that we have identiﬁed the set of paths to test for a given anagram, we turn our attention to computing the

steps around the circle represented by a given path. First, given a path p, we deﬁne the path diﬀerences d∈ZN

with entries {dn}N−1

n=0 given by

dn=pn+1 −pn,(1)

where for convenience we deﬁne pN=p0. We would like to use these diﬀerences to analyze the geometric properties

of the path around the circle.

However, these raw diﬀerences of the node locations include ambiguities that make direct analysis diﬃcult.

Notice that dncan take on values from −(N−1) to N−1, yielding 2N−2 possible values1. Clearly these values

are redundant, since starting from a given node there are only N−1 possible steps to the next node. Our goal will

be to map these path diﬀerences to unambiguous steps.

If the next node is k1steps away around the circle in the clockwise direction, then it will be k2=N−k1steps

away in the counter-clockwise direction. Each of the other nodes can thus always be reached by two complementary

steps with sizes satisfying k1+k2=N. To avoid this ambiguity, we will use the smaller length for our steps. This

choice also allows us to deﬁne the length of each edge as the magnitude of the corresponding step. We will also

deﬁne a clockwise step as positive and a counter-clockwise step as negative. Finally, when Nis even, the clockwise

and counter-clockwise steps directly across the circle will both have length N/2 with opposite signs. This ambiguity

corresponds exactly to the ±180◦ambiguity when measuring angles. To avoid this issue, we will simply deﬁne a step

directly across the circle as positive N/2.

Putting all of these deﬁnitions together, we arrive at a unique set of N−1 possible steps sfrom a given node

that satisfy −N/2< s ≤N/2, with s=N/2 only allowed for even Nand s6= 0. For a path pwith path diﬀerences

dwe can compute the vector of corresponding steps s∈ZNwith entries {sn}N−1

n=0 as

sn=









dn,|dn|<N

2,|dn|=N

dn−N, dn>N

dn+N, dn<−N

(2)

Figure 4 illustrates these steps from the top node of the circle for N= 7 and N= 8. As an example, for the N= 8

asymmetric star NITROGEN →RINGTONE we obtain

p= [ 3 1 7 5 2 4 0 6 ]

d= [ -2 6 -2 -3 2 -4 6 -3 ]

s= [ -2 -2 -2 -3 2 4 -2 -3 ].

This star is shown on the left of Figure 5 with the steps labeled.

-3

-2

-1 1

-3

-2

-1 1

Figure 4: All possible steps from the red “A” node for N= 7 (left) and N= 8 (right).

Lemma 1 (Steps).The steps {sn}N−1

n=0 for a path p∈ZNsatisfy

pn+1 = (pn+sn) mod N. (3)

Proof. Examining (2), we see that sn≡dn(mod N). Notice also that pn=pnmod N. Using (1), we combine these

facts to obtain pn+1 =pn+1 mod N= (pn+dn) mod N= (pn+sn) mod N.

This relationship will be useful for proving properties of edge lengths in the sequel.

1Notice dn6= 0 since all the {pn}N−1

n=0 are distinct.

NITROGEN --> RINGTONE

asymmetric

-2

-3 2

-2

-3

DEANSHIP --> PINHEADS

not a star

-1

-3

-2

Figure 5: Two example anagrams with the steps {sn}N−1

n=0 labeled in blue.

2.4 Detecting a Star Path

Our remaining task for this section is to determine if a given path corresponds to a star anagram.

Theorem 1 (Star Detection).A path p∈ZNis a star anagram path if and only if the path’s steps satisfy |sn| 6= 1

for all n∈ {0,1,...N −1}.

Proof. Recall that an anagram is a star if no letter in the new word retains its original neighbors. By (3), this

condition occurs exactly when no step is to the nearest clockwise neighbor (sn= 1) or the nearest counter-clockwise

neighbor (sn=−1). Recalling that dN−1=p0−pN−1, we see that testing the Nsteps captures all pairs of possible

former neighbors.

Notice that this check is easily performed for each of the Ppossible paths for each anagram, requiring only a few

operations on Nscalar values.

3 Star Anagram Classiﬁcation

Now that we have a reliable method for detecting star anagrams, we turn our attention to classiﬁcation. We start

with the simpler test for perfection.

3.1 Identifying Perfection

Recall that a perfect star anagram has all edges of the same length. Since the edge lengths are given by the magnitudes

of the steps s, we arrive immediately at our test for perfection.

Theorem 2 (Perfection Test).A path p∈ZNis a perfect star path if and only if the path’s steps satisfy sn=Sfor

all n∈ {0,1,...N −1}for a constant Ssatisfying |S|=L > 1.

Proof. By deﬁnition, a perfect star path must satisfy |sn|=Lfor a constant L > 1. We see that the steps must have

the same sign, since two consecutive steps with equal magnitude and opposed signs would cause the path to repeat

the previous node.

To see this test in action, consider our earlier example EARTH →HATER. We obtain

p=[ 41 302]

d=[-32-322]

s=[ 22 222].

This star, like all length 5 stars, is a perfect pentagram with L= 2. Indeed, this characteristic shape was the origin

of the name star anagram.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

StarAnagramDetectionandClassicationJasonParker*DanBarkerOctober13,2022AbstractAstaranagramisarearrangementofthelettersofonewordtoproduceanotherwordwherenoletterretainsitsoriginalneighbors.Thesemaximallyshuedanagramsarerare,comprisingonlyabout5.7%ofanagramsinEnglish.Theycanalsobedepictedasunicursa...

展开>> 收起<<

Star Anagram Detection and Classication Jason Parker Dan Barker.pdf

共279页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Star Anagram Detection and Classication Jason Parker Dan Barker

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: