Deterministic vs. Non Deterministic Finite Automata in Automata Processing Farzana Ahmed Siddique Tommy James Tracy II Nathan Brunelle Kevin Skadron

2025-05-06 0 0 816.44KB 6 页 10玖币

侵权投诉

Deterministic vs. Non Deterministic Finite

Automata in Automata Processing

Farzana Ahmed Siddique, Tommy James Tracy II, Nathan Brunelle, Kevin Skadron

University of Virginia

{farzana, tjt7a, njb2b, skadron}@virginia.edu

Abstract—Linear-time pattern matching engines have seen

promising results using Finite Automata (FA) as their com-

putation model. Among different FA variants, deterministic

(DFA) and non-deterministic (NFA) are the most commonly

used computation models for FA-based pattern matching engines.

Moreover, NFA is the prevalent model in pattern matching

engines on spatial architectures. The reasons are: i) DFA size,

as in #states, can be exponential compared to equivalent NFA, ii)

DFA cannot exploit the massive parallelism available on spatial

architectures. This paper performs an empirical study on the

#state of minimized DFA and optimized NFA across a diverse

set of real-world benchmarks and shows that if distinct DFAs

are generated for distinct patterns, #states of minimized DFA are

typically equal to their equivalent optimized NFA. However, NFA is

more robust in maintaining the low #states for some benchmarks.

Thus, the choice of NFA vs. DFA for spatial architecture is less

important than the need to generate distinct DFAs for each pattern

and support these distinct DFAs’ parallel processing. Finally,

this paper presents a throughput study for von Neumann’s

architecture-based (CPU) vs. spatial architecture-based (FPGA)

automata processing engines. The study shows that, based on

the workload, neither CPU-based automata processing engine nor

FPGA-based automata processing engine is the clear winner.

If #patterns matched per workload increases, the CPU-based

automata processing engine’s throughput decreases. On the other

hand, the FPGA-based automata processing engine lacks the

memory spilling option; hence, it fails to accommodate an entire

automata if it does not ﬁt into FPGA’s logic fabric. In the best-

case scenario, the CPU has a 4.5x speedup over the FPGA, while

for some benchmarks, the FPGA has a 32,530x speedup over the

CPU.

I. INTRODUCTION

Pattern matching is essential in many applications, including

genomics, virus detection, network intrusion detection, and

machine learning techniques such as random forest. Most of

these applications are latency-sensitive. If the pattern matching

rate is not fast enough, it acts as a performance bottleneck for

those applications. Finite Automata (FA) have proven to be

a great computation model for linear time pattern matching

[1]–[5]. For example, Schulz et al. used FA to accelerate the

pattern matching for genomics applications [6]. On the other

hand, network devices such as the Cisco family of security

appliances and networking software such as Snort [7] use

regular expressions to represent safe payload patterns; prior

works have used FA to accelerate regexes [1, 2].

FA has different variants. The most commonly used FA

variants in pattern matching engines are 1. Deterministic Finite

Automata (DFA) and 2. Non-deterministic Finite Automata

(NFA). Even though both of the variants are functionally

equivalent, their structure (state count & transition) differs [8].

As a result, while designing an FA-based pattern matching

engine, the underlying hardware architecture inﬂuences the

selection between NFA and DFA. NFA is less preferred for

von Neumann architecture because the possibility of multiple

active states leads to multiple random memory lookups, and

high memory latency is a performance bottleneck in von

Neumann’s architecture [9]. Micron’s Automata Processor [10]

and spatial architectures, such as Field Programmable Gate

Arrays (FPGA), offer workarounds to hide the high memory

latency. For these architectures, NFA is a popular choice over

DFA [2, 5, 11] for several reasons. The crucial one is if an

NFA has nstates, its equivalent DFA can have 2nstates

[8, 12]. For instance, Yu et al. [13] stated that 20% of the

Snort [7] DFAs suffer from exponential state growth relative

to the corresponding NFAs. Moreover, prior works show that

generating one DFA from a set of patterns could result in an

enormous time and space overhead [13, 14].

There are algorithms [15, 16] to generate minimized DFAs

for which state count is smaller than the expected exponential

growth (although exponential growth remains a worst-case for

arbitrary automata). Tabakov et al. [17] performed a study on

synthetic automata and illustrated that based on the automata

structure, DFA states exhibit a growth rate of the order

polynomial to sub-exponential compared to the NFA, and in

some cases, minimized DFAs have a smaller state count than

the equivalent non-optimized NFA. Motivated by their study,

this paper ﬁlls a very important gap in automata processing

work. Our paper demonstrates the state count analysis of

the minimized DFA and its equivalent optimized NFA using

a broad set of real-world benchmarks. The analysis shows

that for the real-world benchmarks, the DFA state count

exhibits linear or polynomial growth. Therefore, DFAs can

be considered a viable alternative to any FA-based pattern

matching engine that uses NFA as the computation model.

Please note that unlike prior work [17], NFAs used in this

analysis are optimized using a heuristics-based approach.

Another reason for choosing NFA over DFA while design-

ing FA-based pattern matching engines on FPGAs is that

NFAs may activate multiple states per symbol cycle, which

exploits FPGAs’ massive parallelism [2, 11, 18, 19]. However,

this paper shows that DFAs can also exploit the parallel

architecture of the FPGAs if, instead of creating a single large

DFA for all the patterns of an application, separate DFAs are

generated for distinct patterns, and these DFAs are processed

arXiv:2210.10077v1 [cs.PF] 18 Oct 2022

in parallel (section IV).

Finally, this paper presents another interesting ﬁnding:

the state-of-the-art CPU-based automata processing engine,

Hyperscan [20] sometimes outperforms the state-of-the-art

FPGA-based automata processing engine, Grapefruit, in terms

of throughput (pattern matching rate). Even though FPGAs

offer massive parallelism, FPGAs’ clock rate is considerably

lower than CPUs. On top of that, applications face perfor-

mance degradation if the computation model does not ﬁt into

the CPU cache. Prior works show that in the worst case, the

FPGA-based pattern matching engine has equal throughput

compared to the CPU-based pattern matching engine [2, 5].

Contrary to that, this paper shows that for some real-world

benchmarks, the CPU gets 4.5x speedup over the FPGA

(Section V). However, if a workload is such that it matches

many distinct patterns, CPU throughput degrades. So, it can

be afﬁrmed that neither CPU nor FPGA is a clear winner for

the FA-based pattern matching engine.

In summary, this paper makes the following contributions:

•It presents a detailed analysis of the state count of the

optimized NFA vs. minimized DFA across various real-

world automata benchmarks (section III).

•It shows that on FPGA, the choice of NFA over DFA

is less important than the need to keep individual DFA

separate and the ability to support parallel processing of

a large number of DFAs (section IV).

•This paper presents an in-depth performance analysis

of the state-of-the-art CPU-based automata processing

engine vs. the state-of-the-art FPGA-based automata pro-

cessing engine (section V).

II. RELATED WORK

The state count of the NFA and DFA has been studied

previously from the perspective of network-intrusion detection,

and synthetic benchmarks [13, 21, 22]. The studies on the

network-intrusion detection benchmarks showed that gener-

ating a DFA for a speciﬁed pattern set can exponentially

increase the DFA state count. They demonstrate that regular

expressions containing constraints such as dot-star (pq.*rs)

and repeated sub-patterns (xyz.{100}abc) induce exponential

growth in DFA states. Contrary to the prior works, this paper

performs state count analysis for a diverse state of real-world

benchmarks and shows that if separate DFAs are generated for

each pattern, the DFA state count is equal to the equivalent

optimized NFA. However, in the worst-case DFA state count

shows a polynomial growth rate.

Tabakov et al. [17] evaluated Hopcroft’s [15] and Brzo-

zowski’s [16] DFA minimization algorithm using a synthetic

automata type called random automata. Their analysis illus-

trates that the state count of a DFA depends on the automata

graph’s transition density1. For smaller transition densities, the

DFA state grows super-polynomially but sub-exponentially.

Additionally, their result shows that minimized DFA has a

1For each symbol ratio of the total number of transitions to the total number

of states. Details can be found at [17].

smaller state count than the equivalent NFAs with greater tran-

sition densities. One limitation of this analysis is that the NFAs

used are not optimized. Even though NFA minimization is a

PSPACE-complete problem [23], a heuristics-based approach

can generate optimized NFA, as shown in this paper.

Since Floyd et al. [24] introduced hardware implementation

of NFA, there has been a substantial amount of work on high-

performance automata processors using either DFA [4, 13,

25] or NFA [1]–[3, 26] as a computation model. The state-

of-the-art CPU-based automata processing engine Hyperscan

[20] uses regex decomposition to separate the pattern into

disjoint strings and FA. Moreover, to process FA, Hyperscan

exploits SIMD vector operation to perform the maximum

possible state transition for each memory access. Hyperscan

encounters throughput degradation due to the CPU’s high

memory latency if the working set size does not ﬁt into

the CPU cache. Grapefruit [5], a state-of-the-art FPGA-based

automata processing engine, maps a cluster of automata into

the FPGA logic fabric. Even though this approach helps in

accelerating Grapefruit’s throughput, if the automata graph

does not ﬁt into the FPGA logic fabric, Grapefruit fails to

process it. This paper does not discuss GPU-based automata

processing engine because prior works [27, 28] have presented

that GPU is not promising for processing FA.

Xie et al. [2] examined throughput comparison between the

state-of-the-art CPU-based automata processing engine, Hy-

perscan [20] and an FPGA-based automata processing engine,

REAPR. Their analysis shows that in the worst case, REAPR

performs the same as Hyperscan, whereas, in the best case,

REAPR has a 2,000x speedup over Hyperscan. However, this

paper shows that, based on the workload and the benchmark,

neither Hyperscan nor Grapefruit is a clear winner for the FA-

based pattern matching engine.

Becchi et al. [14] presented a performance evaluation of

a network processor and a general-purpose processor using

NFA, DFA, and hybrid-FA. Hybrid-FA is a variant of automata

formed by halting the NFA→DFA conversion in places where

DFA blowup occurs. The evaluation at [14] shows that hybrid-

FA has the overall best throughput across the network and

general-purpose processors. The reason is that the DFA part

of the hybrid-FA is less susceptible to high memory latency,

and the NFA part of the hybrid-FA keeps a check on the

state explosion issue of the DFA. This performance evaluation

is limited to von Neumann architecture, whereas this paper

does the performance evaluation for von Neumann vs. spatial

architecture.

III. NFA VS. DFA STATE COUNT ANALYSIS

By deﬁnition, if an NFA requires nstates to express a

regular language Lor pattern P, the equivalent DFA will

require at most 2nstates [8]. As a result, DFAs are disregarded

for designing FA-based pattern matching engines on FPGA

[2, 21]. The state count study of DFA vs. NFA across diverse

real-world benchmarks is extremely important for automata

processing research. Because it would resolve if, for real-

world benchmarks, minimized DFAs’ state count exhibits an

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Deterministicvs.NonDeterministicFiniteAutomatainAutomataProcessingFarzanaAhmedSiddique,TommyJamesTracyII,NathanBrunelle,KevinSkadronUniversityofVirginiaffarzana,tjt7a,njb2b,skadrong@virginia.eduAbstractLinear-timepatternmatchingengineshaveseenpromisingresultsusingFiniteAutomata(FA)astheircom-putati...

展开>> 收起<<

Deterministic vs. Non Deterministic Finite Automata in Automata Processing Farzana Ahmed Siddique Tommy James Tracy II Nathan Brunelle Kevin Skadron.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Deterministic vs. Non Deterministic Finite Automata in Automata Processing Farzana Ahmed Siddique Tommy James Tracy II Nathan Brunelle Kevin Skadron

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: