Deterministic vs. Non Deterministic Finite Automata in Automata Processing Farzana Ahmed Siddique Tommy James Tracy II Nathan Brunelle Kevin Skadron

2025-05-06 0 0 816.44KB 6 页 10玖币
侵权投诉
Deterministic vs. Non Deterministic Finite
Automata in Automata Processing
Farzana Ahmed Siddique, Tommy James Tracy II, Nathan Brunelle, Kevin Skadron
University of Virginia
{farzana, tjt7a, njb2b, skadron}@virginia.edu
Abstract—Linear-time pattern matching engines have seen
promising results using Finite Automata (FA) as their com-
putation model. Among different FA variants, deterministic
(DFA) and non-deterministic (NFA) are the most commonly
used computation models for FA-based pattern matching engines.
Moreover, NFA is the prevalent model in pattern matching
engines on spatial architectures. The reasons are: i) DFA size,
as in #states, can be exponential compared to equivalent NFA, ii)
DFA cannot exploit the massive parallelism available on spatial
architectures. This paper performs an empirical study on the
#state of minimized DFA and optimized NFA across a diverse
set of real-world benchmarks and shows that if distinct DFAs
are generated for distinct patterns, #states of minimized DFA are
typically equal to their equivalent optimized NFA. However, NFA is
more robust in maintaining the low #states for some benchmarks.
Thus, the choice of NFA vs. DFA for spatial architecture is less
important than the need to generate distinct DFAs for each pattern
and support these distinct DFAs’ parallel processing. Finally,
this paper presents a throughput study for von Neumann’s
architecture-based (CPU) vs. spatial architecture-based (FPGA)
automata processing engines. The study shows that, based on
the workload, neither CPU-based automata processing engine nor
FPGA-based automata processing engine is the clear winner.
If #patterns matched per workload increases, the CPU-based
automata processing engine’s throughput decreases. On the other
hand, the FPGA-based automata processing engine lacks the
memory spilling option; hence, it fails to accommodate an entire
automata if it does not fit into FPGAs logic fabric. In the best-
case scenario, the CPU has a 4.5x speedup over the FPGA, while
for some benchmarks, the FPGA has a 32,530x speedup over the
CPU.
I. INTRODUCTION
Pattern matching is essential in many applications, including
genomics, virus detection, network intrusion detection, and
machine learning techniques such as random forest. Most of
these applications are latency-sensitive. If the pattern matching
rate is not fast enough, it acts as a performance bottleneck for
those applications. Finite Automata (FA) have proven to be
a great computation model for linear time pattern matching
[1]–[5]. For example, Schulz et al. used FA to accelerate the
pattern matching for genomics applications [6]. On the other
hand, network devices such as the Cisco family of security
appliances and networking software such as Snort [7] use
regular expressions to represent safe payload patterns; prior
works have used FA to accelerate regexes [1, 2].
FA has different variants. The most commonly used FA
variants in pattern matching engines are 1. Deterministic Finite
Automata (DFA) and 2. Non-deterministic Finite Automata
(NFA). Even though both of the variants are functionally
equivalent, their structure (state count & transition) differs [8].
As a result, while designing an FA-based pattern matching
engine, the underlying hardware architecture influences the
selection between NFA and DFA. NFA is less preferred for
von Neumann architecture because the possibility of multiple
active states leads to multiple random memory lookups, and
high memory latency is a performance bottleneck in von
Neumann’s architecture [9]. Micron’s Automata Processor [10]
and spatial architectures, such as Field Programmable Gate
Arrays (FPGA), offer workarounds to hide the high memory
latency. For these architectures, NFA is a popular choice over
DFA [2, 5, 11] for several reasons. The crucial one is if an
NFA has nstates, its equivalent DFA can have 2nstates
[8, 12]. For instance, Yu et al. [13] stated that 20% of the
Snort [7] DFAs suffer from exponential state growth relative
to the corresponding NFAs. Moreover, prior works show that
generating one DFA from a set of patterns could result in an
enormous time and space overhead [13, 14].
There are algorithms [15, 16] to generate minimized DFAs
for which state count is smaller than the expected exponential
growth (although exponential growth remains a worst-case for
arbitrary automata). Tabakov et al. [17] performed a study on
synthetic automata and illustrated that based on the automata
structure, DFA states exhibit a growth rate of the order
polynomial to sub-exponential compared to the NFA, and in
some cases, minimized DFAs have a smaller state count than
the equivalent non-optimized NFA. Motivated by their study,
this paper fills a very important gap in automata processing
work. Our paper demonstrates the state count analysis of
the minimized DFA and its equivalent optimized NFA using
a broad set of real-world benchmarks. The analysis shows
that for the real-world benchmarks, the DFA state count
exhibits linear or polynomial growth. Therefore, DFAs can
be considered a viable alternative to any FA-based pattern
matching engine that uses NFA as the computation model.
Please note that unlike prior work [17], NFAs used in this
analysis are optimized using a heuristics-based approach.
Another reason for choosing NFA over DFA while design-
ing FA-based pattern matching engines on FPGAs is that
NFAs may activate multiple states per symbol cycle, which
exploits FPGAs’ massive parallelism [2, 11, 18, 19]. However,
this paper shows that DFAs can also exploit the parallel
architecture of the FPGAs if, instead of creating a single large
DFA for all the patterns of an application, separate DFAs are
generated for distinct patterns, and these DFAs are processed
1
arXiv:2210.10077v1 [cs.PF] 18 Oct 2022
in parallel (section IV).
Finally, this paper presents another interesting finding:
the state-of-the-art CPU-based automata processing engine,
Hyperscan [20] sometimes outperforms the state-of-the-art
FPGA-based automata processing engine, Grapefruit, in terms
of throughput (pattern matching rate). Even though FPGAs
offer massive parallelism, FPGAs’ clock rate is considerably
lower than CPUs. On top of that, applications face perfor-
mance degradation if the computation model does not fit into
the CPU cache. Prior works show that in the worst case, the
FPGA-based pattern matching engine has equal throughput
compared to the CPU-based pattern matching engine [2, 5].
Contrary to that, this paper shows that for some real-world
benchmarks, the CPU gets 4.5x speedup over the FPGA
(Section V). However, if a workload is such that it matches
many distinct patterns, CPU throughput degrades. So, it can
be affirmed that neither CPU nor FPGA is a clear winner for
the FA-based pattern matching engine.
In summary, this paper makes the following contributions:
It presents a detailed analysis of the state count of the
optimized NFA vs. minimized DFA across various real-
world automata benchmarks (section III).
It shows that on FPGA, the choice of NFA over DFA
is less important than the need to keep individual DFA
separate and the ability to support parallel processing of
a large number of DFAs (section IV).
This paper presents an in-depth performance analysis
of the state-of-the-art CPU-based automata processing
engine vs. the state-of-the-art FPGA-based automata pro-
cessing engine (section V).
II. RELATED WORK
The state count of the NFA and DFA has been studied
previously from the perspective of network-intrusion detection,
and synthetic benchmarks [13, 21, 22]. The studies on the
network-intrusion detection benchmarks showed that gener-
ating a DFA for a specified pattern set can exponentially
increase the DFA state count. They demonstrate that regular
expressions containing constraints such as dot-star (pq.*rs)
and repeated sub-patterns (xyz.{100}abc) induce exponential
growth in DFA states. Contrary to the prior works, this paper
performs state count analysis for a diverse state of real-world
benchmarks and shows that if separate DFAs are generated for
each pattern, the DFA state count is equal to the equivalent
optimized NFA. However, in the worst-case DFA state count
shows a polynomial growth rate.
Tabakov et al. [17] evaluated Hopcroft’s [15] and Brzo-
zowski’s [16] DFA minimization algorithm using a synthetic
automata type called random automata. Their analysis illus-
trates that the state count of a DFA depends on the automata
graph’s transition density1. For smaller transition densities, the
DFA state grows super-polynomially but sub-exponentially.
Additionally, their result shows that minimized DFA has a
1For each symbol ratio of the total number of transitions to the total number
of states. Details can be found at [17].
smaller state count than the equivalent NFAs with greater tran-
sition densities. One limitation of this analysis is that the NFAs
used are not optimized. Even though NFA minimization is a
PSPACE-complete problem [23], a heuristics-based approach
can generate optimized NFA, as shown in this paper.
Since Floyd et al. [24] introduced hardware implementation
of NFA, there has been a substantial amount of work on high-
performance automata processors using either DFA [4, 13,
25] or NFA [1]–[3, 26] as a computation model. The state-
of-the-art CPU-based automata processing engine Hyperscan
[20] uses regex decomposition to separate the pattern into
disjoint strings and FA. Moreover, to process FA, Hyperscan
exploits SIMD vector operation to perform the maximum
possible state transition for each memory access. Hyperscan
encounters throughput degradation due to the CPU’s high
memory latency if the working set size does not fit into
the CPU cache. Grapefruit [5], a state-of-the-art FPGA-based
automata processing engine, maps a cluster of automata into
the FPGA logic fabric. Even though this approach helps in
accelerating Grapefruit’s throughput, if the automata graph
does not fit into the FPGA logic fabric, Grapefruit fails to
process it. This paper does not discuss GPU-based automata
processing engine because prior works [27, 28] have presented
that GPU is not promising for processing FA.
Xie et al. [2] examined throughput comparison between the
state-of-the-art CPU-based automata processing engine, Hy-
perscan [20] and an FPGA-based automata processing engine,
REAPR. Their analysis shows that in the worst case, REAPR
performs the same as Hyperscan, whereas, in the best case,
REAPR has a 2,000x speedup over Hyperscan. However, this
paper shows that, based on the workload and the benchmark,
neither Hyperscan nor Grapefruit is a clear winner for the FA-
based pattern matching engine.
Becchi et al. [14] presented a performance evaluation of
a network processor and a general-purpose processor using
NFA, DFA, and hybrid-FA. Hybrid-FA is a variant of automata
formed by halting the NFADFA conversion in places where
DFA blowup occurs. The evaluation at [14] shows that hybrid-
FA has the overall best throughput across the network and
general-purpose processors. The reason is that the DFA part
of the hybrid-FA is less susceptible to high memory latency,
and the NFA part of the hybrid-FA keeps a check on the
state explosion issue of the DFA. This performance evaluation
is limited to von Neumann architecture, whereas this paper
does the performance evaluation for von Neumann vs. spatial
architecture.
III. NFA VS. DFA STATE COUNT ANALYSIS
By definition, if an NFA requires nstates to express a
regular language Lor pattern P, the equivalent DFA will
require at most 2nstates [8]. As a result, DFAs are disregarded
for designing FA-based pattern matching engines on FPGA
[2, 21]. The state count study of DFA vs. NFA across diverse
real-world benchmarks is extremely important for automata
processing research. Because it would resolve if, for real-
world benchmarks, minimized DFAs’ state count exhibits an
2
摘要:

Deterministicvs.NonDeterministicFiniteAutomatainAutomataProcessingFarzanaAhmedSiddique,TommyJamesTracyII,NathanBrunelle,KevinSkadronUniversityofVirginiaffarzana,tjt7a,njb2b,skadrong@virginia.eduAbstract—Linear-timepatternmatchingengineshaveseenpromisingresultsusingFiniteAutomata(FA)astheircom-putati...

展开>> 收起<<
Deterministic vs. Non Deterministic Finite Automata in Automata Processing Farzana Ahmed Siddique Tommy James Tracy II Nathan Brunelle Kevin Skadron.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:6 页 大小:816.44KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注