simulator that favors usability and rapid iteration.
This paper will discuss in detail the purpose, design, and
effectiveness of the ChampSim simulator, beginning with a
discussion of the guiding principles of the design in Section II,
details on the key features of ChampSim’s modular architec-
ture in Section III, a history of ChampSim’s application in the
field and a vision of its continuing place in Section IV, and a
list of future development plans in Section VI.
II. THE CHAMPSIM ARCHITECTURAL SIMULATOR
The ChampSim simulator has its roots in the simulation
environment used for the Second Data Prefetching Competi-
tion [1]. In a competition environment, it is valuable to have an
easy-to-use environment to encourage wide participation and
a variety of novel submissions. These accessibility principles
have continued through the development of ChampSim and
have emerged into three guiding design principles: low startup
time, broad applicability, and high configurablilty.
A. Low startup time
While failure is an important part of learning, wrangling
with a highly complex simulator before even beginning the
implementation of a new idea can frustrate the learning process
of any student or beginning researcher. With this insight in
mind, we seek that a new user should be able to download
and compile ChampSim in a few minutes, create their first
design in a few hours, and perform new and meaningful
computer architecture research within a few weeks. In each
of the applications for which ChampSim is intended, it is
valuable that a user is able to begin using the simulator quickly.
Furthermore, the runtime of the simulation should be short
enough to provide quick feedback for a novice user.
Many general-purpose processors have a lot in common.
Designs are pipelined, usually with a decoupled, in-order front
end and an out-of-order back end. Many researchers are not
seeking to modify these basic design aspects, but have a
particular element of the design in mind that they intend to
study or improve. ChampSim presents a selection of areas
that commonly see research activity as configurable modules:
branch predictors, cache replacement policies, branch target
buffers, and both instruction and data prefetchers. These
modules provide an intuitive interface into a larger system,
allowing designers to test new designs quickly and effectively,
while affording them the opportunity to not have to worry
about the parts of the system they are not studying.
Reference implementations of legacy modules, such as the
GShare branch predictor [2] or the next-line prefetcher, are
included with the simulator. These reference implementations
can be used as starting points for new designs or as placehold-
ers if the user is not interested in modifying those particular
modules.
B. Broad Applicability
A researcher seeking to perform hardware research should
not be expected to be broadly familiar with a variety of
programming languages or to track their changes over time.
Therefore, for the sake of inclusion, we seek that a user should
only need an entry-level understanding of C++, the language
in which ChampSim is written, to perform research using
ChampSim. The interface to an simulator should be simple
and present the user with a few meaningful choices that map
well onto their experience. Each module is fundamentally only
the implementation of a few functions.
ChampSim is trace-driven, meaning that simulation is per-
formed in two stages. First, the workload to be simulated
is instrumented and run offline. The tracing instrumentation
produces a digest of the program’s activity, called a trace.
The tracing step can be performed offline from the simulation
step and the trace can be stored in a repository and made
available to users. To test a simulated design, the user selects
a trace file as input. The trace is streamed into the program
as a stand-in for actual program execution. This strategy
sacrifices a modest amount of accuracy, particularly in how
the operating system interacts with the program, in favor of
ease in reproducing results and of speed of the model. It is
simpler for ChampSim to read a decoded trace file than to
execute an external program, and given established, compiled
repositories of traces, it is a helpful abstraction to the user to
remove another step of environment setup.
Users are able to generate their own program traces with
the included “tracer” tracing tool, included in the ChampSim
package. The included tracer is built upon Intel PIN [3], a
well-documented tool for instrumenting programs at runtime,
though other tool sets, such as DynamoRIO [4], can be used.
Alternately, instruction traces can be dumped from execution-
driven simulators such as gem5 or QFlex [5]–[7]. The tracer
inspects every instruction the program runs and encapsulates
each instruction into a decoded format that includes the
instruction pointer, branching behavior, and which registers
and memory locations form the input and output operands for
the instruction. The concatenation of many instructions forms
the entire trace. This trace format permits ChampSim to run
with low memory requirements, since the trace does not need
to be held in memory but can be streamed off the disk after
inline decompression.
C. Design configurability
ChampSim is capable of modeling a large variety of
commodity processors. A configuration file specifies many
aspects of the modeled CPU core, including frequency, cache
configuration, re-order buffer size, load and store queue sizes,
widths for instruction fetch, decode, execution and retire, and
a variety of latencies for different components. In addition,
ChampSim includes a DRAM system that models bank and
bus contention. The trace format includes only virtual memory
addresses, so ChampSim simulates a page table and TLB
hierarchy with arbitrary mappings of virtual to physical pages.
Each cache must be configured with a prefetcher (to
simulate no prefetcher, there is an included “do-nothing”
prefetcher) and a replacement policy. The cache interfaces with
a read queue, a write queue, and a prefetch queue. Prefetches
originating from the cache level are placed in its prefetch
2