
Onward! ’22, December 8–10, 2022, Auckland, New Zealand Manuel Rigger and Zhendong Su
a test suite, rather than an instance of a more broad testing
methodology. This paper aims to address this by unifying
such existing and future techniques under a common name
and abstract framework, thus fueling exchange and develop-
ment of Intramorphic Testing techniques.
In summary, this paper contributes the following:
•
Intramorphic Testing, a general conceptual white-box
approach to tackling the test oracle problem;
•
a conceptual comparison with regression testing, dif-
ferential testing, and metamorphic testing;
•examples that illustrate the idea.
2 Background and Motivation
Test oracles.
To the best of our knowledge, the term test
oracle was coined by Howden in 1978 [
11
]. Since then, a
number of approaches to tackle the problem have been pro-
posed, which were summarized in surveys by, for example,
Barr et al. [
1
] or Pezzè et al. [
20
]. Metamorphic testing was
proposed by Chen et al. in a technical report in 1998 [
4
]. Vari-
ous concrete metamorphic testing techniques were proposed
that were subsequently surveyed by, for example, Segura et
al. [24] or Chen et al. [5].
Terminology.
Originally, a test oracle was dened to vali-
date a system’s output for a set of inputs [
11
]. This view is
restrictive, given that an input to the program might include
changes to the device or environment. Similarly, rather than
a directly-observable program output, non-functional obser-
vations include the program’s performance or changes to
the device’s state. Thus, Barr et al. [
1
] used stimuli for inputs
and observations for outputs to account for various testing
scenarios. We continue to use the original terminology of
inputs (denoted as
𝐼
) and outputs (denoted as
𝑂
), but refer
to them in the general sense of stimuli and observations. We
will denote the program under test as 𝑃.
Motivating Example.
To outline the existing techniques
and our idea, let us assume a specic use case, namely that
we want to test the implementation of one or multiple sort-
ing algorithms.
1
Let us assume that we implemented multi-
ple sorting algorithms such as
bubble_sort()
,
insertion_sort()
,
and
merge_sort()
. Let us also assume that we made a mistake
when implementing the in-place
bubble_sort()
algorithm, as
illustrated in Listing 1; the last array index in the code listing
should be
j
, rather than
i
. In the subsequent paragraphs,
we discuss how both instantiations of existing techniques as
well as an instantiation of the proposed Intramorphic Testing
technique could nd the bug. In practice, we expect that In-
tramorphic Testing will be realized that can nd bugs that are
overlooked, or dicult to nd, by other testing approaches.
1
A Jupyter Notebook with the code examples presented in this paper is
available at hps://doi.org/10.5281/zenodo.7229326.
Regression Testing.
Regression testing aims to ensure that
changes do not introduce bugs into the program through
manually written tests in which the developer species the
expected output. One common way of implementing regres-
sion tests is by implementing unit tests, where a specic unit
is tested in isolation.
To test
bubble_sort()
and the other sorting algorithms, we
could introduce unit tests with both typical inputs as well as
boundary values. Listing 2shows a test case that triggers the
bug; sorting an array
[3, 1, 2]
incorrectly results in
[1,←↪
2, 1]
, which does not match the expected array
[1, 2, 3]
,
thus revealing the bug. Note that the test does not assume
access to the system’s internals; unit testing is a black-box
approach that could also be applied without access to the
source code. While unit testing is eective and widely used,
tests are typically implemented manually, and the developer
needs to specify the expected outcome of the test case.
Dierential Testing.
Dierential testing validates a set of
systems that implement the same semantics, by comparing
their output for a given input. As illustrated in Figure 1, given
input
𝐼
and equivalent systems
𝑃1
,
𝑃2
, . . . ,
𝑃𝑛
, dierential test-
ing validates that
∀𝑖, 𝑗
:
𝑃𝑖(𝐼)=𝑃𝑗(𝐼)
. Dierential testing
has been applied to a variety of domains, such as, C/C++
compilers [
31
], Java Virtual Machines (JVMs) [
6
,
7
], database
engines [
25
], debuggers [
16
], code coverage tools [
32
], sym-
bolic execution engines [
13
], SMT solvers [
29
], and Object-
Relational Mapping Systems (ORMs) [26].
As illustrated by Listing 3, we can apply dierential testing
by comparing the sorted arrays for multiple sorting algo-
rithms for the same input array. Given that the test oracle
requires no human in the loop, it can be eectively paired
with automated test generation; for example, in the listing,
we generate random arrays as test input. Note that the loop
does not terminate; in practice, it would be reasonable to
set a timeout or run the tests for a xed number of itera-
tions. For an input array like
[3, 1, 2]
, dierential testing
reveals a discrepancy between the output of the sorting algo-
rithms, demonstrating the bug. Similar to regression testing,
dierential testing is a black-box approach; for example, we
could have also compared sorting algorithms implemented
in various languages based on checking their output alone.
Metamorphic Testing.
Metamorphic testing uses an input
to a system and its output to derive a new input for which
a test oracle can be provided via so-called Metamorphic Re-
lations (MRs). This is illustrated in Figure 1. Given an input
𝐼
and
𝑃(𝐼)=𝑂
, a follow-up input
𝐼′
is derived, so that a
known relationship between
𝑂
and
𝑃(𝐼′)=𝑂′
is validated.
Metamorphic testing is a high-level concept and nding ef-
fective MRs is often challenging; MRs for testing various
systems such as compilers [
15
], database engines [
22
,
23
],
SMT solvers [
30
], Android apps [
27
], as well as object detec-
tion systems [28] have been proposed in the literature.