
In our study, we select the following representa-
tive charge prediction models to evaluate whether
they are trustworthy according to the specific legal
theory, i.e., the FET in this case.
BiLSTM
Luo et al. (2017) uses Bi-LSTM (Yang
et al.,2016) to encode fact descriptions and applies
an attention mechanism to aggregate encoded word
representations to obtain fact embedding, which is
then used for classification.
TopJudge
TopJudge (Zhong et al.,2018) is a
representative of those multitask learning models.
During encoding, TopJudge employs CNN (Kim,
2014) as the encoder to obtain fact embeddings.
In decoding, it exploits a directed acyclic graph
to capture the relationship among three sub-tasks,
i.e., charge prediction, law article prediction, and
term prediction, which are jointly optimized in a
multitask framework.
FewShot
FewShot (Hu et al.,2018) introduces
discriminative attributes to distinguish confusing
charges and provide additional knowledge for few-
shot charges, which can stand for those models that
introduce legal knowledge into the charge predic-
tion task. It uses LSTM (Hochreiter and Schmidhu-
ber,1997) as the fact encoder and conducts charge
prediction and attributes prediction afterward.
BERT
BERT (Devlin et al.,2019) is a strong
baseline for many text classification tasks. We use
the representation of [CLS] token for classification.
Lawformer
Xiao et al. (2021) is a Longformer-
based (Beltagy et al.,2020) language model, which
is pretrained on large-scale Chinese legal cases. We
use it to encode the fact description and apply the
classification based on the [CLS] token.
2.1 The Four Elements Theory
Legal theories are the bases for judges to correctly
determine charges, which define the method of an-
alyzing cases. Judges are required to follow legal
theories when making judgements (Gao,1993). If
they do not, they might make decisions arbitrar-
ily, which is a breach of human rights and free-
dom (Wang,2017).
In China, the Four Elements Theory (FET) is the
dominant legal theory for criminal trials. In prac-
tice, nearly all criminal judges use FET to justify
their decisions (Jiyao,2011). As a result, a trust-
worthy Chinese charge prediction model should
also conform to FET since they are trained based
Acc F1 P R
TopJudge 82.7 60.6 67.5 59.2
FewShot 82.9 71.7 75.9 71.6
BiLSTM 82.4 59.8 65.7 58.9
Bert 90.4 81.9 83.2 79.8
Lawformer 91.0 83.8 84.4 81.1
Table 1: Charge Prediction results on CAIL-I, where
Acc, F1, P, and R represent Accuracy, macro F1, macro
precision, and macro recall, respectively.
on the judgment documents which conform to the
local legal theory, FET.
According to FET, a case must satisfy four
criminal elements simultaneously to constitute a
crime. The four criminal elements are: (1) the
subject (Sub) refers to the person or organization
who has committed the criminal offense and shall
bear criminal crimes, (2) the object (Obj) refers to
the person, thing, interest, or social relations pro-
tected by criminal law and jeopardised by criminal
offence, (3) the conduct (Con) refers to harmful
behaviors, and (4) the mental state (Men) is the
mental state of the criminal subject when commit-
ting a crime, either intent or negligence.
For example, the four criminal elements of Theft
are as follows: (1) subject: the general subject, that
is, a person who has reached the age of criminal
responsibility (16 years old in China), (2) object:
public or private property, (3) conduct: the act of
stealing a large amount of property or repeatedly
stealing property, (4) mental state: intent and with
the purpose of illegal possession.
3 Dataset
Existing charge prediction datasets, such as
CAIL (Xiao et al.,2018), have played a crucial
role in the development of legal artificial intelli-
gence research. However, they suffer from two
limitations: (1) Lacking innocent cases. This vi-
olates the presumption of innocence, one of the
most fundamental legal principles worldwide. (2)
Only containing coarse-grained annotations, such
as charges and law articles, which cannot reveal
how the judges analyze the cases.
To alleviate the two shortcomings, in this paper,
we propose a new charge prediction dataset, CAIL-
I, that adds innocent cases to the original CAIL.
We further annotate whether a sentence is related to
certain criminal elements in a subset of CAIL. We
call this Sentence-level Criminal Elements dataset
as SCE, which can be utilized to analyze whether a