
LONG-TAILED FOOD CLASSIFICATION
Jiangpeng He∗Luotao Lin†Heather Eicher-Miller†Fengqing Zhu∗
∗School of Electrical and Computer Engineering †Department of Nutrition Science
Purdue University, West Lafayette, Indiana, U.S.A
ABSTRACT
Food classification serves as the basic step of image-based
dietary assessment to predict the types of foods in each input
image. However, food image predictions in a real world sce-
nario are usually long-tail distributed among different food
classes, which cause heavy class-imbalance problems and a
restricted performance. In addition, none of the existing long-
tailed classification methods focus on food data, which can
be more challenging due to the lower inter-class and higher
intra-class similarity among foods. In this work, we first
introduce two new benchmark datasets for long-tailed food
classification including Food101-LT and VFN-LT where the
number of samples in VFN-LT exhibits the real world long-
tailed food distribution. Then we propose a novel 2-Phase
framework to address the problem of class-imbalance by (1)
undersampling the head classes to remove redundant sam-
ples along with maintaining the learned information through
knowledge distillation, and (2) oversampling the tail classes
by performing visual-aware data augmentation. We show
the effectiveness of our method by comparing with existing
state-of-the-art long-tailed classification methods and show
improved performance on both Food101-LT and VFN-LT
benchmarks. The results demonstrate the potential to apply
our method to related real life applications.
Index Terms—Image Classification, Long-tailed Distri-
bution, Deep Learning, Knowledge Distillation
1. INTRODUCTION
Accurate identification of food is critical to image-based di-
etary assessment [1, 2], which facilitates matching the food
to the proper identification of that food in a nutrient database
with corresponding nutrient composition [3]. Such linkage
makes it possible to determine dietary links to health and dis-
ease, such as diabetes. Dietary assessment, therefore, is very
important to health-care related applications such as [4, 5] due
to recent advances in novel computation approaches and new
sensor devices. The performance of image-based dietary as-
sessment relies on the accurate prediction of foods in the cap-
tured eating scene images. However, real world food images
usually have a long-tailed distribution where a small portion
Fig. 1. The overview of the VFN-LT that exhibits real-world
long-tailed food distribution. The number of training sam-
ples are assigned based on consumption frequency, which is
matched through NHANES from 2009–2016 among 17,796
U.S. healthy adults.
of food classes (i.e. head class) contain abundant samples for
training while most food classes (i.e. tail class) have only a
few samples as shown in Figure 1. The long-tailed classifica-
tion, defined as the extreme class-imbalance problem, leads
to classification bias towards head classes and poor general-
ization ability on recognizing tail food classes.
As few existing long-tailed image classification methods
target food images, we first introduce two benchmark long-
tailed food datasets including Food101-LT and VFN-FT. Sim-
ilarly as in [6], Food101-LT is constructed as a long-tailed
version of the original balanced Food101 [7] dataset by fol-
lowing the Pareto distribution. In addition, as shown in Fig-
ure 1, VFN-LT is also used and provides a new and valuable
long-tailed distributed food dataset where the number of sam-
ples for each food class exhibits the distribution of consump-
tion frequency [8], defined as how often a food is consumed
in one day as in the National Health and Nutrition Exami-
arXiv:2210.14748v1 [cs.CV] 26 Oct 2022