Preprint – A SMART RECYCLING BIN USING WASTE IMAGE CLASSIFICATION AT THE EDGE 2
two crucial factors that prevent people from recycling [
7
].
Automatic waste segmentation bins were designed to overcome
these problems by helping people classify and send the waste
into the corresponding containers, making waste disposal more
convenient. Table I outlines the approaches and the primary
hardware components involved in related studies to construct
the waste segmentation systems.
In early research, a microcontroller is connected to a variety of
sensors to determine the composition of the waste. For example,
inductive and capacitive sensors can detect the metal element
[
8
][
9
][
10
] and moisture sensors separate wet waste from dry
waste [
9
][
10
]. The microcontroller makes decisions based
on its readings [
8
][
9
]. However, although the sensor-based
classification method can detect the composition precisely, it
lacks the ability to classify waste into more specific groups. The
sensors cannot discriminate between plastic, paper, glass, and
other unrecyclable dry waste, which are important categories
in recycling.
The development of machine learning and image classification
enables the bin to sort the waste based on visual input
like a human. The convolutional neural network (CNN) is
a branch of image classification algorithms that performs
mainly convolution operations on the pixels [
15
]. It is the most
popular choice due to its high accuracy and power efficiency
compared to other methods and is used in all the four papers
[
11
][
12
][
13
][
14
]. It gives the bin ability to differentiate between
spoons and cups, which is tremendous progress compared to the
sensor-based approaches. Traditionally, the CNN models run on
a cloud raising the data transmission latency and user privacy
security problems. To solve these problems, recent research
moved the computation to an edge embedded system. However,
edge computing has the drawback of limiting computation
resources, so the model size is important in selecting the CNN
model structure.
To develop, evaluate and select the proper CNN structures, most
research in this field used the TrashNet dataset developed by
Mindy Yang and Gary Thung [
3
] in 2017, the first open-access
recycling waste dataset. This high-quality dataset contains
2527 waste photos from 6 groups: metal, glass, plastic, paper,
cardboard, and trash. It provides a foundation for later research
and our study also used this dataset.
Efforts have been made to increase the segmentation accuracy
of CNN models based on TrashNet. The state-of-art accuracy
of 97.86% on TrashNet was reached in [
16
]. However, the
CNN architecture, GoogLeNet, used in this research will cause
out-of-memory (OOM) on Jetson Nano [
17
]. It demonstrated
the potential of CNN classification, but the model size and
computation cost must be cut down to implement the machine
learning algorithm on embedded systems. A lighter CNN
model, WasteNet, was constructed by White et al. and achieved
an accuracy of 97% on the TrashNet dataset [
12
]. The paper
claimed that the model could be loaded to Jetson Nano, but
it did not provide details regarding the edge implementation,
such as the classification speed. Nevertheless, from the model
structure, we could estimate that the classification speed for
one image would be too slow for real-time classification. It
can only be used in a bin application that classifies the waste
objects based on one photo.
The classification speed is quantified by inference time, which
refers to the time taken for one classification to be completed.
Our application looks for a smaller model that can interact
with the user in real-time, with an inference time that must be
smaller than 0.1 s, which is the human visual reaction time. The
CNN model, EfficientNet B0, used in [
17
] is the starting point
of this research and it achieved an accuracy of 95.38% and
an inference time of 0.07 s on Jetson Nano. While the model
is fast enough to be used in real-time applications, the 96%
high memory usage needs optimization for bin applications.
Power consumption is another factor that needs to be consid-
ered in the final product. Unfortunately, rare research has paid
attention to it. For instance, none of the trash bin cases listed in
Table I has measured the power consumption of the proposed
system. Still, it is evident that the CNN-based approaches have
higher power consumption than the sensor-based one and the
Pynq-Zl and Raspberry Pi will typically have lower power
consumption than Jetson Nano used in [
12
] and [
17
]. The
Raspberry Pi 4 has a typical power consumption of between
2.7 W and 6.4 W. It reduces the power consumption but
results in undesirable low performance [
14
]. As a result, the
CNN architecture, MobileNet V2, has a low average per-class
precision of 91.76% and a long inference time of 0.358 s on
it.
The limitation of the current research is that the energy-
saving systems will have unacceptable low performance in
the classification task for commercialization. As a result, this
study designed two high-accuracy waste classification systems
with lower power consumption than previous studies. The first
system reduced the power consumption of the application on
Jetson Nano by using a lighter model, MobileNet, to reduce
power consumption while maintaining the accuracy.
The second system is developed based on K210, a less
expensive and more energy-saving embedded device. K210 is
an unpopular choice and is used in fewer than 100 research.
Most research implemented YOLO object detection models on
it [
18
][
19
] and demonstrated its outstanding power efficiency.
K210 has not been used for recycling waste classification when
this paper was written, so this paper proposed and evaluated
an innovative approach.
III. System design
A. The AI recycling bin design
The AI bin consists of five recycling waste containers and a
detection box. The waste will be sorted after it is placed in
the detection box. The whole system can be controlled by a
Jetson Nano or a K210 board.
The bin design using Jetson Nano is summarized in Figure
1. Jetson Nano will interact with the users and collects
feedback through the touch screen in front of it, displaying
the instructions for using the bin and the camera inputs. The
Raspberry Pi camera at the top takes photos of the waste in
the detection box. The images will be fed to the classification
algorithm in Jetson Nano, which classifies the photos into
seven groups. The waste classification models in the previous
study [
17
] have five output classes: "paper", "metal", "plastic",