
Green Learning: Introduction, Examples and Outlook A PREPRINT
Maaten, 2020] and semi-supervised learning [Sohn et al., 2020, Van Engelen and Hoos, 2020] have been explored to
reduce the supervision burden.
Two further concerns over DL technologies are less addressed. The first one is about its high carbon footprint
[Lannelongue et al., 2021, Schwartz et al., 2020, Wu et al., 2022, Xu et al., 2021]. The training of DL networks is
computationally intensive. The training of larger complex networks on huge datasets imposes a threat on sustainability
[Sanh et al., 2019, Sharir et al., 2020, Strubell et al., 2019]. The second one is related to its trusworthiness. The
application of blackbox DL models to high stakes decisions is questioned [Arrieta et al., 2020, Poursabzi-Sangdeh
et al., 2021, Rudin, 2019]. Conclusions drawn from a set of input-output relationships could be misleading and counter
intuition. It is essential to justify an ML prediction procedure with logical reasoning to gain people’s trust.
To tackle the first problem, one may optimize DL systems by taking performance and complexity into account jointly.
An alternative solution is to build a new learning paradigm of low carbon footprint from scratch. For the latter, since it
targets at green ML systems by design, it is called green learning (GL). The early development of GL was initiated by
an effort to understand the operation of computational neurons of CNNs in [Kuo, 2017, 2016, Kuo and Chen, 2018,
Kuo et al., 2019]. Through a sequence of investigations, building blocks of GL have been gradually developed, and
more applications have been demonstrated in recent years. As to the second problem, a clear and logical description of
the decision-making process is emphasized in the development of GL. GL adopts a modularized design. Each module
is statistically rooted with local optimization. GL avoids end-to-end global optimization for logical transparency and
computational efficiency. On the other hand, GL exploits ensembles heavily in its learning system to boost the overall
decision performance. GL yields probabilistic ML models that allow trust and risk assessment with certain performance
guarantees.
GL attempts to address the following problems to make the learning process efficient and effective:
1. How to remove redundancy among source image pixels for concise representations?
2. How to generate more expressive representations?
3. How to select discriminant/relevant features based on labels?
4. How to achieve feature and decision combinations in the design of powerful classifiers/regressors?
5. How to design an architecture that enables rich ensembles for performance boosting?
New and powerful tools have been developed to address each of them in the last several years, e.g., the Saak [Kuo and
Chen, 2018] and Saab transforms [Kuo et al., 2019] for Problem 1, the PixelHop [Chen and Kuo, 2020], PixelHop++
[Chen et al., 2020a] and IPHop [Yang et al., 2022a] learning systems for Problem 2, the discriminant and relevant
feature tests [Yang et al., 2022b] for Problem 3, the subspace learning machine [Fu et al., 2022a] for Problem 4. The
original ideas scattered around in different papers. They will be systematically introduced here.
In this overview paper, we intend to elaborate on GL’s development, building modules and demonstrated applications.
We will also provide an outlook for future R&D opportunities. The rest of this paper is organized as follows. The
genesis of GL is reviewed in Sec. 2. A high-level sketch of GL is presented in Sec. 3. GL’s methodology and its
building tools are detailed in Sec. 4. Illustrative application examples of GL are shown in Sec. 5. Future technological
outlook is discussed in Sec. 6. Finally, concluding remarks are given in Sec. 7.
2 Genesis of Green Learning
The proven success of DL in a wide range of applications gives a clear indication of its power although it appears to be
a mystery. Research on GL was initiated by providing a high-level understanding of the superior performance of DL
[Kuo, 2016, 2017, Xu et al., 2017]. There was no attempt in giving a rigorous treatment but obtaining insights into a set
of basic questions such as:
• What is the role of nonlinear activation [Kuo, 2016]?
•
What are individual roles played by the convolutional layers and the fully-connected (FC) layers [Kuo et al.,
2019]?
• Is there a guideline in the network architecture design [Kuo et al., 2019]?
•
Is it possible to avoid the expensive backpropagation optimization process in filter weight determination [Kuo
and Chen, 2018, Kuo et al., 2019, Lin et al., 2022]?
As the understanding increases, it becomes apparent that one can develop a new learning pipeline without nonlinear
activation and backpropagation.
2