
Figure 1: The original AlexNet architecture used for the ImageNet Challenge in 2012 [2]. The network had
eight layers and sixty million parameters and took six days to train on two GPUs.
1 Introduction
The ImageNet challenge for automatically recognizing and labeling objects in images was launched in 2010 [1].
However, it was in 2012 when AlexNet, an eight-layer (hence deep) convolutional neural network (CNN) emerged
as the winner by a large margin, and ushered in the new era of AI [2]. CNNs were not new and had been proposed
as far back as the 1990s, but had been sidelined in favor of more theoretically rigorous ML approaches such as
support vector machines (SVMs) and boosting methods [3, 4, 5]. So, why did CNNs outperform other models?
Two reasons are usually given. First was the provision of substantial high-quality training data. The ImageNet
database was a one-of-a-kind benchmark and consisted of over fourteen million hand-annotated images from
more than twenty thousand diverse categories. The multilayer CNN had the capacity to effectively memorize
the training subset of ImageNet and, at the same time, generalize to unseen examples — a characteristic that is
not fully understood even today [6]. Second, Graphics Processing Units (GPUs), which were originally designed
for parallelizing image processing tasks, proved to be ideally suited for the computational problems associated
with training CNNs, making it practicable to train deep CNNs on large data sets in a reasonable amount of
time. The combination of Big Data, Big Models, and relatively cheap parallel computation became the mantra
that swept through AI research, in disciplines spanning from astronomy to zoology, and all applications that
have elements of data and prediction.
Our perspective has two parts.
We begin with a high-level, partly technical, overview of the current state of AI. We will begin by reviewing
supervised learning, a machine learning task that has been most impacted by deep learning (DL). We follow
with a discussion on deep content generation models, on the resurrection of reinforcement learning, on the
emergence of specialized software libraries for deep learning, and on the role of GPUs. We will conclude the
first part by highlighting how adversarial samples can be designed to fool deep models and whether it is possible
to make models robust.
In part two of the perspective, we consider the many socio-technical issues surrounding AI. Of particular
interest is the dominance of Big Tech on AI. Effectively, only big corporations have the resources (expertise,
computation, and data) to scale AI to a level where it can be meaningfully and accurately applied.
2 Digression: What is AI?
The term Artificial Intelligence was first introduced in 1956 in a workshop proposal submitted by John McCarthy
to the Rockefeller foundation, which proposed that “every aspect of learning or any other feature of intelligence
can in principle be so precisely described that a machine can be made to simulate it [7].” Before that, Alan
Turing in 1947, in an unpublished report titled “Intelligent Machinery”, speculated that “What we want is a
machine that can learn from experience” and suggested that the “possibility of letting the machine alter its own
instructions provides the mechanism for this1.” Much of the recent success in AI is under the distinct subfield
of AI known as Machine Learning and since the role of data is central, there is a broader term, Data Science,
that is often used to subsume related disciplines including Statistics.
3 Is Supervised Learning Solved?
Supervised Learning (SL) is the poster child of success of machine learning. Depending upon the context, SL is
known as classification, regression, or prediction. Since the modern advent of deep learning, both the accuracy
1https://www.britannica.com/technology/artificial-intelligence/Alan-Turing-and-the-beginning-of-AI
3