Effectiveness of the Recent Advances in Capsule Networks Nidhin Harilal

2025-04-26 0 0 796.02KB 10 页 10玖币

侵权投诉

Effectiveness of the Recent Advances

in Capsule Networks

Nidhin Harilal∗

Department of Computer Science

University of Colorado, Boulder

nidhin.harilal@colorado.edu

Rohan Patil∗

Department of Computer Science

University of California, San Diego

rpatil@ucsd.edu

Abstract

Convolutional neural networks (CNNs) have revolutionized the ﬁeld of deep neural

networks. However, recent research has shown that CNNs fail to generalize under

various conditions and hence the idea of capsules was introduced in 2011, though

the real surge of research started from 2017. In this paper, we present an overview of

the recent advances in capsule architecture and routing mechanisms. In addition, we

ﬁnd that the relative focus in recent literature is on modifying routing procedure or

architecture as a whole but the study of other ﬁner components, speciﬁcally, squash

function is wanting. Thus, we also present some new insights regarding the effect

of squash functions in performance of the capsule networks. Finally, we conclude

by discussing and proposing possible opportunities in the ﬁeld of capsule networks.

1 Introduction

Over the last few years, neural networks have made remarkable progress in various tasks ranging

from vision tasks like image recognition and object segmentation to machine translation tasks. The

availability of huge amounts of data has made it possible for Neural Networks to excel in different

areas of Computer Vision. The considerable success lies in the fact that CNNs can automatically

extract high-level features from images, which are much more powerful than human-designed features.

Capsule Networks (CapsNet) were introduced by Hinton et al. [

], which addressed the signiﬁcant

limitations of CNNs and showed superior performance on the MNIST [

] dataset. Presently, capsule

networks are regarded as one of the most promising breakthroughs in deep learning. The primary

reason behind this is that capsules’ idea provides a much more promising way of dealing with different

variations in images, including position, scale, orientation, and lighting, than the currently employed

methods in the neural networks community.

Despite capsule networks showing an increased positive impact on various vision tasks [

], the

lack of architectural knowledge has limited researchers to exploit the full potential of this new ﬁeld.

Therefore, this paper aims to provide insights behind the working of capsule networks and critically

review the latest advances in this ﬁeld. First, we brieﬂy discuss the reason behind the introduction

of capsule networks, followed by a description of its components. Then, we provide an analysis

describing aspects and limitations of various components of different capsule networks, followed by an

analysis of augmenting squash functions. Lastly, we conclude our analysis by describing opportunities

for further research in this ﬁeld.

* These authors contributed equally to this work

arXiv:2210.05834v1 [cs.CV] 11 Oct 2022

2 Convolutional Neural Networks (CNNs)

Convolutional neural networks (CNN) are feed-forward neural networks that can extract features from

data in a hierarchical structure. The architecture of CNN is inspired by visual perception [

]. CNN

architectures date back decades [

], consisting of convolutional layers that can extract high-level

features from images. CNNs detect these features in images and learn how to recognize objects with

this information. Layers near the start detect simpler features like edges, and as the layers get deeper,

they detect more complex features like eyes, noses, or an entire face in case of face recognition. It then

uses all of these features, which it has learned to make a ﬁnal prediction. Deep CNNs have provided

a signiﬁcant contribution in computer vision tasks such as image classiﬁcation [

]. They have

also been successfully applied to other computer vision ﬁelds, such as object detection [

face recognition [29], etc.

Limitations:

CNNs perform exceptionally great when they are inferenced over images that

resemble the dataset [

]. Despite being successful, CNN performs poorly when it receives the same

image with a different viewpoint [

]. Convolving kernel across an image ensures invariance,

but it doesn’t certify equivariance. Max-pooling [

] was introduced to further aid in creating the

positional invariance. On close scrutiny, the pooling operation stacked with a convolutional layer will

only detect the features but not preserve any spatial relationships between the detected features. The

difference could be explained with a CNN detecting a human face. An average human face will have

a pair of eyes, nose, and mouth; however, an image in which these parts are present doesn’t qualify

it to be an image of a human face. Therefore, a system just detecting certain features in the image

without having information about their spatial arrangement could fail in many scenarios. In short, The

pooling operation, along with the convolution, was supposed to introduce positional, orientational, and

proportional invariances but rather became a cause for them [

]. Including more data or using methods

like data, augmentation is used to tackle this problem, which ensured that the model was trained on

as many viewpoints/ orientations as possible. However, this is a very crude way of handling this.

3 Capsule Networks (CapsNet)

Hinton et al. [

] proposed the ﬁrst capsule networks, which, unlike conventional CNNs, were designed

to encode and preserve the underlying spatial information between its learned features. The basic

idea behind the introduction of capsules by Hinton et al. [

] was to create a neural network capable of

performing inverse graphics [

]. Computer graphics deals with generating a visual image from some

internal hierarchical representation of geometric data. This internal representation consists of matrices

that represent the relative positions and orientation of the geometrical objects. This representation

is then converted to an image that is ﬁnally rendered on the screen. Hinton and his colleagues in their

paper [

] argued that humans deconstruct a hierarchical representation of the visual information

received through eyes and use this representation for recognition. From a pure machine learning

perspective, this means that the network should be able to deconstruct a scene into co-related parts,

which can be hierarchically represented [

]. To achieve this, Hinton et al. [

] proposed to augment

the conventional idea of neural network architecture to reﬂect the idea of several entities. Each entity

encapsulates a certain number of neurons and learns to recognize an implicitly deﬁned visual entity

over a limited domain of viewing conditions and deformations.

3.1 Evolution of Capsules

In the ﬁrst proposed capsule networks [

], the output of each capsule consisted of the probability

that a speciﬁc feature exists along with a set of instantiating parameters

. The goal of Hinton et al. [

]

was not recognizing or classifying the objects in an image but rather to force the outputs of a capsule

to recognize the pose in an image. A signiﬁcant limitation of this ﬁrst implementation of capsule

networks was that it required an additional external set of inputs that speciﬁed how the image had

been transformed for each of the entities to work. Although these transformations could be learned in

principle, the more signiﬁcant challenge was to devise a way to instruct each of the capsules to discover

the underlying hierarchical relationship between the transformations in a complete end-to-end train

setting without explicitly using additional inputs.

Instantiating parameters [

] may include and encode the underlying pose, lighting, and deformation of

a particular feature relative to the other features that are detected by the capsules in the image.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EffectivenessoftheRecentAdvancesinCapsuleNetworksNidhinHarilalDepartmentofComputerScienceUniversityofColorado,Bouldernidhin.harilal@colorado.eduRohanPatilDepartmentofComputerScienceUniversityofCalifornia,SanDiegorpatil@ucsd.eduAbstractConvolutionalneuralnetworks(CNNs)haverevolutionizedtheeldofdee...

展开>> 收起<<

Effectiveness of the Recent Advances in Capsule Networks Nidhin Harilal.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Effectiveness of the Recent Advances in Capsule Networks Nidhin Harilal

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: