1 Nish A Novel Negative Stimulated Hybrid Activation Function Yildiray Anag una and Sahin Is ika

2025-04-30 0 0 637.36KB 10 页 10玖币

侵权投诉

Nish: A Novel Negative Stimulated Hybrid Activation Function

Yildiray Anaguna* and Sahin Isika

aDepartment of Computer Engineering, Eskisehir Osmangazi University, Eskisehir, Turkey

*Corresponding author: yanagun@ogu.edu.tr

Abstract

An activation function has a significant impact on the efficiency and robustness of the neural networks. As

an alternative, we evolved a cutting-edge non-monotonic activation function, Negative Stimulated Hybrid

Activation Function (Nish). It acts as a Rectified Linear Unit (ReLU) function for the positive region and

a sinus-sigmoidal function for the negative region. In other words, it incorporates a sigmoid and a sine

function and gaining new dynamics over classical ReLU. We analyzed the consistency of the Nish for

different combinations of essential networks and most common activation functions using on several most

popular benchmarks. From the experimental results, we reported that the accuracy rates achieved by the

Nish is slightly better than compared to the Mish in classification.

Keywords: Activation Function, Convolutional Neural Network, Deep Learning

1. Introduction

ReLU (Krizhevsky et al., 2017; Nair and Hinton,

2010) has been used extensively as a non-linear

function on deep neural networks; moreover it is one

of the most preferred functions due to better

convergence and gradient propagation compared to

tanh, logistic and sigmoid. ReLU also suffers from

the existence of certain three important cases such

as non-zero mean, negative loss and unlimited

output. Although, ReLU ensures training speed in

the learning phase, since the derivative of the zero

value region is also zero, learning does not occur in

backpropagation (Dying ReLU problem). ReLU-n

(Krizhevsky, 2012a) is another rectifier-based

activation functions proposed to enhance the

efficiency of the convolutional neural network

(CNN). ReLU-n is a limited ReLU at n. When the

value of n is chosen as 6 empirically, it is called

ReLU6. This provides the model to learn the sparse

input earlier. This is equivalent to assuming that

each ReLU unit consists of only 6 replicated bias-

shift Bernoulli units rather than an infinite amount

(Lu et al., 2019). Leaky ReLU (LReLU) (Maas,

2013) tries to solve the vanishing gradient problem

by multiplying by a constant

 

for negative values

of the function. Randomized Leaky ReLU (RReLU)

(Xu et al., 2015) generates activation of neurons by

random sampling for negative values. Parametric

ReLU (PReLU) (Kaiming He et al., 2015a) is very

similar to LReLU, but compared to it, the multiplier

is a learnable parameter

 



. The equations of the

ReLU and its variants are given as follows:

0 if 0

() if 0

ReLU

fxxx









(1)

if 0

() if 0

LReLU

ax x









(2)

if 0

( ) ,

if 0

ij ij ij

RReLU ij ij ij

a x x

fxxx









(3)

 





, , and , 0,1aij P k l l k k l

if 0

() if 0

PReLU











(4)

Exponential Linear Unit (ELU) activation function

(Clevert et al., 2016) gradually brings slope of the

curve from a constant threshold to origin, in

addition, it contains a negative saturation on the

curve to manage variance of the forward

propagating. It takes a binary value and is therefore

often used as an output layer. Inspired by the ELU,

Scaled ELU (SELU) (Klambauer et al., 2017) is its

parametric activation function variant and does not

have a vanishing gradient problem such as the

ReLU. The SELU is defined for self-normalizing

neural networks and deals with internal

normalization. In other words, each layer preserves

the mean and variance coming from the previous

layers. It ensures this normalization by adjusting

both the mean and variance so that the network

converges faster. However, in deeper neural

networks, it is possible to say that the SELU suffers

from gradient explosion or loses its self-

normalization property. Gaussian Error Linear Unit

(GELU) (Hendrycks and Gimpel, 2016) performs

element-wise neurons firing on a given input tensor.

The GELU nonlinearity sets the weights of the

inputs in according to the magnitude instead of their

sign as in the ReLUs, and deals with the cumulative

density function of the normally distributed inputs.

Sigmoid Linear Unit (SiLU) (Elfwing et al., 2018)

is calculated with the sigmoid function multiplied

by its input, while the Swish activation function

(Ramachandran et al., 2018) is obtained by adding a

trainable β parameter, with slight modification to

SiLU. Although all of these cutting-edge activation

functions perform significantly better than previous

classic activation functions when trained in 3 deep

models, they converge more slowly compared to the

ReLU, Leaky ReLU and PReLU. Based on the

Swish’s self-gating properties, a new self-regulating

non-monotonic activation function Mish (Misra,

2020) tends to increase the efficiency of computer

vision problems. Not only does the Mish ensure

better empirical results than the Swish under most

experimental conditions, but it also overcomes some

of the Swish's drawbacks, particularly in large and

complex architectures such as models with residual

layers.

2. Related Works

Research on new heuristics/strategies that can

outperform classical activation functions is

nowadays one of the key points for finding solutions

to image and signal processing problems with

advanced deep learning mechanisms (Nwankpa et

al., 2020). There is a close and complex relationship

between the activation function and the CNN

structure. Variants of the family of rectified units

have been widely used in popular deep CNNs in

recent years. (Clevert et al., 2015; Kaiming He et al.,

2014; K. He et al., 2015b; Sermanet et al., 2014;

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Nish:ANovelNegativeStimulatedHybridActivationFunctionYildirayAnaguna*andSahinIsikaaDepartmentofComputerEngineering,EskisehirOsmangaziUniversity,Eskisehir,Turkey*Correspondingauthor:yanagun@ogu.edu.trAbstractAnactivationfunctionhasasignificantimpactontheefficiencyandrobustnessoftheneuralnetworks.Asa...

展开>> 收起<<

1 Nish A Novel Negative Stimulated Hybrid Activation Function Yildiray Anag una and Sahin Is ika.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Nish A Novel Negative Stimulated Hybrid Activation Function Yildiray Anag una and Sahin Is ika

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: