Ahmad et al.
Recently, the attention mechanism has become an efficient approach to determine the important erudition to achieve
excellent outcomes. Numerous studies have been carried out on attention mechanisms and architecture. For text
classification, several novel methods are also proposed [32, 33, 34, 35]. An attention-based LSTM network was
proposed by Zhou et al. [32] to classify cross-lingual sentiments, where they used English and Chinese as the source
and target languages, respectively. A Convolutional-Recurrent Attention Network (CRAN) was proposed by Du et
al. [33]. Their proposed architecture includes a text encoder using RNN, and an attention extractor using CNN. The
experimental result shows that the model effectively extracts the salient parts from sentences along with improving
the sentence classification performance. Liu et al. [34] proffered attention-based convolution layer and BiLSTM
architecture, where the attention mechanism provides a focus for the hidden layers output. The BiLSTM is used to
extract both previous and following context, while the convolutional layer retrieves the higher-level phrase from the
word embedding vectors. Their experimental results get comparable results for all the benchmark datasets.
The state-of-the-art graph-based neural network methods for text classification have been gaining increasing attention
recently. A text graph convolutional network (TextGCN) was proposed by Yao et al. [36], which is more notable for its
small training corpus for text classification. To learn the TextGCN for the corpus, word co-occurrence and the relation
between the word document based single text graph was developed. Another tensor graph convolutional network
(TensorGCN) has been proposed by Liu et al. [37]. They develop the text graph tensor based on semantic, syntactic,
and sequential contextual information. After that, two types of propagation learning are performed on the text graph
tensor called intra-graph propagation to aggregate information from neighboring nodes and inter-graph propagation to
tune heterogeneous information between graphs.
Capsule network is another state-of-the-art method for text classification that is inherent to CNNs. Several studies
based on the capsule network have been conducted [38, 39, 40]. In capsule networks, capsules are locally invariant
groups that learn to recognize the presence of visual entities and encode their characteristics into vectors. It also re-
quires a nonlinear function called squashing, whereas neurons in a CNN act independently. However, equivariance
and dynamic routing are the two most essential characteristics of Capsule Networks that distinguish them from stan-
dard Neural Networks. A Capsule network with dynamic and static routing based text classification methods was
proposed by Kim et al. [39]. Static routing achieved higher accuracy than dynamic routing. Yang et al. [38] intro-
duced a cross-domain capsule network and illustrated the transfer learning applications for single-label to multi-label
text classification and cross-domain sentiment classification. An attention mechanism-based capsule network system
called Deep Refinement was suggested by Jain et al. [40]. Their proposed method achieved 96% accuracy for text
classification compared to BiLSTM, SVM, and C-BiLSTM for the Quora insincere question dataset.
Traditional text classification techniques use manually labelled datasets that are monotonous and time-consuming.
Recently, a few dataless text classification techniques, for example, the Laplacian seed word topic model (Lap-
SWTM) [41], and seed-guided multi-label topic model (SMTM) [42] have recently been proposed to solve this chal-
lenge. Anantharaman et al. [43] proposed large and short text classification non-negative matrix factorization, LDA,
and LSA (latent semantic analysis). LSA with TFIDF was proposed by Neogi et al. [44] for text classification. To
increase the accuracy, they used entropy. A self-training LDA based semi-supervised text classification method was
proposed by pavlinek et al. [45] for text classification.
2.5 Research Gap, Novelty, and Contributions
Text datasets, often known as corpora, are used to study linguistic phenomena including text classification, morpho-
logical structure, word sense disambiguation, language evolution over time, and spelling checking. The quality and
amount of the corpus have a big impact on the research output. A well-structured, comprehensive corpus can yield far
superior study results. In comparison to the English language, there has been inadequate study done due to the paucity
of the Bangla corpus and the complicated grammatical structure. In this paper, our contributions are as follows:
• We are the first to use a comprehensive Bangla newspaper article dataset called Potrika [6, 46] to classify eight
distinct news article classes, including Education, Entertainment, Sports, Politics, National, International,
Economy, and Science & Technology.
• We implement both machine learning (ML) including logistic regression, SGD, SVM, RF and KNN algo-
rithms, and deep learning (DL) including CNN, LSTM, BiLSTM, and GRU algorithms for single label news
article classification. We perform BOW, TFIDF, and Doc2Vec word embedding models for ML algorithms.
For DL algorithms, we apply word embedding models such as word2vec, glove, and fasttext that were de-
veloped based on the Potrika dataset. These word embedding models are not only valuable for news article
classification but also for other NLP tasks like text summarization, named entity recognition, Bangla auto-
matic word prediction, question-answering systems, etc. Further, we evaluate and scrutinise the results for
both cases.
4