Forecasting Cryptocurrencies Log-Returns a LASSO-VAR and Sentiment Approach Milos CiganovicFederico DAmario

2025-05-01 0 0 776.69KB 26 页 10玖币
侵权投诉
Forecasting Cryptocurrencies Log-Returns: a
LASSO-VAR and Sentiment Approach
Milos CiganovicFederico D’Amario
October 4, 2022
Abstract
Cryptocurrencies have become a trendy topic recently, primarily due to their dis-
ruptive potential and reports of unprecedented returns. In addition, academics in-
creasingly acknowledge the predictive power of Social Media in many fields and, more
specifically, for financial markets and economics. In this paper, we leverage the pre-
dictive power of Twitter and Reddit sentiment together with Google Trends indexes
and volume to forecast the log returns of ten cryptocurrencies. Specifically, we con-
sider Bitcoin,Ethereum,T ether,BinanceCoin,Litecoin,EnjinCoin,Horizen,
Namecoin,P eercoin, and F eathercoin. We evaluate the performance of LASSO-VAR
using daily data from January 2018 to January 2022. In a 30 days recursive forecast,
we can retrieve the correct direction of the actual series more than 50% of the time.
We compare this result with the main benchmarks, and we see a 10% improvement
in Mean Directional Accuracy (MDA). The use of sentiment and attention variables
as predictors increase significantly the forecast accuracy in terms of MDA but not in
terms of Root Mean Squared Errors. We perform a Granger causality test using a
post-double LASSO selection for high-dimensional VARs. Results show no “causality”
from Social Media sentiment to cryptocurrencies returns
Keywords: Cryptocurrencies, Time series analysis, Sentiment analysis,
Natural Language Processing
JEL Codes: C32, C53, C55, G17
Department of Economics and Law - Sapienza University of Rome. milos.ciganovic@uniroma1.it
Department of Economics and Law - Sapienza University of Rome. federico.damario@uniroma1.it
arXiv:2210.00883v1 [q-fin.ST] 22 Sep 2022
1 Introduction
Since the introduction of Bitcoin by Nakamoto (2008), cryptocurrencies have gradually be-
come very popular among investors. In the last decade, the world witnessed an unforeseen
growth of cryptocurrencies, both in terms of their market capitalization and the number of
kinds of coins. Many reasons may justify this boom: first of all, social media and journals re-
ported unprecedented returns of cryptocurrencies which led many, professional investors and
not, to enter this market. This mechanism has been naturally motivated by minimal global
regulation, which brought people to primarily see cryptocurrencies as means of payment for
illegal trades. Eventually, their gigantic returns stimulated an enthusiasm reminiscent of the
Gold Rush in the western U.S. Investing in cryptocurrencies can be done quickly by every-
one downloading an app on their smartphone. This led youngsters to represent, relatively,
the main investors in digital currencies1. Moreover, due to the relatively young age of the
cryptocurrency market, traditional news outlets cannot follow events timely.
Because of the reasons mentioned above, social media can be defined as the primary source
of information for cryptocurrency investors. Specifically, micro-blogging websites such as
Twitter 2and Reddit 3are widely used sources for cryptocurrency information. Significant
fluctuations in cryptocurrencies’ prices and their high volatility resulted in significant risks
associated with investment in crypto assets. This has led to heated discussions about their
place and role in the modern economy (see, for example Corbet et al. (2019), Catalini and
Gans (2020), Halaburda et al. (2020) and Auer et al. (2021)). Therefore, the issue of devel-
oping appropriate methods and models for predicting prices for digital currencies is relevant
both for the scientific community and financial analysts, investors, and traders. A crucial
contribution in terms of modelling and forecasting cryptocurrencies’ financial time series has
been given by Catania et al. (2019) and Catania and Grassi (2021) who develop a dynamic
model suitable for the complex dynamics of these series as well as compare several alter-
native of univariate and multivariate models for point and density forecasts. Furthermore,
many studies (see Hitam and Ismail (2018), Sun et al. (2020), Miller and Kim (2021)) show
how Machine Learning algorithms are extremely convenient in terms of computational time
and accuracy when forecasting cryptocurrencies’ time series. Lastly, increasing literature
highlights the importance of specific factors that shape cryptocurrencies’ demand helping in
1See https://www.investopedia.com/younger-generations-bullish-on-cryptocurrencies-5223563.
2See https://twitter.com/.
3See https://www.reddit.com/.
1
forecasting their prices, returns, and volatility. According to the efficient market hypothesis
Fama (1970), market prices reflect all available information, thus the prediction of stock
returns should not be possible. On the other hand, considerable empirical evidences (see,
e.g. Daniel et al. (2002) for a comprehensive review) show that investors’ psychology drives
the stock market. This led many researchers to adopt sentiment indexes to improve forecast
accuracy. Glenski et al. (2019) exploit the predictive power of social signals from multiple
platforms (GitHub and Reddit) to forecast prices for three cryptocurrencies. They show
that social signals reduce error when forecasting daily coin prices and that the language
used in comments within the official communities on Reddit are the best predictors overall.
Kraaijeveld and De Smedt (2020) show that Twitter sentiment has predictive power for the
returns of several cryptocurrencies. Aslanidis et al. (2022) and Nasir et al. (2019) highlight
the link of Google Trends with cryptocurrencies regarding their returns and volatility.
In this study, we analyze the impact of sentiment variables on cryptocurrency returns by
using a novel dataset that combines a number of social media, search engine data, and vol-
ume. We apply a state-of-the-art sentiment classification technique to investigate whether
sentiment measures contain predictive power for returns. To the best of our knowledge,
similar sets of predictors have not been employed jointly previously. We account for the high
dimensionality of the predictor variables by using a regularization technique known as the
LASSO. This allows us to investigate (i) whether the variables constructed from our novel
dataset can help to improve log-return forecasts using a VAR approach compared to the
benchmark models; (ii) which data source and which type of sentiment or attention mea-
sure is most relevant in terms of Granger-causality in High-Dimensional VARs. Our results
show that, on average, LASSO-VAR performs better in terms of Mean Directional Accuracy
(MDA) than benchmark models. Moreover, the use of sentiment and attention variables as
predictors increase significantly the forecast accuracy. We do not find Granger causality from
sentiment indexes to cryptocurrencies returns. We find, instead, Granger causality between
all cryptocurrencies except Bitcoin, Tether, and Feathercoin and from returns to the bitcoin
sentiment extracted from Twitter.
The remainder of the paper is organized as follows. Section two 2 deals with data collection,
describing the data set, its sources, and strategies to construct sentiment indexes. Section
three 3 describes the modelling strategy, the estimation, forecasting methods, and the met-
rics used for the comparative evaluation of the out-of-sample model predictions. Section four
4 summarizes some selected results. Section five 5 concludes.
2
2 Data Collection
This study relies on multiple data sources. First, we collect daily data of ten cryptocurrencies
from January 2018 to January 2022. We provide for the same period google trends and
sentiment indexes from Twitter and Reddit. We complete our dataset with volume for each
cryptocurrency considered.
2.1 Cryptocurrency Data
The cryptocurrencies used are reported in table 1 ranked in terms of Market Capitalization
(MC) as of January 2022. We get our data from finance.yahoo.com4.
Table 1: 10 Cryptocurrencies and their symbols, market capitalization (MCs) and rankings of MCs (as of
31 January 2022).
Cryptocurrency Symbol MC Rank by MC
Bitcoin BTC $702,864,225,136 1
Ethereum ETH $295,905,148,931 2
Tether USDT $78,188,468,450 3
Binance Coin BNB $63,930,448,963 4
Litecoin LTC $7,479,561,631 21
Enjin Coin ENJ $1,434,490,287 60
Horizen ZEN $505,212,977 120
Namecoin NMC $24,472,607 733
Peercoin PPC $13,809,105 837
Feathercoin FTC $1,927,251 1515
Many reasons brought us to choose this set of cryptocurrencies. First of all, cryptocurren-
cies emerge and disappear continually, while our selected ten currencies have been publicly-
traded consecutively. Moreover, all the currencies chosen have been created with a defined
purpose representing innovative projects which brought development, progress, or value to
the blockchain technology that Bitcoin had implemented. On the other hand, our sample in-
cludes three tier currencies as in Gandal and Halaburda (2016). Bitcoin,Ethereum,T ether,
BinanceCoin, whose market capitalizations stay in the world’s top five, are “top-tier” cryp-
tocurrencies. Litecoin,EnjinCoin,Horizen, representing “middle cryptocurrencies” in
market capitalization. Namecoin,P eercoin, and F eathercoin are representative “minor
cryptocurrencies” according to market capitalization. We include Tether (USDT) in our
4See: https://it.finance.yahoo.com/criptocurrencies/.
3
sample for a specific reason. We know, indeed, that it is a blockchain-based cryptocurrency
whose tokens in circulation are backed by an equivalent amount of U.S. dollars, making it
a stablecoin with a price pegged to USD $1.00, which leads this currency to be very low
volatile. Tether was designed to build the necessary bridge between fiat currencies and cryp-
tocurrencies and offer users stability, transparency, and minimal transaction charges. We
decided to include it as a counterfactual to understand whether sentiment indexes can help
the prediction of stablecoin currencies. We compute the log returns and include them in our
sample. Table 2 provides several summary statistics.
Table 2: Log-returns summary statistics for the ten cryptocurrencies during the period 1 January 2018 to
31 January 2022
BTC-USD ETH-USD USDT-USD BNB-USD LTC-USD ENJ-USD ZEN-USD NMC-USD PPC-USD FTC-USD
Mean 0.001 0.001 0 0.003 0 0.002 0 -0.001 -0.001 -0.003
Median 0.001 0.001 0 0.001 0 -0.001 -0.001 0.001 -0.001 -0.005
Min -0.465 -0.551 -0.053 -0.543 -0.449 -0.624 -0.546 -1.16 -0.665 -0.474
Max 0.172 0.231 0.053 0.529 0.291 0.768 0.38 0.75 0.567 0.409
Range 0.637 0.781 0.106 1.072 0.74 1.392 0.926 1.91 1.232 0.883
Skew -1.147 -1.099 0.3 0.3 -0.608 1.132 -0.233 -0.869 -0.174 -0.174
kurtosis 14.042 10.795 34.34 15.579 7.947 15.508 6.268 19.999 13.593 4.575
ADF -11.0753 -11.1057 -14.186 -11.1936 -11.3664 -11.201 -11.0829 -12.9631 -12.9892 -11.5868
Notes: all ADF statistics are stationary at 1% level
2.2 Google Trends Data
We collected forty three google trends searches (Table 7 in the Appendix reports the list of
all the Google trends collected).
Figure 1: Google trends words collected
Google Trends is a search trend feature that shows how frequently a given search term
is entered into Google’s search engine relative to the site’s total search volume over a given
4
摘要:

ForecastingCryptocurrenciesLog-Returns:aLASSO-VARandSentimentApproachMilosCiganovic*FedericoD'Amario„October4,2022AbstractCryptocurrencieshavebecomeatrendytopicrecently,primarilyduetotheirdis-ruptivepotentialandreportsofunprecedentedreturns.Inaddition,academicsin-creasinglyacknowledgethepredictivepo...

展开>> 收起<<
Forecasting Cryptocurrencies Log-Returns a LASSO-VAR and Sentiment Approach Milos CiganovicFederico DAmario.pdf

共26页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:26 页 大小:776.69KB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 26
客服
关注