Using Deep Learning to Find the Next Unicorn A Practical Synthesis LeleCao

2025-05-06 0 0 1.79MB 49 页 10玖币
侵权投诉
Using Deep Learning to Find the Next
Unicorn: A Practical Synthesis
Lele Cao
Motherbrain, EQT
lele.cao@eqtpartners.com caolele@gmail.com
Vilhelm von Ehrenheim
Motherbrain, EQT
vilhelm.vonehrenheim@eqtpartners.com
Sebastian Krakowski
House of Innovation, Stockholm School of Economics
sebastian.krakowski@hhs.se
Xiaoxue Li
Department of Political Science, Stockholm University
xiaoxue.li@statsvet.su.se
Alexandra Lutz
Motherbrain, EQT
alexandra.lutz@eqtpartners.com
A condensed version [1] is peer reviewed and published by IJCAI 2023 (The 32nd International Joint
Conference on Artificial Intelligence) Workshop: https://aclanthology.org/2023.finnlp-1.6.
Chicago Citation Format:
Cao, Lele, Vilhelm von Ehrenheim, Sebastian Stan, Xiaoxue Li, and Alexandra Lutz. "Using Deep Learning
to Find the Next Unicorn: A Practical Synthesis on Optimization Target, Feature Selection, Data Split
and Evaluation Strategy."Proceedings of the IJCAI Joint Workshop on the 5th Financial Technology and
Natural Language Processing (FinNLP) and the 2nd Multimodal AI for Financial Forecasting (Muffin),
pp. 63-73, 2023.
Please send correspondence to the first author – Lele Cao, Motherbrain AI Research, EQT Group, Regeringsgatan 25,
11153 Stockholm, Sweden; e-mail: caolele@gmail.com.
arXiv:2210.14195v2 [q-fin.CP] 10 Jun 2024
Abstract
Startups often represent newly established business models associated with disruptive innovation
and high scalability. They are commonly regarded as powerful engines for economic and
social development. Meanwhile, startups are heavily constrained by many factors such as
limited financial funding and human resources. Therefore, the chance for a startup to eventually
succeed is as rare as “spotting a unicorn in the wild”. Venture Capital (VC) strives to identify
and invest in unicorn startups during their early stages, hoping to gain a high return. To avoid
entirely relying on human domain expertise and intuition, investors usually employ data-driven
approaches to forecast the success probability of startups. Over the past two decades, the
industry has gone through a paradigm shift moving from conventional statistical approaches
towards becoming machine-learning (ML) based. Notably, the rapid growth of data volume
and variety is quickly ushering in deep learning (DL), a subset of ML, as a potentially superior
approach in terms of capacity and expressivity. In this work, we carry out a literature review
and synthesis on DL-based approaches, covering the entire DL life cycle. The objective is a)
to obtain a thorough and in-depth understanding of the methodologies for startup evaluation
using DL, and b) to distil valuable and actionable learning for practitioners. To the best of our
knowledge, our work is the first of this kind.
Keywords— Startup, Success Prediction, Unicorn, Deep Learning, Machine Learning, Venture
Capital, Investment, Big Data, Practical Synthesis
Using Deep Learning to Find the Next Unicorn: A Practical Synthesis Cao et al.
1 Introduction
A “startup” has many variants of definitions; up until this date there is no consensus on the standard
definition. Santisteban and Mauricio [2] synthesized many popular definitions, and discovered
some common labels such as “new”, “small”, “rapid growth”, “high risk”, where “small” is often
approximated by limited financial funds and human resources [3]. Much of the literature, e.g., [4],
associates startups with disruptive innovation and high scalability. As a result,
A startup is a dynamic, flexible, high risk, and recently established company that
typically represents a reproducible and scalable business model. It provides innovative
products and/or services, and has limited financial funds and human resources.” [2
4]
Since startups stimulate growth, generate jobs and tax revenues, and promote many other
socioeconomically beneficial factors [5], they are commonly regarded as powerful engines for
economic and social development, especially after economic, environment, and epidemic crisis such
as COVID-19
1
[6]. As the startups continue to develop, they often increasingly rely on external
funds (as opposed to internal funds from founders and co-founders), from either domestic or foreign
capital markets, to unlock a high rate of growth that usually corresponds to a “hockey stick” growth
curve (i.e. a linear line on a log scale) [7].
Startups may receive funds from multiple sources like Venture Capital (VC) and debt financing;
up till this date, the dominating source has been VC. As an industry, VC seeks opportunities to
invest in startups with great potential (in the sense of financial returns) to grow and successfully exit.
The risk-return trade-off tells us that the potential return rises with a corresponding increase in risk
2
.
1
Coronavirus disease 2019 (COVID-19) is a contagious disease caused by the severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2). The first case was identified in 2019.
2
Statistics revealing the high risk of funding startups: on average, only around 60% of the startups survive for more
than 3 years since founded [8]; top 2% of VC funds receive 95% of the returns in the entire industry [9]; VC typically
has only 10% rate of achieving an ROI (return on investment) of 100% or more [10,11].
2
Using Deep Learning to Find the Next Unicorn: A Practical Synthesis Cao et al.
As a consequence, VC firms usually strive to mitigate this risk by improving their 1) deal sourcing
3
and screening and 2) value-add process [12]. In this survey, we will focus on the published work
around the former approach, i.e. finding the startup unicorn
4
as accurately as possible during the
deal sourcing phase.
Finding the unicorn from candidate startups is a complex task with great uncertainty because
of many factors such as vague and prone-to-change business ideas, no proof-of-concept prototype
when applicable, no organic revenue. This creates a low information situation, where VC firms often
have to make investment decisions based on insufficient information (e.g. lack of financial data) [14].
Therefore a VC’s deal sourcing process traditionally turns out to be manual and empirical, leaving
estimations of the ROI (return on investment) heavily dependent on the human investors’ decisions.
As pointed out in [15], human investors are inherently biased and intuition alone cannot consistently
drive good decisions. A better approach should leverage big data to
debias the decisions, so that the individual investment decision made for a particular startup is
expected to drive lower risk and higher ROI;
enable automation, so that more startups can be evaluated without requesting extra amount of
time.
To that end, over the past two decades, data driven approaches have been dominating the research
around startup success prediction (i.e. identifying startups that eventually turn into unicorns).
However, the majority is analytical and statistical as opposed to ML (machine learning) approaches.
Conventional statistical work (e.g. [2,16
33]) mostly starts with defining some hypotheses
5
, followed
3Deal sourcing is the process by which investors identify investment opportunities.
4
Unicorn and near-unicorn startups are private, venture-backed firms with a valuation of at least $500 million at
some point [13].
5
Hypothesis often assumes certain impact of some factors to startup success. For example “the founder’s past
entrepreneurial experience influences the likelihood of success [32]”.
3
Using Deep Learning to Find the Next Unicorn: A Practical Synthesis Cao et al.
Startup
(input data)
x
ML Model Invest?
y = {0 - bad, 1 - good}
Figure 1: High-level overview of ML (machine learning) based startup sourcing
The ML model is trained to approximate a function
𝑓(·)
so that the input data
x
describing a startup
can be mapped to an output variable
𝑦
indicating the recommended investment propensity that can
be either discrete (good vs. bad) or continuous (success probability).
by testing them using statistical tools; the outcome of these work is often conclusions around
correlation and/or causality between some factors and the success likelihood of startups.
In conventional statistical research, good research hypotheses need to be simple, concise, precise,
testable; and most importantly, they should be grounded in past knowledge, gained from the literature
review or from theory [34]. Therefore, it is not a easy task to come up with good hypotheses. Over
the last few years, researchers have started investigating the possibility to perform hypothesis mining
from data using ML algorithms to avoid manually defining hypotheses upfront. Hypothesis mining
aims to summarize (instead of manually define) hypotheses by carrying out explainability analysis
(cf. Section 9) on the trained ML models [35]. For example, with a labeled (i.e. knowing which
startups eventually become unicorns) dataset containing many attributes for many companies; one
can directly start off with training an ML model to predict unicorns (i.e. prediction target) using
the entire dataset (all companies and attributes). By explaining and quantifying how the change of
certain attributes would change the prediction target, one may distil hypothesis that describes the
relation between the attributes in scope and the prediction target. In comparison to exploratory data
analysis, hypothesis mining is a much more structured procedure that trains an ML model using
the entire dataset at hand. As illustrated in Figure 1, the ML-based approaches [35
55] require
practitioners to define the input data
x
and annotation
𝑦
(labeling good or bad investment according
to some criteria) before training a model
𝑓(·)
that maps
x
to
𝑦
, i.e.
𝑦=𝑓(x)
. There are already a
4
摘要:

UsingDeepLearningtoFindtheNextUnicorn:APracticalSynthesisLeleCaoMotherbrain,EQTlele.cao@eqtpartners.comcaolele@gmail.comVilhelmvonEhrenheimMotherbrain,EQTvilhelm.vonehrenheim@eqtpartners.comSebastianKrakowskiHouseofInnovation,StockholmSchoolofEconomicssebastian.krakowski@hhs.seXiaoxueLiDepartmentofP...

展开>> 收起<<
Using Deep Learning to Find the Next Unicorn A Practical Synthesis LeleCao.pdf

共49页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:49 页 大小:1.79MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 49
客服
关注