2
1. Introduction
Why do startups in California attract higher valuations than those in New York? Or ones based in London
attract higher valuations than those in Paris, Berlin, or Milan, even when based in similarly-sized economies,
sharing the same industries and many of the same investors? What drives this? Which factors matter most?
While classical economic theory describes valuations as being based on revenues, growth-rates, and risk-
adjusted discount-rates, valuation of startups proves the exception to the rule.
Given their opacity, short histories, and vast array of intangible assets, startups are notoriously difficult to
value (Damodaran, 2009). This has given rise to a diversity of valuation approaches dependent on drivers
known to have valuation-impacts on early-stage startups in various phases. Valuation-approaches such as
discounted-cashflow (DCF), multiples-valuation, and scorecard-valuation rely on inputs such as assets or
performance-measures, while other strands of the literature describe the impact of market-characteristics
and competitive-environment on firm-value. Adding to the cacophony, widespread press-coverage,
describes dramatic valuation-divergences along geographic and industry lines– divergences not wholly
explained by growth, risk, revenue, or assets–.
Scarcity of data-availability gives rise to the need for development of empirical research with the aim of
valuation approaches to be deployed in the face of this data-scarcity. Responding to this, econometric-
techniques demonstrate limitations, as revenue-and-risk-based OLS-techniques demonstrate substantial
hidden-variable bias. Meanwhile, as many categories, groups, regions, and clusters known to have
explanatory-power have sparsely-available concrete economic-figures which might explain these valuation-
differences, fixed-effects demonstrate decreasing marginal explanatory-power as these groups are included
and accounted-for. This gives rise to limitations in estimation-accuracy.
To address these methodological-limitations, one could combine known firm-performance indicators and
market-conditions such as growth-rates, business cycles, and risk-premiums, with predictive-segmentation
of categorical variables, and examination of key fault-lines in the startup landscape.
Such approaches are already used in markets, where an often-used startup-valuation approach are scorecard
models, input-based valuation models driven by aggregation of discreet-inputs, and specific discreet
contextual-characteristics in which the startups arise. While these approaches traditionally have limited
generalizability and have accordingly not made much of an impact in the peer-review landscape, fields such
as marketing, psychology, and political science have made extensive use of similar approaches.
This paper’s focus is multifold. First, this study brings recent developments in methodology to bear for
valuation of startups using machine-learning approaches. Second, this study aims to shed light on
divergences between classical valuation-approaches and scorecard and segmented-approaches used in
industry. Most importantly, recent developments in machine-learning approaches make possible the
hierarchical ranking of valuation-factors, thereby minimizing information-asymmetries, enabling more