Urban Economic Fitness and Complexity from Patent Data Matteo Straccamore123 Matteo Bruno41 Bernardo Monechi3 and Vittorio Loreto3412

2025-05-06 0 0 5.93MB 25 页 10玖币
侵权投诉
Urban Economic Fitness and Complexity from
Patent Data
Matteo Straccamore1,2,3,*, Matteo Bruno4,1, Bernardo Monechi3, and Vittorio Loreto3,4,1,2
1Centro Ricerche Enrico Fermi (CREF), Rome, Italy
2Sapienza Univ. of Rome, Physics Dept., Rome, Italy
3SONY Computer Science Laboratories, Paris, France
4SONY Computer Science Laboratories, Rome, Italy
*matteo.straccamore@cref.it
ABSTRACT
Over the years, the growing availability of extensive datasets about registered patents allowed researchers to better understand
technological innovation drivers. In this work, we investigate how the technological contents of patents characterise the
development of metropolitan areas and how innovation is related to GDP per capita. Exploiting worldwide data from 1980
to 2014, and through network-based techniques that only use information about patents, we identify coherent distinguished
groups of metropolitan areas, either clustered in the same geographical area or similar from an economic point of view. We
also extend the concept of coherent diversification to patent production by showing how it represents a decisive factor in the
economic growth of metropolitan areas. These results confirm a picture in which technological innovation can lead and steer
the economic development of cities, opening, in this way, the possibility of adopting the tools introduced here to investigate the
interplay between urban development and technological innovation.
1 Introduction
Modern cities are at the centre of a passionate debate about their future. With over 55% of the global population now living in
urban areas, cities represent the core of the modern world. They are key for the production and diffusion of innovation
1,2
in
many different sectors ranging from economy
3
to science
4
and culture
5
. The ongoing pandemic has been imposing the hardest
possible stress test on urban infrastructures and poses a real challenge in rethinking the role of cities, urban planning and policy
decisions. While urbanisation keeps thriving
6
, the challenge of understanding the development of cities to make them more
sustainable and resilient becomes more and more crucial
7,8
. Therefore, it is of paramount importance to tackle urban areas’
challenges by going beyond pure optimisation schemes and keeping a dynamic perspective. New tools are thus needed to
understand and map the present and forecast how a change in the current conditions will affect and modify future scenarios.
Despite belonging to different geographical areas and socio-economic contexts, cities possess general features for economic
development and urbanisation rates. For example, in
9
, authors show that many urban socio-economic indicators have a
power-law correlation with the population size. In
10
, the authors observe how individual cities recapitulate a common pathway
where a transition to innovative economies takes place with a population of around 1.2 million. However, cities are ever-evolving
systems where several changes and different growth paths are possible
11
. Technological innovation has been highlighted as the
main driver for evolution and change in cities, and it is has been shown that complex economic activity flourish in large urban
areas
12
. In parallel, many studies recently focused on how innovation proceeds
1315
. In this paper, we focus on technological
innovation, and we investigate how the technological DNA of cities can affect their development and potential.
The adoption of patent data to monitor technological innovation is well established
1618
. For the past few decades, patent
data have become a workhorse for the literature on technical change due mainly to the growing availability of data about
patent documents
19
. This ever-increasing data availability (e.g., PATSTAT, REGPAT and Google Patents
20
) has facilitated and
prompted researchers worldwide to investigate various questions regarding the patenting activity. For example, the nature of
inventions, their network structure and their role in explaining the technological change19,21,22.
One of the characteristics of patent documents is the presence of codes associated with the claims contained in the patent
applications. These codes mark the boundaries of the commercial exclusion rights demanded by inventors. Claims are classified
based on the technological areas they impact according to existing classifications (e.g., the IPC classification
23
) to allow the
evaluation by patent offices. Mapping claims to classification codes allows localising patents and patent applications within the
technology space. Many studies recently relied on network-based techniques to unfold the complex interplay among patents,
technological codes and geographical reference areas. Network science techniques allowed to analyse economic activities of
countries24, regions2529, cities2,3032 or firms33,34.
arXiv:2210.01001v2 [physics.soc-ph] 3 Feb 2023
In the present work, we focus on cities to quantify the complexity of their technologies, correlating it with socio-economic
indicators such as the GDP per capita. More precisely, we summarise our research questions as follows:
Which cities have the most advanced technological production? We use the framework of Fitness and Complexity (FC)
35
to
quantify the complexity of metropolitan areas and their technological endowment. Introduced initially and extensively adopted
for countries’ production/exports
35,36
, the approach can easily be extended to any object pair, in this case, urban areas and
technological codes.
Are cities able to diversify their production of patents, or do they tend to specialise in particular sectors? In economics, FC
has also been applied to sub-national scales, such as regions
37,38
and firms, both at a country
39
or global
40
level. The study of
bipartite economic systems at different scales revealed that to apply the FC framework, the economic agents need to have the
capability to diversify to create global competition in the system. Otherwise, they will try to specialise and create a nested
subsystem of entities specialising in the same products. In such a case, the analysis has to be restricted to subsystems for the FC
method to capture the interplay among the economic agents. In this sense, the scale of the system is fundamental and regulates
the interplay between competition and specialisation. We aim to understand whether metropolitan areas can compete globally
or if they tend to specialise.
Are there clusters of cities with similar technological baskets? Starting from a bipartite system of metropolitan areas - technology
codes, we investigate the relations and similarities among metropolitan areas and uncover meaningful patterns in the evolution
of their technological production. In bipartite systems, it is often important to understand the similarities between pairs of nodes
of the same layer, to obtain a validated projection on a single layer
41
. We adopt this procedure to understand which metropolitan
areas are more similar in the type of patents they produce and which patents are more likely to be produced together.
The paper is organised as follows: in Section 2, we describe the data used in this work and we go through our data cleaning
procedure. In Section 3, we introduce the methodologies used in our work, describing the details of the networks and measures
we employed. In Section 4, we discuss the results showing how the network techniques can highlight non-trivial clusters of
technologies and metropolitan areas, and how both the Fitness and the coherent diversification can drive a higher increase in the
GDPpc of metropolitan areas. Finally, Section 5sums up our contributions and hints at future work needed to address questions
arising from this study.
2 Data
Technology Codes
Here, we shall adopt the PATSTAT database (www.epo.org/searching-for-patents/business/patstat) that provides information
about patents and technology codes. The database contains approximately 100 million patents registered in about 100 Patent
Offices. Each patent is associated with a code that uniquely identifies the patent and a certain number of associated technology
codes. The WIPO (World International Patent Office) uses the IPC (International Patent Classification) standard
23
to assign
technology codes to each patent. IPC codes make a hierarchical classification based on six levels called digits, used to go into
more and more detail about the technology used. The first digit represents the macro category: for example, the code Cxxxxx
corresponds to the macro category "Chemistry; Metallurgy" and Hxxxxx to the macro category "Electricity"; considering the
subsequent digits, we have, for instance, with C01xxx, the class "Inorganic Chemistry" and with C07xxx the class "Organic
Chemistry".
After assigning a technology code to each patent, we use a database about cities (see next section) to match the unique patent
identifier and its technology code to the corresponding city. To geolocalise the patents, we adopt the De Rassenfosse et al.
database
42
that contains entries on 18 million patents from 1980 to 2014. Conveniently, in this database, the geographical
information of patents is assigned to precise geographical coordinates. Thus, each patent has a unique identifier, a series of
technology codes, and geographical coordinates identifying the corresponding city.
GDP of cities
To obtain information on the GDP of cities and their evolution, we used the work of Kummu et al.
43
. The authors constructed a
worldwide GDP grid with a resolution of about five arc minutes for the 25 years 1990-2015. To compute the GDP per capita
of each city or metropolitan area (MA) for each year in the data, we first download the boundaries from the Global Human
Settlement Layer
44
. Considering the GDP grid in one year, we compute the GPD per capita of a MA as the average of all the
grid points within its boundaries. In Fig.3 in the Supplementary Information, we show the example of the grid of the Rome
metropolitan area.
Data Cleaning Procedure
To clean the data, the first step is to associate the technology codes of a patent with a specific city. Once this preliminary
operation is completed, it is possible to build the bipartite networks that will link cities to technology codes. We represent the
2/15
t1 t2 t3 t4 t5
(a) (b)
Figure 1. Bipartite metropolitan areas - technology codes network. (a): Pictorial representation of the bipartite
metropolitan areas-technology codes network. Each MA is connected to one or more technology sectors.
(b)
: Bipartite network
adjacency matrix for the year 2000. A dark dot means that a given technology code is present in a patent made by a given MA.
bipartite networks through bi-adjacency rectangular matrices
Vy
whose elements
Vy
c,t
are integers indicating how many times a
technology code
t
appeared in different patents in a given city
c
in year
y
. In total, our network features
42912
cities connected
to
650
technology codes (4-digit). To reduce the difference between the two layers of the networks and reduce the noise in
the system which is often due to the presence of very small cities, we aggregate the cities in the respective metropolitan areas
(MAs). We select all cities within a metropolitan area (MA), and the technology codes associated with the metropolitan area
will be the union of all the technology codes of the cities within it. The MAs present in the Global Human Settlement Layer
44
are
8641
and cover the entire world. However, most of these do not contain cities that have patents. The metropolitan areas
producing patents are 2169 and are distributed as shown in Figures 1 and 2 in the Supplementary Information.
We obtain a matrix
Vy
for each year
y
from
1980
to
2014
, connecting
2169
metropolitan areas
a
and
650
technology codes
t
. To avoid the fluctuations due to using only one year at a time as an interval, we decided to consider a window of
5
years each
time, summing the matrices in one window. In this paper, therefore, the matrix
Vy
will refer to the time window from
y
to
y+5
.
The final database consists of
30
5-year window matrices
Vy
ranging from window
1980 1984
to
2010 2014
. Finally, we
binarise the matrices
V
applying a standard procedure in economic complexity to determine relevant producers/exporters of
products (see Section 3).
3 Methods
Revealed Comparative Advantage
To understand which metropolitan areas are relevant innovators of a specific technological sector, we apply the revealed com-
parative advantage (RCA)
45
binarisation strategy. RCA is a frequently used tool in the economic complexity literature
24,36,46
.
Considering a bipartite network of countries and products, RCA allows us to determine how competitive a country is in
exporting a given product while also considering how many countries export that product. In our case, RCA reveals when the
share of patents of some technology,
t
, introduced by a certain MA,
a
, is higher than the average share of the rest of the market,
meaning that the metropolitan area focuses on the technology
t
more than the number of technologies produced would suggest.
Considering the matrix Vyfor the year y, we define the RCA for the MA aand the technology tas:
RCAy
a,t=Vy
a,t/t0Vy
a,t0
a0Vy
a0,t/a0,t0Vy
a0,t0
,
where the sums in the lhs run over all the technologies
t0
and all the MAs
a
. A value
RCAa,t1
means that MA
a
is significantly
competitive in the technology field
t
. We use this threshold on the RCA values to obtain
30 My
matrices, one for each
5
-year
3/15
window:
My
a,t=(1 if RCAy
a,t1
0 if RCAy
a,t<1.
Notice that, in the following, we consider only having an average of at least one RCA
>1
per year, reducing their number to
1211. These Mymatrices represent our final temporal bipartite network that links 1211 MAs to 650 technology codes.
Bipartite Networks
A bipartite network is a network whose nodes represent two different kinds of entities, and only connections between nodes
from different entities are allowed. Many systems in ecological and socio-economical environments, such as those studied in the
present work, are easily described as bipartite since they involve interactions between two kinds of entities
39,47
. For instance,
the Internet can be modeled as a users-websites bipartite network, whose analysis can reveal sets and ranks of pages which will
be more likely to be of interest for the user
48
. We can use the
My
matrices as biadjacency matrices of MA - technology bipartite
networks, connecting each MA with the technologies in which it is competitive. In figure 1we show a pictorial representation
of this bipartite network and its biadjacency matrix Myfor the year y=2000.
Projecting the bipartite network on one of its layers, we can find non-trivial similarity patterns between MAs or technologies.
However, the problem of finding the proper projection of a bipartite network into a monopartite one representing the similarities
of nodes on one of its layers is well-known in the literature
41,4850
. In general, the goal is to find the representation of a
monopartite network that best represents the bipartite one without taking too much information away from the latter. We
decided to use the Bipartite Configuration Model (BiCM)51,52 to select the most significant nodes and links.
Bipartite Configuration Model (BiCM)
One of the simplest ways to obtain a one-party projection from bipartite data is to count the number of links in common
between two different entities belonging to the same layer. For example, using
M
as the biadjacency matrix of a bipartite
network between metropolitan areas
a
and technologies
t
, counting the number of links in common between two different
entities belonging to the same layer means computing:
Aaiaj=
t
MaitMajt,
where
Aaiaj
is the adjacency monopartite projection matrix element of
A
between elements
ai
and
aj
. However, we note that a
projection made in this way leads to a densely connected structure with a trivial topology.
To select the relevant nodes and links in our projected networks to avoid obtaining a too dense projection,we use as a null model
the Bipartite Configuration Model (BiCM)
49,51,52
which we compute by using the NEMtropy Python package
1
. The BiCM
belongs to the family of the Exponential Random Graphs, adapted to the case of bipartite networks. These models arise from
the maximisation of the Shannon entropy of an ensemble of networks, in our case undirected binary bipartite networks M:
S=
M
P(M)lnP(M),
considering a set of constraints C(M).P(M)is the probability of a specific bipartite network M.
The probability distribution maximising the entropy is the exponential distribution:
P(M|
~
λ) = eH(M,
~
λ)
Z(
~
λ),(1)
where H(M,~
λ) =~
λ·C(M)is the Hamiltonian imposing the Lagrangian multipliers.
Two sets of constraints are imposed in the BiCM, one for each layer. Specifically, the node degrees are fixed, namely ubiquity
~u(M)
for each technology code and diversification
~
d(M)
for MAs, in our case. The mean values of the node degrees must be
tuned to match these quantities. Then we obtain the Hamiltonian H:
H(M,~
λ) = ~
α·~
d(M) +~
β·~u(M).
Imposing the previous constraints together with the normalisation condition MP(M) = 1, we can write Eq. 1as:
P(M|
~
λ) = e~
α·~
d(M)~
β·~u(M)
Me~
α·~
d(M)~
β·~u(M).
1github.com/nicoloval/NEMtropy
4/15
Since constraints have been imposed on the mean values of the node degrees, the previous equation can be decomposed into the
product of the probability distributions of a single link:
P(M|
~
λ) =
a
t
pMat
at (1pat )1Mat
where
pat =xayt
1+xayt
is the probability of the link between the MA
a
and the technological code
t
,
xa=eαa
and
yt=eβt
. To
estimate the unknown parameters we have to maximise the log-likelihood L(~x,~y) = ln P(M|~x,~y), i.e. solving the system:
~
L(~x,~y) = 0(da(M) = t
xayt
1+xayta
ut(M) = a
xayt
1+xaytt
with da(M) = d
aand ut(M) = u
trepresenting the observed quantities.
After we obtain the link probabilities of the model, we use them to compute how unexpected is the number of common
neighbours of two nodes of the same layer. Given that, by construction, the links of the model are independent random variables,
the probability of sharing a technology for two MAs is
P(Vt
aa0=1) = pat pa0t
, and the total number of technologies they share
will be
Vaa0=tmat ma0t
. Thus, we can compute a p-value for the number of common neighbours observed for two nodes of
the same layer, which reads:
p-valueaa0=P(Vaa0>V
aa0)(2)
where
V
aa0
is the number of common neighbours between nodes
a
and
a0
in the observed network. Note that the random variable
Vaa0
is a Poisson-Binomial, i.e. a sum of independent Bernoulli random variables of different parameters, which is hard to
evaluate when the number of different Bernoulli is large, we actually approximate this by substituting a Poisson variable with
the same mean, as it has been done in previous works.
After applying this procedure to each pair of nodes, we obtain as output a p-value matrix of the same size as the adjacency
matrix
M
of the starting bipartite network. As a final step, we have to decide which of these
p
-values are significant and which
are not. To assess the link significance, we use the False Discovery Rate test
53
: let us assume that we have
N
hypotheses, each
characterised by its p-value. The FDR first sorts these
N
p-values as
p-value1
,...,
p-valueN
, and then identifies the integer
I
such
that:
p-valueIIα
N(3)
where
α
is the arbitrarily defined single-test significance level. We use
α=0.01
for the projection onto the technology layer,
and
α=0.1
for the MA one. Note that in this case,
α
will be the statistical significance of the whole validated network, while
for the single links their significance will be much lower. Finally, all hypotheses with p-value lower or equal than
p-valueI
will
be rejected, i.e. the link will be validated in the projected network. In our case, for instance in the case of the projection on the
technologies’ layer, the number of hypotheses is the number of possible links in the projection Nt
2and Eq. 3becomes:
p-valueIIα
Nt
2.
Ordering the coefficients
Nt
2p-value(Vtt0)
and retaining only the links between pairs of nodes
t,t0
such that
p-value(V
tt0)
p-valueIyields our projection.
Let us remark that the projection obtained via the procedure just described only keeps links that are highly significant with
respect to the degree of the nodes, unveiling hidden strong similarities.
Modularity and Community detection
We are interested in finding relevant communities of MAs or technologies to visualise better which nodes in the two layers are
highly interconnected. To this end, we adopted the Louvain method introduced by Blondel et al.
54
, which relies on finding a
partition that maximises the modularity.We also vary the Resolution55 to find communities at different scales.
Fitness and Complexity algorithms
The Fitness and Complexity (FC) framework
35
, introduced in
2012
, provides a way to quantify the competitiveness (Fitness) of
the economy of a country. Here, we adopt it to quantify the Fitness of metropolitan areas considering only patent data. The idea
5/15
摘要:

UrbanEconomicFitnessandComplexityfromPatentDataMatteoStraccamore1,2,3,*,MatteoBruno4,1,BernardoMonechi3,andVittorioLoreto3,4,1,21CentroRicercheEnricoFermi(CREF),Rome,Italy2SapienzaUniv.ofRome,PhysicsDept.,Rome,Italy3SONYComputerScienceLaboratories,Paris,France4SONYComputerScienceLaboratories,Rome,It...

展开>> 收起<<
Urban Economic Fitness and Complexity from Patent Data Matteo Straccamore123 Matteo Bruno41 Bernardo Monechi3 and Vittorio Loreto3412.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:5.93MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注