Urban Economic Fitness and Complexity from Patent Data Matteo Straccamore123 Matteo Bruno41 Bernardo Monechi3 and Vittorio Loreto3412

2025-05-06 7 0 5.93MB 25 页 10玖币

侵权投诉

Urban Economic Fitness and Complexity from

Patent Data

Matteo Straccamore1,2,3,*, Matteo Bruno4,1, Bernardo Monechi3, and Vittorio Loreto3,4,1,2

1Centro Ricerche Enrico Fermi (CREF), Rome, Italy

2Sapienza Univ. of Rome, Physics Dept., Rome, Italy

3SONY Computer Science Laboratories, Paris, France

4SONY Computer Science Laboratories, Rome, Italy

*matteo.straccamore@cref.it

ABSTRACT

Over the years, the growing availability of extensive datasets about registered patents allowed researchers to better understand

technological innovation drivers. In this work, we investigate how the technological contents of patents characterise the

development of metropolitan areas and how innovation is related to GDP per capita. Exploiting worldwide data from 1980

to 2014, and through network-based techniques that only use information about patents, we identify coherent distinguished

groups of metropolitan areas, either clustered in the same geographical area or similar from an economic point of view. We

also extend the concept of coherent diversiﬁcation to patent production by showing how it represents a decisive factor in the

economic growth of metropolitan areas. These results conﬁrm a picture in which technological innovation can lead and steer

the economic development of cities, opening, in this way, the possibility of adopting the tools introduced here to investigate the

interplay between urban development and technological innovation.

1 Introduction

Modern cities are at the centre of a passionate debate about their future. With over 55% of the global population now living in

urban areas, cities represent the core of the modern world. They are key for the production and diffusion of innovation

1,2

many different sectors ranging from economy

to science

and culture

. The ongoing pandemic has been imposing the hardest

possible stress test on urban infrastructures and poses a real challenge in rethinking the role of cities, urban planning and policy

decisions. While urbanisation keeps thriving

, the challenge of understanding the development of cities to make them more

sustainable and resilient becomes more and more crucial

7,8

. Therefore, it is of paramount importance to tackle urban areas’

challenges by going beyond pure optimisation schemes and keeping a dynamic perspective. New tools are thus needed to

understand and map the present and forecast how a change in the current conditions will affect and modify future scenarios.

Despite belonging to different geographical areas and socio-economic contexts, cities possess general features for economic

development and urbanisation rates. For example, in

, authors show that many urban socio-economic indicators have a

power-law correlation with the population size. In

, the authors observe how individual cities recapitulate a common pathway

where a transition to innovative economies takes place with a population of around 1.2 million. However, cities are ever-evolving

systems where several changes and different growth paths are possible

. Technological innovation has been highlighted as the

main driver for evolution and change in cities, and it is has been shown that complex economic activity ﬂourish in large urban

areas

. In parallel, many studies recently focused on how innovation proceeds

13–15

. In this paper, we focus on technological

innovation, and we investigate how the technological DNA of cities can affect their development and potential.

The adoption of patent data to monitor technological innovation is well established

16–18

. For the past few decades, patent

data have become a workhorse for the literature on technical change due mainly to the growing availability of data about

patent documents

. This ever-increasing data availability (e.g., PATSTAT, REGPAT and Google Patents

) has facilitated and

prompted researchers worldwide to investigate various questions regarding the patenting activity. For example, the nature of

inventions, their network structure and their role in explaining the technological change19,21,22.

One of the characteristics of patent documents is the presence of codes associated with the claims contained in the patent

applications. These codes mark the boundaries of the commercial exclusion rights demanded by inventors. Claims are classiﬁed

based on the technological areas they impact according to existing classiﬁcations (e.g., the IPC classiﬁcation

) to allow the

evaluation by patent ofﬁces. Mapping claims to classiﬁcation codes allows localising patents and patent applications within the

technology space. Many studies recently relied on network-based techniques to unfold the complex interplay among patents,

technological codes and geographical reference areas. Network science techniques allowed to analyse economic activities of

countries24, regions25–29, cities2,30–32 or ﬁrms33,34.

arXiv:2210.01001v2 [physics.soc-ph] 3 Feb 2023

In the present work, we focus on cities to quantify the complexity of their technologies, correlating it with socio-economic

indicators such as the GDP per capita. More precisely, we summarise our research questions as follows:

Which cities have the most advanced technological production? We use the framework of Fitness and Complexity (FC)

quantify the complexity of metropolitan areas and their technological endowment. Introduced initially and extensively adopted

for countries’ production/exports

35,36

, the approach can easily be extended to any object pair, in this case, urban areas and

technological codes.

Are cities able to diversify their production of patents, or do they tend to specialise in particular sectors? In economics, FC

has also been applied to sub-national scales, such as regions

37,38

and ﬁrms, both at a country

or global

level. The study of

bipartite economic systems at different scales revealed that to apply the FC framework, the economic agents need to have the

capability to diversify to create global competition in the system. Otherwise, they will try to specialise and create a nested

subsystem of entities specialising in the same products. In such a case, the analysis has to be restricted to subsystems for the FC

method to capture the interplay among the economic agents. In this sense, the scale of the system is fundamental and regulates

the interplay between competition and specialisation. We aim to understand whether metropolitan areas can compete globally

or if they tend to specialise.

Are there clusters of cities with similar technological baskets? Starting from a bipartite system of metropolitan areas - technology

codes, we investigate the relations and similarities among metropolitan areas and uncover meaningful patterns in the evolution

of their technological production. In bipartite systems, it is often important to understand the similarities between pairs of nodes

of the same layer, to obtain a validated projection on a single layer

. We adopt this procedure to understand which metropolitan

areas are more similar in the type of patents they produce and which patents are more likely to be produced together.

The paper is organised as follows: in Section 2, we describe the data used in this work and we go through our data cleaning

procedure. In Section 3, we introduce the methodologies used in our work, describing the details of the networks and measures

we employed. In Section 4, we discuss the results showing how the network techniques can highlight non-trivial clusters of

technologies and metropolitan areas, and how both the Fitness and the coherent diversiﬁcation can drive a higher increase in the

GDPpc of metropolitan areas. Finally, Section 5sums up our contributions and hints at future work needed to address questions

arising from this study.

2 Data

Technology Codes

Here, we shall adopt the PATSTAT database (www.epo.org/searching-for-patents/business/patstat) that provides information

about patents and technology codes. The database contains approximately 100 million patents registered in about 100 Patent

Ofﬁces. Each patent is associated with a code that uniquely identiﬁes the patent and a certain number of associated technology

codes. The WIPO (World International Patent Ofﬁce) uses the IPC (International Patent Classiﬁcation) standard

to assign

technology codes to each patent. IPC codes make a hierarchical classiﬁcation based on six levels called digits, used to go into

more and more detail about the technology used. The ﬁrst digit represents the macro category: for example, the code Cxxxxx

corresponds to the macro category "Chemistry; Metallurgy" and Hxxxxx to the macro category "Electricity"; considering the

subsequent digits, we have, for instance, with C01xxx, the class "Inorganic Chemistry" and with C07xxx the class "Organic

Chemistry".

After assigning a technology code to each patent, we use a database about cities (see next section) to match the unique patent

identiﬁer and its technology code to the corresponding city. To geolocalise the patents, we adopt the De Rassenfosse et al.

database

that contains entries on 18 million patents from 1980 to 2014. Conveniently, in this database, the geographical

information of patents is assigned to precise geographical coordinates. Thus, each patent has a unique identiﬁer, a series of

technology codes, and geographical coordinates identifying the corresponding city.

GDP of cities

To obtain information on the GDP of cities and their evolution, we used the work of Kummu et al.

. The authors constructed a

worldwide GDP grid with a resolution of about ﬁve arc minutes for the 25 years 1990-2015. To compute the GDP per capita

of each city or metropolitan area (MA) for each year in the data, we ﬁrst download the boundaries from the Global Human

Settlement Layer

. Considering the GDP grid in one year, we compute the GPD per capita of a MA as the average of all the

grid points within its boundaries. In Fig.3 in the Supplementary Information, we show the example of the grid of the Rome

metropolitan area.

Data Cleaning Procedure

To clean the data, the ﬁrst step is to associate the technology codes of a patent with a speciﬁc city. Once this preliminary

operation is completed, it is possible to build the bipartite networks that will link cities to technology codes. We represent the

2/15

t1 t2 t3 t4 t5

(a) (b)

Figure 1. Bipartite metropolitan areas - technology codes network. (a): Pictorial representation of the bipartite

metropolitan areas-technology codes network. Each MA is connected to one or more technology sectors.

(b)

: Bipartite network

adjacency matrix for the year 2000. A dark dot means that a given technology code is present in a patent made by a given MA.

bipartite networks through bi-adjacency rectangular matrices

whose elements

c,t

are integers indicating how many times a

technology code

appeared in different patents in a given city

in year

. In total, our network features

42912

cities connected

650

technology codes (4-digit). To reduce the difference between the two layers of the networks and reduce the noise in

the system which is often due to the presence of very small cities, we aggregate the cities in the respective metropolitan areas

(MAs). We select all cities within a metropolitan area (MA), and the technology codes associated with the metropolitan area

will be the union of all the technology codes of the cities within it. The MAs present in the Global Human Settlement Layer

are

8641

and cover the entire world. However, most of these do not contain cities that have patents. The metropolitan areas

producing patents are 2169 and are distributed as shown in Figures 1 and 2 in the Supplementary Information.

We obtain a matrix

for each year

from

1980

2014

, connecting

2169

metropolitan areas

and

650

technology codes

. To avoid the ﬂuctuations due to using only one year at a time as an interval, we decided to consider a window of

years each

time, summing the matrices in one window. In this paper, therefore, the matrix

will refer to the time window from

y+5

The ﬁnal database consists of

5-year window matrices

ranging from window

1980 −1984

2010 −2014

. Finally, we

binarise the matrices

applying a standard procedure in economic complexity to determine relevant producers/exporters of

products (see Section 3).

3 Methods

Revealed Comparative Advantage

To understand which metropolitan areas are relevant innovators of a speciﬁc technological sector, we apply the revealed com-

parative advantage (RCA)

binarisation strategy. RCA is a frequently used tool in the economic complexity literature

24,36,46

Considering a bipartite network of countries and products, RCA allows us to determine how competitive a country is in

exporting a given product while also considering how many countries export that product. In our case, RCA reveals when the

share of patents of some technology,

, introduced by a certain MA,

, is higher than the average share of the rest of the market,

meaning that the metropolitan area focuses on the technology

more than the number of technologies produced would suggest.

Considering the matrix Vyfor the year y, we deﬁne the RCA for the MA aand the technology tas:

RCAy

a,t=Vy

a,t/∑t0Vy

a,t0

∑a0Vy

a0,t/∑a0,t0Vy

a0,t0

where the sums in the lhs run over all the technologies

and all the MAs

. A value

RCAa,t≥1

means that MA

is signiﬁcantly

competitive in the technology ﬁeld

. We use this threshold on the RCA values to obtain

30 My

matrices, one for each

-year

3/15

window:

a,t=(1 if RCAy

a,t≥1

0 if RCAy

a,t<1.

Notice that, in the following, we consider only having an average of at least one RCA

per year, reducing their number to

1211. These Mymatrices represent our ﬁnal temporal bipartite network that links 1211 MAs to 650 technology codes.

Bipartite Networks

A bipartite network is a network whose nodes represent two different kinds of entities, and only connections between nodes

from different entities are allowed. Many systems in ecological and socio-economical environments, such as those studied in the

present work, are easily described as bipartite since they involve interactions between two kinds of entities

39,47

. For instance,

the Internet can be modeled as a users-websites bipartite network, whose analysis can reveal sets and ranks of pages which will

be more likely to be of interest for the user

. We can use the

matrices as biadjacency matrices of MA - technology bipartite

networks, connecting each MA with the technologies in which it is competitive. In ﬁgure 1we show a pictorial representation

of this bipartite network and its biadjacency matrix Myfor the year y=2000.

Projecting the bipartite network on one of its layers, we can ﬁnd non-trivial similarity patterns between MAs or technologies.

However, the problem of ﬁnding the proper projection of a bipartite network into a monopartite one representing the similarities

of nodes on one of its layers is well-known in the literature

41,48–50

. In general, the goal is to ﬁnd the representation of a

monopartite network that best represents the bipartite one without taking too much information away from the latter. We

decided to use the Bipartite Conﬁguration Model (BiCM)51,52 to select the most signiﬁcant nodes and links.

Bipartite Conﬁguration Model (BiCM)

One of the simplest ways to obtain a one-party projection from bipartite data is to count the number of links in common

between two different entities belonging to the same layer. For example, using

as the biadjacency matrix of a bipartite

network between metropolitan areas

and technologies

, counting the number of links in common between two different

entities belonging to the same layer means computing:

Aaiaj=∑

MaitMajt,

where

Aaiaj

is the adjacency monopartite projection matrix element of

between elements

and

. However, we note that a

projection made in this way leads to a densely connected structure with a trivial topology.

To select the relevant nodes and links in our projected networks to avoid obtaining a too dense projection,we use as a null model

the Bipartite Conﬁguration Model (BiCM)

49,51,52

which we compute by using the NEMtropy Python package

. The BiCM

belongs to the family of the Exponential Random Graphs, adapted to the case of bipartite networks. These models arise from

the maximisation of the Shannon entropy of an ensemble of networks, in our case undirected binary bipartite networks M:

S=−∑

M∈Ω

P(M)lnP(M),

considering a set of constraints C(M).P(M)is the probability of a speciﬁc bipartite network M.

The probability distribution maximising the entropy is the exponential distribution:

P(M|

λ) = e−H(M,

λ)

λ),(1)

where H(M,~

λ) =~

λ·C(M)is the Hamiltonian imposing the Lagrangian multipliers.

Two sets of constraints are imposed in the BiCM, one for each layer. Speciﬁcally, the node degrees are ﬁxed, namely ubiquity

~u(M)

for each technology code and diversiﬁcation

d(M)

for MAs, in our case. The mean values of the node degrees must be

tuned to match these quantities. Then we obtain the Hamiltonian H:

H(M,~

λ) = ~

α·~

d(M) +~

β·~u(M).

Imposing the previous constraints together with the normalisation condition ∑M∈ΩP(M) = 1, we can write Eq. 1as:

P(M|

λ) = e−~

α·~

d(M)−~

β·~u(M)

∑Me−~

α·~

d(M)−~

β·~u(M).

1github.com/nicoloval/NEMtropy

4/15

Since constraints have been imposed on the mean values of the node degrees, the previous equation can be decomposed into the

product of the probability distributions of a single link:

P(M|

λ) = ∏

∏

pMat

at (1−pat )1−Mat

where

pat =xayt

1+xayt

is the probability of the link between the MA

and the technological code

xa=e−αa

and

yt=e−βt

. To

estimate the unknown parameters we have to maximise the log-likelihood L(~x,~y) = ln P(M|~x,~y), i.e. solving the system:

∆L(~x,~y) = 0−→ (da(M) = ∑t

xayt

1+xayt∀a

ut(M) = ∑a

xayt

1+xayt∀t

with da(M) = d∗

aand ut(M) = u∗

trepresenting the observed quantities.

After we obtain the link probabilities of the model, we use them to compute how unexpected is the number of common

neighbours of two nodes of the same layer. Given that, by construction, the links of the model are independent random variables,

the probability of sharing a technology for two MAs is

P(Vt

aa0=1) = pat pa0t

, and the total number of technologies they share

will be

Vaa0=∑tmat ma0t

. Thus, we can compute a p-value for the number of common neighbours observed for two nodes of

the same layer, which reads:

p-valueaa0=P(Vaa0>V∗

aa0)(2)

where

V∗

aa0

is the number of common neighbours between nodes

and

in the observed network. Note that the random variable

Vaa0

is a Poisson-Binomial, i.e. a sum of independent Bernoulli random variables of different parameters, which is hard to

evaluate when the number of different Bernoulli is large, we actually approximate this by substituting a Poisson variable with

the same mean, as it has been done in previous works.

After applying this procedure to each pair of nodes, we obtain as output a p-value matrix of the same size as the adjacency

matrix

of the starting bipartite network. As a ﬁnal step, we have to decide which of these

-values are signiﬁcant and which

are not. To assess the link signiﬁcance, we use the False Discovery Rate test

: let us assume that we have

hypotheses, each

characterised by its p-value. The FDR ﬁrst sorts these

p-values as

p-value1

,...,

p-valueN

, and then identiﬁes the integer

such

that:

p-valueI≤Iα

N(3)

where

is the arbitrarily deﬁned single-test signiﬁcance level. We use

α=0.01

for the projection onto the technology layer,

and

α=0.1

for the MA one. Note that in this case,

will be the statistical signiﬁcance of the whole validated network, while

for the single links their signiﬁcance will be much lower. Finally, all hypotheses with p-value lower or equal than

p-valueI

will

be rejected, i.e. the link will be validated in the projected network. In our case, for instance in the case of the projection on the

technologies’ layer, the number of hypotheses is the number of possible links in the projection Nt

2and Eq. 3becomes:

p-valueI≤Iα

Nt

2.

Ordering the coefﬁcients

Nt

2p-value(Vtt0)

and retaining only the links between pairs of nodes

t,t0

such that

p-value(V∗

tt0)≤

p-valueIyields our projection.

Let us remark that the projection obtained via the procedure just described only keeps links that are highly signiﬁcant with

respect to the degree of the nodes, unveiling hidden strong similarities.

Modularity and Community detection

We are interested in ﬁnding relevant communities of MAs or technologies to visualise better which nodes in the two layers are

highly interconnected. To this end, we adopted the Louvain method introduced by Blondel et al.

, which relies on ﬁnding a

partition that maximises the modularity.We also vary the Resolution55 to ﬁnd communities at different scales.

Fitness and Complexity algorithms

The Fitness and Complexity (FC) framework

, introduced in

2012

, provides a way to quantify the competitiveness (Fitness) of

the economy of a country. Here, we adopt it to quantify the Fitness of metropolitan areas considering only patent data. The idea

5/15

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UrbanEconomicFitnessandComplexityfromPatentDataMatteoStraccamore1,2,3,*,MatteoBruno4,1,BernardoMonechi3,andVittorioLoreto3,4,1,21CentroRicercheEnricoFermi(CREF),Rome,Italy2SapienzaUniv.ofRome,PhysicsDept.,Rome,Italy3SONYComputerScienceLaboratories,Paris,France4SONYComputerScienceLaboratories,Rome,It...

展开>> 收起<<

Urban Economic Fitness and Complexity from Patent Data Matteo Straccamore123 Matteo Bruno41 Bernardo Monechi3 and Vittorio Loreto3412.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Urban Economic Fitness and Complexity from Patent Data Matteo Straccamore123 Matteo Bruno41 Bernardo Monechi3 and Vittorio Loreto3412

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: