Big Data Analysis to support OBOR initiative

时间:2017年6月  作者:高剑波

Table of Contents

Insights into infrastructure investment
➢ Railway
➢ Communication network
Massive media data analysis
➢ How does media reports reflect economy?
➢ Identification of political hot spots
➢ Quantification of political risks
➢ Modes of international interactions
➢ Indices for the rise of populism

Key question:identify truly promising countries to invest
Thailand :before 93, people flocked to Bangkok to chase job opportunities, driving Bangkok's urbanization; after 93, they became permanent residents, forcing the administrative region to expand; after 2005, even though the total railway mileage increased, passenger traffic continued to decrease — new investment is not critical

Investment in communication infrastructure
Key question:identify truly promising countries to invest
Solution: set China's value to be 1, check which countries have worse infrastructure
➢A ratio < 1 signifies worth of furtherconsideration for investment

Internet connection in 2015

Constructing Information Development Index (IDI)
Coverages of Internet connection, landlines, broadband, and mobile phones are highly correlated, yielding similar maps
Using SVD or PCA, a single index can be constructed, which is a good candidate to measure a country's IDI

Quantitative social science: data sources
 Integrated Crisis Early Warning System (ICEWS)
-- Purpose: to forecast and respond to crises
-- Support: DARPA
➢ Event data collected only if relevant to crises
 A much larger open data source: GDELT
(global database of event, language, and tone)
– Essentially all events reported in media are collected
– Opportunities: do a lot more than ICEWS purports to do
– Challenges: Huge number of events, some erroneous, all mixed

Political event data
The event data set analyzed here is called Global Database of Events, Language, and Tone (GDELT)
GDELT events are drawn from a wide array of news media,both in English and non-English, from across the world, ranging from international to local sources in nearly every country
GDELT includes more than 400 million unique events across all countries, during the period from 1979 to the present
These data were produced by the TABARI automated coding software ( using the CAMEO event and actor coding system
Basic structure of event data:
  Actor 1 interacts with actor 2
  Who and where Actor 1 & 2 are
  Event score (Goldstein scale:  for good & bad;  300 events)

Potential of GDELT:Number of Events VS. GDP

A positive correlation also exists between the Event-num and GDP of almost all the countries in the word.
All above mean that media reports of a country is related economy

Global political “hotness” map for November,2015

Introduction to fractal and multifractal
A part is (exactly or statistically) similar to another part, or the whole | scale-free
Clouds; mountains; trees; etc. (Images: not computer-made, but photos of Jiu Zhai Gou)
Power-law relation | a straight line in a log-log plot (scaling law)
Power-law relation is the origin of self-similarity
Many (or possibly in nitely many) power-law relations | Multifractal

Scope of fractal theory
Fractal geometry | both deterministic and stochastic
Chaotic attractors often have fractal structure --deterministic
Dynamical random fractal theory
1/fα(12-1)  processes
A subclass, 1/f2H+1 process, where H is called Hurst parameter, has long-range correlations or long memory
 Depending on whether 0 < H < 1=2, H = 1=2, or 1=2 < H < 1, the process is said to have anti-persistent, short memory (or memoryless), and persistent long-range correlations
Multiplicative cascade multifractals
Levy processes
Chaos and random fractals have di erent foundations | incorrect to use chaos to \mean" both chaos and fractal!
Applications: cyber security, nancial crises, river
ow dynamics, political instability

Self-similar stochastic processes

Brownian Motion

Relevance of the Hurst parameter to life
What is randomness? Write down a random number, which could be your cell phone number. While initially the number might sound "alien" or random to you, soon you will nd meaning of the numbers to remember it { the number is no longer random
Is Bill Gates and Microsoft lucky? | some have been arguing that other people could have been as successful in that situation
The key is to nd the right "spot" where future evolution from that spot will have a large H > 1/2
In research, choosing the right topics is the most important!
In terms of morality, large H > 1/2 means either a benign behavior that attracts many \unknown forces" to help enforce it or a malicious behavior that becomes more and more repelling { seldom do we have something neutral
Ancient and common wisdom can be analyzed using advanced theory

Modes of international interactions:
US-Russia stressed relation starting from
1) Russia annexed Crimea,
2) US applied economic sanction, Russia's economy tanked
3) Russia joined Syrian air bombing; US role marginalized …

Consequence: terrible lose-lose situation
Question: how to minimize such possibilities?

Indicators for the rise of populism
 General understanding (Inglehart & Norris, 2016)
➢ The rise of populism around the world is due to economic insecurity and cultural backlash
 Challenge: Can robust indices be constructed to forewarn the rise of populism?