China’s industrialization process has long been a product of government direction, be it coercive central planning or ambitious industrial policy. For the first time in the literature, we develop a quantitative indicator of China’s policy priorities over a long period of time, which we call the Policy Change Index (PCI) for China. The PCI is a leading indicator that covers the period from 1951 to the most recent quarter — the fourth quarter of 2018 — and can be updated in the future. In other words, the PCI not only helps us understand the past of China’s industrialization but also allows us to make short-term predictions about its future directions.

The design of the PCI has two building blocks: (1) it takes as input data the full text of the People’s Daily — the official newspaper of the Communist Party of China — since it was founded in 1946; (2) it employs a set of machine learning techniques to “read” the articles and detect changes in the way the newspaper prioritizes policy issues.

The source of the PCI’s predictive power rests on the fact that the People’s Daily is at the nerve center of China’s propaganda system and that propaganda changes often precede policy changes. Before the great transformation from the central planning under Mao to the economic reform program after Mao, for example, considerable efforts were made by the Chinese government to promote the idea of reform, move public opinion, and mobilize resources toward the new agenda. Therefore, by detecting (real-time) changes in propaganda, the PCI is, effectively, predicting (future) changes in policy.

 

Methodology

To detect changes in how the People’s Daily prioritizes policy issues, the machine learning algorithm we design is geared toward carrying out a straightforward task: classifying whether a People’s Daily article is published on the front page — a proxy for high-priority content.

The intuition of the design can be explained with the following example. Imagine an avid reader of the People’s Daily who had read, remembered, and thought through all the articles published in recent times (say, five years). They would, consequently, have acquired a fairly good sense of what kind of articles “should” or “should not” appear on the front page. But if the reader later picked up surprising papers, say, in the next quarter — that is, their educated guess about the new paper turned out to work either particularly well or exceptionally poorly — it might constitute a signal of change from the reader’s perspective. While a small surprise may well be taken as noise, a strong signal would convince the reader that their existing understanding of the page arrangement is no longer valid and that the priorities of the People’s Daily have fundamentally changed.

The design of the machine learning algorithm mimics the reasoning in the avid reader’s mind and builds the PCI on the “surprises” to the algorithm’s existing understanding of the newspaper. To train the algorithm on the text data and a variety of metadata variables associated with them, we apply a set of machine learning techniques, such as word embedding, multilayer perceptrons, and recurrent neural networks. Technical details can be found in Section 4 of our research paper, “Reading China: Predicting Policy Change with Machine Learning.”

We have also released the source code of the project on GitHub, which can be found here.

 

Main Results

The nearby figure plots the quarterly PCI for China from 1951 to the most recent quarter. When the index hovers near zero, the (new) articles in the current quarter are largely confirming the “paradigm” the algorithm has acquired from the previous five years, suggesting policy stability. But if the index increases drastically, it would mean a big “surprise” to the algorithm’s existing understanding, which, in turn, would indicate a major policy change in the near future.

 

Figure: Policy Change Index and major events in China, 1951 Q1 to 2018 Q4

Note: The PCI series is a predictor of policy changes. A spike in the PCI signals a policy change, while a vertical bar marks the occurrence of a policy change labeled by the event.

 

The validity of the PCI, which we address in Section 5 of our research paper, rests on whether it could have predicted important policy changes in China that have been identified in the literature — or, the ground truth. These historical events are plotted in the same figure against the PCI time series for comparison. As shown in the figure, the PCI picks up the beginning of the Great Leap Forward in 1958, that of the Cultural Revolution in 1966, that of the economic reform program in 1978, and, more recently, a reform speed-up in 1993 and a reform slow-down in 2005, among others. The PCI often leads these events by months, providing short-term predictions of the latter’s occurrences.

Furthermore, we show that, by reading articles mis-classified by the algorithm, one can infer from them the substance of policy changes that the PCI predicts would take place. In other words, the PCI allows us to not only predict the timing of policy changes but also understand the substance of these changes before they are realized.

 

Potential Applications

The design of the PCI has an added feature: It is “language-free.” That is, predicting the occurrences of policy change, in itself, does not require the researcher to read and process the Chinese text. The key ingredient of the algorithm is merely the page on which each article appears. This feature, therefore, suggests that the method could be applicable to a wide range of other settings, such as the possibility of predicting policy change in other (ex-)Communist regimes (e.g., Cuba and North Korea), measuring decentralization in central-local government relations (such as in China), measuring media bias in democracies (such as the US), and predicting changes in lawmakers’ voting behavior and in judges’ judicial or ideological leaning. Section 6 of our research paper discusses these potential applications in more detail.