PCI-NKO

The PCI-NKO series is a predictor of policy changes in North Korea (NKO). A spike in the indicator signals a policy change, while a vertical bar marks the occurrence of a policy change labeled by the event.

Figure: PCI-NKO and Major Policy Events in North Korea, Apr 2022 to Feb 2024

For technical details about this algorithm, check out the source code and stay tune for our upcoming research paper on the subject.

Introduction

The Democratic People’s Republic of Korea (or North Korea), under the Kim family’s rule for three generations spanning 76 years, remains one of the most repressive and secretive autocracies on earth. However, as North Korea’s nuclear and missile programs continue to advance and threaten other countries, the stake could not be higher for policymakers in the United States and its allies to better understand, or even anticipate, the Hermit Kingdom’s moves.

In this project, we introduce a new, open-source method of glimpsing into the North Korean regime—by reading between the lines in its state propaganda. We call it the Policy Change Index project for North Korea (PCI-NKO), which predicts Pyongyang’s moves by analyzing the text of Rodong Sinmun, the Workers’ Party of Korea (WPK)’s most prominent official newspaper covering major issues from domestic economic policy to foreign and defense policy.

The design of the PCI-NKO has two building blocks: (1) it takes as input data the full text of the Rodong Sinmun; (2) it employs machine learning and large language models (LLMs) to “read” the articles and detect changes in the way the newspaper prioritizes policy issues.

The source of the PCI-NKO’s predictive power can be traced back to Soviet Russia and Communist China, after which Kim Il Sung, Kim Jong Un’s grandfather, modeled his WPK. Among other things, the WPK assigns media a foundational role of ideological “education,” runs Rodong Sinmun out of its central committee, and appoints a propaganda tsar to fully control the government’s messaging at home and abroad. According to this ideology, the masses are backward, so the Communists need to convince the people that the government’s policies are sound and that they should follow the party’s lead. Therefore, from an observer’s perspective, when the indicator detects changes in propaganda, it effectively predicts changes in policy.

Methodology

We adopt the predictive framework of the PCI-China for the PCI-NKO, and it consists of the following steps:

Collect the full text of Rodong Sinmun from January 2018 to February 2024 and label a set of essential metadata for each article, such as publication date, title, content and page number. In particular, we focus on whether an article was published on the front page—a simple but effective proxy for the importance of the content.
For every four years of data (such as from January 2018 to December 2021), train a deep learning model tailored to the Korean language to predict whether an article was published on the front page. This step is akin to how a propagandist operates: prioritizing different content based on the party’s policy direction.
Deploy the model to the three months following the four-year window (such as January to March 2022) and assess whether the algorithm’s performance (in telling front-page articles apart) is significantly different from that in training. This step is akin to an avid Rodong Sinmun reader watching out for anomalies relative to what they consider the newspaper’s baseline.
Define the difference in the algorithm’s performance between the training period and the deployment period as the value of the PCI-NKO at the point of analysis. When the index is high, it suggests that the editorial priorities have shifted from the preceding four years to the subsequent three months.
Use LLMs to interpret the anomalous articles detected in the three-month window. False positives—articles predicted to be on but actually off the front page—represent policies with declining priorities. Similarly, false negatives—articles predicted to be off but actually on the front page—signal policies that are becoming more prominent.
Repeat the analysis every month, resulting in a monthly PCI-NKO from April 2022, the earliest data point given the scope of the raw data, to February 2024.

We have released the source code of the project on GitHub, which can be found here.

Main Results

The above figure plots the monthly PCI-NKO from April 2022 to February 2024. When the index hovers near zero, the (new) articles in the current quarter are largely confirming the “paradigm” the algorithm has acquired from the previous five years, suggesting policy stability. But if the index increases drastically, it would mean a big “surprise” to the algorithm’s existing understanding, which, in turn, would indicate a major policy change in the near future.

To test the hypothesis that change in PCI-NKO is predictive of change in policies, we curate a list of major policy events in North Korea during the period of the analysis and examine whether their occurrences are preceded by significant movements in the PCI-NKO. As shown in the figure, the indicator does tend to spike before policy events take place. As outlined in step 5 above, we also leverage LLMs to interpret the leading spikes of the PCI-NKO and verify that the changes in Rodong Sinmun were indeed indicative of the regime’s nuclear expansion.

On April 1, 2023, for example, the index recorded a value of 0.1, one of the highest in the period of our analysis. The spike indicates a major editorial change in Rodong Sinmun between the four years from January 2019 to December 2022 and the three months from January to March 2023. Using LLMs, we categorize sampled articles from the training and deployment windows into 10 policy areas, and we examine the areas where the algorithm makes more false negative mistakes in deployment than in training, indicating emerging priorities compared to the baseline.

The content analysis suggests that defense and national security has the biggest training-to-deployment contrast in false negatives. The misclassified articles often highlight advancements in intercontinental ballistic missile (ICBM) technology and military readiness, Kim Jong Un’s focus on military exercises and counterattack training, suggesting an anticipation of military aggressions.

These anomalies detected by the algorithm are consistent with North Korea’s nuclear arsenal expansion in early 2023. On April 13, North Korea conducted the inaugural launch of the Hwasong-18, its first long-range, solid-fueled ICBM, potentially putting the entire continental United States within Pyongyang’s reach. That marks yet another progress on its weapons program following the successful November 2022 launch of the liquid-fueled ICBM Hwasong-17, a February 2023 military parade that showcased more North Korean nuclear weapons than ever before, and other military events.

Limitations and Future Developments

One important limitation of the PCI approach is that, while LLMs can assist with interpreting anomalies in propaganda, the policy implications of those anomalies are not always as straightforward as in the April 1, 2023 episode about defense and national security.

We also acknowledge that the algorithm’s training performance still has room for improvement. The deep learning model we used—and pre-trained language models more generally—may not have taken into account the difference between North and South Korean languages, which have diverged significantly after decades of separation. Additionally, the uncertain policy directions of the North Korean government may also have contributed to the algorithm’s performance issues.

Finally, it’s in our roadmap to continue to expand the PCI-NKO time series to cover the time before 2018 and to continue to update as new Rodong Sinmun data become available.