The focus of my research is applying machine learning and deep neural networks to analyzing texts. My approach combines concern for design-based causal inference, generation of new sources of data, and new techniques for analyzing them to answer important political questions, such as uncovering undisclosed political spending among nonprofits and measuring Parliamentary polarization and ideology.

Peer-Reviewed Articles

Exports or Investment?

Exports or Investment: How Public Discourse Shapes Support for External Imbalances

(with Federico Ferrarra, Joerg Haas, and Thomas Sattler). 2021. Socio-Economic Review.

The economic imbalances that characterize the world economy have unequally distributed costs and benefits. That raises the question of how countries could run long-term external surpluses and deficits without significant opposition against the policies that generate them. We show that political discourse helps to secure public support for these policies and the resulting economic outcomes. First, a content analysis of 32,000 newspaper articles finds that the dominant interpretations of current account balances in Australia and Germany concur with very distinct perspectives: external surpluses are seen as evidence of competitiveness in Germany, while external deficits are interpreted as evidence of attractiveness for investments in Australia. Second, survey experiments in both countries suggest that exposure to these diverging interpretations has a causal effect on citizens’ support for their country’s economic strategy. Political discourse, thus, is crucial to provide the societal foundation of national growth strategies.

Measuring Polarization from Speeches

Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminister Systems

(with Arthur Spirling). January 2018. Political Analysis.

Measuring the polarization of legislators and parties is a key step in understanding how politics develops over time. But in parliamentary systems—where ideological positions estimated from roll calls may not be informative—producing valid estimates is extremely challenging. We suggest a new measurement strategy that makes innovative use of the “accuracy” of machine classifiers, i.e., the number of correct predictions made as a proportion of all predictions. In our case, the “labels” are the party identifications of the members of parliament, predicted from their speeches along with some information on debate subjects. Intuitively, when the learner is able to discriminate members in the two main Westminster parties well, we claim we are in a period of “high” polarization. By contrast, when the classifier has low accuracy—and makes a relatively large number of mistakes in terms of allocating members to parties based on the data—we argue parliament is in an era of “low” polarization. This approach is fast and substantively valid, and we demonstrate its merits with simulations, and by comparing the estimates from 78 years of House of Commons speeches with qualitative and quantitative historical accounts of the same. As a headline finding, we note that contemporary British politics is approximately as polarized as it was in the mid-1960s—that is, in the middle of the “postwar consensus”. More broadly, we show that the technical performance of supervised learning algorithms can be directly informative about substantive matters in social science.

Political Spending by Nonprofits

Shining the Light on Dark Money: Political Spending by Nonprofits

(with Drew Dimmery). 2016. Russell Sage Foundation—Issue on Big Data in Political Economy.

The past decade has seen an increase in public attention on the role of campaign donations and outside spending. This has led some donors to seek ways of skirting disclosure requirements, such as by contributing through nonprofits that allow for greater privacy. These nonprofits nonetheless clearly aim to influence policy discussions and have a direct impact, in some cases, on electoral outcomes. We develop a technique for identifying nonprofits engaged in political activity that relies not on their formal disclosure, which is often understated or omitted, but on text analysis of their websites. We generate political activity scores for 339,818 organizations and validate our measure through crowdsourcing. Using our measure, we characterize the number and distribution of political nonprofits and estimate how much these groups spend for political purposes.

Working Papers

AI and the Problem of Knowledge Collapse

While artificial intelligence has the potential to process vast amounts of data, generate new insights, and unlock greater productivity, its widespread adoption may entail unforeseen consequences. We identify conditions under which AI, by reducing the cost of access to certain modes of knowledge, can paradoxically harm public understanding. While large language models are trained on vast amounts of diverse data, they naturally generate output towards the ‘center’ of the distribution. This is generally useful, but widespread reliance on recursive AI systems could lead to a process we define as “knowledge collapse”, and argue this could harm innovation and the richness of human understanding and culture. However, unlike AI models that cannot choose what data they are trained on, humans may strategically seek out diverse forms of knowledge if they perceive them to be worthwhile. To investigate this, we provide a simple model in which a community of learners or innovators choose to use traditional methods or to rely on a discounted AI-assisted process and identify conditions under which knowledge collapse occurs. In our default model, a 20% discount on AI-generated content generates public beliefs 2.3 times further from the truth than when there is no discount. An empirical approach to measuring the distribution of LLM outputs is provided in theoretical terms and illustrated through a specific example comparing the diversity of outputs across different models and prompting styles. Finally, we consider further research directions to counteract harmful outcomes. (Code here)

Deep Learning for Political Texts

The Power and Limits of Deep Learning of Political Texts (2018). Provides an introduction to the fundamentals underlying deep learning models, and shows that even for the relatively small datasets (say 100,000 observations) many social scientists have to work with, these models can have advantages over existing models, such as by identifying negation in sentences. (Code here)

Measuring Contractionary Announcements in News Articles

Political Communication and Macroeconomic Stabilization in the Eurozone

(with Thomas Sattler, Contingent Signals Project)

Harsh contractionary economic measures often fail to restore investor confidence during economic crises. We show that this failure occurs because the political credibility of these policies is low in countries with high political polarization. Polarization means that the main political parties propose highly distinct economic policies, which increases the risk of policy reversals in the future. Our empirical analysis tests this through a new collection of all policy announcements by finance ministers and heads of government of Ireland, Spain, Italy, Portugal and Greece from Reuters Newsstream between 2000 and 2016. We find that contractionary policy announcement increased confidence in Ireland, the country with the lowest polarization, but decreased confidence in Greece, the country with the highest polarization. This suggests that less rather than more austerity improves investor confidence in politically adverse circumstances.

Debating Austerity

Debating Austerity in Europe: Fairness, Efficiency, and Appeals to Constituency (2019)

The adoption of austerity as a response to the financial crisis in the Eurozone has been highly contentious. With significant consequences for the economy on the line, the stakes are high for politicians who support or oppose policies meant to address the fiscal position of the country by cutting spending and increasing taxes. Austerity policies, commonly conceived of as bitter, but necessary medicine, are a prime test of democracy, in which the government seeks to defend potentially distasteful actions whose popularity depends critically on controversial economic theories. On the basis of a new database of parliamentary speeches for five European countries, we provide a systematic overview of the types of arguments employed by politicians to defend or oppose austerity measures, and trace their evolution over time. We find support for a change in the role of government discourse in response to austerity measures especially in Ireland, while the evidence for the response of opposition speakers is more mixed.

Machine Learning and Political Texts

Understanding Politics Through the Machine Learning of Texts: New Approaches to Congressional Legislation and Special Interest Groups (dissertation, 2016)

I analyze the Congressional modification of legislation and the political activity of nonprofits by making use of new machine-learning based approaches to analyzing text. First, I develop a novel method for document summary on the tensor product of vector word embeddings that captures information about local co-occurrence, presented in chapter 1. This provides the basis for a new measure of the legislative activity of Congressional actors that is employed in chapters 2 and 3 to investigate agenda setting and principal agent dynamics in legislative drafting. This analysis makes use of a new dataset of multiple versions each bill as it is modified by Congress for 1993-2014. In chapter 4, I identify the substantive dimensions that underly ideological differences in policy proposals over time by projecting legislative text onto concepts of interest such as Federal- versus state-control. In addition to their direct substantive implications, these studies demonstrate new approaches to analyzing text that can be applied broadly by political scientists and social scientists.

Mass Killing as a Means of Control

Busy Hands and the Devil’s Playground: Mass Mass Killing as a Means of Control (2012)

(Game theory) If mass killings target threatening groups, why do regimes that engage in mass killing often chose targets who are quite weak? Furthermore, why do these regimes often end up falling to a different, more threatening group, as when the Nazi regime fell to the Allied powers? Rather than taking regimes’ claims about their enemies at face value, this paper argues that mass killing is a way for leaders to address problems of divided loyalty and conflicts within their selectorate. Mass killing can be useful to a leader engaged in state formation by serving as a costly signal that allows the leader to identify supporters and enemies, thus making the state more ‘legible.’ It is especially effective as a signal because the payoffs are tied to whether there is regime turnover, thus securing citizens’ allegiance to the existing regime. A global games model of revolution is embedded in a simple institutional design problem to generate predictions about the use of mass killings by a leader threatened by revolution or invasion. A case study of the Rwandan genocide is used to explore the plausibility of the theory. The model resolves paradoxes generated by existing theories and suggests novel implications for addressing mass killing, as threats to punish perpetrators may reduce killings or may make them more effective as costly signals.

Election Violence in Burundi

Understanding Election Violence in Burundi (2011)

An analysis of the temporal and geographic distribution of violence linked to the 2010 communal elections in Burundi, including its forms, actors and targets to test alternative explanations for this violence. I also conduct a Benford test on digit frequencies to evaluate claims of fraud.