Listening closely to the news

Hanjo Odendaal
PhD Candidate

Research question

Busy gauging the feasibility of constructing online sentiment indexes using large amounts of text data and sentiment analysis

Advantages:

Surveys can be expensive to conduct
Large coverage of topics and population
High frequency

Animal Spirits

Although the mechanism through which consumer sentiment affects the general economy is still a continuing debate, two primary mechanism have been nominated. The first and foremost being an innate inability to capture reactivity of economic agents in times of uncertainty:

“Most, probably, of our decisions to do something positive, the full consequences of which will be drawn out over many days to come, can only be taken as the result of animal spirits – a spontaneous urge to action rather than inaction, and not as the outcome of a weighted average of quantitative benefits multiplied by quantitative probabilities.” - Keynes (1937).

Thought experiment where an unexpected change in the business cycle could occur purely due to the 'gut', or sentiment outlook, of economic agents reacting out of subjective foresight (heard of irrational exuberance?)

Informational contagion

This states that the contagion effect of informational news about the future state of the economy can already be internalised by economic agents, while not yet being captured in hard statistics.

Beaudry and Portier (2014) and Barsky and Sims (2012) argue that only a limited amount of unexpected business cycle fluctuations can be attributed towards ‘animal spirits’; stating that uncaptured fundamental news is the primary channel by which the relationship of sentiment and subsequent economic activity exists

BER Confidence indexes

The concept of consumer confidence originated in the mid 1940s with George Katona at the University of Michigan

Gain insight into the prevailing economic climate
Quantitative way of incorporating consumer expectations into spending and savings models

In South Africa, a consumer confidence survey is conducted on a quarterly basis by the Bureau of Economic Research of South Africa (BER)

The history of the index dates back to 1975 when the index solely consisted out of the white population group, with black and other racial groups being included in the survey in 1982 and 1994 respectively (Kershoff, 2000).
Survey result is the outcome of an area-stratified probability sample of 2500 households across South Africa

Consumer Confidence index construction

The consumer confidence questions consists out of the following:

How do you expect the general economic position in South Africa to develop during the next 12 months? Will it improve considerably, improve slightly, deteriorate slightly, deteriorate considerably or don’t know?
How do you expect the financial position in your household to develop in the next 12 months? Will it improve considerably, improve slightly, deteriorate slightly, deteriorate considerably or don’t know?
What is your opinion of the suitability of the present time for the purchase of domestic appliances such as furniture, washing machines, refrigerators etc. Do you think that for people in general it is the right time, neither a good nor a bad time or the wrong time?

Index constructed as a normalised sum of relative scores. Percentage of respondents expecting an improvement / good time less the percentage expecting a deterioration / bad time

So what does R have to do with all of this?

Sentiment through textual analysis

Sentiment analysis forms part of a larger field called computational linguistics. A body of text can be typically be characterised by examining two facets within the text:

The degree to which the text exhibits emotion compared to a neutral stance
The degree to which a certain emotion is deemed to be dominant in the writing

Dimensions is known as the valence and arousal of a body of text. To quantify the sentiment of a body of text, generally one of two (or a combination of) approaches are followed:

Bag-of-words
NLP

Sentiment through textual analysis (Cont.)

A dictionary "bag-of-words" approach to sentiment mining is widely used. Our bag-of-words framework was adjusted from tidytext::get_sentiment() as we felt it restrictive. We followed a more functional approach

function(lexicon = c("afinn", "bing", 
                     "nrc", "loughran", 
                     "loughran_mcdonald", "henry", 
                     "harvard_four", "qdap_pol")) {

  data(list = "news_dict",
       package = "NewsR",
       envir = environment())


  news_dict[[lexicon]]
}

Some data examples (Cont.)

plot of chunk unnamed-chunk-3

Sentiment through textual analysis (Cont.)

Output from NewsR::get_sentiments

plot of chunk unnamed-chunk-4

Sentiment through textual analysis (Cont.)

To conduct all the cleaning steps, we created two custom functions:

x <- article_to_text(pdf_file) %>% 
  analyse_article(., lexicon = "all", plot = T)

plot of chunk unnamed-chunk-6

Sentiment through textual analysis (Cont.)

Output from NewsR::analyse_article

plot of chunk unnamed-chunk-7

Constructing Online Sentiment Indexes

We use all of these techniques, along with Time Series Cluster methods to construct sentiment indexes

plot of chunk unnamed-chunk-8

Conclude

Introduced a framework that future research can build on. Motivates for the use of large text data as an alternative source to construct economic indicators. To keep an eye on these developments and more

daeconomist.com
NewsR - which is an internal package developed internally at the BER to handle a lot of the heavy lifting.

article_to_text()
analyse_article()
kalman_smooth()

par_unnest()
tidy_prepDocuments()
ocr_date()

Questions?