
Hanjo Odendaal
PhD Candidate
Busy gauging the feasibility of constructing online sentiment indexes using large amounts of text data and sentiment analysis
Advantages:
Although the mechanism through which consumer sentiment affects the general economy is still a continuing debate, two primary mechanism have been nominated. The first and foremost being an innate inability to capture reactivity of economic agents in times of uncertainty:
“Most, probably, of our decisions to do something positive, the full consequences of which will be drawn out over many days to come, can only be taken as the result of animal spirits – a spontaneous urge to action rather than inaction, and not as the outcome of a weighted average of quantitative benefits multiplied by quantitative probabilities.” - Keynes (1937).
Thought experiment where an unexpected change in the business cycle could occur purely due to the 'gut', or sentiment outlook, of economic agents reacting out of subjective foresight (heard of irrational exuberance?)
This states that the contagion effect of informational news about the future state of the economy can already be internalised by economic agents, while not yet being captured in hard statistics.
The concept of consumer confidence originated in the mid 1940s with George Katona at the University of Michigan
In South Africa, a consumer confidence survey is conducted on a quarterly basis by the Bureau of Economic Research of South Africa (BER)
The consumer confidence questions consists out of the following:
Index constructed as a normalised sum of relative scores. Percentage of respondents expecting an improvement / good time less the percentage expecting a deterioration / bad time
Sentiment analysis forms part of a larger field called computational linguistics. A body of text can be typically be characterised by examining two facets within the text:
Dimensions is known as the valence and arousal of a body of text. To quantify the sentiment of a body of text, generally one of two (or a combination of) approaches are followed:
A dictionary "bag-of-words" approach to sentiment mining is widely used. Our bag-of-words framework was adjusted from tidytext::get_sentiment()
as we felt it restrictive. We followed a more functional approach
function(lexicon = c("afinn", "bing",
"nrc", "loughran",
"loughran_mcdonald", "henry",
"harvard_four", "qdap_pol")) {
data(list = "news_dict",
package = "NewsR",
envir = environment())
news_dict[[lexicon]]
}
Output from NewsR::get_sentiments
To conduct all the cleaning steps, we created two custom functions:
x <- article_to_text(pdf_file) %>%
analyse_article(., lexicon = "all", plot = T)
Output from NewsR::analyse_article
We use all of these techniques, along with Time Series Cluster methods to construct sentiment indexes
Introduced a framework that future research can build on. Motivates for the use of large text data as an alternative source to construct economic indicators. To keep an eye on these developments and more
NewsR
- which is an internal package developed internally at the BER to handle a lot of the heavy lifting.
article_to_text()
analyse_article()
kalman_smooth()
par_unnest()
tidy_prepDocuments()
ocr_date()
Questions?