Using Natural Language Processing for Corporate Sentiment
Earnings announcements, to state the obvious, are important for company and stock performance, and for good reason. Stocks see significant movement around the release of earnings – companies beating earning estimates tend to outperform going forward, while those missing estimates tend to struggle. This post-earnings announcement drift phenomenon has been studied in academia for decades and remains in effect today.
That said, sometimes a company that reports earnings above expectations will subsequently underperform, due to lowered forward guidance, company-specific problems or other reasons. Likewise, companies reporting downbeat earnings can see their share prices pop. In other words, earnings are important, but sometimes stocks move for other reasons.
In this research note, we use a novel dataset that focuses on the sentiment of management to inform the performance of companies during earnings season. This dataset uses advanced machine learning techniques, specifically natural language processing, to generate a measure of sentiment from corporate earnings call transcripts.
At a high level, natural language processing evaluates plain-text documents to extract content and sentiment. We partner with a third-party organization, ProntoNLP, who have deep and varied expertise in natural language processing, particularly as it applies to financial documents.
ProntoNLP has created a series of customized frameworks tailored to the financial sector that can be used to evaluate the degree of exposure companies have to various market and macro drivers, like changes in credit ratings or rising inflation, and the tone with which management mentions these events in regular earnings calls. Within this research note, we evaluate a framework that ProntoNLP has designed to identify likely outperformers within the equities space.
We find that the ProntoNLP-based signal produces excess return at a similar risk level to the overall market. Further, this excess return cannot be explained by traditional risk factors, indicating that the signal is capturing a source of return that is not embedded in traditional factors.
Evaluation of Sentiment-Based Signal
We start by providing general statistics around the structure of the ProntoNLP-based alpha signal. Whenever a transcript is released around a company’s earnings call, the ProntoNLP engine automatically evaluates the overall sentiment of the document, scoring it on a scale from -1 (most negative) to +1 (most positive). By scaling the sentiment score in this way, it allows for easy comparison across companies that may differ widely in size or industry.
We cross-reference these signals against stocks within the S&P 500; Fig. 1 below shows the monthly count of companies for which we have a sentiment score, starting in 2011. From Fig. 1, we see that the count of observations exhibits regular jumps in line with the reporting season. These jumps are expected, as most earnings calls take place at regular (quarterly) intervals.
To derive our signal, we compute an “adjusted” sentiment level which we define as the current sentiment level less the average of the prior 4 quarters’ sentiment level for each stock. This computation scales the sentiment for each company based on its own history and can be interpreted as the level of “unexpected” or surprise sentiment (similar to the standard earnings surprise discussed in the introduction above).
Each month, we compute the “adjusted” sentiment level for any company in the S&P 500 for which an earnings transcript call was released over the prior 3 months. From Fig. 1, we know that updates in the sentiment signal cluster during earnings season. We therefore “carry forward” the adjusted sentiment score for a company over a period of 3 months to “smooth” the signal, until the company releases earnings again in the next quarter.
We then form a long basket of stocks that rank in the highest decile of the distribution of adjusted sentiment values. This basket includes companies that have seen the largest improvement in sentiment. Similarly, we form a short basket consisting of stocks that rank in the lowest decile of adjusted sentiment, these are stocks where management has become much more pessimistic. Fig. 2 shows the size of the long and short baskets over time, for the short basket (denoted by the orange bars with values below zero) the magnitude of the bar indicates the basket size.
To evaluate the historical performance of the ProntoNLP-based sentiment signal, we take the long and short baskets as described above, and compute their historical performances, assuming monthly rebalancing. The cumulative return to both the long and short baskets, as well as the S&P 500, is shown in Fig. 3.
Fig. 3 – Cumulative Return to ProntoNLP Long and Short Baskets
From Fig. 3, we see that the basket of long stocks consistently generates outperformance relative to the market. Additionally, the basket of short stocks, while it performed in line with the market from 2011 through 2017, has generated underperformance relative to the market over the past 5+ years.
To show the consistency in the performances of the long and short baskets, we conduct a slightly modified analysis. We construct the baskets as described above, but in Fig. 4 below, aggregate the monthly relative returns by calendar year. Each data point in Fig. 4 represents an average of 12 monthly observations.
Fig. 4 shows that in 8 of the 12 years shown, the long basket outperforms the S&P 500. Further, the short basket underperforms the market in 8 of the 12 years shown, and the long basket has outperformed the short basket in 10 of the 12 years.
In other words, the ProntoNLP long basket (i.e. the stocks with the largest improvement in sentiment) consistently outperforms the S&P 500, while the ProntoNLP short basket (stocks where sentiment has soured) consistently underperforms. Also, the spread between the long and short baskets has shown positive return even more consistently.
While the long basket does produce consistent outperformance relative to the short basket (and to the S&P 500), we take the analysis a step further, to determine if the outperformance is the result of exposure to standard risk factors (size, value, etc.) or if the long basket is producing excess return above and beyond what would be expected from its factor exposure (i.e. alpha).
We form the long and short baskets (as described above) and compute the difference in their returns on a monthly basis. We then take this series of monthly spread returns and estimate exposures to the 5 Fama-French factors by using a rolling regression framework. Mathematically, for each month in the history, we estimate the coefficients in the below equation using a trailing 36-month window:
rspread is the difference in return between the ProntoNLP-based long and short baskets (i.e. the long-short return spread),
Mkt – Rf, SMB, HML, RMW and CMA are the returns of the 5 Fama-French factors, and
α is the portion of the return not explained by exposure to the 5 Fama-French factors.
The a1 term in the equation above (the coefficient on the market excess return) can be interpreted as the sensitivity of the long-short return spread to changes in the market. Fig. 5 below shows how this sensitivity evolves over time.
From Fig. 5, we see that the sensitivity of the long-short return spread to the market is generally around zero. In other words, the long and short baskets exhibit similar levels of exposure to the overall market.
The α term in the equation above indicates the portion of the long-short return spread that cannot be explained by its exposure to the five Fama-French factors (i.e. its alpha). Fig. 6 below shows the cumulative alpha for the long-short return spread strategy.
Fig. 6 shows that the long-short return spread has generated approximately 60% cumulative alpha over the period shown, which equates to nearly 7% annualized average alpha.
While Fig. 4 above showed that the long basket consistently outperforms both the S&P 500 and the short basket, this additional analysis shows that after adjusting for market risk and other systematic factors, the spread between the long and short baskets still generates significant outperformance. We conclude, therefore, that the NLP-based sentiment signal contains unique information to produce excess stock return, and that this excess return is not captured by traditional risk factors.
In this research note, we introduce a signal based on the application of natural language processing to earnings call transcripts. We partner with ProntoNLP, who specialize in the application of natural language processing to financial text documents, to generate a signal that accounts for the sentiment of company management during earnings calls.
We find that a signal designed to identify likely outperformers among equities produces significant and consistent excess return. This excess return is driven by unique information not captured in standard risk factors.
 See, for example, Bernard, V. L., & Thomas, J. K. (1989). Post-Earnings-Announcement Drift: Delayed Price Response or Risk Premium? Journal of Accounting Research, 27, 1-36.  The ProntoNLP system is highly customizable, and can identify over 80 individual events. The system also allows a user to define custom events. ProntoNLP processes earnings call transcripts for over 4,000 companies across global markets.  At the end of month t, we compute the “adjusted” sentiment for any company that released an earnings call transcript during months t, t-1 or t-2. The long and short baskets consist of stocks that rank in the top and bottom deciles, respectively, of the distribution of adjusted sentiment values.  The Fama-French factors can be downloaded from Ken French’s website at https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/f-f_factors.html