This is the second of three articles about Diversification and Risk. In this article we examine annual, monthly and daily data to determine whether the 2008-9 equity crisis has displayed unprecedented risk characteristics.

Discussion:

- annual data show that returns have been extreme but are not unprecdented.
- monthly risk measures show that risks have been high, but by no means at extreme levels.
- however, daily risk measures show extremely high levels of risk. While these levels are not quite unprecedented, their duration is not matched by any market experience since the beginning of our daily data in 1950.

In Part 1 of “Diversification and Risk Reduction” we argued that the equity market declines of 2008 do not show that equity diversification has failed. However, there was still an important open question: have the risk characteristics of global equities changed? Have the huge equity declines been larger, and hence realized equity risks more severe, than investors could have reasonably expected?

We will tackle this second issue from two directions. This paper, Part 2, looks directly at measures of risk. By examining the magnitude and distribution of the negative returns relative to history, we may be able to judge whether the recent and continuing market crisis is in fact an unparalleled and unprecedented event. It turns out that from the perspective of annual and monthly data, the realized risks are not outside the bounds of experience or reasonable expectations. However, with daily returns we will see much stronger evidence of unexpected risk levels.

The next paper, Part 3, picks up on an issue related to the concept of diversification, that expected risk reduction is an important function of imperfect correlations between underlying securities. There is a widely held view that correlations between equity markets have been very high (and certainly higher than normal) during 2008 and early 2009. If this is true, it would suggest that overall levels of risk are higher globally than in previous periods of lower correlations.

We will attempt to explore these questions in a straightforward manner, using simple ideas and common sense. This is not a statistical exercise, although a few statistical terms and concepts will be used.

We begin with the Schiller dataset of the U.S. market, monthly data spanning 138 years from January 1870 to February, 2009,(1) and focus first on annual returns history. When we construct annual returns based on monthly data, we must ensure that data are “non-overlapping”. This means that we actually have twelve different samples of either 137 or 138 data points, corresponding to years (twelve-month periods) ending with each month end.(2)

When we look at the last “year” in each of those samples, it is only in the five samples periods ending October 2008 through February 2009 that the last twelve-month period is significantly negative. Here “significantly negative” is arbitrarily defined to mean a return of more than two standard deviations(3) below the mean. In two of the five cases the last return is the worst on record, and in three cases it is the second worst on record. For these five samples, Table 1 shows all returns more than two standard deviations below the mean.

Figure 1 is a frequency plot of the sample with years ending in January, and for comparison purposes we have a similar plot for the December sample in footnote (4). The plot shows three returns to the left (ie. less than) the line marking two standard deviations below the mean, just as presented in Table 1.

If we were to assume that these returns are normally distributed, we would expect about 2.3% of the observations to be less than two standard deviations below the mean. With 138 observations, this implies just over 3 observations --- exactly what we have. The February sample also has three such data points, but October has five and November and December both have four --- slightly more than might be expected. However, our main point is not to “do statistics” here, but rather to point out that these returns for twelve-month periods ending late 2008 and early 2009 are very much in line with the magnitude of other very bad market returns. Table 1 shows that many similar events occurred in the 1930’s, several in 1907, and in one case as recently as 1974.(5)

We can get a second perspective on risk by looking at monthly data. Again we use the Schiller data back to 1870, and calculate rolling standard deviations over 120 months, 60 months, 24 months and 12 months, as shown in Figure 2.

As you might expect, the 120-month measure is relatively stable, apart from the huge peak in the centre of the chart that occurred during the 1930’s. As the time span shortens, the measure becomes more volatile. The most recent 12-month number from February 2009 is quite high, at 22.5% roughly matching peaks in May 2003 and July 1999, but substantially below previous post-1930’s peaks of about 30% in December 1987 and June 1975.

Again the point is that on these measures, while recent monthly trailing risk measures have been high, they are by no means outside of quite recent experience

This is confirmed when we look at the experience of three representative ETF’s in Figure 3: XIU (Canadian SP/TSX 60), XSP (S&P500 with currency hedged into $CN), and XIN (EAFE, with currencies hedged into $CN).(6) The volatility of all three spiked dramatically upward in 2008 and early 2009, but not to levels significantly different from previous peaks in May 2003 (S&P and EAFE) and Feb 2001 (SPTSX60).

When we turn to daily returns we can observe volatility on a very short-term basis. Figure 4 shows 60-day and 20-day rolling standard deviations of daily data for the S&P500 going back to 1950. The 20-day measure peaked at 85.6% on November 5, 2008, slightly less than the peak of 96% on November 12, 1987.

The magnitude of the current volatility is not unprecedented, because there is one other larger spike caused by the October 1987 crash. However, its duration seems to be unprecedented. With the October 1987 volatility spike, removing just four days (the Friday before Black Monday, and the two following days) reduces volatility to a high but normal level, not noticeable on the long-term graphic. But the current high volatility, while not centered around one very large (negative) return like October 19, 1987, was very high for two months. From September 29 to December 1, 2008 the S&P price index experienced 17 daily returns of greater than plus or minus 5%, out of 47 trading days --- a mind-numbing 36%. It is the duration of this high daily volatility that we believe is unprecedented. Since early December the volatility has remained higher than normal, although it has been falling, with just three such events from then until March 13 2009, the time of writing this paper.

Once again we confirm these findings by examining the three ETF’s used in Figure 3, and chart the rolling 20-day standard deviations as a measure of risk or volatility.(7) All three spike to very extreme levels during the crisis, following the pattern of the S&P500 in Figure 4 by peaking at the end of October. Not only are all three indexes experiencing similar volatility, their individual volatility is rising and falling very much in lock step.

To summarize, annual data place the current crisis as an extreme but not unprecedented event. Monthly data place risk levels during the current crisis as higher than normal, but by no means extreme. However, we still know viscerally that this crisis has been very aberrant and severe and it is with daily data that we see significant evidence of this. We do not have access to daily data from 1929 to 1938, and there might well have been times during that period with similarly high and prolonged daily volatility. So we have to be careful about assuming that the duration of this volatility is completely unprecedented. But the other indication of the devastating nature of this volatility is the way that Canada and EAFE as a whole have matched the volatility of the S&P500 step by step.

Go to Part 3: Correlation and Risk

(1) Robert J. Shiller. Irrational Exuberance, 2nd Edition. New York: Doubleday, 2005. Data from his website http://www.econ.yale.edu/~shiller/data.htm has been updated using current data sources. (Return to text)

(2) For example, for the twelve months ending November 2008 the S&P500 had a return of -38.0%, while for the twelve months ending December 2008 it had a return of -36.9%. We can’t count this as two returns of more than -35%, since they share in common eleven of twelve months --- they are not “independent” or “completely different” data points. (Return to text)

(3) We assume that readers will know, or look up, the term “standard deviation”. (Return to text)

(4) This plot shows that there are four returns worse than -2 standard deviations below the mean.

(5) For those who are more interested in statistical matters, only one (February) of the five cases fails the Jarque-Bera normality test at the 1% level, and overall four of the twelve samples fail at 1%. The other 8 cases do not fail at any normal testing level. However, we are not assuming that the sample is normal, and nothing turns on this point. Again, we are just making the empirical observation that the 2008-9 experience is not unprecedented, and is in line with other extremely bad market events.

It is worth noting that when we do statistical work, we often use log returns rather than percentage returns, on the hypothesis that the distribution of log returns is more likely to be normal. While this may be true for monthly returns, it is interesting that when we examine these annual samples in log return space, they are quite clearly __ not__ normal. Eleven of the twelve samples fail the Jarque-Bera normality test at the 1% level, and the twelfth fails at the 10% level. Finally on average across the five samples there are just under six data points smaller than two standard deviations below the mean, and so the data are strongly “fat tailed”. Since most investors do not look at log returns, we pass over these issues in the body of this paper. (Return to text)

(6) We chose the XSP and XIN ETF’s with currencies hedged into $CN as their returns more closely approximate the returns of investments in the local markets in local currencies, and remove the additional effects of currency movements. Please note that while the XIU began trading in January 2000, the XSP and the XIN began trading in December of 2005. Monthly data for the XSP and XIN prior to commencement of trading is derived from their respective benchmark indexes.This is a reasonable approximation because both ETF's have tracked their benchmarks very closely: the XSP has tracked at -.02% per month, while the XIN has tracked at -.01% per month. (Return to text)

(7) The chart covers the very short daily history of actual trading that the three ETF's shared in common, and commences on December 1, 2005. (Return to text)

Structured Capital presents a book by Tim Appelt:

Learning and Playing the Long-term Investment Game

Historical analysis of long-term global equity and bond returns is used to develop an analytical framework for a historical attribution of returns. In turn this attribution approach is used to develop expectations of future returns that acknowledge the past but take into account current market conditions.

Further information: