Structure of Capital Market Returns

85  Download (0)

Full text


















László Nagy

Structure of Capital Market Returns

PhD Thesis

Supervisor: Mihály Ormos, PhD



, 2020


I would like to thank my supervisor, Mihály Ormos PhD, for all the help, guidance and stimulating conversations.

Thank you my wife, my mother, my father and brothers to still love me.


Table of Contents

Table of Contents ... 2

I. Introduction ... 3

II. Heterogeneity of global equity market ... 8

II.1. Data ... 9

II.2. Risk versus expected return in the CAPM framework ... 10

II.3. The evaluation of CAPM betas ... 12

III. Friendship of stock indices ... 15

III.1. Defining similarity ... 15

III.2. Normalized modularity cut ... 17

III.3 Normalized cut ... 19

III.4. Results ... 21

III.5. Conclusion ... 37

IV. Heterogeneity of Standard and Poor's 500 companies ... 38

IV.1. Data ... 38

IV.2. The linear structure of S&P 500 companies ... 39

V. Best practices and the stock market implied industry classification ... 42

V.1. Industry Classification Standards ... 42

V.2. FMIC and GICS ... 44

V.3. Risk Reward ... 46

V.4. Conclusion ... 48

VI. Remarks ... 50

VII. Volatility Surface Calibration to Illiquid Options ... 51

VII.1. Stochastic Volatility Inspired ... 52

VII.2. Sensitivity Analysis of SVI ... 54

VII. 3. Empirical Tests ... 59

VIII. Conclusion ... 63

References ... 65

Referred by the theses ... 72

Related Conference Papers ... 72

Other Conference Papers ... 72

Posters ... 72

Appendix I. ... 73

Appendix II. ... 75

Appendix III. ... 79


I. Introduction

Don't put all your eggs in one basket. The adage of generations of investors and asset managers that highlights the spirit of diversification.

Markowitz pioneering work in 1952 formalized the proverb. He applied limit theorem based techniques to prove that maximizing only expected returns is erroneous concept in equity portfolio theory. He advised that investors should take into account the expected standard deviation of portfolio returns, thus they should maximize the discounted value of expected future returns with given expected standard deviation of returns.

To shed more light on the importance of diversification, in 1956, J. L. Kelly, Jr.

proposed a game in which a gambler should bet on a binary event such that the payoff is double or nothing. If the outcome is uncertain then putting all the money into the more likely scenario will undoubtedly maximize the expected return. However, in long run the gambler almost surely ruins.

Markowitz used the correlation structure of the market to diversify risk, mitigate expected standard deviation of returns. The theory implies an efficient frontier which contains the maximum expected return portfolios with different given expected standard deviations of returns. Besides, the efficient frontier contains the so called market portfolio which provides the highest expected return on unit expected standard deviation of returns. Expanding the set of assets with risk free bond we could see that portfolios of risk free bond and market portfolio surpassing portfolios with same risk.

Lintner (1965) showed that if market is perfect: i.e. purely competitive, no transaction costs, no taxes, infinite amount of liquidity, number of different assets is finite, assets are infinitely divisible, distributions of asset returns have second moment, investors are risk- averse, have the same perceptions and optimizing mean-variance on the same time horizon then market participant should hold the same combination of risky assets. Moreover, adding indifference curves to the mean-variance framework unveils risk tolerance controls leverage.

Mossin (1966) emphasized the importance of conventions e.g. unit time, notional, numéraire. He formalized the utility maximization problem and derived the mean-variance market portfolio.

In 1964, W. F. Sharpe analyzed the anatomy of standard deviation of the returns. He


orthogonal projection i.e. simple regression to quantify systematic risk, hence introduced the mean-beta framework.

Black, Jensen and Scholes tested the model in 35-year New York Stock Exchange (NYSE) data (Black et al., 1972). Their findings underpinned the linear relationship between beta and returns. However, regression results put some pressure on the capital asset pricing model (CAPM) because high-risk portfolios seemed to generate less return that was predicted by the model. The paper also stressed that the autocorrelation of residuals is low, but non- normality could bias the t-statistics.

The empirical results suggested the initial conditions should be relaxed to make CAPM applicable. In 1972 Black, Jensen and Scholes showed that liquidity constraint of riskless borrowing and lending can be modelled with additional factor. Fama and MacBeth added second order term and asset related local non-𝛽 risk (Fama and MacMeth, 1973).

Merton (1973) came up a fundamentally different idea. He applied stochastic differential equations (SDE) to describe the optimal investment problem and derived the intertemporal CAPM (ICAPM). Breeden used the same optimal control technique and simplified Merton’s multi-beta framework. He noticed that the consumption rate is the key driver of risk hence return expectations. Thus, he derived the consumption CAPM (CCAPM) amid theoretical and empirical criticism of CAPM (Breeden, 1979). Mehra and Prescott tested the CCAPM with constant relative risk aversion (CRRA) utility function on 90-year Standard and Poor 500 data. Their findings proved that CCAPM cannot reproduce the historical 6.18%

average annualized equity premium with acceptable value of risk aversion (Mehra and Prescott, 1985).

The empirical deficiencies of CAPM highlighted the complexity of asset pricing and unveiled certain qualitative return characteristics. These anomalies led to the arbitrage pricing theory (APT). Ross in 1976, similarly to Merton, used multiple factors to explain equity returns in linear regression framework. The model conveyed strong empirical results (Roll and Ross, 1980; Chen, 1983).

The main question formalized: how to identify factors. In practice, there were several indicators in use e.g. earnings-price ratio (E/P), return on equity (ROE), and daily volume (Fama and French, 1993). Banz (1981) demonstrated the importance of the size of market capitalization. Stulz (1981) showed how CAPM can be extended to global capital markets.

Samuel and Bhandari (1988) highlighted that leverage has significant impact on returns. In 1993, Fama and French encapsulated the essence of previous works in the well-known Fama French 3-factor model (FF3F). They underlined that the usual variables are only scaled versions


of size (small minus big – SMB) and book-to-market equity (high minus low – HML) factors, hence, they proposed to use CAPM with SMB and HML to assess expected returns. They also emphasized SMB and HML are just proxies of unknown state variables (Fama and French, 1996). The asset pricing model seemed to be complete. However, selling the worst and buying the best performing stocks said to be lucrative strategy (Jegadeesh and Titman, 1993). The empirical results could not be explained within FF3F model. Carhart concluded that momentum (UMD or previous one year return PR1YR) should be a standalone factor and added to the FF3F equation. In 1997, the Asian financial crisis and the default of Long-Term Capital Management revised the importance of general market liquidity. Pástor and Stambaugh unveiled that stocks with high sensitivity to liquidity generate excess mark-to-market returns in CAPM framework, hence promoted liquidity as missing piece of CAPM (Pástor and Stambaugh, 2003). Titman, Wei and Xie (2004) documented negative relation between abnormal capital investments and returns and suggested to expand FF3F model with investment-per-return factor. In 2012, Hou, Xue and Xhang showed that return-on-equity (ROE) is appropriate indicator of profitability which is not explained in FF3F (Hou et al., 2015). Three years later Fama and French (2015) revised the 3-factor model and introduced the state-of-the-art Fama French 5-factor model (FF5M). The new model captures the differences of returns of stocks with robust and week profitability (RMW) along with low and high investment stocks (CMA). Nevertheless, empirical tests of FF5F model dampens the theoretical foundation (Racicot and Rentz, 2016).

Adcock and Shutes emphasized the normality assumption of CAPM does not hold and proposed to use skew-Student-t distribution (Adcock and Shutes, 2000). Kraus and Litzenberger added skewness to CAPM, but their findings are neglected (Kraus and Litzenberger, 1976). It also turned out that investors are more concerned about downside risk (Campbell et al. 2001; Rockafellar and Uryasev 2000; Ormos and Timotity 2016). In 2015, Ormos and Zibriczky revised J. L. Kelly, Jr. idea. They connected fat-tail phenomenon and maximal growth rate and proposed to use entropy as alternative market factor. Note that around zero expected returns entropy implies the usual mean variance framework. Besides, non- linearity it could control non-Gaussian behavior. However, higher moments can be treated as proxies of size and momentum. Thus, calibrating the appropriate non-linear function can be viewed as mixing systematic risk with other market factors e.g. momentum and size.


In this dissertation we demonstrate the heterogeneity of international capital markets (Ormos Timotity Nagy, 2017). We show that countries put into different qualitatively identified groups do not converge to each other. We compare historical statistics of different market factor measures and conclude the applicability of CAPM. Altogether the main contributions and novel results of the dissertation can be summarized in the next theses.

Thesis 1 (Nagy and Ormos, 2018): We show that heterogeneity of return characteristics can be controlled by spectral clustering.

Instead of theoretically relaxing the assumptions of CAPM and Markowitz model we look at the factor problem from a purely mathematical perspective. We highlight that adding nominal variables to a linear regression splits the data into subsets. We also emphasize that all the multi- linear CAPM and Markowitz model variants have superior regression statistics to the single- factor ones. Thus, we conclude the main problem is the heterogeneity of return characteristics.

In order to overcome this phenomenon, we revise the original factor problem and propose a novel data driven spectral clustering based technique. We connect portfolio management, linear algebra, graph theory and demonstrate the applicability of Newman-Girvan cut.

Thesis 2 (Nagy and Ormos, 2018): Spectral clustering based global stock market index classification outperforms qualitative standards.

We expand the Markowitz model with clusters to stabilize risk-return regressions. We examine stock index market implied clusters and compare them with geographical and MSCI qualitative classifications. The results suggest that qualitative categories are in line with market implied clusters. Moreover, some of the spectral cluster-wise historical return-standard deviation regression coefficients are stationer.

Thesis 3 (Nagy and Ormos, 2018): US capital market implied industry classification outperforms industry standard.

We outline that the largest companies of the United States have different return characteristics that is handled with different qualitative market classification standards. We show that in CAPM framework spectral clustering leads to statistically more adequate and reliable results.

We have to emphasize that clustering provides sufficient results if and only if market is at least weakly efficient (Fama, 1970).


Thesis 4 (Nagy and Ormos, 2019): We demonstrate the applicability of novel absolute price difference based stochastic volatility inspired surface (SVI) fitting methodology.

We stress that equilibrium models do not deal with derivatives, but the idea of categorization is general. Note that asset classes based on market characteristics e.g. traded instruments, tenors, liquidity, execution venues are fundamentally different. Thus, option products of different asset classes require different pricing models. Researchers and practitioners narrowed down the set of derivatives based on asset class and complexity. We show that wide bid-ask spreads (market friction) can destabilize arbitrage free prices.


II. Heterogeneity of global equity market

Cross-border labor and capital flow has long history. However, persistent economic convergence of countries is still questionable. On the one hand, Errunza and Losq (1985) found mildly segmented capital markets. On the other hand, since the milestone paper of Bekaert and Harvey (1995), literature on international integration has documented strong convergence in the characteristics of risk-return relationship in liberalized stock markets (Narayan et al., 2011;

Eun and Lee, 2010; Heimonen, 2002). However, researchers and practitioners still distinguish developed, emerging and frontier markets.

Table II.1: History of Emerging Markets (EM) and Developed Markets (DM) classification

Author Milestone paper Conclusion

Levy, H. and Sarnat,

M. 1970 International Diversification of

Investment Portfolios EM and DM indices are poorly correlated, thus using EM indices implies better diversification.

Samuelson, P. A.

1970 The Fundamental Approximation

Theorem of Portfolio Analysis in Terms of Means, Variance and Higher Moments

Quadratic optimization could be not optimal; hence, higher moments are important as well.

Baumol, W. J. 1986 Productivity Growth, Convergence, and Welfare: What the Long-Run Data Show

Historical data shows EM and DM have different GDP growth rates.

Barro, R. J. 1991 Economic Growth in a Cross Section of Countries

Human capital and political stability are proxies of GDP growth - which can explain the gap between EM and DM.

Richards, A. J. 1996 Volatility and Predictability in National Markets: How do Emerging and Mature Markets Differ?

The volatility of EM decreased after market liberalization, but foreign investors’

overreaction causes large downside risk.

Hwang, S. and Pedersen, C. S 2002

Best Practice Risk Measurement in Emerging Markets: Empirical Test of Asymmetric Alternatives to CAPM

Fat-tail property of EM returns can be captured by Asymmetric Response Model and Lower Partial Moment CAPM.

Eun C. S., Lee J.,

(2002) Mean–Variance Convergence Around

the World Declining country and increasing industry effects imply mean-variance convergence within DM. However, EM do not converge to DM.

Heimonen, K. (2002). Stock Market Integration: Evidence on Price Integration and Return Convergence

Empirical evidence of partial time-varying financial integration of EM.

Salomons R.,

Grootveld H. (2003) The Equity Risk Premium: Emerging vs.

Developed Markets Average EM risk premium is higher than DM’s, but its non-symmetric fat-tail distribution brings difficulties in risk assessment.

Narayan, P. K., Mishra, S., Narayan, S. (2011).

Do Market Capitalization and Stocks Traded Converge? New Global Evidence

Statistical evidences of market convergence.

Notes: Table II.1. presents some of the milestone papers in international financial market integration


In this dissertation we highlight the heterogeneity of global equity market (Ormos Timotity Nagy, 2017).

II.1. Data

We present a detailed analysis of 58 emerging and developed stock market indices. We use US dollar denominated stock splits and dividends adjusted daily closing prices between 26/9/1990 and 21/9/2015; data is provided by Thomson Reuters. We have to stress that some of the indices became listed after 26/9/1990, hence data has to be truncated accordingly (A.II/1).

The risk-free rate is benchmarked by the 3-Month US Treasury Bill rate from the Federal Reserve Economic Database. We have to note that after The Great Recession the discount curve calculation became more sophisticated. Cross-currency and tenor basis spreads have to be handled to eliminate arbitrage (Fujii et al., 2010). However, in this thesis similarly to most of the studies we use the 3-Month US Treasury as a proxy for the risk free rate (Mukherji, 2011).

Our selection criteria for covered stock indices is based on their classification in IMF Economic Outlook 2015 and the MSCI WORLD Index composition in 2015. These countries are presented in Table II.2. In our analysis we allocate approximately the same weight to each region. Regions and indices are selected based on the country breakdown list of MSCI World index. Although the number of countries are not equal in each region, we rebalance the sample by choosing approximately ten indices from each group.

Table II.2: List of selected indices

Region Country Index

Africa Kenya, Namibia, Nigeria, South Africa,


BRVMCI, MALSMV, LASILZ Arab world Bahrain, Egypt, Kuwait, Morocco, Qatar,

Saudi Arabia, United Arab Emirates BAX, EGX30, KW15, MASI, QSI, TASI, ADI

Asia India, China, Indonesia, Malaysia, Vietnam, Thailand, Bangladesh, South Korea, Taiwan, Hong Kong


Eastern Europe Russia, Ukraine, Bulgaria, Hungary, Poland,

Romania, Turkey, Czech Republic IRTS, UAX, SOFIX, BUX, WIG, BETI, XU100, XUO30, PX

South and

Middle America Argentina, Brazil, Chile, Colombia, Costa

Rica, Mexico, Venezuela MERV, BVSP, IPSA, COLCAP, IACR, MXX, IBC

Western Europe, United States, Japan, France, UK, Canada, SPX, DJI, TOPX, FCHI, FTSE, GSPTSE,


In order to underline the very different characteristics of individual stock indices, we present the daily descriptive statistics in Table II.3. One may find a controversial relationship between risk and return by simply looking at the Ukrainian (UAX) stock index compared to the world index: while the former has much greater volatility, its mean return is still negative - instead of providing a higher expected risk premium.

Table II.3: descriptive statistics of the daily returns

Index Mean Variance Skewness Kurtosis

CSI300 0.0003 0.0006 -0.9710 21.6375

XU100 0.0000 0.0015 -0.0988 49.1132

DJI 0.0004 0.0004 -0.0588 16.6425

UAX -0.0004 0.0002 -1.7000 113.4285

WORLD 0.0001 0.0001 -0.4032 15.6315

Note:Table II.3. highlights the differences between global index return distributions; CSI300, XU100, DJI, UAX, and WORLD representing Shanghai Composite 300, Brose Istanbul 100, Dow Jones Industrial Average,

Ukraine UX Index, MSCI World Index respectively.

II.2. Risk versus expected return in the CAPM framework

According to the Expected Utility Theory (Neumann and Morgenstern, 2007), decision-making under uncertainty yields expected utility that is a linear function of expected return and variance. The most commonly used framework in financial modelling, the CAPM suggests that such perception of utility leads to a linear relationship between relevant risk, as measured by beta and expected return. In this section, we present our findings on this latter relationship by applying estimation based daily returns.

In Figure II.1, daily expected return is plotted against the beta values of separate stock market indices. Although, some outliers are present in the sample (such as UAX, DS30 the stock index of Ukraine and Bangladesh respectively), a clear positive relationship between the daily estimation of beta values and expected returns is present. The regression line is estimated with a constant, that is, we allow for inefficient markets as measured by the Jensen alpha. The red dot in the middle stands for reference market portfolio (the MSCI world index).


Figure II.1: Daily relationship between CAPM betas and expected return between 1990 and 2015

In Table II.4 we summarize our results for daily return estimations and statistically confirm the positive relationship shown above. Two settings are analyzed here: in the first one (the first two columns) we estimate a Jensen alpha by including a constant in our regression (Jensen, 1968), while in the second one (last two columns) only the risk premium is estimated. In both cases we confirm that the risk premium is significantly different from zero and is positive.

Table II.4: Daily relationship between CAPM betas and expected return Coefficient P-value Coefficient P-value

Alpha 0.0024 0.0180 - -

Beta 0.0021 0.0020 0.0030 0.0000

Note: Table II.4. displays the OLS regression statistics of average daily returns and MSCI World betas with and without intercept

We emphasize that statistical indications could be misleading if returns are not stationary: we


II.5. Coefficients showing daily returns of developed markets are weakly stationer, whereas some emerging market data shows different properties.

Table II.5: Stationarity analysis of daily returns

Index ADF p value of ADF KPSS p value of KPSS

CSI300 -17.621 0.01 0.104 0.100

XU100 -20.354 0.01 0.061 0.100

DJI -21.396 0.01 0.229 0.100

UAX -17.906 0.01 0.578 0.025

WORLD -20.354 0.01 0.121 0.100

Notes: In this table, we present the stationarity test statistics of different stock indices

After we have shown the risk-reward relationship for our total sample through a cross-sectional analysis, we turn to discussing the time-series and panel analysis of the CAPM.

II.3. The evaluation of CAPM betas

Since 1990 capital markets have changed in many ways. The vast increase in world trade volume and liberalization of markets has led to the emergence of new capital markets. As measured by the World Trade Index of the CPB Netherlands Bureau for Economic Policy Analysis, international trade has increased, in absolute terms, by an immense 235% over the last 25 years (1990-2015). This change in international trade, along with the greater access to capital markets, has further facilitated investors’ ability to diversify their portfolios internationally. One the one hand, this global access to capital markets supports the idea that well-diversified portfolios are priced in the CAPM setting. On the other hand, the development and increasing interconnectivity of geographically separated markets yields that these markets converge to each other; therefore, their betas should converge to unity.

In Figure II.2, we present this convergence by plotting the CAPM betas estimated on one-year window. In line with most studies in empirical literature in the topic, we present our results using estimations based on daily returns. The red dashed line represents the market capitalization weighted average which would represent the MSCI World Index itself.


Figure II.2: The evolution of CAPM betas

It is worth noting two patterns here: first, towards the end of the sample period betas seem to be more scattered around the mean than at the beginning; second, in general the emergence of new capital market indices comes with an extreme starting beta - however, these quickly converge to the mean as well. In the following we test the statistical significance of convergence of stock indices that exist since 1990.

We investigate the persistence of equity index return heterogeneity. We test whether emerging capital markets trend towards developed markets by measuring the average squared distance between individual betas and the world index, and search for a decreasing temporal trend. Analytically we define the error term as

𝜀𝑡 = ∑ (𝛽𝑖,𝑡−1)


𝑛−1 𝑛𝑖=1 ,

where 𝑛 stands for the number of stock market indices at time 𝑡. Then, our fitted regression includes a linear trend of time (𝑡) and dummy series for the 1997-98 Asian/Russian Crisis (𝐷1997), 2001 Dot-com Bubble (𝐷2001), and 2008 Financial Crisis (𝐷2008). Hence, we apply an OLS estimation for the equation

𝜀𝑡 = 𝛾0+ 𝛾1𝑡 + 𝛾2𝐷1997+ 𝛾3𝐷2001+ 𝛾4𝐷2008+ 𝑒𝑡,

where 𝑒 is a zero-mean error term. The decision criteria for the rejection of no convergence is


Results of the analysis based on daily returns, is presented in Table II.6. Although, the R- squared value indicates a reasonable fit to the linear model, apart from the constant term, none of the variables are significant at the five percent level. Here, we can reject the null hypothesis that separate capital markets converge towards the global index.

Table II.6: Convergence of CAPM betas estimated on daily returns

Coefficient P-value

Constant 0.4262 0.0000

Time 0.0070 0.0962

Year 1997 -0.0385 0.7466

Year 2001 -0.1790 0.1434

Year 2008 0.0697 0.5709

R-squared 0.2880

Note: we display the OLS coefficient estimates and p-values of time and financial market crises dummy variables

Overall, we argue that the convergence property of CAPM betas cannot be confirmed by daily estimations. In addition, global stock indices still found to have significant individual characteristics as measured by the difference of their beta parameters.

In the following we aim to capture these differences by introducing cluster variables.


III. Friendship of stock indices

The global stock market structure has to be well understood to diversify risk and manage cross- border equity portfolios. Appropriate portfolio construction is rather complicated. The linear dependence structure of the network is not stable (Erdős et al., 2011; Song et al., 2011;

Maldonado and Saunders, 1981; Figure II.2). Moreover, exogenous shocks have major impact on the correlation structure; hence, uncorrelated assets could start moving together (Heiberger, 2014). Therefore, correlation-based techniques could cause unwanted variance peaks.

Institutional economics studies (like MSCI, 2018) provide qualitatively identified network structures e.g., emerging markets and developed markets to stabilize their classification.

In this dissertation we propose a more suitable quantitative technique, generalize the widely used correlation-based portfolio construction framework and discover the equity index network (Nagy and Ormos, 2018).

The baseline concept follows CAPM, in which similarity measures can be treated as correlations between logarithmic returns (Yalamova, 2009). However, anomalies of CAPM indicate the two-dimensional mean-beta framework gives only a simplified picture of the real market structure. In order to explain the flaws of the CAPM numerous qualitatively identified cluster variables appeared implicitly in the famous regression. The original concept of classifying the market, defining sub-groups raises the fundamental question: Who can judge the market? In order to overcome the ad hoc manner of regression statistics optimization we suggest a data driven graph theory-based approach.

We compare various spectral clustering techniques to unveil embedded network level information (Shi and Malik, 2000; Bolla, 2011). We analyze jump-based similarity to investigate the effect of shocks. In addition, we test whether relative entropy of the distribution functions, that captures non-Gaussian behavior, conveys network level information. We also investigate the widely used Gaussian smoothing and correlation (Luxburg, 2007).

III.1. Defining similarity

In the 20th Century, the appearance of large, complex data sets brought new challenges to developing methods which could be used to understand complicated structures. The key


optimal, lower dimensional representation of multidimensional data sets. The idea is twofold:

on the one hand, similarly to principal component analysis we could calculate lower dimensional representation of the data points from the eigenvalues and eigenvectors of the similarity matrix. On the other hand, we could represent the data structure as a weighted graph and cut the graph along the different clusters. This approach leads to penalized cut optimization problems. Linear algebra and cluster analysis provide powerful methods to find the optimal representations and minimized cuts.

If we do not want to make any a priori assumptions then we have to look at the data and dismiss other subjective classification guidelines. Mathematically it is possible to represent a datasets as a graph, hence, we can construct an abstract network of instruments. The most straightforward method would be representing instruments with nodes, connection strengths with weights. Thus, we can define the network with G(𝑉, 𝑊) graph where 𝑉 represents the set of instruments and 𝑊 contains the connection information.

If we would like to cluster different items, first the measurement of similarity has to be decided. We will denote the similarity of two time series (𝑖, 𝑗) by 𝑊𝑖,𝑗. The goal is to penalize differences and reward similarities.

First, the Markowitz-based squared correlation is considered a similarity metric.

𝑊𝑖,𝑗= Corr2(𝑟𝑖, 𝑟𝑗),

where 𝑟𝑖, denotes the daily logarithmic return series of instrument 𝑖.

We argue this approach because logarithmic returns are not normally distributed, hence non- linear effects may also be important. However, as correlation is linear, squared correlation similarities only take into account linear dependences.

The problem of higher-order moments can be easily tackled by using symmetric, positive-definite kernel functions. In practice, the Gaussian-kernel is widely used (Leibon et al., 2008).

𝑊𝑖,𝑗 = exp⁡(−∥ 𝑟𝑖 − 𝑟𝑗2 dim(𝑟𝑖) )

We notice that, if the sets of the relevant information and sensitivities are similar, then the relative entropy of the distribution of return processes is small. Otherwise, we can say stock


indices are sensitive to different sets of information in a different manner (Ormos and Zibriczky, 2014). This means that the similarity function has to be monotonically decreasing in symmetric Kullback–Leibler distance, and so we can construct a similarity measure such that (Kullback and Leibler, 1951):

𝑊𝑖,𝑗= 2/(2 + [KL(𝑝(𝑟𝑖) ∥ 𝑝(𝑟𝑗)) + KL(𝑝(𝑟𝑗) ∥ 𝑝(𝑟𝑖))]),

where 𝑝(𝑟𝑖) denotes the historical probability distribution function of logarithmic returns of index 𝑖 and KL(𝑝(𝑟𝑖) ∥ 𝑝(𝑟𝑗)) ≝ ∑𝑝(𝑟𝑖 = 𝑥) ln(𝑝(𝑟𝑖 = 𝑥)/𝑝(𝑟𝑗 = 𝑥)) the relative entropy of indices 𝑖 and 𝑗.

Another perspective argues that large deviations are riskier, hence similarities should be defined with tail distributions. We calculate the differences of two normalized return series and count the number of at least two standard deviation peaks (Tsay, 2005). This logic implies that indices are similar if their price processes jump together. Similarity function has to be decreasing in the number of large deviations, hence we propose the following metric:

𝑊𝑖,𝑗= 1/ (1 + ∑ 𝟏(|⁡𝑧𝑖(𝑡) − 𝑧𝑗(𝑡)| > 2)




where 𝑧𝑖 represents the normalized return of index 𝑖 and 𝟏 the indicator function.

In this dissertation we compare each approach.

III.2. Normalized modularity cut

The equity index structure is strongly connected. We cannot say that events in Africa do not have any effect on European markets, hence we have to find methods which can be used to cluster dense graphs.

If we assume that nodes are independently connected, then the guess of weight 𝑊𝑖,𝑗 will be the product of the average connection strength of 𝑖 and⁡𝑗. The average connection strength 𝑑𝑖 and 𝑑𝑗 are given by 𝑊,


Thus, 𝑊𝑖,𝑗− 𝑑𝑖𝑑𝑗 captures the information of the network structure (Bolla 2011).

A k-partition of graph G(𝑉, 𝑊) can be defined as the partition of vertices such that ⋃𝑘𝑎=1𝑉𝑎 = 𝑉 and 𝑉𝑖⋂𝑉𝑗 = 𝛿𝑖,𝑗𝑉𝑖, ∀𝑖, 𝑗 ∈ {1, … , 𝑘}.

If we want to maximize the sum of information in each cluster, we get:

𝑃max𝑘∈𝒫𝑘𝑘𝑎=1𝑖,𝑗∈𝑉𝑎(𝑊𝑖,𝑗− 𝑑𝑖𝑑𝑗), (III.1)

where 𝑃𝑘 stands for specific k-partition in 𝒫𝑘, which represents the set of all possible k- partitions.

Let 𝑀 ≔ 𝑊 − 𝑑𝑑𝑇 denotes the modularity matrix of G(𝑉, 𝑊). If we would like to get clusters with similar volumes, then we have to add a penalty to Equation (III.1), hence we get the normalized Newman–Girvan cut.



Vol(𝑉𝑎)𝑖,𝑗∈𝑉𝑎(𝑊𝑖,𝑗− 𝑑𝑖𝑑𝑗)

𝑘𝑎=1 , (III.2)

where Vol(𝑉𝑎) = ∑𝑢∈𝑉𝑎𝑑𝑢.

Let us define the so called normalized modularity matrix:

𝑀𝐷 ≔ 𝐷−1/2𝑀𝐷−1/2, where 𝐷 = diag(𝑑).

If we would like to cluster a weighted graph G(𝑉, 𝑊), then eigenvectors of its modularity (𝑀) and normalized modularity matrices (𝑀𝐷) can be used. Modularity and normalized modularity matrices are symmetric and 0 is always in the spectrum of 𝑀𝐷:

𝑀𝐷 = ∑ 𝜆𝑖𝑢𝑖 =



∑ 𝜆𝑖𝑢𝑖




where 1 > 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑁≥ −1 denote the eigenvalues of 𝑀𝐷.

If we would like to maximize Equation (III.2), then we can use the k-means clustering algorithm on the optimal k-dimensional representation of vertices,




2𝑢1, … , 𝐷

1 2𝑢𝑘)


where 𝑢1, … , 𝑢𝑘 denote the corresponding eigenvectors of |𝜆1(𝑀𝐷)| ≥ ⋯ ≥ |𝜆𝑘(𝑀𝐷)|.

Moreover, if the normalized modularity matrix has large positive eigenvalues, then the graph has well-separated clusters (Bolla, 2014).

III.3 Normalized cut

Another natural approach is to row-wise normalize the similarity matrix, then we get the transition matrix of the random walk on the graph (Luxburg, 2007);

𝑃 = 𝐷−1𝑊,

where 𝐷 represents the diagonal matrix of the sums of rows.

Studying stopping times of random walks on graphs sheds some light on the structure of the graph, because, if it takes a long time to reach a subgraph of the graph from a given node then it would mean that the node and the subgraph are well separated. Moreover, the largest eigenvalue of the submatrix of 𝑃 which belongs to the subgraph controls the distribution of the stopping time (Behrends, 2000);

Prob(𝜏⁡ ≥ 𝑛) = 1 − 𝜋0𝑛𝑖=1𝑄𝑖−1𝑅 ⋅ 1,

where 𝜏 represents the stopping time to reach the subgraph of the graph from given node,⁡𝜋0 the initial distribution, 𝑄 the submatrix of transient points, 1 the (1, … ,1) vector and 𝑅 defines the transition probabilities from transient to recurrent points.

Using spectral theorem (𝑄 = 𝑉Λ𝑉𝑇) and symmetricity

Prob(𝜏⁡ ≥ 𝑛) = 1 − 𝜋0𝑉 ∑𝑛𝑖=1Λ𝑖−1𝑉𝑇𝑅 ⋅ 1 (III.3) Equation III.3. shows that the spectrum of adjacency matrix, stopping times and clustering


an arbitrary real valued function on the vertexes and defining the below incidence matrix led some color to the connections between Laplace operator and graph theory (Chung, 1997);

𝐵𝑒𝑣 = {





⁡if⁡v⁡is⁡the⁡terminal⁡vertex⁡of⁡e otherwise⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡


then 𝐵𝑇𝐵 would be exactly the negative discrete Laplace operator because:

𝐵𝑇𝐵𝑓(𝑣𝑖) = ⁡ ∑𝑣𝑖∼𝑣𝑗𝑓(𝑣𝑖) − 𝑓(𝑣𝑗).

Notice that if we subtract the adjacency matrix from the diagonal matrix of row-sums then we get the same operator. Thus Laplace matrix can be defined as follows;

𝐿 = 𝐷 − 𝑊

If we would like to get clusters with similar volumes, then we have to penalize extreme volumes, hence we get the normalized cut (Luxburg 2007).

𝑃𝑚𝑖𝑛𝑘∈𝒫𝑘∑ ( 1

Vol(𝑉𝑎)+ 1

Vol(𝑉𝑏)) 𝑊𝑖,𝑗


𝑎=1,𝑏=𝑎+1 (III.4)

Let us define the so called normalized Laplace matrix:

𝐿𝐷 ≔ 𝐷12(𝐷 − 𝑊)𝐷12

The optimization problem is similar to Equation (III.2). However, instead of the normalized- modularity matrix the normalized Laplace matrix provides the solution (Shi and Malik 2000).

As the random walk idea implies the normalized Laplace technique works when clusters are well separated, otherwise normalized modularity gives better results.

In this dissertation, we will compare both methods to unveil the market embedded networks structure.


III.4. Results

This dissertation presents a broad analysis of the equity index network structure. Logarithmic returns of 58 stock indices are clustered in different ways. Our investigations reveal stock indices are homogenously connected, and large price changes have limited effect on the network structure.

In empirical analysis, the following steps are the backbone of the calculation (Filippone et al. 2007).

1. Constructing the similarity matrix (𝑊).

2. Calculating the normalized modularity matrix (𝑀𝐷).

3. Based on the spectral gap, determining the number of clusters and optimal k- dimensional representation.

4. Appling k-means clustering.

III.4.1. Comparing similarity matrices and normalized cuts

Defining similarity is a key aspect in clustering. In general, it is not usually possible to find an optimal kernel, but different approaches can be tested and compared to specific data sets.

We analyze correlation, jump, entropy, and Gaussian-based similarity kernels. When calculating the similarity matrices, we expect strongly connected indices have coefficients close to one, whereas loosely connected close to zero. Level plots (Figure III.1) give a feeling about the network structure which seems to be homogeneous; thus, clusters could not be well separated.


Figure III.1. Level plots of daily similarity matrices. The lighter the color the more similar the two indices are.

Figure III.1 displays the correlation, Gaussian-kernel, relative entropy and jump-based similarity structure of the equity index graph, in which the lighter the color the stronger the connection between the indices. Indices are sorted alphabetically and (𝑖, 59 − 𝑗) represents the similarity between index 𝑖 and 𝑗.

Different similarity measures imply similar patterns, which is in line with our a priori intuition. However, the spectra of normalized Laplace and normalized modularity matrices help us to find the most adequate kernel function: the wider the spectral gap, the better the clustering property. This means, we have to find similarity metrics, which in turn implies large gaps in the spectrum of normalized Laplacian and modularity matrix (Chung, 1997).

A correlation-based similarity approach implies roughly uniform eigenvalue density on [0, 1]. This means, a lot of gaps appear in the spectrum, hence we could not comment on the optimal number of clusters. Moreover, lower dimensional representations will not contain all the information as some of the large eigenvalues are not considered. These hurdles highlight the problems of squared correlation similarity matrices.

Counting at least two standard deviation jumps results in a small number of eigenvalues with large multiplicity. Therefore, lower dimension representation cannot be used to cluster the data points. Accordingly, jumps are random and do not reflect the network structure; thus we


could say all the clusters are exposed to the same systematic risk. Thus, the results provide evidence of spillover effect (Booth et al., 1997). Moreover, we show that shocks and market collapses have a minor effect on the equity index graph i.e., network structure of equity indices.

Figure III.2. Eigenvalues of normalized modularity matrix in decreasing order.

Figure III.3. Eigenvalues of normalized Laplacian matrix in decreasing order.

Gaussian and relative entropy-based similarity matrices infer promising figures, especially in the case of normalized modularity. Here, we get large well separated eigenvalues necessary to transform the data into a lower dimensional space.

Notice that these results are in line with Figure III.1 because the normalized Laplacian minimizes the normalized cut (Equation (III.2)), which in turn, is small if and only if, the clusters are loosely connected. Whereas, the modularity approach maximizes the information of clustering, hence, it can also be used in a homogeneous network structure as well.

Investigating the spectra, especially the positions of spectral gaps, gives some guidance on the optimal number of clusters. Considering the previous results, the spectra of Gaussian and relative entropy-based normalized modularity matrices are suitable. Figure III.2 shows indices could be put into 2, 3, or 5 clusters.


Figure III.4: Largest eigenvalues of Gaussian- and relative entropy-based normalized modularity matrices.

The efficiency of different k-partition can be tested in multiple ways. In this framework we suppose that variance has two components: the within, and the between cluster components.

Therefore, the explanatory power of given clusters can be described as:


𝑁𝑖,𝑁𝑗𝑖,𝑗=1(𝑋𝑖,𝑗−𝑋̅)2 ,

where 𝑘 represents the number of clusters, 𝑁𝑖 shows the size of clusters and 𝑋̅, 𝑋̅𝑖 stands for the total and cluster-wise average (Zhao, 2012). The formula penalizes dispersions within clusters, hence dense clusters would give a number close to 1. Moreover, calculating the ratios with a different number of clusters highlights the optimal number of clusters.

In order to identify the optimal number of clusters we calculate the percentage of variance explained as a function of clusters. We have to stress that this method is computationally intense because, the whole process has to be repeated many times. However, in our case, as we have 58 stock indices, it can be used. Figures III.2, III.3 and III.4 provide evidence for using 2, 3, 4, or 5 clusters.

Figure III.5: Explained percentage variance of Gaussian-kernel based clusters of representations (similarity matrix as column-wise representation, normalized modularity and Laplce matrix inplied represantations).


Figures III.2–III.4 show the Gaussian-kernel infers the clearest spectrum property. The relative entropy-based kernel also gives usable results, whereas, jump and correlation-based approaches are ineffective. Moreover, it also can be seen that the Gaussian-kernel over- performs relative to the entropy-based approach; this is because in each case its variance explanation function is steeper.

III.4.2. Equity Index Network Structure

Spectral gap (Figure III.4) and variance analyses (Figures III.5) imply equity indices can be studied by using 2, 3, and 5 clusters. The explanatory power of two clusters is 38%. This means roughly one-third of the total variance comes from the sample heterogeneity. If we increase the number of clusters and investigate the three cluster cases, we get a similar explanatory power.

However, a spectral gap appears between the third and fourth eigenvalues (Figure III.4), so, theoretically, we propose three clusters. The next gap is between the fifth and sixth eigenvalues.

The explanation power of five clusters is 52%. This means, half of the total variance of data can be explained by five clusters.

The results (Figure III.5) also suggests that additional clusters have little explanatory power, which is in line with spectrum properties.

In practice, mean-variance plots can be used to represent risks and rewards. Intuitively, indices with similar risk and return can be believed to be similar. This approach applies a k- means algorithm to cluster the two-dimensional (mean, standard deviation) representation of logarithmic returns.

We have seen the naïve two dimensional approaches (return-standard deviation, return- beta) do not give optimal cuts. However, if we calculate Gaussian similarities and normalized modularity matrix based representation, then we get clusters with a higher variance explanatory power. We have seen stock indices can be put into 2, 3, or 5 clusters.

Calculating 2 Gaussian based normalized modularity clusters we can see that clusters that are optimizing the modularity cut are concave in return-beta coordinates. Similarly to previous chapter we use the 3-Month US Treasury and MSCI World Index as proxies for the risk free rate and market portfolio respectively.


Figure III.6: Two Gaussian-kernel based normalized modularity clusters in return-beta coordinate system 1990- 2015

Having a closer look at the indices (Table III.1) we could notice that geographical and MSCI categorizations are in line with Newman-Girvan cut. The first cluster is dominated by developed market indices while Cluster 2 collects emerging and frontier markets names. It also can be seen Western European indices dominate Cluster 1.


Table III.1: Frequency table of geographical regions, 2 Gaussian based modularity clusters and MSCI classification

Region Cluster MSCI Frequency

Africa 1 EM 1

Africa 2 Frontier 3

Africa 2 Not rated 2

Africa 2 Standalone 1

Arab 2 EM 4

Arab 2 Frontier 3

Asia 2 Developed 1

Asia 2 EM 7

Asia 2 Frontier 2

Western EU, AU, US, JP 1 Developed 16 Western EU, AU, US, JP 2 Developed 2

Eastern EU 1 EM 3

Eastern U 2 EM 2

Eastern EU 2 Frontier 1

Eastern EU 2 Standalone 2

Global 1 Not rated 1

SMA 1 EM 5

SMA 2 Not rated 2

Notes: The frequency table displays the connection between geographical, MSCI and spectral clustering based classifications.

Investigating the geographical outliers we could find that South Africa, Czech Republic, Hungary, Poland, Costa Rica and Venezuela are in Cluster 1, while Australia and Japan belong to Cluster 2 (Appendix II. Table II).

Putting the indices into three different clusters (Figure III.7) gives a more complicated structure, but we could still state that the first cluster is led by European countries (green), the second by American (blue), and the third is a mixture of indices from the rest of the world (red).


Figure III.7. Three Gaussian-kernel based normalized modularity clusters, edges with weights less than 0.2 are filtered out, 1990-2015

Calculating five different clusters helps us to gain deeper understanding of the global equity index structure.

Figure III.8: Five Gaussian-kernel based normalized modularity clusters in return-beta coordinate system, 1990- 2015


The first surprising result is that despite the penalty of different cluster sizes, the Dhaka Stock Exchange (DS30) is separated into cluster three. It could be worth mentioning that DS30 index has been listed only since 2013.

Another interesting outcome is cluster four contains only two African and two American indices. One could also say the first cluster includes the Arab indices except Morocco. Cluster two primarily comprises Western European, while cluster five is dominated by Asian and Eastern European names.

Appendix II. also highlights that Cluster 2 contains most of the developed market names. Cluster 5 is dominated by emerging market countries, while Cluster 1 encompasses frontier and emerging market indices.

Comparing cluster-wise descriptive statistics gives more support to geographical segmentation. However, Table III.1, A.II/2 and A.II/3 suggest that portfolios constructed using only geographical or MSCI categorization can integrate indices which behaves differently compared to real cluster-wise peers.

Table III.2.: cluster-wise descriptive statistics of mean, standard deviation and CAPM beta Avg. of avg. Std. of avg. Avg. of std. Std. of std. Avg. of beta Std. of beta Cluster 1 0.00079 0.00033 0.00076 0.01208 0.21874 0.17250 Cluster 2 0.00034 0.00022 0.00061 0.01116 1.37550 0.62450 Cluster 3 0.00423 0.00000 0.00274 0.00000 -0.03681 0.00000 Cluster 4 0.00047 0.00047 0.00053 0.00365 0.05242 0.06857 Cluster 5 0.00014 0.00046 0.00086 0.01229 0.75020 0.38721

Africa 0.00017 0.00023 0.01833 0.00465 0.33630 0.42651

Arab 0.00090 0.00022 0.02732 0.01178 0.30594 0.17263

Asia 0.00064 0.00110 0.02841 0.01301 0.66540 0.57139

W-EU etc. 0.00026 0.00017 0.01809 0.00613 1.29430 0.48186 Eastern EU 0.00010 0.00061 0.02796 0.01197 1.01380 0.60693

Global 0.00020 0.00000 0.00935 0.00000 1.00000 0.00000

CASA 0.00064 0.00037 0.03221 0.01314 1.36131 1.09171

Note: This table presents the differences between cluster-wise statistics of individual return statistics. We calculated the clusterwise average and standard deviation of individual stock index average returns, standard

deviations and MSCI World index betas.


deviation of indices are homogenous (Cluster 3 and Cluster 4 incorporates the outliers).

Comparing cluster-wise betas suggests that using the standard CAPM would give appropriate risk-return estimation for Cluster 2, 5, Arab, Western and Eastern European indices. However, emerging market (Cluster 1, Africa, Asia, Central America and South America) betas would be too noisy to have reliable regression coefficients.

We could conclude that geographical and MSCI categorizations are in line with spectral clustering-based classification. Thus, the network generated by simple index returns incorporates geographical and relative economic advancement information.

Displaying historical daily average returns as function of historical standard deviation could shed more light on how clusters scatter.

Figure III.9 Five Gaussian-kernel based normalized modularity clusters in return-standard deviation coordinate system, 1990-2015

In order to compare our quantitative approach with geographical and MSCI classifications, we run the following regressions:


𝑟 = 𝛽0+ 𝛽1𝜎 + 𝛽2𝑐𝑙𝑢𝑠𝑡𝑒𝑟 + ⁡𝜖 (III.5)

𝑟 = 𝛽0+ 𝛽1𝛽M+ 𝛽2𝑐𝑙𝑢𝑠𝑡𝑒𝑟 + 𝜂 (III.6)

𝑟 = 𝛽0+ 𝛽1(𝜎 ⋅ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟) + 𝜉 (III.7)

𝑟 = 𝛽0+ 𝛽1(𝛽M⋅ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟) + 𝜃 (III.8)

The results (Table III.3) show that spectral clustering provides statistically reliable figures, while geographical and MSCI clusters are not statistically significant.

Table III.3 Daily OLS regression p-values of CAPM and Markowitz model with geographical, MSCI, and spectral clusters


variable Coefficients/Statistics

Eq III.5 p-value

Eq III.6 p-value

Eq III.7 p-value

Eq III.8 p-value

Geographical Africa anchore

Constant 0.9270 0.7484 0.2585 0.1155

Std. 0.0279

Beta 0.0025

Africa 0.8944 0.8928

Arab 0.0799 0.0476 0.0200 0.0193

Asia 0.6059 0.8090 0.1801 0.1538

Western EU, US, AU, JP 0.1178 0.5737 0.0232 0.0122

Eastern EU 0.9342 0.8167 0.0846 0.0281

Central and South America 0.1542 0.2893 0.5479 0.5678

Global 0.5211 0.7941 0.0006 0.0004

MSCI Developed markets Constant 0.0038 0.0683 0.0447 0.0375

Std. 0.0504

Beta 0.0142

Developed 0.0216 0.0070

EM 0.6566 0.8713 0.0089 0.0021

Frontier 0.1787 0.6115 0.8345 0.6967

Not rated -0.9040 0.7843 0.6420 1.2440

Standalone 0.5011 0.9580 0.4483 0.0585

Spectral clustering Cluster 1

Constant 0.0021 0.0042 0.2919 0.3206

Std. 0.0017

Beta 0.0001

Cluster 1 0.0087 0.0084

Cluster 2 0.0020 0.0986 0.0000 0.0000

Cluster 3 0.0000 0.9880 0.4579 0.4352

Cluster 4 0.1348 0.4068 0.1451 0.0731

Cluster 5 0.0000 0.0054 0.0685 0.0299


Moreover, 𝑅2 statistics give further evidence for the applicability of spectral clustering based classification (Table III.4).

Table III.4 Daily OLS regression R-squared, adjusted R-squared and F-statistics of CAPM and Markowitz model variants with geographical, MSCI, and spectral clusters

R-squared: Adj. R-squared: F-statistic:

Eq. III.5

Geographical 0.1989 0.8672 1.773

MSCI 0.1323 0.488 1.585

Spectral 0.2188 0.1437 2.913

Eq. III.6

Geographical 0.2656 0.1627 2.583

MSCI 0.1684 0.884 2.106

Spectral 0.2931 0.2251 4.312

Eq. III.7

Geographical 0.2965 0.1979 3.009

MSCI 0.1879 0.1098 2.407

Spectral 0.3017 0.2346 4.490

Eq. III.8

Geographical 0.2867 0.1868 2.870

MSCI 0.2405 0.1675 3.293

Spectral 0.3104 0.2441 4.682

Notes: Table III.4. summarizes the regression statistics of CAPM and Markowitz model variants

We also have to stress that our regression statistics underpin the bassline concept of linear relationship between risk and return.

The outcomes highlight the difficulty of diversification, because the correlation structure of the network is quite heterogeneous. Moreover, geographical and MSCI classifications do not give us statistically significant results. However, indices can be clustered by spectral methods. This means indices in the same cluster are affected by the same risk factor, hence, only cluster-wise diversification can be used to eliminate non-systematic global risk.

III.4.3. Equity Index Graph

Clustering helps us to discover the global equity index graph. However, the local structure can be better understood by node-specific attributes. Our aim is to find out from daily closing prices the most influential markets. Hubs can be identified as vertices with the largest vertex weights.

Analyzing the Gaussian similarity kernel shows that if we randomly generate data, then we would get similarities smaller than 0.2, with probability more than 0.99.

Note that the square difference of standard normal variables (𝜉) follows 𝜒2 distribution.

Thus, assuming normality we can get the theoretical distribution of Gaussian similarities.




Related subjects :