• Nem Talált Eredményt

Noise sensitivity of portfolio selection under various risk measures

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Noise sensitivity of portfolio selection under various risk measures"

Copied!
41
0
0

Teljes szövegt

(1)

Noise sensitivity of portfolio selection under various risk measures

Imre Kondor

a,b,

, Szil´ard Pafka

b,c

, G´abor Nagy

c

aCollegium Budapest – Institute for Advanced Study, Szenth´aroms´ag u. 2., H–1014 Budapest, Hungary

bDepartment of Physics of Complex Systems, E¨otv¨os University, P´azm´any P. s´et´any 1/a, H-1117 Budapest, Hungary

cRisk Management Department, CIB Bank, Medve u. 4–14., H–1027 Budapest, Hungary

Abstract

We study and compare the sensitivity to estimation error of portfolios optimized under various risk measures, including variance, absolute deviation, expected short- fall and maximal loss. We introduce a global measure of portfolio sensitivity and test the various risk measures by considering simulated portfolios of varying sizes N and for different lengths T of the time series. We find that the effect of noise is significant in all the investigated cases, it strongly depends on the ratio N/T and actually diverges at a critical value of N/T that depends on the risk measure in question. We also study the noise sensitivity of portfolio weights. The other risk measures display an enhanced noise sensitivity compared to variance. Piece-wise linear risk measures show an instability, the optimal portfolios jump as we go from sample to sample. In addition, expected shortfall displays an unexpected probabilis- tic feasibility problem: although it is a convex risk measure, in some samples it is not bounded, and then the optimization over the weights does not have a solution.

We conclude that when one considers the theoretical and practical criteria to be imposed on a risk measure, the requirement of robustness (noise-tolerance) should also be taken into account.

Key words: risk measures, expected shortfall, estimation noise, random matrix theory, portfolio optimization.

JEL Classification:C13,C15,C61,G11.

Corresponding author. E-mail: kondor@colbud.hu. Phone: +36-1-224-8313. Fax:

+36-1-375-9539.

(2)

1 Introduction

Risk is one of the central concepts in finance that plays a prominent role in investment decisions, asset allocation, risk management, and regulation alike.

Despite its fundamental importance, no universally accepted measure exists for its quantitative characterization. Risk measures used by practitioners, im- plemented in risk management software packages, embodied in regulation, or applied in theoretical considerations range from rules of thumb and ad hoc recipes to standard statistical measures and sophisticated axiomatic con- structs. A comprehensive treatment of risk measures should embed them in the wider context of economic theory. This would entail, among other things, the clarification of their relationship to utility functions, and a discussion of how the choice of one or another risk measure reflects a set of implicit as- sumptions about the nature of the underlying processes, investors and mar- kets. Our goals are much more modest here. Taking a pragmatic approach, we will content ourselves with considering a few concrete risk measures that are widespread in practice, textbooks or regulation, and will focus on a sole issue related to them, namely their sensitivity to noise. We will see that all risk measures exhibit significant sensitivity, and various risk measures display very different sensitivity to the same noise. We also find that, as a rule, more sophisticated risk measures display an enhanced sensitivity to noise compared to more conventional ones.

Most practitioners tend to regard risk as a given figure, or perhaps a set of a few figures. This is an oversimplified view, however. Ultimately, risk figures result from evaluating some estimators whose input data come from empirical observations on the market. As the observed samples (or time series) are always finite, sample to sample fluctuations will be inevitable. The risk characteris- tics of any financial asset or portfolio are, therefore, never single, well defined numbers, but probability distributions. Ideally, these distributions would be so sharply peaked that even the observation of a single sample (a segment of length T from a time series) could give a fair representation of the whole distribution. The fact is, however, that the size of typical banking portfolios and the lengths of available time series are such that this is almost never the case, and the problem of estimation error can almost never be disregarded in a finance context. The fundamental problem we face here is one of information deficit. The amount of data needed for a faithful reconstruction of the under- lying stochastic process far exceeds the amount of data available in typical situations. Under these conditions, our stochastic models will inevitably con- tain a smaller or larger amount of measurement error and the risk estimates will be noisy.

The effect of noise is much more serious when we want to choose a portfolio that is optimal under certain criteria than when we merely assess the risk

(3)

in an existing portfolio. One and the same risk measure may perform fairly satisfactorily as adiagnostictool and fail miserably as a decision makingtool.

The focus in this paper will be on the decision making side: we will study the noise sensitivity of risk measures inportfolio selection, as opposed to risk assessment.

This problem is, of course, not new. It dates back to the very beginning of Markowitz’ rational portfolio selection theory (Markowitz, 1952, 1959), and over the decades a huge number of papers have been devoted to its various aspects. Most of the literature is concerned with portfolios composed of assets that obey normal or Gaussian statistics, and discusses the noise sensitivity of the natural risk measure associated with Gaussian portfolios, the variance (e.g. Frankfurter et al., 1971; Dickinson, 1974; Jobson and Korkie, 1980; Elton and Gruber, 1973; Eun and Resnick, 1984; Chan et al., 1999). In addition to studying the effect of noise, a number of filtering schemes have also been devised to remove at least a large part of the estimation noise from portfolio selection. These filtering procedures include single- and multi-factor models, (for a review, see e.g. Elton and Gruber, 1995), Bayesian estimators (e.g.

Jorion, 1986; Frost and Savarino, 1986; Ledoit and Wolf, 2003), and more recently, tools taken over from random matrix theory (Laloux et al., 2000).

Despite the fact that portfolio optimization based on several alternative risk measures has been introduced in the literature (Konno and Yamazaki, 1991;

Young, 1998; Rockafellar and Uryasev, 2000; Acerbi, 2004) and has become widely utilized in practice (Dembo and Rosen, 2000; Algorithmics, 2002), the literature on the effect of noise on risk measures other than the variance is much less extended. Our goal here is to start to fill this gap. We will study and compare the sensitivity of the following risk measures: standard deviation, absolute deviation, expected shortfall, and its extremal version, maximal loss.

Although we will make occasional remarks on Value-at-Risk (VaR) and some risk measures implied by existing international capital regulation, we will not include them in a systematic comparison with the others, because VaR and some of the regulatory measures fail to satisfy the crucial criterion of convex- ity (Artzner et al., 1997; Kondor et al., 2004), and also because the regulatory measures cannot be interpreted as functionals on the underlying probability distribution function (pdf). Finally, we will also discuss a surprising feasibility problem that we have encountered in the course of this study: the existence of an optimum under some of the risk measures that are strongly promoted in the academic literature (e.g. expected shortfall) turns out to be a proba- bilistic issue, depending on the sample. This phenomenon is related to similar feasibility problems in stochastic linear programming and in random geometry (Todd, 1991; Schmidt and Mattheiss, 1977).

The plan of the paper is as follows. In Section 2 we look into the noise sen- sitivity of a classic risk measure, the variance. The use of variance for a risk

(4)

measure assumes that the assets in the portfolio obey a (multivariate) normal probability distribution. As we have just mentioned, the effect of noise on, and the related filtering of, Gaussian portfolios have been the subject of countless studies. The novel feature here is a model-simulation approach by which we are able to test the noise sensitivity of variance. We also introduce the ratio of the noisy standard deviation and the true one as a measure of the noise-induced sub-optimality of a portfolio and show that this quantity scales inN/T (Pafka and Kondor, 2002). In addition, we derive an exact analytic expression for this quantity that shows that the effect of noise diverges as we approach the limit N/T = 1 from below. It is interesting that this formula, despite its simplicity, was only noticed recently (Pafka and Kondor, 2003; Burda et al., 2003; Papp et al., 2005). Next we construct, by a step by step approach, a covariance matrix model that is able to grasp some of the essential features of empirical covariance matrices observed on real markets. This model can also be related to current factor models in the econometric literature (see e.g. Ferson, 2003).

We analyze the spectrum of this covariance matrix, and investigate the effect of estimation error on it. In the limit N, T going to infinity such that their ratio is fixed and finite, the band of small eigenvalues becomes smeared into the Marchenko–Pastur spectrum (Marchenko and Pastur, 1967) of Wishart matrices (Wishart, 1928). This makes contact with a recent research line that approaches the problem of noise filtering from a random matrix theory per- spective (Laloux et al., 1999; Plerou et al., 1999; Laloux et al., 2000).

Section 3 is devoted to alternative risk measures. The examples we consider are absolute deviation (AD), expected shortfall (ES), and a pessimistic ver- sion of it, which we call maximal loss (ML). All these risk measures share a common property with VaR and the regulatory measures, in that they are all piecewise linear, or, put in another way, their level surfaces (iso-risk surfaces) are polyhedrons in the space of portfolio weights. We argue that this feature is the explanation why these risk measures display an enhanced sensitivity to noise compared to the smooth variance. Another decisive factor is the high loss-threshold below which one throws away all the data in the case of VaR, ES and ML. In order to provide a fair perspective on the performance of ES, we have also investigated the dependence of its noise sensitivity on this thresh- old, and found a very counterintuitive, non-monotonic dependence. Finally, we study the feasibility problem related to the optimization under ES and ML.

For the latter we provide a closed analytic expression for the probability of the existence of a solution. This formula is, in fact, known from the solution of an isomorphic problem in stochastic linear programming (Todd, 1991), but also from random geometry (Schmidt and Mattheiss, 1977). As for ES, we cannot derive a similar closed formula, but provide numerical results for its dependence onN, T, and the threshold.

The paper ends on a short Summary.

(5)

The exposition will be informal throughout. No lengthy mathematical deriva- tions will be presented, results will be illustrated or supported by simulations, and we shall frequently resort to geometric arguments that rarely appear in the literature in this context, but that we find a powerful guide to intuition.

2 Noise sensitivity of variance 2.1 The Markowitz problem

The fluctuations of prices and returns form a multivariate stochastic process.

The simplest model for their pdf is provided by a multivariate normal dis- tribution. This picture goes back to Bachelier (1900), and it has remained the standard textbook model of financial markets even until today. Real-life portfolios conform to this model to various degrees, depending on the assets, liquidity, the time horizon, etc. In this Section we assume that our portfolio is multivariate normal.

The problem of rational portfolio selection was formulated by Markowitz (1952) as a tradeoff between reward and risk. Reward is usually measured in terms of return or logreturn. For a Gaussian portfolio variance is basically a unique measure of risk; any other reasonable measure is necessarily pro- portional to it. Rational investors want to minimize their risk given a certain fixed expected return. In mathematical terms the task consists in finding the minimum of the quadratic form

σP2 =

XN

i,j=1

wiσijwj, (1)

over the weights wi, given the constraints

XN

i=1

wi = 1 (2)

and

XN

i=1

wiµi =µ, (3)

whereσP is the standard deviation of the portfolio, σij the covariance matrix, wi, the weight of asset iin the portfolio (i= 1...N),µthe expected return on

(6)

the portfolio (given), and µi the expected return on asset i. We assume that short selling is allowed, so the weights wi can be of either sign. This classical optimization problem can be solved analytically (Merton, 1972). (However, if short selling is excluded or other linear constraints are introduced, it becomes a quadratic programming problem.)

Expected returns are notoriously hard to determine on short time horizons with any degree of reliability. As our objective in this paper is to study the noise sensitivity of risk measures, we would like to simplify the task and omit the constraint on return. That is, we wish to focus on the minimal risk portfo- lio. At first sight this may seem rather pointless. We note however, that there are special tasks (benchmarking, index tracking) where this is precisely what one wishes to do. The solution is then:

wi =

PN

j=1σij−1

PN

j,k=1σjk−1 . (4)

It is important to note that the optimal weights are given here in terms of the inverse covariance matrix. Since the covariance matrix has, as a rule, a number of small eigenvalues, any measurement error will get amplified and the resulting portfolio will be sensitive to noise. This is the fundamental reason for the difference between portfolio selection and risk assessment of a given portfolio; in the latter case, the covariance matrix does not need to be inverted.

2.2 Geometric interpretation of the Markowitz problem

The Markowitz problem has a straightforward geometric interpretation that will prove to be useful as a guide to intuition in the following. The covariance matrix is positive definite. Then, for a given value ofσP, Eq. (1) is the equation of a (hyper-)ellipsoid. Put in another way, the iso-risk surfaces of variance are ellipsoids. The convexity of these iso-risk surfaces reflects the fact that variance is a convex risk measure.

The principal axes of the risk ellipsoid are proportional to the inverse square root of the corresponding eigenvalues of the covariance matrix. Small eigen- values thus correspond to long axes. If we included the riskless asset in our portfolio, the corresponding axis would be infinitely long, and the risk ellipsoid would go over into an elliptical cylinder. For the purposes of this paper we do not need to consider the riskless asset in the following.

The constraints Eqs. (2,3) are linear, they correspond to hyper-planes. The solution of the problem has to lie on the intersection of these planes, itself a hyper-plane. The solution can be found by blowing up the risk ellipsoid

(7)

Fig. 1. Geometrical interpretation of the Markowitz problem forN = 3: the solution is the point where the inflating iso-risk ellipsoid first touches the intersection of the budget and expected return constraints.

(or cylinder) until it touches the intersection. As the solution is thus the point of tangency of a convex surface and a plane, it is unique. It is also stable in the sense that if we make a small error ε in the specification of the ellipsoid/cylinder, so that it changes its position a little, the solution will shift in a continuous manner. Of course, depending on the curvature of the ellipsoid around the point of tangency, this shift may be smaller or larger, but it will certainly be continuous in ε. (It is proportional to the square root of ε.) This geometric construction is illustrated in Fig. 1.

2.3 Empirical covariance matrices

The covariance matrix has to be determined from measurements on the mar- ket. From the returns xit observed at time t we get the maximum likelihood estimator:

σij = 1 T

XT

t=1

xitxjt (5)

(assuming that the expected values are known to be zero).

For a portfolio ofN assets the covariance matrix hasO(N2) elements. The time series of lengthT forN assets containNT data. In order for the measurement to be precise, we need N ¿ T. Bank portfolios may contain hundreds or thousands of assets. The length of available time series depends on the context,

(8)

but it is always bounded. If we talk about a stock portfolio for example, it is hardly reasonable to go beyond four years, that is T 1000 (some of the stocks may not have been present earlier, economic or regulatory environment may change, etc.). Therefore, N/T ¿1 rarely holds in practice. As a result, there will be a lot of noise in the estimate. For large enough N and T we expect that the error depends only on the ratio N/T, what we describe by saying that it scales in N/T.

The problem we have just described is one of the several manifestations of the ”curse of dimensions”. Economists have been struggling with this problem for ages (Elton and Gruber, 1995). Since the root of the problem is lack of sufficient information, the remedy is to inject external information into the estimate. This means imposing some structure on the matrix σ, which intro- duces bias, but the beneficial effect of noise reduction may compensate for this. These filtering procedures, Bayesian estimators, etc. have a large body of literature of their own. As our primary concern here is the noise sensitivity of risk measures, we would not like to enter into a detailed discussion of the procedures by which this sensitivity is reduced, and will only touch upon the problem of filtering en passant, as we proceed with the main subject.

2.4 An intriguing observation

In 1999 two groups announced an intriguing observation simultaneously (Laloux et al., 1999; Plerou et al., 1999). They noted that the effect of noise on the spectra of empirical covariance matrices is typically so strong that the over- whelming majority (in the example considered: 94%) of its eigenvalues fits into the spectrum of a completely random matrix.

The appearance of random matrices in a portfolio theoretical context triggered a lot of activity, mostly among physicists. Subsequently, Laloux et al. (2000) proposed a filtering method based on random matrix theory (RMT). This has been further developed and refined by many workers (Plerou et al., 2002;

Burda et al., 2004; Papp et al., 2005)

2.5 The eigenvalue spectrum of Wishart matrices

In order to understand the essence of the proposed filtering procedure, we have to recall a few elementary facts about random matrices.

LetX be anN×T matrix whose elementsxitare independent, identically dis- tributed (i.i.d.) random variables with zero mean and finite second moment.

Then in the limit N, T going to infinity with N/T fixed and N/T < 1, the

(9)

eigenvalue density of the matrix σ = T1XX0, where X0 is the transpose, will converge to the Wishart or Marchenko–Pastur spectrum (eigenvalue distribu- tion) (Marchenko and Pastur, 1967):

ρ(λ) = 1 N

dn(λ)

dλ = T

2πN

q−λmin)(λmax−λ)

λ , (6)

where

λmax,min =

1±

sN T

2

. (7)

Several remarks are in order:

– IfX is the matrix obtained from arranging the data in time series of length T for N assets, then the matrix σ = T1XX0 in the theorem is the sample covariance matrix for these assets.

– IfT < N the eigenvalue distribution remains the same, with and extra point of weight 1−T /N at the origin.

– If T = N the Marchenko–Pastur law is related to the Wigner semi-circle law (Wigner, 1958), the first limit theorem for (squared) random matrices.

– The proof extends to slightly dependent and inhomogeneous entries.

– The convergence is fast, believed to be of1/N, but proved only at a lower rate (Bai, 1993).

The significance of the last remark is that the eigenvalues of finite Wishart matrices start to obey the Marchenko-Pastur law very early, most of them fall inside the support of the limiting distribution already for N’s as small as 20 or 50. ForN 400, which was the size of the portfolio in Laloux et al. (1999) and Laloux et al. (2000), the histogram of eigenvalues reproduces the limiting distribution to a fair degree of precision.

2.6 A filtering method based on random matrix theory

It is here that we can return to the filtering method proposed by Laloux et al.

(1999). They noticed that the spectrum of a large empirical covariance ma- trix consisted of a quasi-continuous band of small eigenvalues, plus a number (∼20) of discrete, medium size eigenvalues, plus an isolated, large eigenvalue, far above the rest. This large eigenvalue is interpreted as the whole market, the discrete, medium eigenvalues are associated with the main industrial sec- tors, while the structure in the quasi-continuous band cannot be resolved at the given level of noise and becomes indistinguishable from the spectrum of

(10)

random matrices. This means that this random band does not contain in- formation concerning the structure of the market. Accordingly, the proposed filtering consists basically in discarding as pure noise that part of the spectrum that falls below the upper edge of the random spectrum. Information is carried only by the eigenvalues and their eigenvectors above this edge. Optimization should be carried out by projecting onto the subspace of large eigenvalues, and replacing the small ones by a constant chosen so as to preserve the trace. This procedure drastically reduces the effective dimensionality of the problem, and has been reported to lead to much improved optimal portfolios (Laloux et al., 2000; Plerou et al., 2002).

The method can be regarded as a systematic version of principal component analysis, where the upper edge of the random band serves as a cutoff, thus providing an objective criterion on the number of principal components.

We have performed extensive comparative tests and found that this filtering works consistently well (Pafka and Kondor, 2004). Further efforts to dig out valuable information from below the random edge are underway (Burda et al., 2004; Papp et al., 2005).

Although the filtering method based on random matrix theory has thus been shown to be at least competitive, and in most cases even superior to other filtering procedures, it also raises a few questions that need further inquiry.

– If empirical covariance matrices contain mainly noise, how is it possible that they are still widely used in the industry?

– To what extent is the large number of junk eigenvalues a good characteri- zation of the impact of noise on the quality of the portfolio? Given that the solution of the optimization problem tends to lie in the subspace spanned by the eigenvectors corresponding to the largest eigenvalues, and these stay relatively stable from sample to sample, is it not conceivable that even substantial fluctuations in the subspace of the random eigenvectors have a relatively minor effect on the optimal portfolio?

– In addition, Laloux et al. (2000), like most of the other workers who con- sidered the filtering problem, used real empirical data where other parasitic effects (like non-stationarity) may become additional sources of noise and may overlap with the fundamental difficulty caused by the finite length of time series.

Motivated by these questions we decided to study the problem in a ”labora- tory” setting. Rather than using real-life data, we decided to generate artificial time series where we have total control over the underlying stochastic process.

The rationale behind this is that in order to be able to compare the sensi- tivity of risk measures to noise, we would better get rid of other sources of uncertainty, like non-stationarity.

(11)

Our strategy is to choose various model covariance matrices and generate long simulated time series by them. Then we cut out segments of length T from these time series, as if observing them on the market, and try to reconstruct the covariance matrices from them. We optimize a portfolio both with the ”true”

and with the ”observed” covariance matrix under the given risk measure and determine the error due to the finite observation time. The models are chosen to mimic at least some of the characteristic features of real markets. Four simple models of slightly increasing complexity will be considered.

2.7 A measure of the effect of noise on the optimal portfolio

Before turning to our market models, however, we would like to introduce a measure that characterizes the effect of noise or estimation error on portfolio selection. One could use various metrics defined over the space of covariance matrices for this purpose: one may define a distance between matrices, or between the spectra of these matrices, etc. We believe, however, that the most relevant measure is the relative sub-optimality, or relative risk increment of the portfolioq01, where

q20 =

P

ijwiσij(0)wj

P

ijwi(0)∗σij(0)wj(0)∗. (8)

Hereσ(0) is the ”true” covariance matrix (the empirical one will be called σ), w(0)∗ and w are the weights of the portfolios optimized under σ(0), and σ, respectively. The square root of the denominator in Eq. (8) is the true risk (standard deviation) of the portfolio, while the square root of the numerator is the risk we run when using the weights derived from the empirical covariance matrix, whereas the true underlying process is governed by σ(0).

This measure was introduced in Pafka and Kondor (2002). It assumes, of course, that we know the true process. By a slight extension of the definition, one can make it applicable also in the context of empirical data (Pafka and Kondor, 2003), but this will not concern us in this paper.

Eq. (8) implicitly refers to a given sample, a segment of length T from the time series. Therefore, q0 itself is a random variable. Its distribution will be studied later on.

(12)

2.8 Model building

2.8.1 Model 1: The simplest covariance matrix

Let us imagine we have a portfolio of standard, independent, normal variables.

The corresponding covariance matrix is the simplest concievable: just the unit matrix. This has an N-fold degenerate eigenvalue λ = 1. If we now generate N series of lengthT of these variables and try to reconstruct the unit matrix from these data through the formula Eq. (5), we find a much more complicated structure that will fluctuate from sample to sample and will converge to the unit matrix only for N fixed and T → ∞. If we consider large N and T values, however, such that their ratio is fixed, we will find that noise, or the measurement error due to the finite T, lifts the degeneracy of the eigenvalue λ = 1 and the resulting spectrum will converge to the Marchenko–Pastur spectrum of Eq. (6).

Let us now evaluate the measure q0 for N, T → ∞ with N/T <1 fixed. Since σij(0) =δij, where δij is 1, if i = j, and 0 otherwise, the true optimal weights are w(0)∗i = N1, and the denominator of Eq. (8) is:

X

ij

wi(0)∗σij(0)wj(0)∗ =X

i,j 1 N δij 1

N = N1. (9)

For the evaluation of the numerator we rotate to the coordinate system spanned by the eigenvectors of the empirical covariance matrix to obtain σij = λiδij. Then we get:

wi =

P

jσ−1ij

P

j,kσ−1jk =

P

j 1 λi δij

P

jk 1 λj δjk

=

1 λi

P

j 1 λj

, (10)

X

i,j

wiσ(0)ij wj =X

i,j 1 λi

P

k 1 λk

δij 1 λj

P

k 1 λk

=

P

i,j 1 λiδij λ1

³P j

k 1 λk

´2 =

P

i 1 λ2i

³P

i 1 λi

´2, (11)

q20 =N

P

i 1 λ2i

³P

i 1 λi

´2 =

1 N

P

i 1 λ2i

³1 N

P

i 1 λi

´2 =

R 1

λ2 ρ(λ) dλ

³R 1

λρ(λ) dλ´2. (12) The integrals are easy to work out, and we end up with the strikingly simple result (whenN, T → ∞ with N/T <1 fixed):

q0 = 1

q1 NT . (13)

(13)

This formula dates back to a discussion between two of the present authors (I. K. and S. P.) and G. Papp and M. Nowak, and was published in Pafka and Kondor (2003), Burda et al. (2003) and Papp et al. (2005).

It is well known that the rank of the empirical covariance matrix is the smaller ofN andT. WhenT becomes smaller thanN, the covariance matrix develops zero eigenvalues, and the portfolio optimization problem becomes meaningless.

Eq. (13) shows that as we approach the limit N/T = 1, the relative risk in the portfolio diverges.

0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6

0 5 10 15 20 25

q0

N=100 N=200 N=400 N/T=1/3

Fig. 2. Distribution of q0 for increasingN withN/T fixed (simulation results).

The derivation of Eq. (13) is true only in the limit N, T → ∞. If we want to study the distribution of q0 as a random variable, we have to resort to simulations. Fig. 2 displays the histogram ofq0 for three different values of N, with N/T fixed. It can be seen that the distribution becomes sharper as N increases. On the other hand, in Fig. 3 we show the results of simulations for the average of q0 for various N and T values. The data points nicely fit the theoretical result (corresponding to the limitN, T → ∞) already for moderate values of these parameters, and demonstrate the scaling in the variable N/T.

2.8.2 Model 2: The single-factor or market model

Model 2 is a small step towards reality: we assume that our random variables are not independent, but are correlated through a single factor that can be regarded as the whole market. The corresponding covariance matrix is shown in Fig. 4 (a). (For the sake of simplicity, we assume such a high degree of symmetry that all the diagonal elements are the same (1) and all the off- diagonal elements are also the same (ρ0); in such a situation the distinction

(14)

0 0.2 0.4 0.6 0.8 1 1.2 0

1 2 3 4

q0

N/T Eq. (13)

N=100 N=200 N=400

Fig. 3. Meanq0 as a function ofN/T for different values of N (simulation results).

The data points collapse on the curve given by Eq. (13) (also shown in the figure).

between covariances and correlations can be scaled out.)

Such a matrix has an (N 1)-fold degenerate small eigenvalue 1−ρ0, and a large (Frobenius-Perron) eigenvalue 1 + (N1)ρ0. Noise in the corresponding empirical covariance matrix will resolve the degeneracy of the small eigenvalues and smears them out into the Marchenko-Pastur spectrum again, while the large eigenvalues will be effected only to O(1/N).

Since the weight of the large eigenvalue in the spectral density is 1, which is negligible compared to the total weight N 1 of the small band, one can evaluateq0 again, and, within O(1/N) corrections, arrive at the same formula as for Model 1, Eq. (13).

Simulations for this Model and also the subsequent ones can be performed by a standard procedure. Given a covariance matrix σ(0), we can construct the corresponding empirical correlation matrix from it as follows: we generate finite time series from the true correlation matrixσ(0),xit =PjLijzjt, where L is the Cholesky decomposition of the true correlation matrix σ(0) = L L0, and thezjtare independent standard normal random variables with zero mean and unit variance. Then the empirical correlation matrix is given by the usual estimator of Eq. (5) as σ= T1 PTt=1 xitxjt.

Table 1 shows some simulation results for Model 2. It can be seen that for large N and T Eq. (13) is a good approximation to q0 also for Model 2.

(15)

ρ0

(a)

ρ0

ρ1 ρ1

ρ1

(b)

ρ0 ρ1

ρ2 ρ3

(c)

Fig. 4. Structure of correlation matrices used in Models 2, 3 and 4, respectively.

Table 1

Sample simulation results for our toy models for several parameter values. ”q0avg.”

and ”q0 dev.” denotes the average and standard deviation of q0 over 1000 Monte Carlo iterations. ”q0 diff.” denotes the difference between ”q0 avg.” and the result of Eq. (13), while ”q0 err.” denotes the estimation error of ”q0 avg.”

Mod. N T ρ0 Nk ρk q0 avg. q0 dev. q0 diff. q0 err.

1 100 200 1.4103 0.0690 -0.0039 0.0022

2 100 200 0.2 1.4114 0.0700 -0.0028 0.0022

2 100 200 0.5 1.4101 0.0701 -0.0041 0.0022

3 100 200 0.2 20 0.4 1.4084 0.0719 -0.0059 0.0023

4 100 200 0.3 25 0.4–0.8 1.4117 0.0724 -0.0025 0.0023 4 100 200 0.2 5–50 0.4–0.9 1.4092 0.0703 -0.0050 0.0022

2 20 40 0.2 1.3894 0.1513 -0.0247 0.0047

3 20 40 0.1 5 0.3 1.3844 0.1511 -0.0298 0.0047

2 400 800 0.2 1.4125 0.0352 -0.0016 0.0011

4 400 800 0.3 100 0.4–0.8 1.4126 0.0360 -0.0016 0.0011 4 400 800 0.2 50–200 0.3–0.9 1.4141 0.0354 -0.0001 0.0011 2.8.3 Model 3: Symmetrical market-plus-sectors model

Here we assume that the assets can be grouped into ”sectors” of size N1 within which correlations (ρ1) are stronger than between assets belonging to different sectors (Fig. 4 (b)). The true spectrum consist of three different eigenvalues: an³N NN

1

´–fold degenerate small eigenvalue 1−ρ1, an³NN

1 1´- fold degenerate medium eigenvalue 1 + (N1 1)ρ1 −N1ρ0, and the singlet market eigenvalue 1 + (N11)ρ1+ (N −N10. If N ÀN1 À1, these scales will clearly separate, see Fig. 5 (a). The effect of noise on the spectrum of the corresponding empirical matrix will be to resolve the degeneracies of the small and the medium eigenvalues, so we will have two separate bands plus the market eigenvalue.

(16)

10−1 100 101 102 103 (b)

λ

10−1 100 101 102 103

(a)

λ

Fig. 5. Scale separation of eigenvalues of correlation matrices of Model 3 and 4.

Depending on the parameters N1, ρ0, ρ1, the behaviour of this model can already become rather complicated, but for large N and N1, q0 will blow up as given by Eq. (13) again (see the simulation results reported in Table 1).

The strength of the random-matrix-theory-based filtering was demonstrated on these models in (Pafka and Kondor, 2004) and it was shown that the method, especially when tuned to the structure of the underlying structure, can work extremely efficiently indeed; it not only reduces the divergent error asN approaches T, but also allows one to penetrate into the region belowT, with remaining essentially on the same level of reduced error.

2.8.4 Model 4: Asymmetrical market-plus-sectors model

As a further step towards the structure of real markets, we relax the assump- tion of symmetry between the sectors. The overall market correlation will still be assumed to be the same ρ0 > 0 for each asset, but the elements ρk, (k= 1...K), describing intra-sector correlations in the diagonal blocks will be assumed to be different constants within each block (and larger than those outside the blocks, ρk > ρ0). The block sizes will also be allowed to take on different values Nk, k = 1...K. The model just described is the same as the one introduced by Noh (2000). For the sake of simplicity again, we keep to the assumption that the correlation and covariance matrices are the same, i.e.

we set the variance of individual instruments to unity. The structure of the correlation matrix is then given by the pattern shown in Fig. 4 (c).

Such a matrix, containing K sectors, possesses K small eigenvalues given by 1−ρk <1,k = 1...K. The corresponding eigenvectors have only two nonzero elements (of equal absolute value but opposite sign). Their multiplicity is Nk1,k = 1...K (whereNk is the size of sector k), i.e. the total multiplicity of the small eigenvalues isN−K. In addition, there areK larger eigenvalues (λ > 1), typically singlets, that depend on all the parameters of the model:

(17)

ρ0, ρk, and Nk. That is, a K-sector matrix has 2K different eigenvalues. By virtue of the Frobenius–Perron theorem, the largest eigenvalue will necessarily be a singlet ofO(N), with an eigenvector having all positive components. This mode can then be identified with the whole market, while the other K 1 eigenvalues will be associated with the sectors and they correspond to the medium eigenvalues in the previous model, this time their degeneracy resolved by the lack of symmetry (Fig. 5 (b)).

As for the eigenvalue structure of the corresponding empirical matrix, we will find the usual effect: sample to sample fluctuations will lift the degeneracy of the small eigenvalues, and the familiar Marchenko–Pastur spectrum results.

The market eigenvalue will stay basically unaffected by noise, while its effect on the intermediate eigenvalues depends on the parameters. If the parameters are such that these eigenvalues are well separated and distinct, and N/T is small enough, then the intermediate eigenvalues will remain distinct even in the empirical spectrum.

On the whole, this simple model reproduces all the main features observed in the spectra of real-life empirical covariance matrices (Laloux et al., 1999;

Plerou et al., 2002).

Simulations (see results in Table 1) show that for large N and T, q0 is well represented by Eq. (13) for Model 4 as well.

2.9 Fluctuating weights and the distribution of q0

We have seen that the effect of noise on the average relative standard deviation can be very strong, especially for large portfolios and not sufficiently long time series. The averageq0 is, however, only a global measure of the effect of noise.

A more detailed characterization can be obtained by considering the optimal weights belonging to the empirical covariance matrix. As this matrix fluctuates from sample to sample, so do the weights.

In Fig. 6 we exhibit the optimal empirical weights for a small portfolio (N = 10) of independent standard Gaussian items, while in Fig. 7 we show the same for a larger portfolio (N = 100). The simulation results displayed have been obtained from a randomly chosen sample of length T = 500, so the ratio N/T is quite far from the critical threshold N/T = 1 even for the larger portfolio.

As in this illustrative example all the assets are assumed to be completely equivalent, the ”true” weights are all equal (1/N). They are also shown in the figures for comparison. The deviation of the empirical weights from their true value is striking.

Figs. 8 and 9 demonstrate the fluctuations of the empirical weights in time for

(18)

1 2 3 4 5 6 7 8 9 10

−0.1

−0.05 0 0.05 0.1 0.15 0.2 0.25 0.3

i wi

Fig. 6. Optimal portfolio weightswi (i= 1...N) obtained from an estimated corre- lation matrix compared to the ”true” weights (dashed line) for N = 10,T = 500.

0 20 40 60 80 100

−0.01

−0.005 0 0.005 0.01 0.015 0.02 0.025 0.03

i wi

Fig. 7. Optimal portfolio weightswi (i= 1...N) obtained from an estimated corre- lation matrix compared to the ”true” weights (dashed line) forN = 100,T = 500.

N = 100 andT = 500. In Fig. 8 we show the path of a given weight calculated from non-overlapping time windows of length T, while in Fig. 9 we display the same for a window of the same length T, but stepping forward only one unit at a time. In the former case the weight undergoes wild fluctuations, in the latter its steps are strongly correlated, hence the path is much smoother, but it stays mostly far from its true position. Evidently, neither of these is

(19)

0 50 100 150 200 250 300 350 400

−0.01

−0.005 0 0.005 0.01 0.015 0.02 0.025 0.03

t w1(t)

Fig. 8. w1 as a function of time for N = 100, T = 500, when time windows are non-overlapping (the dashed line shows the ”true” value).

0 500 1000 1500 2000

−0.01

−0.005 0 0.005 0.01 0.015 0.02 0.025 0.03

t w1(t)

Fig. 9.w1 as a function of time forN = 100,T = 500, when the time window steps forward one unit at a time (the dashed line shows the ”true” value).

very promising from a risk management point of view. In the former case, the hypothetical portfolio should be totally reorganized every periodT, in the latter it would seem deceptively stable from one day to the next, but it would be far from its true composition for most of the time.

Compared to the strong instability of the weights, the fluctuations of the whole portfolio are relatively mild. Indeed, let us recall Fig. 2 that shows the histogram of q0 as it fluctuates from sample to sample. For large N and

(20)

sufficiently small values of N/T the distribution is sharp, and the empirical value measured in any given sample can be regarded as a fair representation of the average. ForN/T close to unity, however, the width of the distribution of q0 diverges, along with the variance of the weights. A detailed study of the distribution ofq0 and the fluctuation of the weights is in progress and will be published elsewhere.

All our results were obtained on the basis of toy models which may raise doubts as to their relevance for real markets. Admittedly, these toy models represent highly idealized circumstances: perfect stationarity of the process and Gaussian distribution of returns. Neither of these holds true on real mar- kets, but it is precisely for this that we believe that the effect of estimation error on real-life portfolios will be, if anything, even stronger than in our idealized world.

Given the fact thatN/T is never small in practice, and, in fact, it may even go beyond the critical valueN/T = 1, it is imperative that some sort of filtering or cleaning procedure be applied, in order to reduce the effect of noise. A number of these techniques is available in the literature (Elton and Gruber, 1995) and widely used in practice. The discussion of filtering is not a subject of this paper. In the next section we wish to consider the noise sensitivity of some alternative risk measures for which hardly any filtering procedures exist.

For a fair comparison between the noise sensitivities of various risk measures, we have, therefore, to use unfiltered results for the variance.

3 Alternative risk measures

Since the normal distribution has only two non-vanishing cumulants, the mean and the variance, the typical intensity of fluctuations can be completely char- acterized by a single number, the variance or the standard deviation. For other distributions the variance will not suffice. For long-tailed distributions that asymptotically fall off like a power-law, variance can be particularly mis- leading as a risk measure. Real-life portfolios often display this long-tailed behavior, and then they should not be optimized under variance. Alternative risk measures abound both in practice and the literature. We cannot cover all of them here, but focus on a few that have some practical and/or theoretical importance.

3.1 Value-at-Risk and regulatory measures

(21)

Value-at-Risk (VaR) and other risk measures implied by international capital regulation are, strictly speaking, not the subjects of this study. We merely mention them, because they are widely used in practice so we feel compelled to clarify why they do not fit into the framework of this discussion.

VaR (Value-at-Risk) is a high (95%, or 99%) downside quantile, a threshold beyond which a given fraction (5% or 1%) of the statistical weight resides.

Introduced in 1994 (RiskMetrics, 1994), it quickly spread over the whole in- dustry, and has also become firmly embedded into regulation. It has some unquestionable merits compared to ”what if” type measures (e.g. theGreeks):

it is universal, which means it can be applied to any portfolio; as a functional defined on the pdf, it has a clear probabilistic content; and it is expressed in money, so it speaks directly to the banker. Unfortunately, it has some serious shortcomings too. The most obvious is its threshold character: A 99% VaR tells us the smallest (absolute) value of the loss that we may incur with 1% proba- bility, but does not say anything about thetypical value of this loss. Another, much criticized, feature is the lack of convexity. As a quantile, VaR has no particular reason to be a convex measure, and, indeed, it is easy to construct examples where convexity is grossly violated. This is especially troublesome, because VaR has, over the years, been promoted from a diagnostic tool to a de- cision making tool, so one ought to be able to optimize a portfolio under VaR, which may be problematic for the lack of convexity. (For a thorough discussion of the relative merits and drawbacks of VaR see, for example Danielsson et al.

(2001), or the papers by Acerbi and Tasche (2002), Frey and McNeil (2002), Rockafellar and Uryasev (2002) and others in the special issue on ”Statistical and Computational Problems in Risk Management: VaR, and Beyond VaR”

of the Journal of Banking and Finance, Volume 26, July 2002.)

The lack of convexity was one of the main motivations for a number of workers to look for alternative risk measures, and led to the introduction of the coher- ent risk measure axioms (Artzner et al., 1997, 1999). Although the practical consequences for large banking portfolios of the lack of convexity may be de- batable, we do not include VaR in our present discussion, because it displays such a complicated nonmonotonic and nonconvex behaviour even for fixed N and very largeT that makes a straightforward comparison between the noise sensitivity of variance and that of VaR impossible. We plan to return to the problem of noise in the context of VaR in a subsequent publication.

VaR was embraced by international regulation, with the 1996 Amendment to the Basel I Capital Accord allowing banks to calculate the capital charge on their trading book positions from VaR-based internal models. For those institutions that did not want to develop internal models (or, in view of the penalizing factor 3 to 4 by which they have to multiply the result obtained from their internal model, did not want to report their risk on the basis of internal models), the regulator provided a simple algorithm for determining

(22)

the minimal capital requirement. This standard model has not been accepted in the US, but is offered as an option, especially for smaller institutions, on other markets, including Europe. The capital charges assigned to various po- sitions in the standard model, as described in the Directives of the European Commission (CAD, 1993, 1998) purport to cover the risk in those positions, therefore they must be regarded as some kind of risk measures. However, they are not associated with any probabilistic feature of the positions, in particular, they are not functionals defined on the pdf of fluctuations. In addition, some of these ”regulatory measures” violate the requirement of convexity again (Kon- dor et al., 2004). For these reasons we cannot make a meaningful comparison between the noise sensitivity of these regulatory measures and that of variance or expected shortfall. Nevertheless, both VaR and the regulatory measures of the standard model share a special property with expected shortfall and the other risk measures to be discussed below: they are piece-wise linear, their level surfaces are polyhedrons. As will be seen, this property poses some spe- cial problems from the point of view of noise sensitivity, and this is primarily the reason why we mention VaR and the regulatory measures here.

3.2 Absolute deviation

Absolute deviation (AD, the expected value E(|x|) of the absolute value of fluctuations) is an obvious alternative to standard deviation. Some risk man- agement methodologies (e.g. Algorithmics, 2002) do actually use AD to char- acterize the fluctuation of portfolios which offers a huge computational advan- tage in that the resulting portfolio optimization task can be solved by linear programming. However, the effect of estimation noise on AD has been largely ignored in the literature (except Simaan, 1997). Preliminary results of our study of the noise sensitivity of AD will appear in Kondor et al. (2006).

Given a time series of finite lenght T, the objective function to minimize is (Konno and Yamazaki, 1991)

1 T

X

t

¯¯

¯X

i

wixit¯¯¯, (14)

subject to the usual budget constraintPiwi = 1. (We disregard the constraint on expected return again.)

This is equivalent to the following linear programming problem:

min 1 T

X

t

ut (15)

(23)

s.t. ut+X

i

wixit 0 (16)

utX

i

wixit0 (17)

X

i

wi = 1 (18)

(where the minimization is carried out over wi and the additional variables ut, t = 1...T, while Eqs. (16–17) represent 2T constraints, a pair for each t= 1...T).

Now we apply the same simulation-based strategy, as with the variance. We generate artificial data of a known structure which we choose here, for the sake of simplicity, to be independent standard normal again. Since for normal fluctuations E(|x|) = q2πσ, the ratioq0 of the AD of the portfolio constructed by the above optimization procedure (wi) and the AD of the ”true” opti- mal portfolio (w(0)∗i ) is equal to the ratio of the standard deviations of these portfolios:

q20,AD =

P

iwi2

P

iw(0)∗i 2. (19)

Since, by symmetry, the ”true” optimal weights are all equal (w(0)∗i = N1), the ratio q0, which can be used to characterize the sub-optimality of the portfolio obtained from time series of finite length is

q20,AD =NX

i

wi2. (20)

We have performed simulations for this quantity for various N/T values and show the results in Fig. 10. It is clear that the data points collapse on a single curve again which shows that (the mean of) q0,AD scales in N/T also in this case. Also shown in the figure are results obtained for the variance. As Fig. 10 clearly demonstrates,q0,AD lies above the corresponding curve for the variance which means that AD as a risk measure is more sensitive to noise than the variance. (This result is consistent with Simaan (1997).)

It is easy to see the reason for this enhanced sensitivity. The iso-risk surfaces of AD are polyhedrons, as opposed to the ellipsoidal level surfaces of the variance. The solution of the optimization problem is found at the point where this risk-polyhedron first touches the plane of the budget constraint. This happens typically at one of the corners of the risk-polyhedron. If we construct this polyhedron from finite length time series, it will inevitably contain some estimation error or noise. Accordingly, the shape and/or the position of the risk-polyhedron will change from sample to sample. As a result, the solution

(24)

0 0.2 0.4 0.6 0.8 1 1.2 0

1 2 3 4

q0

N/T Eq.(13)

N=50 N=100 N=150

Fig. 10.q0,AD as a function ofN/T (simulation results). The data points collapse on a curve situated above the curve obtained previously for the variance, also shown in the figure (with dashed line).

−1 −0.5 0 0.5 1

−1

−0.5 0 0.5 1

absolute deviation

w1

w2

−1 −0.5 0 0.5 1

−1

−0.5 0 0.5 1

variance

w1

w2

Fig. 11. Iso-risk surfaces for N = 2, T = 4 for AD and variance. The solutions to the corresponding risk minimization problems are also shown in the figure (vectors pointing from the origin to a point on the budget line), along with the ”true”

solution (vector shown in bold).

will jump to a new corner of the new polyhedron, see Fig. 11. This discontinuity is the basic reason for the enhanced sensitivity. For N fixed and T going to infinity, the polyhedron goes over into the ellipsoid of constant variance and the difference between AD and variance-optimized portfolios disappears. For finite T, however, the piece-wise linear character persists, and it is precisely this, otherwise attractive, feature of AD that makes it prone to estimation noise.

(25)

3.3 Expected shortfall

Expected shortfall (ES) (see e.g. Acerbi, 2004) is the mean loss beyond a high threshold β defined in probability (not in money). For continuous pdf’s it is the same as the conditional expectation beyond the VaR quantile, sometimes called Conditional VaR or CVaR, but for discrete distributions (such as the histograms in empirical studies) CVaR and ES are different, see Acerbi and Tasche (2002) for a careful discussion of the subtle difference between the two.

As a kind of conditional average, ES certainly provides a more reasonable characterization of large losses than VaR. Besides, at variance with VaR, ES is a coherent (hence also convex) measure (Acerbi and Tasche, 2002), perhaps the simplest and most intuitive of all the coherent measures. (In fact, it also belongs to the special subset of coherent measures called spectral measures (Acerbi, 2004).) Due to these attractive properties, ES is strongly promoted by people who are concerned about the inconsistency of VaR. In addition, Rockafellar and Uryasev (2000) have shown that the optimization of ES can be reduced to linear programming for which extremely fast algorithms exist.

The ES objective function (to be minimized over v and the weights wi) is (Rockafellar and Uryasev, 2000):

Ã

v+ 1 (1−β)T

X

t

[−vX

i

wixit]+

!

, (21)

subject to Piwi = 1. (The constraint on expected return will be omitted, as all through this paper.) We have used the notation [a]+ = a, if a > 0, zero otherwise, and β is the ES threshold.

The optimization task above is equivalent to the following linear programming problem:

min

Ã

v+ 1 (1−β)T

X

t

ut

!

(22) s.t. ut ≥ −v X

i

wixit (23)

ut 0 (24)

X

i

wi = 1 (25)

(where now the minimization is carried out over wi, v and ut, t= 1...T).

To test the noise sensitivity of ES we use a portfolio of independent stan- dard normal random variables, and, in complete analogy with the previous

(26)

cases, measure the sub-optimality due to finiteT samples in terms of the ratio q0,ES between the risk (as measured by ES) evaluated for the optimal weights obtained for a given sample and the same with uniform weights.

Simulations of this quantity confronts us with a completely unexpected phe- nomenon: the optimization problem above does not always have a solution!

More concretely, the existence of a solution depends on the sample and thus becomes a probabilistic issue. The probability of the existence of a solution depends on the parameters (N, T, and β) of the problem. Before setting out to measure the noise sensitivity of ES, we have to clarify this feasibility prob- lem, and map out those regions of parameter space where we can hope to find solutions with a nonnegligible probability. It turns out that the problem is fairly complicated, therefore, in order to gain a preliminary orientation, first we consider a special case, when the thresholdβis very close to one. The point is that if β is so close to unity that (1−β)T 1, then only the single worst loss will contribute to ES. This limiting case represents a coherent (spectral) measure in its own right; we will call it maximal loss (ML). The feasibility problem of ES is present in ML too, but it is analytically tractable, so ML provides a convenient laboratory for understanding the phenomenon. This is what we turn to now.

3.4 Maximal loss

As a risk measure, maximal loss may appear to be over-pessimistic: we con- sider the worst loss ever incurred on a portfolio of a given composition, then minimize this loss over the weights. The objective function to be minimized is then

maxt

³X

i

wixit´, (26)

subject to the budget constraint Piwi = 1.

The minimax problem (Young, 1998) so defined is equivalent to the following linear programming task:

min u (27)

s.t. u≥ −X

i

wixit (28)

X

i

wi = 1. (29)

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

We prove in case of a GBM stock process that in the high growth regime all spectral risk measures (including Expected Shortfall) will be negative at high t.. We also investigate

We prove the practical feasibility of this in the case of a domestic startup company – whose basic goal is scalable growth – how to adapt and integrate the principles of

A special case of stratified samples is considered where each stratum has the same number of units and from each stratum, one unit is selected in the sample with simple ran-

In case of stone products that include rounded edges we have to know the average chipping rate and the maximal edge chipping depth generated by the milling pro- cess.. With the help

In the case of hoth premixes and feeds, the homogeneity was numerically evaluated hy means of the corrected empirical relative standard deviation of the sample

A theorem for planar case and its generalization for spatial case are proposed to determine the projection of a virtual dis- placement to the orientation under the case of knowing the

Drawing a sample of n from a population with average m and standard deviation d, with replacement, the random variable of the sample sum disperses around its expected value nm in

The most frequent rule of password selection, i.e. that a password must contain lower case and upper case letters, digits and punctuation marks or other special