Factor diagnostics: monotonicity with Romano and Wolf, Patton and Timmerman

Quantitative processes are necessarily complex. Explaining them requires a deep understanding of the beliefs. Confirming those beliefs requires a deep understanding of the maths. It is a snowy, blistering cold day in London. A perfect time to be introspective. So with your consent, I’d like to review a few methods for confirming factor premia – tests for monotonicity.

There are sound theoretical reasons for expecting cross sectional differences in exposures to result in monotonic difference in returns. If one constructs n-tile portfolios, the lowest n-tile should under-perform the second n-tile and so on with the highest n-tile providing the highest performance.

The tests to confirm this outcome are necessarily complex. The first of these test is by Andrew Patton and Allen Timmerman. Another by Joseph P Romano and Michael Wolf. Both examine the differences in adjacent n-tile portfolios. How often do the returns of portfolio n-tile_{k-1} outperform n-tile_{k}? That should not happen too often. Patton and Timmerman asks if the relationship is monotonic increasing. Romano and Wolf respond with not so fast. There are cases where some n-tiles are larger than the previous but also larger than the next n-tile. Not strictly increasing. When this is the case, are the violations small enough in occurrence to still accept monotonicity?

The difference is that the Patton and Timmerman test allows for a wider class of outcomes to pass the monotonicity test. Romano Wolf is stricter. From the conclusion section of Romano and Wolf:

In a recent paper, Patton and Timmermann (2010) propose a new test, taking all categories of the underlying characteristic into account. Compared to previous related proposals, they are, to our knowledge, the first ones to postulate a strictly increasing relation as the alternative hypothesis of the test, rather than as the null hypothesis. This is the correct formulation if the goal is to establish strict monotonicity with a quantifiable statistical significance. In addition, they postulate a weakly decreasing relation as the null hypothesis. Compared to allowing, more generally, also for a non-monotonic or weakly increasing relation under the null, this approach results in a smaller critical value and thereby in higher power of the test. But if a non-monotonic or weakly increasing relation is not ruled out, the test can break down and falsely ‘establish’ a strictly increasing relation with high probability. Therefore, the test of Patton and Timmermann (2010) should only be used as a test for the direction of (existing) monotonicity.

In this paper, we have proposed some alternative tests that also allow for a non-monotonic or weakly increasing relation under the null and that successfully control the probability of a type 1 error. These tests do not require modeling the dependence structure of the time series nor the covariance matrix of the observed return differentials. A suitable bootstrap method is used to implicitly account for these unknown features of the underlying data generating mechanism. As is unavoidable, such tests have lower power compared to the test of Patton and Timmermann (2010). This is the price one has to pay to safe-guard against the possibility of falsely ‘establishing’ strict monotonicity (beyond the nominal significance level of the test) in general settings.

Here is my code. I’ve altered Patton and Timmerman’s block bootstrap code with a version inspired by Kevin Sheppard’s code, it is much faster. Patton’s code is quite clear to understand, but not fast.
Change to :

% generating the time indices for the bootstrapped data:
bootdates = block_bootstrap(T,bootreps,block_length,rand_state);
%bootdates = stat_bootstrap_function_21(T,bootreps,block_length,rand_state); old code

— the disclaimer bits and BSD 2-clause licence—-

If you injure yourself using any of this, you’ve only yourself to blame.
Copyright 2018 louis scott lscott@kiemaadvisors.com

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

----- the demo: demo_mono_rw.m  ----

% ---------------------------------------------------------
%  AUTHOR: lscott@kiemaadvisors.com

% ---------------------------------------------------------
% 
%  Copyright 2018 louis scott 		 
%  This file is released under the BSD 2-clause license.
%  The BSD 2-clause license is located in the license directory,
%  or type licence at the matlab/octave command prompt
%
% ---------------------------------------------------------

% for a patton and timmerman test, substitute
% out_pt = MR_test_22(data,pm.bootreps,0,pm.bootreps,pm.rand_state)

  pm.kernel = 'quad';
  pm.bootreps =  2000;
  pm.rand_state = 123456;
  
  data = randn(2000,5);

  for ii = 1:3
    rescale = .0225 +(ii-1)*.004;
    datam = data + repmat(rescale*[1:5],2000,1);
    out_rw = MR_test_RW(datam,pm); 
    disp(out_rw(4));
    eval(['subplot(31' num2str(ii) ');']); 
    semilogy(cumprod(1+.01*datam),'Linewidth',2);
    axis([1 2000 .8 100])
    legend({'1','2','3','4','5'},'Location','NorthWest')
    title(sprintf('Probability of non-monotonicity: %4.3f',out_rw(4)))
  end
  print('demo_mono_rw','-dpng')


----- Romano wolf in matlab/octave -----
function res = MR_test_RW(data,params)
%
% Function to compute the "monotonic relationship" tests in: 
% Joseph P Romano and Michael Wolf, Testing for Monotonicity in Expected Asset Returns 
% https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1858761
%

% ---------------------------------------------------------
%  AUTHOR: lscott@kiemaadvisors.com

% ---------------------------------------------------------
% 
%  Copyright 2018 louis scott 		 
%  This file is released under the BSD 2-clause license.
%  The BSD 2-clause license is located in the license directory,
%  or type licence at the matlab/octave command prompt
%
% ---------------------------------------------------------

[T,n] = size(data);
res = nan(4,1);

params = setdefaults(params);

if params.direction==0;
    params.direction = sign(mean(data(:,n)-data(:,1)));
end

if abs(params.direction)<2
    % then need to difference the data
    diffdata = data(:,2:end)-data(:,1:end-1);
    % differences across the columns of the data
    diffdata = params.direction*diffdata;
    % changing the sign of these if want to look for a *decreasing* pattern rather than an *increasing* pattern
else
    diffdata = data;
    % data are already in differences
    diffdata = sgn(params.direction)*diffdata;
    % changing the sign of these if want to look for a *decreasing* pattern rather than an *increasing* pattern
    n = n + 1;
    % if data are already differenced, then this is like having one more column in the raw data
end
dmuhat = mean(diffdata)';


if abs(params.direction)<2 % if data are already in differences, then I cannot do the t-test. this must be done on the actual returns % now getting the t-statistic and p-value from the usual t-test temp = nwest(params.direction*(data(:,end)-data(:,1)), ones(T,1), floor(4*((T/100)^(2/9)))); % truncation lag for Newey-West standard errors, see Newey and West (1987, 1994) res(1) = temp.tstat; res(2) = normcdf(-temp.tstat); end % generating the time indices for the bootstrapped data: r_{i}^{*} bootdates = block_bootstrap(T,params); temp = nan(params.bootreps,n-1); tempS = temp; tstar = temp; tempSE = temp; sigma_hatstar = nan(n-1,1); % Algorithm 4.1 % Generate bootstrap data {r_{1}^{*}, r_{2}^{*} , . . . , r_{T}^{*} } and % compute statistics \hat{∆}_{T,i} and \hat{σ}_{T,i} % from the data, for i = {1,\dots,N} % patton and timmerman results for later use in creating conservative null % \delta_{0,i} res_PT = MR_test_22(data,params.bootreps,params.direction,params.block_length,params.rand_state); del_cons = res_PT(5,2); % compute \hat{\delta} and t_{i}^{*} for i = {1,\dots.N} % % HAC standard errors are computed with % Jochen Heberle and Cristina Sattarhoff, % A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators for ii=1:n-1 temp2 = diffdata(:,ii); delta_hat = mean(temp2(bootdates))'; sigma_hatstar(ii) = sqrt((1/T)*HAC_hs(delta_hat, params.kernel, params.block_length)); for jj=1:params.bootreps tempSE(jj,ii) = sqrt((1/T)*HAC_hs(temp2(bootdates(:,jj)), params.kernel, params.block_length)); end %sigma_hatstar = mean(tempSE(bootdates))'; % Compute t_{T}^{min,*} \equiv min_{i}t_{T,i}^{*} % Repeat this process B times, resulting in statistics % t_{T,1}^{min,*}, t_{T,2}^{min,*} \dots t_{T,B}^{min,*} temp(:,ii) = delta_hat - dmuhat(ii) + del_cons; % the mean of each of the bootstrap shuffles of the original data, % minus the mean of the original data (the re-centering bit) tempS(:,ii) = (delta_hat - dmuhat(ii) + del_cons)./tempSE(ii); % t_{i}^{*} for i = {1,\dots.N} eq 4.1 end tempm = min(temp,[],2); % looking at the minimum difference in portfolio means, for each of the bootstrapped data sets Jstat = min(dmuhat); % the test statistic res(3,1) = mean(tempm>Jstat);
% the p-value associated with the test statistic

% t_{T,1}^{min,*}, t_{T,2}^{min,*} \dots t_{T,B}^{min,*}
tempSm = min(tempS,[],2);
% looking at the minimum STUDENTISED difference in portfolio means, for each of the bootstrapped data sets
JstatS = min(dmuhat./sigma_hatstar);
% the STUDENTISED test statistic
res(3) = JstatS;
res(4) = mean(tempSm>JstatS);  
waithere = true;

% Based on remark 4.2 alternative way to implement the Cons test.
end 

                
function params = setdefaults(params);

   if(~isfield(params,'bootreps'))
      params.bootreps = 1000;
   end
   if(~isfield(params,'direction'))
      params.direction = 0;
   end
   if(~isfield(params,'block_length'))
      params.block_length = 10;
   end
   if(~isfield(params,'kernel'))
      params.kernel = 'Parzen';
   end
   if(~isfield(params,'rand_state'))
      params.rand_state = [];
   end
   
   if(isempty(params.bootreps))
       params.bootreps = 1000;
   end
   if(isempty(params.direction))
      params.direction = 0;
   end
   if(isempty(params.block_length))
      params.block_length = 10;
   end
   if(isempty(params.kernel))
      params.kernel = 'Parzen';
   end
   if(isempty(params.rand_state))
      rand('state',sum(100*clock));
      params.rand_state = rand("state");
   end     
   if(~isnumeric(params.bootreps))
      error('bootreps must be a number')
   end
   if(~isnumeric(params.block_length))
      error('block_length must be a number')
   end
   if(~isnumeric(params.direction))
      error('direction must be a number')
   end
   if(params.block_length< 0) error('block_length must be > 0')
   end 
   if(params.bootreps < 0) error('bootreps must be > 0')
   end
   if(params.block_length < 0) error('block_length must be > 0')
   end    
end

----- Patton and Timmerman available here: -----
http://public.econ.duke.edu/~ap172/Patton_Timmermann_MR_test_toolbox.zip

----- The Block bootstrap -----
% 
% ---------------------------------------------------------
%  AUTHOR: lscott@kiemaadvisors.com

% ---------------------------------------------------------
% 
%  Copyright 2018 louis scott 		 
%  This file is released under the BSD 2-clause license.
%  The BSD 2-clause license is located in the license directory,
%  or type licence at the matlab/octave command prompt
%
% ---------------------------------------------------------
% Notes: A variant on the code by Kevin Sheppard kevin.sheppard@economics.ox.ac.uk
% see https://www.kevinsheppard.com/UCSD_GARCH
% his is released under the BSD licence, so this is compliant.
% 

function indices = block_bootstrap(t,br,w,rand_state)

if(isstruct(br))
  w = br.block_length;
  rand_state = br.rand_state;
  tt= br.bootreps;
  br = tt;
end

  
rand('state',rand_state);  
% Compute the number of blocks needed
s=ceil(t/w);
% Generate the starting points
Bs=floor(rand(s,br)*t)+1;
indices=zeros(s*w,br);
index=1;
% Adder is a variable that needs to be added each loop
adder=repmat((0:w-1)',1,br);
for i=1:w:t
    indices(i:(i+w-1),:)=repmat(Bs(index,:),w,1)+adder;
    index=index+1;
end
ndx = (indices > t);
indices(ndx) = indices(ndx) - t;

end


----- fast covariance estimation  ------

see Jochen Heberle and Cristina Sattarhoff, A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators,


Here is my implementation:

function res = HAC_hs(A, kernel,bw)
%
% try to initialize fftw for speed gains
% see eqns 38-39 
% Jochen Heberle and Cristina Sattarhoff, A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators,
% Econometrics 2017, 5, 9, pp 
% 
% ---------------------------------------------------------
%  AUTHOR: lscott@kiemaadvisors.com

% ---------------------------------------------------------
% 
%  Copyright 2018 louis scott 		 
%  This file is released under the BSD 2-clause license.
%  The BSD 2-clause license is located in the license directory,
%  or type licence at the matlab/octave command prompt
%
% ---------------------------------------------------------
% Notes: my translation of their R code into matlab/octave
%
  
  [N,q] = size(A);
  wstar = HACwts(kernel,N,bw);
  % Compute the eigenvalues $ λ i (i = 1, . . . , 2N) of C(ω^∗) $ using Equation (34) with
  wstar = [1; wstar(1:N-1); 0 ; wstar(N-1:-1:1)];
  wstar = real(fft(wstar));
  
  % Construct the matrix $A^∗$ with dimension 2N × q from 
  FF = [A; zeros(N,q)];
  % Determine $V^∗ A^{∗}_{j} $ given by the DFT of $A^{∗}_{j}$.
  FF = fft(FF);
  % Multiply for all $i ∈ { 1, . . . , 2N }$ the i-th entry of the vector $V^{*} A^{∗}_{j}$
  % with the eigenvalue $λ_{i}$ , in order to construct $ΛV^{*}A^{∗}_{j}$ .
  FF = FF.*repmat(wstar,1,q);
  % Determine $C(ω^∗) A^{∗}_{j} = VΛV ∗ A^{∗}_{j} given by the IDFT of ΛV^{*}A^{∗}_{j}$ .
  FF = real(ifft(FF)/(2*N));
  FF = FF(1:N,:);
  % $\hat{S}_N = \frac{1}{N}{A}'T(ω)A. where FF = T(ω)A.$
  res = (1/N)*(A'*FF);
end

------HAC_hs relies on the code HAC_wts -----
function ww = HACwts( kernel, dN, bw) 

%
% with kernel = {"Truncated", "Bartlett", "Parzen", "Tukey-Hanning", "Quadratic Spectral"} 
%      bw  the bandwidth of nonzero weights
%      dN  matrix dimension that calls HAC
%
% ---------------------------------------------------------
%  AUTHOR: lscott@kiemaadvisors.com

% ---------------------------------------------------------
% 
%  Copyright 2018 louis scott 		 
%  This file is released under the BSD 2-clause license.
%  The BSD 2-clause license is located in the license directory,
%  or type licence at the matlab/octave command prompt
%
% ---------------------------------------------------------
% Notes: my translation of their R code into matlab/octave
% Jochen Heberle and Cristina Sattarhoff, A Fast Algorithm for the Computation of HAC Covariance Matrix Estimators,
% Econometrics 2017, 5, 9, pp 
%
   
   ww = zeros(dN,1);
   switch lower(kernel(1:4)) 
      case "trun"
        ww(1:bw) = 1;
      case "bart"
        c1 = (bw - 1)/2;
        ww(1:bw) = 1 - ((1:bw)/(bw + 1));
      case "parz"
        c1 = (1:floor(bw/2))/bw;
        c2 = ((floor(bw/2)+1):bw)/bw;
        ww(1:length(c1)) = 1 - 6*c1.^2 + 6*c1.^3;
        ww(length(c1)+1:bw) = 2*(1-c2).^3;
      case "tuke" 
        ww(1:bw) = (1 + cos(pi*((1:bw)/bw)))/2;
      case "quad"
        aa = pi*((1:dN)'/bw)/5.0;
        ww = 1.0./(12*aa.^2).*(sin(6*aa )./(6*aa) - cos(6*aa));
      otherwise
        error('kernel case not understood');
   end
end

--- licence.m
function licence()

display('Copyright 2018 louis scott lscott@kiemaadvisors.com')
disp('This file is released under the BSD 2-clause license.')
disp('')
disp('Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:')
disp('')
disp('1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.')
disp('')
disp('2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.')
disp('')
disp('THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.')

end


 

Posted in Econometrics, matlab, Matrix Algebra | Tagged , , | Leave a comment

The NAG libraries in Julia.

NAG.jl  The NAG libraries in Julia

Friends, it is time to get on board the Julia train.  With the release of v0.6 and the promise of v1.0 around the corner, it seemed an opportune time to dig in.  Hating manuals, I decided to learn by contributing to a package in need of a developer, the NAG library.   The integration is discussed in a post at the NAG blog by NAG’s Reid Atcheson and Julia’s Andy Greenwell.  Reid gave a recent workshop on the use of NAG on the Intel Phi,  Andy gave a great talk at JuliaCon on C,C++ integration.   Both well worth your time.  Julia’s promise of being the one language for high performance computing and scientific environment is what attracted me, and the use of the NAG libraries certainly is a useful confluence of these two areas.

So learning Julia. A joy really.  Have a look at the JuliaCon 2016 workshops on youtube.  In particular, David Saunders Intermediate Julia workshop, Arch Robison’s high performance Julia workshop, Tom Sargent’s keynote on the use of Julia in macro economics , and the FRB’s Erika Moszkowski’s talk on a specific econometric topic DSGE models in Julia.  I could go on… I watched hours of the conference, the level of quality here is stellar.

With that background, I began coding by getting to the existing code NAG.jl on github, cloning it and beginning.  I also took a look at GSL.jl, the wrapper for the GNU scientific library coded by Yichao Yu to get a sense of good practice for something this large.  What I found is a reliance on the metaprogramming style of writing Julia.  Julia, like lisp is homoiconic, the code looks like its abstract syntax tree.  That makes it easier to parse and extend.  There are two packages that take this on, clang.jl giving access to a large subset of llvm clang, and BaseTestAuto.jl for automatted testing.   Now step two could not be learn clang.jl – its too big a fish to reel in as I learn.  Instead I recognised that a number of special functions in chapter S take real or complex numbers and return real or complex numbers.  These simple file signatures in the nags.h header file did not require clang to automate the wrapping of these functions.  I could rely on a little bash with sed to grab what was needed.  Here is what that looks like:

fn_type_str = "return complex, inputs (complex z, NagError *fail)"
 s_dict = Dict{Symbol, Symbol}(:s14agc => :nag_complex_log_gamma,:s15ddc => :nag_complex_erfc)
 for nagf in (:s14agc, :s15ddc,
 )
     @eval begin
         @doc "$(section_str) : Fns that $(fn_type_str) $(nag_doc_str) " $nagf
         export $(s_dict[nagf])
         function $(s_dict[nagf])(z::Number)
             reset_nag_error()
             ccall(($(string(nagf)), :libnagc_nag), NagComplex,
                   (NagComplex, Ptr{Void}),
                   z, NAG_ERROR)
         end
     end
 end

Some 50+ functions were wrapped in this fashion.
Very simple documentation is provided using the above @doc macro.

help?> NAG.nag_shifted_log
  Approximations of Special Functions : Fns that return double, inputs (double x, NagError *fail) see nag.co.uk/numeric/cl/nagdoc_cl26/html/s/sconts.html

To use more complicated functions, such as the nearest correlation matrices in the G01 – Correlation and Regression Analysis package, I had to rely on handwritten wrapping. The PetSc.jl package for sparse matrices was wrapped by first bootstrapping with clang.jl. That will be something to do. The arrival of the full IDE by JuliaComputing would help here. Unfortunately, the initial release is for windows and OSX, no Linux. Windows? really?

Here is what my NAGNearestCorrelationMatrix.ipnyb looks like for nearest correlation matrices.


 Accessing the NAG libraries
 Add the export commands in set_libpaths.sh to your .bashrc file.
 Alternatively, call set_libpaths prior to launching julia in the shell
 In [ ]:
 #Pkg.update()
 #Pkg.add("ToeplitzMatrices")
 using ToeplitzMatrices
 using Gadfly
 using NAG
 
 
 nag_licence_query()
 true
 
 
 In [ ]:
 # Tons of Gadfly warnings
 vx = 1:10;
 vy = 1:10;
 plot(x=vx[vx .> 4] , y = filter(x -> x > 4, vy))

In [ ]:
Nth=2^7
bx=collect(2*Nth:-2:1)/(2*Nth);
vcv = Toeplitz(bx.^4,bx.^4); vcv = (1/2)*(vcv + vcv');
lambdas,eigvects = eig(vcv);
In [ ]:
spy(eigvects,Scale.color_continuous(;minvalue = -0.15, maxvalue = 0.15))

NAG_NC1a


In [ ]:
xpwt = .994; vcvD = vcv;
for ith= 2:Nth
    for jth= ith:Nth
        vcvD[ith,jth] = xpwt*vcvD[ith-1,jth-1];
        vcvD[jth,ith] = vcvD[ith,jth];
    end
end
lambdasD,eigvectsD = eig(vcvD);
spy(vcvD)

NAG_NC2

Now construct an unsymmetric matrix to add to our covariance matrix and add some noise.


In [ ]:
spy(eigvectsD, Scale.color_continuous(;minvalue = -0.15, maxvalue = 0.15))

NAG_NC3

Add some asymmetry to lose positive definite structure


In [ ]:
urow = zeros(Nth); urow[2] = 1.0;
UTri = TriangularToeplitz(urow,:U);
perturbed = vcv + 1e-3*UTri + 1e-2*rand(Nth,Nth);
lambdas,eigvects = eig(perturbed);

This is no longer real positive definite


In [ ]:
ndx = real(lambdas) .< .2;
plot(x=real(lambdas[ndx]), y=imag(lambdas[ndx]))

NAG_NC4

The decay weighted version behaves similarly


In [ ]:
preturbed = vcvD + 1e-3*UTri + 1e-2*rand(Nth,Nth);
lambdas,eigvects = eig(preturbed);

ndx = real(lambdas) .< .2;
plot(x=real(lambdas[ndx]), y=imag(lambdas[ndx]))

NAG_NC5

construct the (pseudo) correlation matrix from the perturbed matrix


In [ ]:
d = sqrt.(diag(preturbed));
ccm = preturbed./(d*d');
spy(ccm)

NAG_NC6

The pseudo correlation matrix retains the structure of eigenvalues in the complex plane


In [ ]:
lambdas,eigvects = eig(ccm);
ndx = real(lambdas) .< .2;
plot(x=real(lambdas[ndx]), y=imag(lambdas[ndx]),Guide.xlabel("Re(λ)"), Guide.ylabel("Im(λ)"))

NAG_NC7

We have access to NAG’s Nearest correlation functions, here is some help inside Julia.
The pretty printing gets lost here. Unicode characters look great in the documentation.


In [ ]:
? NAG.nag_nearest_correlation
Correlation and Regression Analysis : Fns that provide default values to the nag calls and the associated nag wrapper fn ! for more info.

In [ ]:
? NAG.nag_nearest_correlation!
Correlation and Regression Analysis : See see nag.co.uk/numeric/cl/nagdoc_cl26/html/g01/g01conts.html for further details
nag_nearest_correlation (g02aac) Solves for the nearest PSD symmetric, unit diagonal correlation matrix. It applies an inexact Newton method to a dual formulation of the problem, as described by Qi and Sun (2006). It adds the improvements suggested by Borsdorf and Higham (2010).

where Parameters (with defaults) where applicable

order (Nag_ColMajor) of Nag_RowMajor for matrices
G     the input matrix. the dimension of G must be at least pdg by n. 
      On entry: G, the initial matrix. 
      On exit: a symmetric matrix 0.5*(G + G') with the diagonal set to I.
pdg   the stride separating row or column elements (depending on order) of G.
      Constraint: pdg ≥ n.
n     the size of the matrix G. Constraint: n > 0.

errtol (0.0) termination tolerance for the Newton iteration. 
       If errtol ≤ 0.0 then n*sqrt(machine precision) is used.

maxits (NagInt(0)) maximum number of iterations used for the iterative scheme at each Newton step. If maxits ≤ 0, 2*n is used.

maxit (NagInt(0)) maximum number of Newton iterations. If maxit ≤ 0, 200 is used.

x     the output nearest corr matrix the dimension of x must be at least pdx by n.

pdx   the stride separating row or column elements (depending on order) of x. 
      Constraint: pdx ≥ n.

examples: 
order = Nag_ColMajor; errtol = 0.0; maxits = NagInt(0); 
maxit = NagInt(0) 
n, pdg = size(g) 
pdx = pdg 

x = Array(Float64,n,pdg) nag_nearest_correlation!(order, g, pdg, n, errtol, maxits, maxit, x, pdx) 

we now apply the NAG nearest correlation matrix to our perturbed matrix


In [ ]:
ccm2 = ccm
ccm_psd = NAG.nag_nearest_correlation(ccm2);
lambdas_near,eigvects_near = eig(ccm_psd);
ndx = lambdasD .< .2;
plot(x=lambdas_near[ndx], y=lambdasD[ndx], Guide.xlabel("λ_{near}"), Guide.ylabel("λ_{orig}"))

NAG_NC8

The above can be computationally expensive, the following variants are less so. Here is the NAG nearest correlation matrix with bounded eigenvalues:


In [ ]:
w = zeros(Nth)
for ith in Nth:-1:1
w[ith] = sqrt(ith/Nth)
end
alpha = 0.001
ccm2 = ccm
ccm_bnd = NAG.nag_nearest_correlation_bounded(ccm2,NAG.Nag_LowerBound, alpha, w)
lambdas_nearbnd,eigvects_nearbnd = eig(ccm_bnd);

plot(x=lambdas_nearbnd[ndx], y=lambdasD[ndx], Guide.xlabel("λ_{nearbnd}"), Guide.ylabel("λ_{orig}"))

NAG_NC9

Finally the nearest correlation with target matrix.


In [ ]:
vc = [.75; .025; .0125; zeros(Nth-3)]
H = Toeplitz(vc,vc)
H = vcv = (1/2)*(H + H');
spy(H)

NAG_NC10


In [ ]:
theta = 0.1
ccm2 = ccm
ccm_tgt = NAG.nag_nearest_correlation_target(ccm2,H,theta)
lambdas_neartgt, eigvects_neartgt = eig(ccm_tgt);
ndx = lambdasD .< .2;
plot(x=lambdas_neartgt[ndx], y=lambdasD[ndx], Guide.xlabel("λ_{neartgt}"), Guide.ylabel("λ_{orig}"))

NAG_NC11

Go have a look at the work in progress, my fork of NAG.jl

Longer term, the IDE and clang.jl for automating the first pass function wrapping will come. The automated unit testing looks pretty cool too.

Posted in Econometrics, Julia, Matrix Algebra, Uncategorized | Leave a comment

Revisiting Risk Parity

Introduction

Risk Parity presumes that the risk adjusted returns to lower volatility assets merit the use of leverage. For bonds, this has been the case throughout the great moderation. The story as bond yields rose is quite different. From 1954 to 1982 the returns to risk parity were non existent to negative and realized volatility was as high as equities. If one examines the returns to risk parity over the great moderation period – 1982 to 2013 – as bond yields fell from a height of 16 per cent down to present levels, the results validate the value of risk parity.

Comments welcomed as this post is intended as an idea for a working paper.

The case for Risk Parity

Historically, lower volatility asset classes have provided higher Sharpe ratios. That meant that holding a diversified portfolio that included leveraged high Sharpe assets achieved the returns of an all equity portfolio, but with lower risk (Asness 1996).

Bond Regimes

The historical period is largely driven by two regimes, one the great moderation (Strock and Watson 2002) where interest rates have fallen for 30 years, the other the 28 year period which in which rates rose. Strictly speaking, the great moderation refers to the decline in macro volatility, measures like GDP have become much less volatile since 1982. Stock and Watson provide a review of the idea – they argue for the decline in inflation expectations and a policy of controlling inflation on the part of the Fed as the driving force behind the moderation. It is this view that we tie to the use of the 10 year yield as a measure for the great moderation.
We find that the performance under the two is dramatically different. Deo and Nakisa 2013, come to a similiar conclusion in “When Risk Parity Goes Wrong”. They look at the performance over the two regimes from 1973 to 2013. This work takes a longer history 1926 – 2013, and a longer post war history as the rising rate regime 1954-1982. Secondly, I examine the leverage of the strategy described by Asness, Frazzini and Pedersen 2012, “Leverage Aversion and Risk Parity” and show that the leverage to match the risk of a 60/40 portfolio can at times be as high as 10 times levered.

Methods and data

We follow Asness, Frazzini and Pedersen and create a risk parity unlevered portfolio where the weights are the inverse of the three year volatility, divided by the sum so that the weights sum to one. Next we leverage so that the three year volatility of the portfolio matches that of the 60/40 balanced portfolio. Finally we create a second levered risk parity portfolio where the leverage is as before but then we limit it to no more than two times levered.
The equity data is for the SP 500 monthly from Robert Shiller’s website at Yale, as is the early history for the 10 year treasury which we use for the monthly bond data. The later is updated using the St Louis Federal Reserve Economic data or FRED website. Although Asness, Frazini and Pederson use CRSP data on a larger value weighted index, and they use a bond index that includes corporates, the results are quite similar. The bond data historical returns are slightly lower as we do not include a corp spread.
Risk Parity portfolios, and the 10 year treasury yields on the right hand scale.

Risk Parity portfolios, and the 10 year treasury yields on the right hand scale.

Leverage required to meet the risk of the 60/40 portfolio.

Leverage required to meet the risk of the 60/40 portfolio.

Results

Any diversification into bonds improved the Sharpe ratios for the entire 1926 to 2013 period. Under the rising yield regime, the 28 years from 1954 to 1982 the risk parity portfolios under-perform. Under the rising yield regime, the realized volatility of the leveraged risk parity portfolios is higher than the 60/40 portfolio used in calibrating the risk. The Risk parity portfolios have negative excess returns in the 1954 to 1982 rising yield regime.
Results

Results

Concluding remarks

If one examines the returns to risk parity over the great moderation period – 1982 to 2013 – as bond yields fell from a height of 16 per cent down to present levels, the results validate the value of risk parity. But during the preceding 28 years of rising yields, risk parity portfolios under-perform. The result for the entire period suggests that risk parity was close to the tangency portfolio, and that remains an argument for the continued exploration of the Risk Parity idea. However, for long periods of time – on the order of thirty years – the realized results to standard risk parity designs suggest that forecast of returns and volatilities are needed. Perhaps we already knew this – macro regimes drive allocation performance.

References

Asness, Clifford (1996). “Why not 100 percent equities?”. Journal of Portfolio Management.
Asness, Clifford; Frazzini, Andrea; Pedersen, Lasse (2012).”Leverage Aversion and Risk Parity.” Financial Analyst Journal.
Baker, Malcom; Bradley, Brendan; Wurgler, Jeffrey (2011). “Understanding the low volatility anomaly.” Financial Analyst Journal.
Deo, Stephan; Nakisa, Ramin (2013).”When Risk Parity Goes Wrong.” UBS Investment Research.
Stock, James; Mark Watson (2002). “Has the business cycle changed and why?” NBER working paper.
Posted in Dismal Science, Econometrics | Tagged , , , | Comments Off on Revisiting Risk Parity

ECB ignites Cyprus

By the time you read this, European markets will have gapped down following the Nikkei’s 2.7% slide with S&P futures indicating a significant fall as well. The morning alerts will be about Cyprus, so what it just happened?

On Saturday EU ministers in a radical departure from earlier rescue packages have raided the savings accounts of the island residents to the tune of a 10% tax of their savings, what they refer to as a bail in. For the first time since the start of the Global Financial Crisis, small depositors will take a hit. Yes, there is deposit insurance in Cyprus, and no it does not apply as the banks were technically speaking, not declared insolvent.

So this morning, small savers across Portugal, Ireland, Spain and Greece are left to wonder if deposit insurance retains any meaning. According to an unconfirmed report, ECB’s chief negotiator, Jörg Asmussen stated that the EC had threaten to pull the credit lines of at least one of the Cypriot banks and without the deal would have collapsed. Under such circumstances, what is the value of the 10% equity stake? The Banks will remain closed today for a local holiday.

BCGMesopotamia

For a thoughtful, prescient read on the topic, see the Boston Consulting Group’s 2011 Back to Mesopotamia by David Rhodes and Daniel Stelter. The graph is from their paper. They argue that a Cyprus-like plan is one of few options available across countries. ECB actions lend credence to the possibility of wider adoption and therefore contagion risk.
Why 180% of GDP? They follow a recent working paper by Ceccetti, Mohanty and Zamolli 2011 which find that a significant impact on growth when debt levels reach any of the following:
  • Government Debt/GDP above 80 to 100%, confirming Reinhardt and Rogoff.
  • Non-financial corporate debt greater than 90% of GDP
  • Private household debt above 85% of GDP
The 180% backs out 60% from each of the three, with assumptions of 5% interest and a growth rate of 3%.
From the Guardian:
Officials in Brussels have wrangled over a rescue plan for Cyprus since it became obvious the island government had few resources of its own to meet debt repayments.
With debts almost equal to its annual national income of €17bn and few taxpayers to tap for funds after years as a tax haven, a plan was hatched to tax bank deposits to generate some €5bn alongside €10bn in loans on offer from the troika and a possible €2bn loan from Russia.
What was agreed in the bailout deal?
Cash-strapped Cyprus will receive around €10bn (£8.7bn) to strengthen its economy and recapitalize its battered banks.
But in a radical departure from previous rescue packages, savers across the Cypriot banking sector are also being forced to contribute to the deal through a compulsory levy on their deposits that will raise €5.8bn.
Hang on, savers are being taxed to help prop up Cyprus’s economy?

Technically they are being “bailed in” – and will receive equity (bank shares) in return. But yes, in effect it’s a tax on all savers.

Those with under €100,000 will lose 6.75%, rising to 9.99% for those with over €100,000 in the bank.
Aren’t savers meant to be insured against losses when banks go bust?
Absolutely. Cyprus’s deposit protection scheme states deposits up to €100,000 are protected. But in this case the banks haven’t collapsed, so it doesn’t apply.

Stephen Ceccetti, Madhusudan Mohanty and Fabrizio Zamolli, The Real Effects of Debt, BIS Working Paper No. 352, September 2011, http://www.bis.org/publ/work352.htm

Posted in Dismal Science | Tagged , | Leave a comment

Getmansky Lo and Makarov 2004

M. Getmansky et al. An econometric model of serial correlation and
illiquidity in hedge fund returns / Journal of Financial Economics 74 (2004) 529–609 link

The paper has been applied to numerous studies that include hedge fund returns for a good reason.  Inflated sharpe ratios that are the result of illiquid assets. This post warns of cases where the estimates are inconsistent. The concern Getmansky et al. address is more pronounced for hedge funds that are able to include a wider range of assets in the portfolio including illiquid assets.  From the paper:

Funds containing illiquid securities will appear to be smoother than true economic returns (returns that fully reflect all available market information concerning those securities) and this, in turn, will impart a downward bias on the estimated return variance and yield positive serial return correlation.

The issue is that this yields an artificially high Sharpe ratio, and correcting for this is just good practice.  The paper refers to the published returns series as the observed return, distinct from the true economic (unobserved) return.

Let

$R_t^o = \theta_0R_t + \theta_1R_{t-1} + \dots + \theta_kR_{t-k}$
$\theta_j \in [0,1], j = 1,\dots,k $
$1 = \theta_0 + \theta_1 + \dots + \theta_k$

Proposition 1 Under these equations, the statistical properties of the returns are:

$E[R_t^o] = \mu $
$Var[R_t^o] = c^2_\sigma\sigma^2\leqslant \sigma^2, $
$SR^o \doteq \frac{E[R^o_t]}{\sqrt{Var[R^o_t]}} = c_sSR\geq SR \doteq \frac{E[R_t]}{\sqrt{Var[R_t]}} $

where
$c_{\mu} = \theta_0 + \theta_1 \dots + \theta_k $
$c_{\sigma}^2 = \theta_0^2 + \theta_1^2 \dots + \theta_k^2 $
$c_{s} \doteq 1/{\sqrt{\theta_0^2 + \theta_1^2 \dots + \theta_k^2}} $

The estimates for $\theta$ are solved using maximal likelihood, the likelihood derived as in Brockwell and Davis 1991 for moving average models in Chapter 8.

One observation – since you are looking for evidence of smoothing that inflates Sharpe ratios, you might consider shrinking the inconsistent theta coefficients to $\theta_0 = 1$ or the no smoothing case. The issue with the method is that not all constraints are imposed in standard MLE optimization. The paper notes for instance that the global macro and pure property sub indices have odd looking parameters.

Getmansky et al present $Var[R_t^o] = c^2_\sigma\sigma^2\leqslant \sigma^2, $, but it is never explicitly stated that theta coefficients like those of Pure Property [1.23 -.31 .07] (Table A.3 page 600) which violate the assumption of $\theta_j \in [0,1]$ actually raise volatility.

A theta profile bounded below by zero and monotonically decreasing is consistent with your mechanisms for smoothing. While not implement-able in their likelihood solver, these restrictions can be applied to a constrained OLS in the Getmansky market driven regression eq. 61. Shown here:

Let $\Lambda_t$ the market return at time t
$ R_t^o = \mu + \beta(\theta_0 \Lambda_t + \theta_1 \Lambda_{t-1} + \dots + \theta_k \Lambda_{t-k}) + u_t \\
R_t^o = \mu + \gamma_0 \Lambda_t + \gamma_1 \Lambda_{t-1} + \dots + \gamma_k \Lambda_{t-k} + u_t $
As in Asness et al 2001.

The worst case is that the market model is not appropriate, you are left with a consistent no smoothing $\theta_0 = 1$. Aggregate fund category $\theta$s are biased away from finding smoothing, but without the unwarranted rise in volatility from mis-estimated parameters.

Asness, C., Krail, R., Liew, J., 2001. Do hedge funds hedge? The Journal of Portfolio Management 28, 6–19 link
Brockwell, P., Davis, R., 1991. Time Series: Theory and Methods, 2nd Edition. Springer, New York. link

Posted in Econometrics, Uncategorized | Tagged | Leave a comment

Yesterday, risk was such an easy game to play.

There is no need for nuance in explaining yesterdays market – we have an appetite for risk. The move in equities was as broad as one is ever likely to see, with all sectors in line with their ex-ante beta. European equities being the laggard.


Bonds also followed suit, with the high yield corporate ETF HYG (+.9%) and junk ETF JNK (+.8%) having a strong day and treasuries down, the 20 year TLT ETF falling (-1.3%).

Commodities fared about as well as corporates, with the notable exception of Natural gas, UNG fell sharply on the day (-3.5%). Note that UNG has a high beta with the AGG, Carry and TIPs, and it was not a good day to be associated with TIPs.

By far the simplest story here is this:

The outliers being Natural gas, and near and mid term VIX, the graph clusters by asset class above the outliers.

It was not a good day to be sensitive to the AGG.

Precious, base metals, natural gas and oil are sensitive to inflation beta – there was quite the spead in performance among these.  The Natural gas coefficient is not significant despite being quite large.

Oil, emerging markets, and Asia ex Japan have large and significant loadings on the returns to the carry trade.  These are followed by commodities in general.

 

 

 

Posted in Dismal Science, Uncategorized | Tagged | Leave a comment

Beating Benchmarks with No Skill*

(* How full scale optimization can outperform an established index)
Another deployment of Full Scale Optimization, this time to a bond portfolio of 4 ETFs:
  • AGG iShares Barclays Agg with between 20-60% bounds.
  • TIP iShares TIP bonds 10-50%
  • TLT iShares 20 year treasuries 0-30%
  • LQD iShares Investment Grade Corporates 0-30%
The portfolio is constrained to have the AGG +TIP allocation to be between 40-95%.
The portfolio is rebalanced quarterly. The FSO should be able to handle the large tails in the 20 year Treasury and the Corporates. For this exercise, we do not constrain the duration – we are interested in seeing if the use of the full distribution of returns and an asymmetric utility function suffices for limiting downside risk.
As in Lochoff 1998, the benchmark is the Barclay (Lehman) Aggregate bond index.   Where Lochoff used the historical means and returns to construct a duration matched portfolio that shorts the risk free and goes long (with leverage) on short duration assets – say from three months to 3 years.   I also use historical data but in contrast employ an S-shaped utility function to state my relative preference for a return distribution.  We use Utility =I(x>0)* x^m- L*I(x<0)*(-x)^m with (L,m) = (2,.95), (2,.8) and (3,.95) and feed it to our FSO.  That is we have an asymmetric preference, we dislike a negative outcome 2-3 times the magnitude of a positive outcome.   Distributions with negative tails then need considerable mass on the positive side to compensate for the tails.  See Cremers, Kritzman and Page 2004, Adler and Kritzman 2005 and Hagströmer 2007 for details of FSO and the S-shaped utility function.  We then equal weight each of the 3 allocations.
                           Downside  Return/    Return/
         Mean      Stdev    Stdev     Stdev     Downside Stdev
AGG     0.0600    0.0657    0.0580    0.9131    1.0337
Port    0.0829    0.0747    0.0513    1.1089    1.6167

Portfolio Beta: .871
  Results Table: For out of sample period March 2007 to May 13,2012
The results  are encouraging, as the Risk/Return is improved over the benchmark, the slightly higher risk is actually a lower downside risk, with strong Sortino ratios.  We did not remove the daily risk free rate – this is a quick blog post after all.   For most of this period it was quite small.

 

 

Returns to the assets

 

Tails of the asset returns

 

Rolling Allocations

 

Strategy and Benchmark returns

 

 

Referenced Papers:
Adler, Timothy and Kritzman, Mark, Mean-Variance versus Full-Scale Optimization: In and Out of Sample .  MIT Sloan Research Paper No. 4589-05.  Available at SSRN: http://ssrn.com/abstract=881813
Athayde, Gustavo M. de, and Renato G. Flôres, Jr. (2003). Incorporating skewness and kurtosis in portfolio optimization: a multideminsional efficient set. In Stephen Satchell and Alan Scowcroft, eds., Advances in Portfolio Construction and Implementation. Oxford: Butterworth-Heinemann.
Chopra Vijay. K., and William T. Ziemba. (1993). The effect of errors in means, variances, and covariances on optimal portfolio choice. Journal of Portfolio Management, Winter, 6-11.
Cremers, Jan-Hein, Kritzman, Mark and Page, Sebastien, Optimal Hedge Fund Allocations: Do Higher Moments Matter? (September 3, 2004). Revere Street Working Paper No. 272-13. Available at SSRN: http://ssrn.com/abstract=587384
DeMiguel, Victor and Nogales, Francisco J., (2007). Portfolio Selection With Robust Estimation. Available at SSRN: http://ssrn.com/abstract=911596
Hagströmer, Björn, Anderson, Richard G., Binner, Jane M., Elger, Thomas and Nilsson, Birger, Mean-Variance vs. Full-Scale Optimization: Broad Evidence for the UK (May 2008). FRB of St. Louis Working Paper No. 2007-016D. Available at SSRN: http://ssrn.com/abstract=979811
Lochoff, Roland, Beating the Bond Market with No Skill, JPM, FAll 1998. Available at: http://www.northinfo.com/documents/130.pdf
Maringer, Dietmar, Parpas, Panos, Global Optimization of Higher Order Moments in Portfolio Selection, Journal of Global Optimization, 2009, 23:2-3, p. 219-230.
Posted in Behavioral Finance, matlab, Uncategorized | Tagged , | Leave a comment

Risk Appetite Indicator

The Risk Appetite Indicator (RAI) is constructed by aggregating several measures of risk across a universe of global large cap stocks traded in on the NYSE, and NASDAQ. As well as a set of hedging assets including:
  • Equity markets Indexes
  • Bonds Indexes, the Barclay AGG, ten year and Short term treasury and a few others
  • FX: JPY, CHF, GBP, EUR against the USD.
  • Others like the VIX, OIL, Emerging markets, REITs
The Indicator is makes use of the construction methodology in Stock and Watson 2004,  Combination forecasts of output growth in a seven country data set, as well as the more recent paper by Rapach, Strauss, and Zhou, Out-of-sample equity premium prediction: Combination forecasts and links to the real economy and a working paper Ibisevic, Snijder, Lai, and Zhang, Combining forecasts: Improved approach and analysis of various frameworks

The distribution of QRAI exhibits a tri-modality.
Peaks corresponding to a normal regime, a quiet regime and a bull run regime.

 

RAI distribution

Distribution of Risk Appetite Index is multi-modal

 

 

mixture of three gaussians

The Index plotted against S&P500 returns and estimates of the three states

 

 

Risk Appetite Index Results

 

The Index is used to enter and exit the S&P500 when the level of the index is low. The Anatolyev & Gerko test of Expected Profitability is employed as well as a related regression that shows the profitability of both the long and short legs of the strategy.

RAI index and long short returns to conditioning on RAI

 

Anatolyev Gerko EP Statistic

Ordinary Least-squares Estimates
Dependent Variable =           SP500  
R-squared      =    0.0106
Rbar-squared   =    0.0101
sigma^2        =    0.0002
Durbin-Watson  =    2.1638
Nobs, Nvars    =   2103,     2
***************************************************************
Variable      Coefficient      t-statistic    t-probability
const            0.002848         4.599309         0.000005
qri EP           0.002938         4.743630         0.000002
Posted in Econometrics | Tagged | Leave a comment

Full Scale Optimization with REITs

As an exercise to demonstrate he usefulness of Full Scale optimization, (FSO), I examine a three asset problem combining REITs, the S&P 500 and the ten year treasury. The example demonstrates the differences between the use of FSO and mean variance (MV). As the name implies, the approach examines the full distribution of returns and assigns a utility to a set of returns. In contrasts, MV parametrizes the returns and then constructs a utility on that assumed normal set of returns. Kritzman, Adler find that FSO can be quite useful when asset returns are non-normal, and in particular in the construction of hedge fund portfolios. Hagstromer finds that the usefulness extends to stock selection, where the distributional assumptions are less of a concern. The REITs are the NARIET US Index weekly going back to January 13, 1995 to October 29, 2010. The returns are total returns. Our interest is to examine how the two approaches differ when REITs falter in 2007.

The returns to the NAREIT, S&P500 and 10 year treasury.

The three assets have quite different tails.

The distributions of the three assets show the large tails in the REITs and the relative safety of the Ten year. We use the S-shaped utility function as our canonical utility function. It is asymmetric and defined as follows. For returns less than a threshold – zero in our case – the utility is 1.5*Loss^.95 for gains it is Gains^.95. Here is an image of the S-shaped utility for each allocation of (REITs, 10 year, SP500) when the exponent is .5. Quite difficult to do gradient descent.

The S-shaped Utility for our problem when exponent is .5 presents an issue to traditional gradient descent.

The MV optimization use a RAP of .5 on annualized returns. We allow the covariance to be based on an expanding window, and an exponentially weighted moving average with lambda = 99.5. The portfolio construction limits the maximum weight to 85% of the portfolio, and the weights sum to one.

The time series of allocations for all four portfolios differs dramatically across MV and FSO

The optimization RAP and the S-shaped asymmetry were chosen so that the two approaches yielded similar realized portfolio returns. Here we show the portfolio performance for our four cases, FSO and MV with expanding (F for full) and ewma weighted (V for varying) windows. The annualized returns are comparable, the downside variance is slightly better for the FSO’s and consequently the Sortinos are better. We also show the 1/N portfolio.

Portfolio returns for each way of constructing portfolios gives the advantage to FSO

The time series of portfolio weights is quite different between the FSO and MV runs. The weights in the FSO portfolio can be quite different than 1/N and the change in portfolio weights around July of 2007 were dramatic – away from REITs and into the 10 year in a classic flight to quality.
While the performance differences are small the tails are different. Here I show the realized average shortfall below -1%. The FSO distributions have smaller left tails and this is significant at the 2% level for the Kolomogorov- Smirnov distance test.

The fatter tails in the MV optimizations are significantly different from their FSO counterpart using a KS test. Never mind the 1/N which isn't in the same ballpark.

References:
Adler, Timothy and Kritzman, Mark, Mean-Variance versus Full-Scale Optimization: In and Out of Sample . MIT Sloan Research Paper No. 4589-05. Available at SSRN: http://ssrn.com/abstract=881813
Cremers, Jan-Hein, Kritzman, Mark and Page, Sebastien, Optimal Hedge Fund Allocations: Do Higher Moments Matter? (September 3, 2004). Revere Street Working Paper No. 272-13. Available at SSRN: http://ssrn.com/abstract=587384

Hagströmer, Björn, Anderson, Richard G., Binner, Jane M., Elger, Thomas and Nilsson, Birger, Mean-Variance vs. Full-Scale Optimization: Broad Evidence for the UK (May 2008). FRB of St. Louis Working Paper No. 2007-016D. Available at SSRN: http://ssrn.com/abstract=979811

Maringer, Dietmar, Parpas, Panos, Global Optimization of Higher Order Moments in Portfolio Selection, Journal of Global Optimization, 2009, 23:2-3, p. 219-230.

Posted in Behavioral Finance, Code, matlab | Leave a comment

Toeplitz matrices as families of shrinkage covariance estimators

Structure of the eigenvalues for the Toeplitz matrix

My current project requires that I examine the risk of a large number of portfolios so I need to create some sensible estimates of risk.  Which in part, means looking for robust shrinkage estimators of covariance matrices.  A requirement is that the resulting matrices are stable to random perturbations.  Specifically if I add a little random noise, I should still get nearly the same answer. So I turn to Toeplitz matrices, with some cool results.

To see which matrices and transformations of matrices are stable, it helps to start with some very structured matrices that look like risk covariances, and therefore allow us to examine if all that structure is preserved when we kick them with some random data. We use Toeplitz matrices – like risk model covariance matrices, they are symmetric can be shown to have real eigenvalues and in some special cases, their eigenvectors have closed form solutions.

It is also known that certain Toeplitz matrices can be proven to have eigenvectors that are periodic.

So I start with a very simple diagonal Toeplitz matrix

Nth=2^9;
bx=[2*Nth:-2:1]/(2*Nth);
vcv=toeplitz(bx.^4);
image(vcv)

The eigenvectors have a lot of structure, this is the first image in this post above. The eigenvalues are unremarkable and not the story here. For the record, the eigenvalues are real and well behaved in our examples.

[ev,ed] = eig(vcv);
image(ev);

Even if we preturb the matrix with random data

vcvDr= vcv + 1e-8*rand(Nth,Nth); [evDr,edDr] = eig(vcvDr); vcvDr2= vcv + 1e-3*rand(Nth,Nth); [evDr2,edDr2] = eig(vcvDr2);

Symmetric Toeplitz matrix


The first five eigenvectors are well behaved under perturbation

The first of four plot shows the first 5 eigenvectors of the original matrix, then to the right are the eigenvectors of a disturbed matrix vcvDr where we add a random perturbation on the order of 10^-8. The bottom left is another larger random perturbation of size 10^-3. Finally we show the difference in absolute value for the entries in the first five eigenvectors. We use absolute value as the eigenvectors are unique up to sign differences. No problem with stability here. What is notable is that this is similar to Ledoit-Wolf where the diagonal is also constant – they use a market weighted average variance.  The off-diagonals in Ledoit-Wolf are average covariance. Here the off-diagonals are not constant but allow for the linkages between assets to fall off as matrix polynomial – here we just show the 4th power. So the matrix allows for some assets to be tightly coupled and others to be decoupled.  In fact the design lends itself to be decomposed into blocks where the blocks could be asset class covariances.

Those of you who have dealt with fixed income risk models know that this tightly coupled group is difficult to work with. One answer is to only use a few say three principal components.  All good and fine, what I am addressing allows for that but more to the point, the stability issue can hit the largest eignevalues and vectors. Another reason to consider the stability question is that the use of a toeplitz matrix with noise allows you to explore the structure that is preserved as you add random noise.

When we allow for the Toeplitz design to be relaxed and allow for the main diagonal to vary exponentially – that is allowing for a parametrized variance where some assets are substantially more volatile than others, then we need to ask about stability. First with an exponent of .995:
for ith= 2:Nth
for jth= ith:Nth
vcvD(ith,jth) = xpwt*vcvD(ith-1,jth-1);
vcvD(jth,ith) = vcvD(ith,jth);
end
end
figure; imagesc(vcvD)

The eigenvalues are structured, but now their is an exponential decay together with the periodicity.  We also see that the addition of noise leaves the eigenvalues stable.

The symmetric weighted exponential matrix (.995)

Structure of the eigenvectors for the symmetric weighted exponential matrix (.995)

The eigenvalue structure for the first five ev's. (lambda=.995)

When we allow for an exponent of .985, the decay is large and the matrix looks quite different.

The eigenvalues however are still quite similar.  So are the perturbed eigenvectors, but the noise is apparent and of the same scale as the vector coefficients. If you are thinking of using shrinkage estimates that vary along the diagonal you should be mindful of this result.

Symmetric weighted exponential matrix (lambda=.985)

Structure of the eigenvalues (lamda=.985)

The random noise affects the largest eigenvectors

The matlab code for this:
Nth=2^9;
bx=[2*Nth:-2:1]/(2*Nth);
for xpwt = [1.0 .995 .985]
vcv=toeplitz(bx.^4); [ev,ed] = eig(vcv);
vcvD = vcv;
for ith= 2:Nth
for jth= ith:Nth
vcvD(ith,jth) = xpwt*vcvD(ith-1,jth-1);
vcvD(jth,ith) = vcvD(ith,jth);
end
end
figure; imagesc(vcvD)

%
[evD,edD] = eig(vcvD);
vcvDr = vcvD + 1e-8*rand(Nth,Nth); [evDr,edDr] = eig(vcvDr);
vcvDr2= vcvD + 1e-3*rand(Nth,Nth); [evDr2,edDr2] = eig(vcvDr2);
[ntoss,ndx]=sort(diag(edD),'descend');
[ntoss,ndxr]=sort(diag(edDr),'descend');
[ntoss,ndxr2]=sort(diag(edDr2),'descend');
figure; imagesc(min(max(evD(:,ndx),-.07),.07))

figure
subplot(221); plot(evD(:,ndx(1:5))); axis([1 Nth -.07 .07])
subplot(222); plot(evDr(:,ndxr(1:5))); axis([1 Nth -.07 .07])
subplot(223); plot(evDr2(:,ndxr2(1:5))); axis([1 Nth -.07 .07])
subplot(224); plot([abs(evD(:,ndx(1:5))) - abs(evDr2(:,ndxr2(1:5)))]); axis([1 Nth -.07 .07])
end

References
A. Bottcher and S.M. Grudsky, Toeplitz Matrices, Asymptotic Linear Algebra, and Functional Analysis, Birkhauser, 2000.
A. Bottcher and B. Silbermann, Introduction to Large Truncated Toeplitz Matrices, Springer, New York, 1999.
W. Cheney, Introduction to Approximation theory, McGraw-Hill, 1966.
T. A. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
P. J. Davis, Circulant Matrices, Wiley-Interscience, NY, 1979.
D. Fasino and P. Tilli, “Spectral clustering properties of block multilevel Hankel matrices, Linear Algebra and its Applications, Vol. 306, pp. 155–163, 2000.
F.R. Gantmacher, The Theory of Matrices, Chelsea Publishing Co., NY 1960.
R.M. Gray Toeplitz and Circulant Matrices: A review. 206, Now Publishers
R. M. Gray, “On the asymptotic eigenvalue distribution of Toeplitz matrices,” IEEE Transactions on Information Theory, Vol. 18, November 1972, pp. 725–730.
Ulf Grenander and Gabor Szego, Toeplitz forms and their application, American Mathematical Soc., 1984
Lloyd Nick Trefethen and David Bau, Numerical Linear Algebra, SIAM 1997
W. F. Trench, “Asymptotic distribution of the even and odd spectra of real symmetric Toeplitz matrices,” Linear Algebra Appl., Vol. 302-303, pp. 155–162, 1999.
W. F. Trench, “Absolute equal distribution of the spectra of Hermitian matrices,” Lin. Alg. Appl., 366 (2003), 417–431.
Posted in Code, matlab, Matrix Algebra | Tagged , , | 1 Comment