Friday, June 13, 2014

Optimal Overhedge

Consider a contract ( contingent claim ) with payoff that is dependent on the stock price at maturity \((P(S_T))\). The value of such a contract at time t with \(t\leq T\) will be: \[V_t= \mathbb{E}_t\left\{ P(S_T) \right\} \] ( For now we're assuming zero interest rates . )
If a firm sells such a contract and wishes to hedge the risk, one idea would be to buy \(\Delta\) units of the stock \(S_t\), thus the net protfolio value would be: \[Q_t= S_t \Delta - V_t \] To find out the exposure ( sensitivity ) of the portfolio to the stock would could take the partial derivative wrt \(S_t\): \[\frac{\partial Q_t}{ \partial S_t} = \Delta - \frac{\partial V_t}{\partial S_t} \] So if \[ \Delta = \frac{\partial V_t}{\partial S_t} \] then \[\frac{\partial Q_t}{ \partial S_t} = 0\] In which case, at that instance we're hedged against small moves in the stock price.

Leaving aside any formal mathematics for a moment...
Broadly speaking when the payoff \((P(S_T))\) has bumps and kinks we find that the contract valuation \(V_t(S_t)\) is a significantly more smooth function, since the value \(V_t\) is an expectation, so the kinks get averaged out. That said: \[\lim_{t\to T}V_t(S_t)=P(S_T)\] and also \[\lim_{t\to T}\Delta_t=\lim_{t\to T}\frac{\partial V_t}{\partial S_t}=\frac{\partial P}{\partial S_T}\] Thus when we have constraints on the delta ( first derivative ) and indeed gamma ( second derivative) of \(V_t\), we need to be sure that the constraints hold for the payoff \(P(S_T)\).

One practical problem with having high gamma is that it means the delta will change a lot for small changes in the underlying. So if a trader is attempting to keep a portfolio delta neutral, he'll need to rebalance frequently and the rebalances will be large. In such a scenario there is the potential to loose a significant amount of money on transaction costs. Also high gamma normally comes with high theta ( time sensitivity ) and vega ( sensitivity to volatility).

Suppose the contract that we wish to hedge is a barrier option with payout at time T: \[P(S_T)= \begin{cases} 0 \ \ \ \ \ when S_T < K \\ N \ \ \ when \ S_T \geq K \end{cases} \] This is a step function. It is not continuous and the partial derivative \(\frac{\partial P}{\partial S_T}\) does not exist at \(S_T=K\).
Thus we it seems we won't be able to hedge away the exposure to \(S_T\).
So what do we do?
Rather than hedge the actual contract, we could make a synthetic contract which does have nicely behaved partial derivatives. Consider a payoff which consists of the real payoff \(P(S_T)\) plus an "overhedge" \(H(S_T)\), such that \[H(S_T) \geq 0\ \ \forall S_T\] Note that the client does not receive \(H(S_T)\) but the contract we will hedge does include a contribution from it. When we work out a price that includes \(H(S_T)\) it will be higher than it would have been without the overhedge. But to be able to hedge the risks we need a "reasonable" contract, which is nicely behaved.
What we want is a minimal overhedge \(H(S_T)\) such that the delta and gamma are contained.
Let the synthetic contract with the overhedge have payoff: \[\Phi (S_T)= P(S_T) + H(S_T)\] (Why did we choose that Greek letter? Well adding P and H we get PHI )
We're going to insist that the first derivative ( \(\frac{\partial \Phi}{\partial S_T}\) )is continuous, because we want the second derivative ( gamma ) to exist and not be infinite.

Let's suppose we have the following caps and floors as constraints: \[-\Delta_F \leq \frac{\partial \Phi}{\partial S_T} \leq \Delta_C\] and \[-\Gamma_F \leq \frac{\partial ^2 \Phi}{\partial S_T^2} \leq \Gamma_C\]
With \(\Delta_F \geq 0 \), \(\Delta_C \geq 0 \)
and \(\Gamma_F \geq 0 \), \(\Gamma_C \geq 0 \)

So now the question is: what is the minimal overhedge which obeys those constraints?

Well for an analogy, suppose S was time t and H was position.
We are starting at rest at position N.
We wish to get home in our car as quickly as possible to position H=0
and we have a maximum acceleration, a maximum deceleration and a speed limit.
Then what should we do?
Clearly we should use maximum acceleration until we reach the speed limit.
We should then continue at the speed limit for as long as possible until near home
and then apply the breaks, using maximum deceleration until we neatly come to rest at home.

Similarly in this we case we break H up into 3 sections:
\(H_1(S_T)\) is the accelerating phase, ( quadratic in \(S_T\))
\(H_2(S_T)\) is the linear phase ( i.e. at the speed limit).
Then \(H_3(S_T)\) is the decelerating phase.
After (quite) a bit of algebra we find:
\(H_1(S_T)=N-(S_T-K)^2\ \Gamma_F\ \ \ \ \ \ \ \ \ \ \ \ \) in the domain: \(K \leq S_T < \Lambda_1\)
\(H_2(S_T)= H_1(\Lambda_1)-(S_T-\Lambda_1)\Delta_F\ \ \ \ \) in the domain: \(\Lambda_1 \leq S_T < \Lambda_2\)
\(H_3(S_T)=(S_T-\Lambda_3)^2\ \Gamma_C\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \) in the domain: \(\Lambda_2 \leq S_T \leq \Lambda_3\)

\[\Lambda_1=K+\frac{\Delta_F}{2 \Gamma_F}\] \[\Lambda_2=\Lambda_1 + \frac{H_1(\Lambda_1)}{\Delta_F}-\frac{\Delta_F}{4\Gamma_C}\] \[\Lambda_3=\Lambda_1 + \frac{H_1(\Lambda_1)}{\Delta_F}+\frac{\Delta_F}{4\Gamma_C}\] We can see that: \[\Lambda_3=\Lambda_2+\frac{\Delta_F}{2\Gamma_C}\]
Note that \(H(S_T)\) is zero before K and after \(\Lambda_3\).

To derive the above equations we use the fact that, when we set: \[H_2(S_T)-H_3(S_T)= 0\] we have a quadratic in \(S_T\) and that quadratic should have a single root at \(S_T=\Lambda_2\).
I.e. the parabola \(H_3(S_T)\) is a tangent to the line \(H_2(S_T)\) at \(S_T=\Lambda_2\).

Returning to the analogy of using the car to get home quickly: Suppose you start off near home and the acceleration in the car is not very high. Then perhaps you'll need to start decelerating before you reach the speed limit.
In our case that happens when \[\Lambda_1 > \Lambda_2 \] So now \(H_2\) doesn't come into play and we need to find where is the transition from \(H_1\) to \(H_3\). I'll leave it as an exercise for the reader to work it out.

Question for the reader:
If you make some "reasonable" assumptions on the distribution of \(S_T\)
and given that \[\frac{\partial^2 P}{\partial S_T} \leq \Gamma_C \ \ \ \ \ \forall S_T\] then show that \[\frac{\partial^2 V_t}{\partial S_t} \leq \Gamma_C \ \ \ \ \ \forall S_t \ and \ t\] What assumptions did you need to make?
Suggestion: you might consider starting with the log-normal distribution case without drift or interest rates, then generalize.

So we have derived the optimal overhedge for a barrier option subject to constraints on delta and gamma. And the synthetic contract which contains the overhedge is continuous and its first derivative is continuous.

Tuesday, June 3, 2014

Changing off diagonal elements in a Covariance Matrix

If you try to change one off-diagonal element of a positive definite covariance matrix, you may find that the adjusted matrix is no longer positive definite. When one element is changed, others often need to be adjusted too, but how do we do this?

Here's a suggestion that was originally developed for looking at exchange rate ( FX ) covariance matrices:

Let \(x^b_a\) be the cost of 1 unit of currency 'b' quoted in currency 'a'.

So we have the standard FX relations:
\[x^a_b=\frac{1}{x^b_a}\] and
\[x^a_b={x^a_c} {x^c_b}\] We will start with a simple model without drift and zero interest rates. Suppose we have a stochastic model:
\[\frac{dx^a_b}{x^a_b}=\sigma^a_b dw^a_b\] with
\[\mathbb{E}(dw^a_b)=0\] and
\[\mathbb{E}\left((dw^a_b)^2\right)=dt \]
Suppose we have a base currency (numeraire) 'a', then we can write: \[\sigma^{bc}_a=E\left(\frac{dx^b_a}{x^b_a} \frac{dx^c_a}{x^c_a}\right)\frac{1}{dt}\]
eqn (i)
The diagonal of the covariance is the vol squared: \[\sigma^{bb}_a=(\sigma^b_a)^2\]
If we change the numeraire, then what’s the effect on the covariance?
To work it out we start by considering: \[\frac{dx^c_d}{x^c_d}=\frac{x^d_a}{x^c_a}d\left(\frac{x^c_a}{x^d_a}\right)=\frac{dx^c_a}{x^c_a}-\frac{dx^d_a}{x^d_a}+\Lambda\] Using Ito’s lemma we find that \(\Lambda\) only contains terms that are either higher order or are non-stochastic, but when we are looking at the covariance, all of \(\Lambda\) can be ignored. And so: \[\sigma^{bc}_d=\mathbb{E}\left(\frac{dx^b_d}{x^b_d} \frac{dx^c_d}{x^c_d}\right)\frac{1}{dt}=\mathbb{E}\left(\left(\frac{dx^b_a}{x^b_a}-\frac{dx^d_a}{x^d_a}\right)\left(\frac{dx^c_a}{x^c_a}-\frac{dx^d_a}{x^d_a}\right)\right)\frac{1}{dt}\]
eqn (ii)
Now using eqn (i) we find: \[\sigma^{bc}_d=\sigma^{bc}_a-\sigma^{bd}_a-\sigma^{cd}_a+\sigma^{dd}_a\]
eqn (iii)

This is our change of numeraire formula.

One little exercise that may be worth trying is to check that if we change the numeraire from ‘a’ to ‘d’ and then back again that we do indeed recover the original value. i.e. what you need to do is to get the four terms on the RHS of (iii) and change their numeraire to ‘d’. You’ll then have sixteen terms. After doing lots of cancellation and noting that some of the terms are identically zero you’ll recover the original covariance, i.e. LHS of (iii)

For a given numeraire ( base currency ) say ‘a’ we have a covariance matrix over the indices ‘b’ and ‘c’: \(\sigma^{bc}_a\)
Very often when working with covariance matrices, it can be very helpful if it is positive definite.
Suppose we have a covariance matrix which we have estimated from historical returns and now we want to change one element. Perhaps we have some good implied vol data on that FX pair. It can be problematic changing one element in the covariance matrix since it is very easy to lose the positive-definite attribute. If you change one element then there should be a ripple effect on neighbouring elements.
But how do we evaluate that ripple effect?

We for a start we could break up the covariance matrix into a set of vols and correlations: \[\sigma^{bc}_a=\sigma^b_a\sigma^c_a\rho^{bc}_a\] If we adjust the vol ( to some non-zero value) while leaving the correlation alone, then the PD attribute will be preserved. Note the the diagonal elements of the covariance are just the vols squared.

So, if we want to change an off diagonal element what we do is change the numeraire, so that that element comes onto the diagonal.
We then split up new covariance matrix into vols and correlation.
We update the vol, leaving the correlation fixed, reform the covariance matrix and then return to the original numeraire.

The question for the reader is when ( if ever ) will the above process not preserve the PD attribute?