Friday, December 5, 2014

Greek Probabilities

Each month IBM posts up a mathematical puzzle. The November 2014 puzzle was as follows:
There are 24 letters in the Greek alphabet, from alpha, beta, and gamma, through chi, psi, and omega. The names of the letters are of different lengths, from short two-letter names like mu or pi through the seven letters in omicron. We want to generate a random Greek letter, where the probability of a letter is determined by the length of its name, and the following:
Psi is a special case with a probability of zero (we are actually generating one of 23 letters) Letters whose names are composed of an even number of letters have a probability of 2^length times the probability of the letter Tau
All the other letters (except Psi and Tau) have a probability of 2^(length-2) times the probability of Tau So, for example, the letter mu (with a length of two letters) should appear 2^2 = 4 times more than tau. Omicron (with a length of seven letters) should appear 2^(7-2), or 32 times more often than tau.
The task is to generate the random Greek letter using six coin tosses, in which the coins are chosen from two types: one with a probability of p_1 of getting a tails and 1-p_1 of getting heads; and the other with a probability of p_2 for tails and 1-p_2 for heads.


One solution has p_1 = 8 / 17
and p_2 = 1 / 3
We can represent the solution as a circle. We divide up the circumfrence of the circle such that the probability is proportional to the length of the section of the cicumfrence. Then we can present a solution as an image:

Tuesday, August 26, 2014

Pretty pictures with R

One of the nice things about using the R programming language, apart from the fact that it is free, is that it can produce beautiful graphics. As well as do all sorts of standard and non-standard plots and graphs, R can also generate animated GIF files.
For example, suppose we have some height: \[h= exp( - \alpha r ) cos ( \beta \ r + p)\] where r is the distance from the origin: \[r=\sqrt{ x^2 + y^2 }\] Also we have constants
\(\alpha\) which determines the decay
\(\beta\) determines the frequency
and \(p\) is the phase

We generate a matrix which we've called h. Each element corresponds to an x and a y. We can convert that h matrix to an image and then cycle through values of phase to animate it. We will then have an image such as the one above.
Here is the R code that generates the h matrix:
 # generate an image of a ripple  
 rippleMat <- function( numPx, decay, phase, freq){  
   x <- seq( -1, 1, 2/ (numPx-1))  
   y <- x   
     
   # We have a matrix r[i,j] = sqrt( x[i] * x[i] + y[j] * y[j])  
   r <- sqrt( outer( x*x, y*y, "+"))    
     
   h <- exp( - decay * r ) * cos( phase + freq * r )  
   return (h)  
 }  
   
(The full code used to generate that animated gif is show below)

Alternatively, we could also plot a spiral. The basic equation is: \[r = exp( \phi \alpha + p ) \] Where r is the distance from the origin, \(\phi\) is the angle to the x-axis, p is the phase and \(\alpha\) determines how tightly coilded it is.
We could of course plot such a path using a line chart in R. But it might be a bit more interesting to color it in. Again we could iterate through values of the phase p to generate an animated GIF file:

We could also build an RGB color triangle. Each corner is set to one of { red, green, blue }. That color then fades across the triangle until it is completely absent on the opposite edge.

Just in case you're interested, here's the full source code for the triangle:
# Author: Philip Kinlen, Aug 2014
library(grid) # required for grid.raster(.)

# numPix should be an integer,
# and gapX < (numPix / 2) 
# where gapX is the minimum number of horizontal pixels from the triangle to the edge of the image
RGBTriangle <- function(numPix = 100, gapX = 10, doSmoothing=FALSE) {
  # the verticle gap between the triangle and edge of image
  gapY            <- round(numPix * ( 0.5 - sqrt(3)/4) + gapX * sqrt(3)/2)

  xArr            <- 1:numPix
  yArr            <- numPix:1   # the function call grid.raster(..) below will put the element with 
                                # highest row number at the bottom. 
                                # For our array, the last shall be at the bottom.
  # The x's a constant in the matrix columns, y's are constant along the matrix rows.
  xMat            <- matrix(rep(xArr, numPix), numPix, numPix, F)  
  yMat            <- matrix(rep(yArr, numPix), numPix, numPix, T)  

  m1              <- sqrt(3)                        # slope
  c1              <- gapY - m1 * gapX               # intercept
  
  m2              <- -sqrt(3)                       # slope
  c2              <- gapY - m2 * (numPix - gapX)    # intercept
  
  height          <- numPix - 2 * gapY              # height of triangle in pixels

  red             <- matrix(mapply(getRed,   xMat, yMat, m1, c1, m2, c2, gapY, height), numPix, numPix, T)
  green           <- matrix(mapply(getGreen, xMat, yMat, m1, c1, m2, c2, gapY, height), numPix, numPix, T)
  blue            <- matrix(mapply(getBlue,  xMat, yMat, m1, c1, m2, c2, gapY, height), numPix, numPix, T)
  
  col             <- rgb(red, green, blue)
  dim(col)        <- dim(red)
  grid.raster(col, interpolate=doSmoothing) 
  # One possible source of minor confusion is that when we write a cartesian co-ordinate (x,y)
  # the x comes first.
  # but in a matrix, we specify the row first, which corresponds to y
} 

#################################################################
getRed <- function( x, y, m1, c1, m2, c2, gapY, height){
   if ( isInsideTriangle( x, y, m1, c1, m2, c2, gapY)){
     res <- (y - gapY) / height    # A rounding error may cause res to be outside the range [0,1]
     return (max(0, min(1, res)))  # If left untreated that could cause an error in grid.raster(.)
   } else
     return (0)     
}
#################################################################
getBlue <- function( x, y, m1, c1, m2, c2, gapY, height){
  if ( isInsideTriangle( x, y, m1, c1, m2, c2, gapY)){ 
    res <- sqrt(3) * ((y - c2) / m2 - x) / ( 2 * height) 
    return (max(0, min(1, res))) 
  } else
    return (0)     
}  
#################################################################
getGreen <- function( x, y, m1, c1, m2, c2, gapY, height){
  if ( isInsideTriangle( x, y, m1, c1, m2, c2, gapY)){
    res <- 0.5 * (m1 * x + c1 - y) / height
    return (max(0, min(1, res))) 
  } else
    return (0)     
}  
#################################################################
isInsideTriangle <- function(x,y, m1, c1, m2 ,c2, gapY){
  
  if (  ( y >  gapY) 
      & ( y < ( m1 * x + c1 ) ) 
      & ( y < ( m2 * x + c2 ) ) )
    return(T)
  else 
    return (F)
}

And here is the code used to generate the animated GIFs contains the ripples and the spiral
# Author: Philip Kinlen, Aug 2014
###################################################################
library( caTools)  # library required for write.gif(..)

###################################################################
animateRipples   <- function(){
  
  filePath       <- paste(getwd(), "/animatedRipples.gif", sep="")
  numImages      <- 25
  numSecForCycle <- 1
  
  numPx          <- getNumPixels()
  phases         <- -2 * pi * ( 0:(numImages-1)) / numImages
  
  # The following line does the main work generating the data. 
  # The call to sapply(..) does the work that would have been done by a for-loop in
  # a language such as a Java.
  zeds           <- sapply(phases, rippleMat, simplify="array")    

  delayCS        <- 100 * numSecForCycle / numImages  # in hundreths of a second
  
  write.gif( zeds, filePath, col= getCustomColors(), delay = delayCS )
  
}
###################################################################
animateSpiral     <- function(){
  # for MS Windows might need to set the separator argument to a back-slash
  filePath        <- paste(getwd(), "animatedSpiral.gif", sep="/")
  
  numImagesPerSec <- 20
  numSecForCycle  <- 5
  
  numImages       <- numImagesPerSec * numSecForCycle
  numPx           <- getNumPixels()
  
  phases          <- 2 * pi * ( 0:(numImages-1)) / (numImages * getSpiralBeta())

  # The following line does the main work generating the data. 
  # The call to sapply(..) does the work that would have been done by a for-loop in
  # a language such as a Java.
  zeds            <- sapply( phases, spiralMat, simplify="array")
  
  delayCS         <- 100 * numSecForCycle / numImages  # in hundreths of a second
  
  write.gif( zeds, filePath, col= getCustomColors(), delay = delayCS )
}
###################################################################
getSpiralBeta <- function(){
  return ( 0.2)
}
###################################################################
getNumPixels <- function(){
  return ( 200)
}
###################################################################
getCustomColors   <- function(){
  maxCol     <- 255
  
  rising     <- 0:maxCol
  falling    <- maxCol - rising
  
  myRed      <- c(falling, rising) * 0.4 + 0.2
  myGreen    <- c(falling, rising) * 0.4 + 0.2
  myBlue     <- c(falling, rising) * 0.8 + 0.2
  
  return (rgb(  myRed, myGreen, myBlue,  maxCol, maxColorValue=maxCol ))
}  
###################################################################
# generate a matrix representation of a ripple
rippleMat <- function( phase, numPx = getNumPixels(), decay = 2.5, freq = 12 * pi){
   x <- seq( -1,  1, 2/ (numPx-1))
   y <- x 
   
   # We have a matrix r[i,j] = sqrt( x[i] * x[i] + y[j] * y[j])
   r <- sqrt( outer( x*x, y*y, "+"))   
   
   h <- exp( - decay * r ) * cos( phase + freq * r )
   return (h)
}
###################################################################
spiralMat    <- function(phase, beta = getSpiralBeta() , numPx= getNumPixels(), generateImage= FALSE){

  alpha    <- 0.1   # the main spiral follows: r = exp( alpha * phi)
  
  x        <- seq(-1, 1, 2 / (numPx - 1))
  y        <- x
  matR     <- sqrt(outer(x*x, y*y, "+"))
  
  matX     <- matrix(rep(x, numPx), numPx, numPx, F)  
  matY     <- matrix(rep(y, numPx), numPx, numPx, T)  
  theta    <- ( atan2(matY, matX) + pi) + phase

  # Note that:
  #      r >= exp( (2 * pi * (rev-1) + theta) * alpha)  
  # and  r <= exp( (2 * pi *  rev    + theta) * alpha)
  # where rev is the number of revolutions 
  rev      <- ceiling((log(matR)/ alpha  - theta) /( 2 * pi))

  phi      <- rev * 2 * pi + theta
  z        <- sin(phi * beta)

  if ( generateImage)
    image( z,  col = getCustomColors())
  else 
    return (z)
}
###################################################################

Tuesday, July 15, 2014

Colliding Bullets

Each month, IBM publish a mathematical puzzle. For example in May 2014 they posted The Colliding Bullets Puzzle:

Every second, a gun shoots a bullet in the same direction at a random constant speed between 0 and 1.
The speeds of the bullets are independent uniform random variables. Each bullet keeps the exact same speed and when two bullets collide, they are both annihilated.
After shooting 20 bullets, what is the probability that eventually all the bullets will be annihilated?
Please supply the answer rounded to the 10th decimal digit.


It is reasonably straight-forward to let a Monte-Carlo engine work it out.
For example see this link to Java code in github.

However, to get an answer to an accuracy of 10 decimal digits would take a lot of computing power or a more clever algorithm.

Let \(v_i\) be the speed of bullet i
( I’m going to use a zero based numbering system, not because I really like it, but rather to be consistent with the Java code. It makes it simpler to transition between the two. ).

We're assuming that the bullets are infinitesimally small and the the probability of 3 colliding together at the same point in time and space is (almost surely) zero.
Also, we note that collisions can only take place between odd and even bullets. We can't have two even-numbered bullets colliding together.
To explain why, we consider what would happen if two even bullets were to collide. Then we would need all the intermediate bullets to have been annihilated. And the number of bullets between say bullet 2m and 2(m+n) is 2n-1 which is odd. But bullets only annihilate in pairs. So we can't have an odd number of intermediate bullets annihilating. Thus two even bullets cannot collide with each other. Similarly two odd bullets cannot collide with each other.
(Far) below we deal with the case when we have 4 bullets but first we'll look at an more general case, with generalized initial conditions.
Suppose \(2N\) bullets are fired. We will label the starting condition to be \(\mathcal{J}_{2N} (\textbf{X},\textbf{t})\):
each bullet has position \(X_i\) with
\(X_i \ge X_{i+1} \) and \(X_i \ge 0 \ \ \ \ \ \forall i: 0 \le i \le N-1 \)

start times: \(t_i \) with
\(t_i \le t_{i+1} \) and \(t_i \ge 0 \ \ \ \ \forall i: 0 \le i \le N-1 \)

and speeds as before \(v_i \sim U(0,1) \)

We define \(\mathcal{A}_{2N} \) to be the probability that all bullets released in our generalized version will be annihilated.
In the case when there are 2 bullets ( \(N=1 \) )
they will collide when \(v_0 < v_1\).
Otherwise they won't.
So the probability that they annihilate each other is: \[\mathcal{A}_{2} = \frac{1}{2}\] Now consider the case when we have an even number of bullets and there are more than 2 of them, i.e \(N>1\):
The only case when there are no collisions at all is when the velocities are strictly ordered: \[v_i > v_{i+1} \ \ \ \ \ \forall i: 0 \le i < N-1 \] In that case, the first bullet is the fastest and the last is the slowest.
On the other hand, if last is not the slowest, then we know there will be at least one collision.
The probability that the last is not the slowest is: \(p_{2N} = \frac{2N-1}{2N} \).
In this case, when the last is not the slowest there will a first collision. We don't know which pair will collide first, but we know that the first collision will definitely occur. After this first collision occurs, we then have 2 fewer remaining bullets.
Immediately after the first collision we will then have a valid starting condition \(\mathcal{J}_{2N-2}(\textbf{X}',\textbf{t}') \). If all the original 2N bullets are to be annihilated, then we will need the remaining (2N-2) bullets to be annihilated and so we find: \[\mathcal{A}_{2N} = \frac{2N-1}{2N} \mathcal{A}_{2N-2}\] Using this recursive definition we find: \[\mathcal{A}_{2N}=\prod_{n=1}^N\frac{2n-1}{2n}\]

Alternative approaches

Integration

Another way to try to get the solution is to analytically do integrals over the probability density functions and thus end up with an exact solution. Now lets move on to the case when we have 4 bullets.
We'll introduce some notation. The probability that all 4 bullets will be annihilated is:
\[A_4= \Phi (**,|/) + \Phi (\bigwedge,\bigwedge) + \Phi (|/,\bigwedge) \] Where:
\(\Phi (**,|/)\)is the contribution from the case when we don’t have any restrictions on the first pair and \(v_2 < v_3\).
I.e. the second pair is NOT on a collision course.

\(\Phi (\bigwedge,\bigwedge)\) is the contribution from the case when \(v_0 < v_1\) and \(v_2 < v_3\).
I.e. that's the case when the first are on a collision course and so are the second pair. Though it doesn't necessarily mean that the first pair will collide with each other.

\(\Phi (|/,\bigwedge)\) is the contribution from the case when \(v_1 < v_0\) and \(v_2 < v_3\).
i.e. the first pair are NOT on a collision course, but the second pair are.

First lets evaluate \(\Phi (**,|/)\). In this case the second pair are not on a collision course.
For all to be annihilated we must have a collision between bullets 1 and 2 and then a collision between 0 and 3.

We first work out the minimum velocity of \(v_0\) required to ensure that bullets 0 and 1 do not collide before the collision of bullets 1 and 2.
With a little bit of algebra ( and or geometry ) we find that minimum velocity to be: \[\beta=\frac{v_1 v_2}{2 v_2 - v_1}\] So we have the constraints:
\[0 < v_2 < 1 \] \[0 < v_1 < v_2 \] \[\beta < v_0 < v_2 \] \[v_0 < v_3 < v_2 \]
Note that the probability density function of each of the velocities \(v_i\) is 1 inside the range 0 to 1. And it is zero outside.
The (subtle) difference between the condition \( v_2 < 1 \) and \( v_2 \leq 1 \) is on a set of zero measure which won't affect our results.
Integrating over the probability space we find: \[\Phi (**,|/)=\int_0^1 dv_2 \int_0^{v_2}dv_1 \int_{\beta}^{v_2}dv_0 \int_{v_0}^{v_2}dv_3 \] Doing the \(dv_3\) integral we find: \[\Phi (**,|/)=\int_0^1 dv_2 \int_0^{v_2}dv_1 \int_{\beta}^{v_2}dv_0 ( v_2 - v_0) \] No doing the \(dv_0\) integral we find: \[\Phi (**,|/)=\int_0^1 dv_2 \int_0^{v_2^2}dv_1 \left( \frac{v_2}{2} - \frac{v_1 v_2^2}{2v_2 - v_1} + \frac{1}{2} \left( \frac{v_1 v_2}{2v_2 - v_1} \right)^2\right) \] To evaluate the \(dv_1\) integral takes a bit of effort.
After doing the work we find: \[\Phi (**,|/)=\int_0^1 dv_2 ( v_2^3 [ 3 - 4 ln(2)]) \] and so: \[\Phi (**,|/)= \frac{3}{4} - ln(2) \approx 0.0568528 \]
Now we’ll evaluate \(\Phi (\bigwedge,\bigwedge)\).
In this case \(v_0 < v_1\) and \(v_2 < v_3\).
Suppose it is also true that \(v_1 < v_2\).
In that case we can deduce that \(v_0 < v_3\)
Thus if \(v_1\) and \(v_2\) collide then so too will \(v_0\) and \(v_3\).
And so all 4 will have been annihilated.

On the other hand if \(v_1\) and \(v_2\) don't collide, then we know that the first pair will collide and so will the second pair.
Either way all 4 bullets will be annihilated.
\[Prob(v_0 < v_1)= \frac{1}{2} \] and \[Prob(v_2 < v_3)= \frac{1}{2} \] So \[\Phi (\bigwedge,\bigwedge)=\frac{1}{2} \frac{1}{2} = \frac{1}{4}\]
Now we move on to: \(\Phi (|/,\bigwedge)\)
We're dealing with the case when the first pair is not on a collision course, but the second pair are. For all 4 bullets to be annihilated we requrie bullets 1 and 2 to collide and then bullets 0 and 3 to collide. We have conditions: \[v_1 < v_0 < v_3 \] \[v_1 < v_2 < v_3 \] However, if we just integrate over those ranges, then we'll have over counted. Because we will have also included a contribution from the case when bullets 2 and 3 collide before bullets 1 and 2. And when that happens bullets 0 and 1 won't be annihilated. So we need to subtract a bit which we'll label \(\phi\)
Thus: \[\Phi (|/,\bigwedge) =\left( \int_0^1 dv_3 \int_0^{v_3}dv_1 \int_{v_1}^{v_3}dv_0 \int_{v_1}^{v_3}dv_2 \right) - \phi \] doing that quadruple integral we find: \[\Phi (|/,\bigwedge) =\frac{1}{12} - \phi \]


Showing the over-counting when bullets 0 and 1 are not annihilated.

Now we turn to that extra bit ( \(\phi\) ).
We have the same constraints as above: \[v_1 < v_0 < v_3 \] \[v_1 < v_2 < v_3 \] but we have some extra constraints, which we'll now look into. We work out the time of collision between 1 and 2 to be: \[T_{12}=\frac{2v_2 - v_1}{v_2 - v_1}\] In this extra bit, that we've over-counted, bullet 3 will have already overtaken bullet 2 before the collision between 1 and 2. I.e. bullet 3 will have passed the (1,2) collision at time \(T_{12}\):
\[v_2(T_{12} -2) < v_3 ( T_{12} - 3 )\] That implies: \[ v_1 v_2 < v_3 ( 2v_1 - v_2) \] and we must have: \[2 v_1 > v_2 \] and \[v_3 > \frac{v_1 v_2}{2v_1 - v_2}\]
The lower limit of the integral over \(v_3\) will be \(\frac{v_1 v_2}{2v_1 - v_2} \), so we need to ensure that: \[ \frac{v_1 v_2}{2v_1 - v_2} < 1 \] and so we require: \[v_1 > \frac{v_2}{2-v_2} \]
Thus \[\phi = \int_0^1 dv_2 \int_{\frac{v_2}{2-v_2}}^{v_2}dv_1 \int_{\frac{v_1 v_2}{2v_1 - v_2}}^{1}dv_3 \int_{v_1}^{v_3}dv_0 \] So: \[\phi = \int_0^1 dv_2 \int_{\frac{v_2}{2-v_2}}^{v_2}dv_1 \int_{\frac{v_1 v_2}{2v_1 - v_2}}^{1}dv_3 (v_3 - v_1) \] and hence: \[\phi = \int_0^1 dv_2 \int_{\frac{v_2}{2-v_2}}^{v_2}dv_1 \left( \frac{1}{2} - v_1 + \frac{v^2_1 v_2}{2v_1 - v_2} - \frac{1}{2} \left( \frac{v_1 v_2}{2v_1 - v_2} \right)^2 \right) \] It does indeed take a bit of work to show that it evaluates to: \[\phi = \int_0^1 dv_2 \left( \frac{v_2}{2} - \frac{5v_2^2}{8} + \frac{v_2^3}{2} + \frac{v_2^3-4v_2}{8(2-v_2)} + \frac{2v_2^2-3v_2^3+v_2^4}{4 (2-v_2)^2} \right)\] (To work out the integral this link may be helpful. )
We find: \[\phi= \frac{6}{24} - \frac{5}{24} + \frac{3}{24} - \frac{4}{24} + \frac{17}{24} - ln(2)\] So we end up with: \[\phi= \frac{17}{24} - ln(2)\] and we now have: \[ \Phi (|/,\bigwedge) = \frac{1}{12} - \left( \frac{17}{24} - ln(2) \right) = ln(2) - \frac{5}{8} \] We return to the probability that all 4 bullets will be annihilated: \[A_4= \left[ \Phi (**,|/) \right] + \Phi (\bigwedge,\bigwedge) + \left\{ \Phi (|/,\bigwedge) \right\} \] We now have evaluated all the terms on the right hand side: \[A_4= \left[ \frac{3}{4} - ln(2) \right] + \frac{1}{4} + \left\{ ln(2) - \frac{5}{8} \right\} \] And so: \[A_4 = \frac{3}{8}\]
So we now know that, when 4 bullets are fired, the probability that all will be annihilated is \(\frac{3}{8}\)

Let's not integrate:

I am aware that for some people, when they are imagining a fun evening, thoughts of doing quadruple integrals by the fireside do not enter their mind. For such individuals, we offer an alternative approach to working out \(A_4\) without doing any integration at all.
We write the probability that all 4 will be annihilated: \[A_4=\Phi \left( \bigwedge , \bigwedge \right) + \Phi \left( |/ , |/ \right) + \Phi \left( |/ , \bigwedge \right) + \Phi \left( \bigwedge, |/ \right) \]
\( \Phi \left( \bigwedge , \bigwedge \right) \) is the contribution from the case when
\( (v_0 < v_1 ) \) and \( (v_2 < v_3 ) \)
We have already worked that out above, without using integration and found it to be: \[ \Phi \left( \bigwedge , \bigwedge \right) = \frac{1}{4} \]
Now moving on to \( \Phi \left( |/ , |/ \right)\) which is the contribution from the case when
\( (v_1 < v_0 ) \) and \( (v_3 < v_2 ) \)
When that is true, all 4 bullets will be annihilated when we have the additional conditions:
\( (v_0 < v_3 ) \) and \( (v_1 < v_3 ) \)
So we have the conditions: \[ v_1 < v_0 < v_3 < v_2 \] There are \(4!=24\) ways in which the 4 \(v_i \)s can be ordered. Each has the same probability \( \frac{1}{24} \). Only 1 of the 24 permuations obeys the conditions that we have here and so we can deduce that: \[\Phi \left( |/ , |/ \right) = \frac{1}{24} \]
Now moving on to: \( \Phi \left( |/ , \bigwedge \right) \). Remember, it is the contribution to \(A_4 \) from the case when the first pair not on a collision course, but the second pair are. So we have conditions:
\( (v_1 < v_0 ) \) and \( (v_2 < v_3 ) \)
for all to be annihilated, we have additional constraints: \( (v_0 < v_3 ) \) and \( (v_1 < v_2 ) \)
and we also require that bullet 2 does not collide with 3 before 1.
After a bit of work, we find that that condition is: \[ 2 v_1 v_3 < v_2 ( v_1 + v_3) \] We'll denote the probability to be \(P\left\{.\right\}\)
So we have: \[ \Phi \left( |/ , \bigwedge \right) = P \left\{ (v_1 < v_0 ) \& (v_2 < v_3 ) \& (v_0 < v_3 ) \& (v_1 < v_2 ) \& \left[ 2 v_1 v_3 < v_2 ( v_1 + v_3) \right] \right\} \] We'll return to that in a moment. But first we'll examine \( \Phi \left( \bigwedge, |/ \right) \). Using a similar argument we find that: \[ \Phi \left( \bigwedge, |/ \right) = P \left\{ (v_0 < v_1 ) \& (v_3 < v_2 ) \& (v_0 < v_3 ) \& (v_1 < v_2 ) \& \left[ 2 v_0 v_2 > v_1 ( v_0 + v_2) \right] \right\} \] We now note that the \(v_i\) are independent, identically distributed random variables and so we could permute the indices on the right hand side of the above equations without changing the value. On the right hand side of the equation immediately above change: \[i_{new} = (i_{old} + 1) mod \ 4\] And so we have: \[ \Phi \left( \bigwedge, |/ \right) = P \left\{ (v_1 < v_2 ) \& (v_0 < v_3 ) \& (v_1 < v_0 ) \& (v_2 < v_3 ) \& \left[ 2 v_1 v_3 > v_2 ( v_1 + v_3) \right] \right\} \] Now note that the right hand side that equation contains a term: \[ 2 v_1 v_3 > v_2 ( v_1 + v_3)\] And the formula above for \( \Phi \left( |/, \bigwedge \right) \) contains the same term with the '>' switched to a '<'. When we're working out the probabilities we can ignore '=' case, since that is on a set of zero measure.
Now note that generally speaking: \[P\left\{K \& L \right\} + P\left\{K \& \ not(L) \right\} = P\left\{K \right\}\] And so we find: \[\Phi \left( \bigwedge, |/ \right) + \Phi \left( |/, \bigwedge \right) = P \left\{ (v_1 < v_0 ) \& (v_2 < v_3 ) \& (v_0 < v_3 ) \& (v_1 < v_2 ) \right\}\] Of the 24 permutations of \(v_i\), we find that just 2 obey those conditions and so \[\Phi \left( \bigwedge, |/ \right) + \Phi \left( |/, \bigwedge \right) = \frac{2}{24} = \frac{1}{12}\] Note that we have not worked out what each of the terms on the left hand side is equal to, but we have worked out their sum, which is what we needed.
Now returning to \(A_4\) we find: \[A_4=\frac{1}{4}+ \frac{1}{24}+ \frac{1}{12} = \frac{3}{8}\]

Friday, June 13, 2014

Optimal Overhedge

Consider a contract ( contingent claim ) with payoff that is dependent on the stock price at maturity \((P(S_T))\). The value of such a contract at time t with \(t\leq T\) will be: \[V_t= \mathbb{E}_t\left\{ P(S_T) \right\} \] ( For now we're assuming zero interest rates . )
If a firm sells such a contract and wishes to hedge the risk, one idea would be to buy \(\Delta\) units of the stock \(S_t\), thus the net protfolio value would be: \[Q_t= S_t \Delta - V_t \] To find out the exposure ( sensitivity ) of the portfolio to the stock would could take the partial derivative wrt \(S_t\): \[\frac{\partial Q_t}{ \partial S_t} = \Delta - \frac{\partial V_t}{\partial S_t} \] So if \[ \Delta = \frac{\partial V_t}{\partial S_t} \] then \[\frac{\partial Q_t}{ \partial S_t} = 0\] In which case, at that instance we're hedged against small moves in the stock price.

Leaving aside any formal mathematics for a moment...
Broadly speaking when the payoff \((P(S_T))\) has bumps and kinks we find that the contract valuation \(V_t(S_t)\) is a significantly more smooth function, since the value \(V_t\) is an expectation, so the kinks get averaged out. That said: \[\lim_{t\to T}V_t(S_t)=P(S_T)\] and also \[\lim_{t\to T}\Delta_t=\lim_{t\to T}\frac{\partial V_t}{\partial S_t}=\frac{\partial P}{\partial S_T}\] Thus when we have constraints on the delta ( first derivative ) and indeed gamma ( second derivative) of \(V_t\), we need to be sure that the constraints hold for the payoff \(P(S_T)\).

One practical problem with having high gamma is that it means the delta will change a lot for small changes in the underlying. So if a trader is attempting to keep a portfolio delta neutral, he'll need to rebalance frequently and the rebalances will be large. In such a scenario there is the potential to loose a significant amount of money on transaction costs. Also high gamma normally comes with high theta ( time sensitivity ) and vega ( sensitivity to volatility).

Suppose the contract that we wish to hedge is a barrier option with payout at time T: \[P(S_T)= \begin{cases} 0 \ \ \ \ \ when S_T < K \\ N \ \ \ when \ S_T \geq K \end{cases} \] This is a step function. It is not continuous and the partial derivative \(\frac{\partial P}{\partial S_T}\) does not exist at \(S_T=K\).
Thus we it seems we won't be able to hedge away the exposure to \(S_T\).
So what do we do?
Rather than hedge the actual contract, we could make a synthetic contract which does have nicely behaved partial derivatives. Consider a payoff which consists of the real payoff \(P(S_T)\) plus an "overhedge" \(H(S_T)\), such that \[H(S_T) \geq 0\ \ \forall S_T\] Note that the client does not receive \(H(S_T)\) but the contract we will hedge does include a contribution from it. When we work out a price that includes \(H(S_T)\) it will be higher than it would have been without the overhedge. But to be able to hedge the risks we need a "reasonable" contract, which is nicely behaved.
What we want is a minimal overhedge \(H(S_T)\) such that the delta and gamma are contained.
Let the synthetic contract with the overhedge have payoff: \[\Phi (S_T)= P(S_T) + H(S_T)\] (Why did we choose that Greek letter? Well adding P and H we get PHI )
We're going to insist that the first derivative ( \(\frac{\partial \Phi}{\partial S_T}\) )is continuous, because we want the second derivative ( gamma ) to exist and not be infinite.

Let's suppose we have the following caps and floors as constraints: \[-\Delta_F \leq \frac{\partial \Phi}{\partial S_T} \leq \Delta_C\] and \[-\Gamma_F \leq \frac{\partial ^2 \Phi}{\partial S_T^2} \leq \Gamma_C\]
With \(\Delta_F \geq 0 \), \(\Delta_C \geq 0 \)
and \(\Gamma_F \geq 0 \), \(\Gamma_C \geq 0 \)

So now the question is: what is the minimal overhedge which obeys those constraints?

Well for an analogy, suppose S was time t and H was position.
We are starting at rest at position N.
We wish to get home in our car as quickly as possible to position H=0
and we have a maximum acceleration, a maximum deceleration and a speed limit.
Then what should we do?
Clearly we should use maximum acceleration until we reach the speed limit.
We should then continue at the speed limit for as long as possible until near home
and then apply the breaks, using maximum deceleration until we neatly come to rest at home.

Similarly in this we case we break H up into 3 sections:
\(H_1(S_T)\) is the accelerating phase, ( quadratic in \(S_T\))
\(H_2(S_T)\) is the linear phase ( i.e. at the speed limit).
Then \(H_3(S_T)\) is the decelerating phase.
After (quite) a bit of algebra we find:
\(H_1(S_T)=N-(S_T-K)^2\ \Gamma_F\ \ \ \ \ \ \ \ \ \ \ \ \) in the domain: \(K \leq S_T < \Lambda_1\)
\(H_2(S_T)= H_1(\Lambda_1)-(S_T-\Lambda_1)\Delta_F\ \ \ \ \) in the domain: \(\Lambda_1 \leq S_T < \Lambda_2\)
\(H_3(S_T)=(S_T-\Lambda_3)^2\ \Gamma_C\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \) in the domain: \(\Lambda_2 \leq S_T \leq \Lambda_3\)

With:
\[\Lambda_1=K+\frac{\Delta_F}{2 \Gamma_F}\] \[\Lambda_2=\Lambda_1 + \frac{H_1(\Lambda_1)}{\Delta_F}-\frac{\Delta_F}{4\Gamma_C}\] \[\Lambda_3=\Lambda_1 + \frac{H_1(\Lambda_1)}{\Delta_F}+\frac{\Delta_F}{4\Gamma_C}\] We can see that: \[\Lambda_3=\Lambda_2+\frac{\Delta_F}{2\Gamma_C}\]
Note that \(H(S_T)\) is zero before K and after \(\Lambda_3\).

To derive the above equations we use the fact that, when we set: \[H_2(S_T)-H_3(S_T)= 0\] we have a quadratic in \(S_T\) and that quadratic should have a single root at \(S_T=\Lambda_2\).
I.e. the parabola \(H_3(S_T)\) is a tangent to the line \(H_2(S_T)\) at \(S_T=\Lambda_2\).

Returning to the analogy of using the car to get home quickly: Suppose you start off near home and the acceleration in the car is not very high. Then perhaps you'll need to start decelerating before you reach the speed limit.
In our case that happens when \[\Lambda_1 > \Lambda_2 \] So now \(H_2\) doesn't come into play and we need to find where is the transition from \(H_1\) to \(H_3\). I'll leave it as an exercise for the reader to work it out.

Question for the reader:
If you make some "reasonable" assumptions on the distribution of \(S_T\)
and given that \[\frac{\partial^2 P}{\partial S_T} \leq \Gamma_C \ \ \ \ \ \forall S_T\] then show that \[\frac{\partial^2 V_t}{\partial S_t} \leq \Gamma_C \ \ \ \ \ \forall S_t \ and \ t\] What assumptions did you need to make?
Suggestion: you might consider starting with the log-normal distribution case without drift or interest rates, then generalize.


Summary:
So we have derived the optimal overhedge for a barrier option subject to constraints on delta and gamma. And the synthetic contract which contains the overhedge is continuous and its first derivative is continuous.

Tuesday, June 3, 2014

Changing off diagonal elements in a Covariance Matrix

If you try to change one off-diagonal element of a positive definite covariance matrix, you may find that the adjusted matrix is no longer positive definite. When one element is changed, others often need to be adjusted too, but how do we do this?

Here's a suggestion that was originally developed for looking at exchange rate ( FX ) covariance matrices:

Let \(x^b_a\) be the cost of 1 unit of currency 'b' quoted in currency 'a'.

So we have the standard FX relations:
\[x^a_b=\frac{1}{x^b_a}\] and
\[x^a_b={x^a_c} {x^c_b}\] We will start with a simple model without drift and zero interest rates. Suppose we have a stochastic model:
\[\frac{dx^a_b}{x^a_b}=\sigma^a_b dw^a_b\] with
\[\mathbb{E}(dw^a_b)=0\] and
\[\mathbb{E}\left((dw^a_b)^2\right)=dt \]
Suppose we have a base currency (numeraire) 'a', then we can write: \[\sigma^{bc}_a=E\left(\frac{dx^b_a}{x^b_a} \frac{dx^c_a}{x^c_a}\right)\frac{1}{dt}\]
eqn (i)
The diagonal of the covariance is the vol squared: \[\sigma^{bb}_a=(\sigma^b_a)^2\]
If we change the numeraire, then what’s the effect on the covariance?
To work it out we start by considering: \[\frac{dx^c_d}{x^c_d}=\frac{x^d_a}{x^c_a}d\left(\frac{x^c_a}{x^d_a}\right)=\frac{dx^c_a}{x^c_a}-\frac{dx^d_a}{x^d_a}+\Lambda\] Using Ito’s lemma we find that \(\Lambda\) only contains terms that are either higher order or are non-stochastic, but when we are looking at the covariance, all of \(\Lambda\) can be ignored. And so: \[\sigma^{bc}_d=\mathbb{E}\left(\frac{dx^b_d}{x^b_d} \frac{dx^c_d}{x^c_d}\right)\frac{1}{dt}=\mathbb{E}\left(\left(\frac{dx^b_a}{x^b_a}-\frac{dx^d_a}{x^d_a}\right)\left(\frac{dx^c_a}{x^c_a}-\frac{dx^d_a}{x^d_a}\right)\right)\frac{1}{dt}\]
eqn (ii)
Now using eqn (i) we find: \[\sigma^{bc}_d=\sigma^{bc}_a-\sigma^{bd}_a-\sigma^{cd}_a+\sigma^{dd}_a\]
eqn (iii)

This is our change of numeraire formula.

One little exercise that may be worth trying is to check that if we change the numeraire from ‘a’ to ‘d’ and then back again that we do indeed recover the original value. i.e. what you need to do is to get the four terms on the RHS of (iii) and change their numeraire to ‘d’. You’ll then have sixteen terms. After doing lots of cancellation and noting that some of the terms are identically zero you’ll recover the original covariance, i.e. LHS of (iii)

For a given numeraire ( base currency ) say ‘a’ we have a covariance matrix over the indices ‘b’ and ‘c’: \(\sigma^{bc}_a\)
Very often when working with covariance matrices, it can be very helpful if it is positive definite.
Suppose we have a covariance matrix which we have estimated from historical returns and now we want to change one element. Perhaps we have some good implied vol data on that FX pair. It can be problematic changing one element in the covariance matrix since it is very easy to lose the positive-definite attribute. If you change one element then there should be a ripple effect on neighbouring elements.
But how do we evaluate that ripple effect?

We for a start we could break up the covariance matrix into a set of vols and correlations: \[\sigma^{bc}_a=\sigma^b_a\sigma^c_a\rho^{bc}_a\] If we adjust the vol ( to some non-zero value) while leaving the correlation alone, then the PD attribute will be preserved. Note the the diagonal elements of the covariance are just the vols squared.

So, if we want to change an off diagonal element what we do is change the numeraire, so that that element comes onto the diagonal.
We then split up new covariance matrix into vols and correlation.
We update the vol, leaving the correlation fixed, reform the covariance matrix and then return to the original numeraire.

The question for the reader is when ( if ever ) will the above process not preserve the PD attribute?

Monday, April 28, 2014

Surname's on an island



On an isolated island each person has
1/5 probability of having 0 kids
1/5 probability of having 1 kid
1/5 probability of having 2 kids
1/5 probability of having 3 kids
1/5 probability of having 4 kids

Heterosexual partners are chosen at random from the population within the given generational cohort.

It is an old fashioned place where the men pass on their surname to their children,
the women don't.

Initially, the surname Klimpt is not very common. Just 1.00% of the total population.

After 100 generations what is the probability there there will be at least one person called Klimpt in the youngest generation if:
      (i) the initial generational cohort had 400 members
or   (ii) the initial generational cohort had 4,000 members

Express your answer as a percentage with two digits after the decimal point.

Assume that the starting population is half men, half women and that there is a 50-50 chance that each child born is a boy or girl.

Hint: you are probably going to need a computer for this one.

The Prince's Ancestors

Part A:
Once upon a time, there lived a fine young prince. He was next in line to the throne in a dynasty that stretched back for centuries. He had the official records that showed that his father's father's father's... 20 generations back had effectively created a country by unifying warring tribes.

Though it should be noted that his male line ancestors were not always very good husbands, and indeed their wives sometimes had lovers when the king was away fighting wars.

At each generation is was estimated that there was a 10% chance that the father's name on the birth certificate was not in fact the biological father.

What is probability that the man that the prince thinks is his father's father's father ... 20 generations back on the male line is indeed his true biological male line ancestor?

Background maths level required: high-school.
Difficulty level: fairly straight-forward.



Part B: 
Over the course of every generation, immigration caused a 10% population increase.

The true figures for the number of biological children each person had are as follows:
20% of people had no children that survived
20% of people had 1 child
20% of people had 2 children
20% of people had 3 children
20% of people had 4 children

Assume that when people paired off to reproduce, their partner was effectively just randomly chosen from their generation of population.

At the time of the founding king, his generation accounted for 1 million people, though the full population of the country was higher if we include the older people who were alive at the time.

What is the probability that the founding king was indeed an ancestor of the young prince?
Please give your answer as a percentage with two digits after the decimal point.

Level of maths required: some basic university maths is probably required.
Level of difficulty: for someone who can write a bit of code, fairly straight-forward


A witch with a compass

Wilma the Witch was given a new electronic compass that could tell her which way was north. It pointed to true north rather than magnetic north.

So she looked at her compass, found out which direction was north-east, put the compass away in her pocket and set out on her broom in a north-easterly direction. She continued as straight as she could over the surface of the earth for 1,000 km, without adjusting left or right.

She then took out her compass and found that even though she had started out by heading north-east and had continued going straight she was no longer exactly on a north-east bearing.

So she now found out which way was east and headed east. This time, for a change, she keep looking at her compass and diligently maintained a constant bearing, ( i.e. she kept going exactly east ) and continued for 6,000 km.

Then she turned to the south-east and headed on a constant south-east bearing. After a while she found herself back at the starting point. She noted that she had crossed every line of longitude exactly once.

She decided to continue on a constant south-east bearing, she crossed the equator and eventually found herself at the south pole.

She then stopped.

How far did she travel in total?

You can assume that the earth is a sphere and that the distance from the equator to either pole it is 10,000 km.
Please also ignore any distance covered due to altitude gain traversing mountains.

Level of background mathematics required: some undergraduate university maths
Question difficulty level: a tad tricky.

Sunday, April 27, 2014

Average number of matches won

In Wimbledon's women's singles tournament there are 128 players. It is a knock-out tournament. I.e. after each round the losers are excluded from the next round.
The first round losers will have a win percentage of 0%.
The overall winner will have a 100% win percentage.

Part A: What's the average win percentage, (assuming the weighting of each player is equal)?

Part B: What's the average win percentage if each player is weighted proportional to how many matches she played?

Level of background mathematics required: high-school.
Difficulty level: straight-forward.

Thursday, April 24, 2014

Maths puzzle: a bit of trig

Here's a little mathematics puzzle.



Suppose points A, B, C are all on a circle
and we have a point P outside the circle such that
the line that passes through A and P is a tangent to the
circle ( at A )
 the line that passes through B and P is a tangent to the
 circle ( at B)

also the line segments PA and BC are parallel.

 Suppose |AP| = 3
 and |BC| = 2

 What is |AC| ?

Background level of mathematics required: high-school
Difficulty level: by high-school standards it is a bit tricky.


In a while I'll post the answer ...