Saturday, April 1, 2017

Efficient machine learning

Suppose we have a set of data from which we wish to train a machine learning algorithm and the data keeps growing and growing. We'd like an efficient algorithm that will put more weight on the recent results and we would like it to be straight forward to keep up dated, even if we have hundreds of millions of rows of data. What can we do? Well, in this post I will suggest an algorithm that is very efficient at dealing with large data sets. It can learn from old data, but it doesn't need the old data to be stored.
Let's start with the basics. Suppose we have a set of T input vectors \(\underline{x}^t \in \mathbb{R}^n\) with \[1 \le t \le T\] and for each \(\underline{x}^t\) there is a resultant \(y^t \in \mathbb{R} \).
We want to find the optimal vector \(\underline{\theta}\) so that \( \underline{\theta } . \underline{x}^t \) is a good predictor of \( y^t \).
We choose the \(\underline{\theta}\) which minimises the cost function: \[ J( \underline{\theta} ) = \frac{1}{2T} \sum^{T}_{t=1} ( \underline{\theta} . \underline{x}^t -y^t)^2 \] We could expand the inner product \( \underline{\theta} . \underline{x}^t \), which would give us: \[ J( \underline{\theta} ) = \frac{1}{2T} \sum^{T}_{t=1} \big[ \sum_{j=1}^n ( \theta_j x^t_j ) -y^t \big]^2 \] When we have an optimal \( \underline{\theta} \), the partial derivatives of J will be zero: \[ 0 = \frac{\partial J}{ \partial \theta_i} = \frac{1}{T} \sum^{T}_{t=1} x_i^t \big[ \sum_{j=1}^n ( \theta_j x^t_j ) -y^t \big] \] We can change the order of the summation and do some rearranging to obtain: \[ 0 = \sum_{j=1}^n \big[ \frac{1}{T} \sum_{t=1}^T ( x^t_i x^t_j ) \theta_j \big] - \frac{1}{T} \sum_{t=1}^T ( y^t x_i^t) \hspace{24 mm} (eqn 1) \] Now, if we define the matrix \(\underline{A}\): \[ A_{ij} = \frac{1}{T} \sum_{t=1}^T ( x^t_i x^t_j ) \hspace{24 mm} (eqn 2) \] and we define vector \(\underline{B}\) \[ B_i = \frac{1}{T} \sum_{t=1}^T ( y^t x_i^t) \hspace{24 mm} (eqn 3) \] Plugging those into equation 1, we find: \[ 0 = \sum_{j=1}^n \big[ A_{ij} \theta_j \big] - B_i \] If we can invert the matrix \(\underline{A}\) then we can obtain \( \theta \) \[ \underline{\theta} = \underline{A}^{-1} \underline{B} \] If we look back at equation 2, we see that each vector \( \underline{x}^t\) effectively has the same weighting.
We could rewrite equations 2 and 3 as: \[ A_{ij} = \sum_{t=1}^T ( x^t_i x^t_j \omega_t) \hspace{24 mm} (eqn 4) \] and \[ B_i = \sum_{t=1}^T ( y^t x^t_i \omega_t) \hspace{24 mm} (eqn 5) \] where \[ \omega_t = \frac{1}{T} \hspace{24 mm} \forall t: 1 \le t \le T \] However, suppose we wanted more recent values of \( \underline{x}^t\) to have higher weighting. We could fix the ratio of consecutive weights: \[ \frac{\omega_{t+1}}{\omega_t} = 1 + \lambda \] with \( \lambda > 0\)
So we would have: \[ \omega_t = \frac{\omega_T} { (1+\lambda)^{T-t}} \] Suppose want to find the speed at which the weight drops down to half the weight of the most recently added entry, then we would seek \(\tau\) such that \[ \frac{\omega_T}{\omega_{T - \tau}} = 2 \] which implies: \[ (1 + \lambda)^{\tau} = 2 \] so: \[ \lambda = 2^{1/ \tau } -1 \hspace{24 mm} (eqn 6) \] So, if we want the most recent data to have double the weighting of a point 1,000 rows back, then we would set \(\tau = 1000 \) and use equation 6 to determine \( \lambda \).
The parameter \( \lambda \) determines how much bigger the weights of the recent data will have, when compared with older data. When choosing it, it may be helpful to first choose \( \tau\) ( which is something like a half-life ) and then use equation 6 to evaluate \( \lambda \)

If we wish to preserve the condition that the sum of the weights is 1, then we can do the geometric sum and after a little algebra we find that: \[ \omega_T = \frac{\lambda}{(1+\lambda)^T-1} \hspace{24 mm} (eqn 7) \]
For a given set of \(\underline{x}^t \) and \(y^t\) with \(1 \le t \le T \)
we can evaluate \(\underline{A} \) and \( \underline{B}\).
Since they were generated with T rows of data, we could label them \(\underline{A}^T \) and \( \underline{B}^T\).
Now suppose we have already evaluated \(\underline{A}^{(T-1)} \) and \( \underline{B}^{(T-1)}\)
with T-1 rows of data and we want to introduce one more,
then we find: \[ A^T_{ij} = \omega_T x^T_i x^T_j + ( 1 - \omega_T ) A^{(T-1)}_{ij} \hspace{24 mm} (eqn 8) \] and \[ B^T_i = \omega_T y^T x^T_i + ( 1 - \omega_T ) B^{(T-1)}_i \hspace{24 mm} (eqn 9) \] where \( \omega_T \) has been defined in equation 7.
So when we have evaluated \( \underline{A} \) and \( \underline{B} \) for a given set of data and then later want to include the contribution from a new \( \underline{x}^T \) and \( y^T \), we can amend the existing \( \underline{A} \) and \( \underline{B} \) using equations 8 and 9, without the need to retrieve all the past \( \underline{x}^t \) and \( y^t \).
As a result, when data ( \( \underline{x}^T \) and \( y^T \) ) comes in, we can use it to update \( \underline{A} \) and \( \underline{B} \) and then we can discard the \( \underline{x}^T \) and \( y^T \). They have made their contribution to \( \underline{A} \) and \( \underline{B} \) and we can continually update \( \underline{A} \) and \( \underline{B} \) without the need to look back at old values of \( \underline{x}^t \) and \( y^t \). When T is very large, say in the hundreds of millions, using this algorithm to continually update the machine learning results is rather efficient. Since we don't need to keep retrieving the old data. We just keep the \( \underline{A} \) and \( \underline{B} \) up to date.

Monday, February 27, 2017

Backward Propogation Algorithm in a Neural Network

This post was inspired by the Coursera course on Machine Learning. In particular by the lesson in week 5. In that lesson the backward propagation algorithm is presented but the students were asked to take the formulae on faith. A derivation was not given. To rectify that, here I present a derivation of the main equations.

Suppose we have an input vector \( \underline{x} \) and we wish to make a prediction for the resultant vector \(\underline{y}\). We start by defining our activation level zero to be: \[ \underline{a}^0 = \underline{x} \] We define \[ z^{k+1}_i = \sum_{j}\theta^{k+1}_{ij} a^k_j \hspace{24 mm} (eqn 1) \] and we use the sigmoid function S(z) to evaluate: \[ a^{k+1}_i= S(z^{k+1}_i) \hspace{24 mm} (eqn 2) \] When we have \(\underline{a}^0\), we can use equations 1 and 2 to evaluate \(\underline{a}^1\).
We then use equations 1 and 2 again to evaluate \(\underline{a}^2\) and so on up to \(\underline{a}^L\)

Our forecast for \(\underline{y}\) is activation level L: \(\underline{a}^L\)

Suppose we are given a training set of T input vectors \(\underline{x}^t\)
and a corresponding set of resultant vectors \(\underline{y}^t\) where \(1 \leq t \leq T \).
For each input vector \(\underline{x}^t\) we will have a corresponding activation level L: \(\underline{a}^{Lt}\)
We can measure the discrepancy between \(\underline{y}^t\) and \(\underline{a}^{Lt}\). We'll call this D. We could have: \[ D(\underline{y}^t , \underline{a}^{Lt} )= \parallel \underline{y}^t - \underline{a}^{Lt} \parallel^2 \] For the rest of this posting we won't refer again to the right hand side of that equation. So if the discrepancy function D were defined differently, then it wouldn't make any difference to the formulae below.

When we sum up all the discrepancies and we'll call the result the cost function: \[ J = \frac{1}{2T} \sum_t D(\underline{y}^t , \underline{a}^{Lt} ) \] We are interested in finding the \(\theta\)s which minimise this cost function and so gives us accurate predictions.
One approach would be to find the partial derivatives \[ \frac{\partial J}{\partial \theta^k_{ij}} \] for all i,j,k and then use a gradient descent method to minimise J. \[ \frac{\partial J}{\partial \theta^k_{ij}} = \frac{1}{2T} \sum_t \frac{\partial D(\underline{y}^t , \underline{a}^{Lt})}{\partial \theta^k_{ij}} \hspace{24 mm} (eqn 3) \] We are now going to focus on the term: \[ \frac{\partial D(\underline{y}^t , \underline{a}^{Lt})}{\partial \theta^k_{ij}} \] For simplicity we are going to drop the t superscript. So we will write: \[ \frac{\partial D(\underline{y} , \underline{a}^L)}{\partial \theta^k_{ij}} \] where the t is implied. Though in the end, when we want to evaluate the partial derivative of J, we will need to do the sum over t.

We define \[ \delta^k_i = \frac{\partial D}{\partial z^k_i} \hspace{24 mm} (eqn 4) \] and when we apply the chain rule to the right hand side we get \[ \delta^k_i = \sum_j \frac{\partial D}{\partial z^{k+1}_j} \frac {\partial z^{k+1}_j}{\partial z^k_i} \] we substitute in \( \delta^{k+1}_j\) and we find: \[ \delta^k_i = \sum_j \delta^{k+1}_j \frac {\partial z^{k+1}_j}{\partial z^k_i} \hspace{24 mm} (eqn 5) \] Now we want to evaluate the last term \[ \frac {\partial z^{k+1}_j}{\partial z^k_i} \] to do that we first combine equations 1 and 2 to obtain: \[ z^{k+1}_j = \sum_l \theta^k_{jl} S(z^k_l) \] and so \[ \frac {\partial z^{k+1}_j}{\partial z^k_i} = \sum_l \theta^k_{jl} S'(z^k_l) I_{li} \hspace{24 mm} (eqn 6) \] where \[ S'(z) = \frac {d S(z) }{dz} \] and \[ I_{lj} = \{ 1 \hspace{4 mm} when \hspace{4 mm} l = j, \hspace{4 mm}otherwise \hspace{4 mm}0 \} \] So, returning to equation 6, we find only one term in the sum survives, i.e. when l = j.
Hence eqn 6 becomes: \[ \frac {\partial z^{k+1}_j}{\partial z^k_i} = \theta^k_{ji} S'(z^k_i) \hspace{24 mm} (eqn 7) \] When we substitute that into equation 5 we find \[ \delta^k_i = \sum_j \delta^{k+1}_j \theta^k_{ji} S'(z^k_i) \hspace{24 mm} (eqn 8) \] It follows from equation 1 that: \[ \frac{\partial z^{k+1}_i}{\partial \theta^k_{jl}} = a^k_l I_{ij} \hspace{24 mm} (eqn 9) \] Using the chain rule we find: \[ \frac{\partial D}{\partial \theta^k_{ij}} = \sum_l \frac{\partial D}{\partial z^{k+1}_l} \frac{\partial z^{k+1}_l}{\partial \theta^k_{ij}} \] Substitute in equation 9 and we find only one element of the sum survives and we obtain: \[ \frac{\partial D}{\partial \theta^k_{ij}} = \delta^k_j a^{k+1}_i \hspace{24 mm} (eqn 10) \] So we can go forward (increasing k's) using equations 1 and 2 to evaluate \( a^k_i\)
and then using equation 8, go backwards ( decreasing k's) to evaluate \( \delta^k_j \).
We will then be able to work out all the partial derivatives of D with respect to \( \theta\).
Remember there is an implied superscript t in equation 10. And to work out the partial derivatives of J we will need to do the sum over all t's as shown in equation 3.
After that, finding the optimal \( \theta \) 's using gradient descent will be as easy as sliding down a hill.

Wednesday, January 25, 2017

Solitaire of sorts

Suppose you had a well shuffled standard deck of 52 cards. From the top of the pack you turned over each card one at a time. As you turned over the first card, you called out 'Ace', then 2 for the next, followed by 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King. After the 14th card you called out 'Ace' again followed by 2, 3, 4 and so on till you reached the end.
What would be the probability that none of the cards that were turned over had the value (rank) that you called out?

Before working it, lets consider a related, but simpler question. Suppose you had 3 cards containing values 1,2,3. After you shuffled them, what would be the probability that none of the 3 were in the same position as they were at the start? In mathematical speak you'd ask what is the probability of a derangement.
In this case we can look at all 3! ( i.e 6 ) permutations. And we see that 2 out of 6 are derangements. So the probability of a derangement is 1/3.
You might be tempted to say that for each card, the probability that it is not in its original position is 2/3 and so for the 3 cards the total probability of a drangement is that to the power of 3, i.e. 8/27.
However the flaw with that method is that the probabilities are not independent.

Returning now to that deck of 52 with 4 suits and 13 ranks. There is of course more than one way to work it out. One method would be to write a Monte Carlo simulation. I've done that and the code is below.
Here is a sample output after 100 million simulations:

Found 1623551 survivors out of 1.0E8
So the survival probability is estimated to be: 0.01623551
with a standard deviation of: 1.2638005465673727E-5
Time elapsed: 77.742 seconds

So the probability of surviving to the end without getting any cards right is approximately 1.62%

OK, OK, MC is useful, but not very satisfying. You might think that you could just write a program to work out all 52! permutations and then work out the answer. But alas 52! is a very big number. So if you want an answer before you die, then it might be a good idea to try a different approach.

Well there is another way... In fact I'm sure there any many other ways. I present here a method that I used. Suppose we have slots numbered 1 to 13 and for simplicity we'll number the cards 1 to 13. At the start we have 13 different ranks and each have 4 cards and there are 4 slots available for each of the ranks.
We could represent that as a table of Ranks:
Cards0  1  2  3  4  

Now suppose we pick one card, then we'll have 12 card ranks with 4 cards remaining
and we'll have 1 card rank that has 3 cards remaining, but still 4 slots.
So in our table of ranks we'll represent that as:

Cards0  1  2  3  4  

But we'll have to put the card in one of the available slots. Of the 52 slots 48 are allowed.
When we do that we'll have
 1 rank with 4 slots and 3 cards remaining
 1 rank with 3 slots and 4 cards remaining
12 ranks with 4 slots and 4 cards remaining

In our table we represent that as:

Cards0  1  2  3  4  

Using such a table with the probabilities of the transitions, we can write some code that recursively solves the problem.

This is what the code looks like:


// check if the back slash char is causing problems

public class Calculator {
    public static void main ( String[] args){
        long startTime = System.currentTimeMillis();

        Calculator calc = new Calculator();

        calc.calc(); // calc() is the main calculator, calcDebug() is for debugging
        long   endTime     = System.currentTimeMillis();
        double elapsedSec  = ((double) endTime - (double) startTime) * 0.001d;
        System.out.println("\nElapsed time: " + Double.toString(elapsedSec) + " sec.");       
    public void calc(){
        System.out.println("Doing calc.");

        int suits = 4;
        int ranks = 9; // Total number of cards will be: suits * ranks
        State s = new State(suits, ranks);
        double survivalProb = s.getSurvivalProb();
        String key = s.getKey();
        String resStr = "\nFor key: " + key + " found prob to be: "  + String.format("%1.14f", survivalProb);
        String pathToFile = "C:/temp/res_" + key + ".txt";
        writeToFile(pathToFile, resStr );

    public void calcDebug(){
        System.out.println("Doing calc2.");
        //           "0000000000111111111122222"
        //           "0123456789012345678901234"
        String key = "1001010000000000100000000";

        State s = new State(key, 0);
        double survivalProb = s.getSurvivalProb();
        String resStr = "For key: " + key + " found prob to be: "  + String.format("%1.14f", survivalProb);
        String pathToFile = "C:/temp/res_" + key + ".txt";
        writeToFile(pathToFile, resStr );
    public void writeToFile( String pathToFile, String str){
            PrintWriter writer = new PrintWriter(pathToFile, "UTF-8");
        } catch(Exception e){
            System.out.println("Caught exception: " + e.getMessage());
    /*  Prob(1,4)  = 0.375
     *  Prob(2,2)  = 0.166667
     *  Prob(4,4)  = 0.011869
     *  Prob(4,8)  = 0.014967   ( takes 11.6 secs )
     *  Prob(4,13) = 0.016232

///////////////////////////////////////////////////////////////////////////////        ///////////////////////////////////////////////////////////////////////////////        ///////////////////////////////////////////////////////////////////////////////        

import java.util.Random;
import java.util.TreeMap;

// prob survival = sum ( numCardsLikeThis * numSafeLocations / TotalslotLocations
//                         * ProbSurvival( next))

// The key is a string 25 chars long,

public class State {
   private static final boolean            m_debug               = false; 
   private static final long               m_seed                = 1;
   private static final int                m_stepsBetweenCaching = 0; //

   private static int                      m_numSuits;
   private static TreeMap  m_map;
   private static Random                   m_rnd;

   private        String                   m_key;  
   private        double                   m_survivalProb;  
   private        int                      m_totalSlots;
   private        int                      m_depth;

   State ( int suits, int ranks){
       m_depth    = 0;
       m_numSuits = suits;
       String      zeros = new String(new char[(suits+1) * (suits+1)]).replace("\0", "0");
       String      key   = adjustKey(suits, suits, zeros, ranks);
       initialize( key );    
       if ( m_debug) printKey  ( key );
   State( String key, int depth){
       m_depth = depth;
   private void initialize(String key){
       if ( m_rnd == null)
            m_rnd =  new Random(m_seed); // formerly had: ThreadLocalRandom.current();

       if ( m_map == null)
            m_map =  new TreeMap();
       if( m_numSuits == 0){
           double sqrt = Math.sqrt((double) key.length());
           m_numSuits  = (int) Math.round(sqrt - 0.5f) -1;
           if ( m_debug)  {
              System.out.println(   "Have key of length: " + Integer.toString(key.length())
                                     + " and num suits: "     + Integer.toString(m_numSuits)   );
       m_key          = key;
       m_survivalProb = -1;   // This indicates that it has not been calculated.      
       m_totalSlots   = calcTotalSlots();   
       if ( m_debug)  printKey(key);
   private int calcTotalSlots(){
       int sum = 0;
       for ( int slot = 0 ; slot <= m_numSuits; slot++){
           sum += getNumSlots(slot);
       if ( m_debug)  System.out.println("Key: " + m_key + ", total number of slots found is: " + Integer.toString(sum));
       return sum;
   String getKey() { return m_key; }
   double calcSurvivalProb(){
       if ( m_totalSlots == 0)
           return 1.0; // if we reach the end and there are no slots left, then we have survived.
       double prob = 0;
       for     ( int slot = 0; slot <= m_numSuits; slot++){
           for ( int card = 1; card <= m_numSuits; card++){
                  double probForCard = calcProbForCard(slot, card);
                  prob += probForCard;
                  if ( m_depth < 3) {
                      System.out.println(  "Depth: "       + Integer.toString(m_depth)
                                         + ", slot: "      + Integer.toString(slot)
                                         + ", card: "      + Integer.toString(card)
                                         + ", num slots: " + Integer.toString(m_totalSlots)
                                         + ", prob: "      + Double.toString (probForCard)   );
       if ( m_debug) {
          System.out.println("Found prob to be: " + String.format("%1.14f",prob) + "\n");
       return prob;
   public double calcProbForCard(int slot, int card){
       if ( card == 0)
           return 0;

       int count = getCount(slot, card);
       if ( count == 0)
           return 0;
       int numCards = count * card;
       double probOfCardChoice = (double) numCards / (double) m_totalSlots;
       double probOfSlot = 0.0;
       for     ( int destSlot = 1; destSlot <= m_numSuits; destSlot++){
           for ( int destCard = 0; destCard <= m_numSuits; destCard++){
                  int destCount = getCount(destSlot, destCard);
                  if ( (slot == destSlot) && (card == destCard))
                      destCount--;  // a card cannot go into its own slot.
                  int availableSlots = destSlot * destCount;

                  if ( availableSlots > 0){
                      double probOfContinuedSurvival = getProbOfContinuedSurvival(slot, card, destSlot, destCard);
                      if ( probOfContinuedSurvival > 0) {          
                          probOfSlot += probOfContinuedSurvival * probOfCardChoice *(double) availableSlots /(double) m_totalSlots;
                          if ( probOfSlot > 1.0000001) { // We don't have >= 1.0 to allow a small rounding error
                              System.out.println("ERROR: Prob is too big: " + Double.toString(probOfSlot));
       if ( m_debug) System.out.println("Found prob of slot to be: " + Double.toString(probOfSlot));
       return probOfSlot;
   double getProbOfContinuedSurvival(int sourceSlot, int sourceCard, int destSlot, int destCard){
          String keyForChosenCard   = adjustKeyForTakenCard (sourceSlot, sourceCard, m_key);
          String keyForContinuation = adjustKeyForCardInSlot(destSlot,   destCard,   keyForChosenCard);   
          // check if state obj is in map
          Double contProbDouble = m_map.get(keyForContinuation);
          double contProb;
          if ( contProbDouble == null ){
              State contState = new State(keyForContinuation, m_depth + 1);
              contProb        = contState.getSurvivalProb();
              if (    ( m_stepsBetweenCaching == 0 )
                   || (m_rnd.nextInt(m_stepsBetweenCaching) == 0 )) { // we'll only insert one in n key,value pairs to the map.
                 contProbDouble  = new Double( contProb);
                 m_map.put(keyForContinuation, contProbDouble); // we store it for later use
                   if ( m_map.size() % 10000 == 0)
                    System.out.println(   "Map elm: "+ Integer.toString(m_map.size())
                                       + ", key: "  + keyForContinuation
                                      + ", prob: " + Double.toString(contProbDouble));
                if ( m_debug)
                    System.out.println(  "Added new key to map: " + keyForContinuation
                                          + " that is item: " + Integer.toString(m_map.size()) );
          } else {
              contProb = contProbDouble.doubleValue();

          if ( m_debug) {
             System.out.println("Key: " + keyForContinuation + ", found cont prob: " + Double.toString(contProb));
          return contProb;
   double getSurvivalProb() {
       if ( m_survivalProb == -1) // i.e. not yet set
            m_survivalProb = calcSurvivalProb();
       return m_survivalProb;
   int getCount(int slot, int card){
       return ( getCount(slot, card, m_key));            
   static int getCount(int slot, int card, String key){
       int    index = getIndex(slot, card);
       // System.out.println("Key: " + key + ", index: " + Integer.toString(index));
       String c     = key.substring(index, index + 1);
       return charToInt(c);
       // return ( Integer.parseInt(c, 16));            
   public int getNumSlots(int slot){
       if ( slot == 0)
           return 0;
       int sum = 0;

       for ( int card = 0; card <= m_numSuits; card++)
           sum += slot * getCount(slot, card);
       return sum;      
   int getTotalSlots(){
       return m_totalSlots;
   public static void printKey(String key){
       if ( key == null){
           System.out.println("Key is null.");

       System.out.println("key: " + key);

       String header = "   Card ";
       for( int j = 0; j <= m_numSuits; j++){
           header += Integer.toString(j) + " ";
       for ( int slot = 0; slot <= m_numSuits; slot++){
           String str = "Slot " + Integer.toString(slot) + ": ";
           for ( int card = 0; card <= m_numSuits; card++){
               int index = getIndex(slot, card);
               str += key.substring(index, index+1) + " ";
   public static int getIndex(int slot, int card){
       int index = slot * (m_numSuits + 1) + card;
       System.out.println(    "For slot: "   + Integer.toString(slot)
                           + " and card: "   + Integer.toString(card)
                           + " have index: " + Integer.toString(index));
       return (index);
   public static String adjustKeyForTakenCard(int slot, int card, String key){
       if( card == 0)
           return null; // no cards to give

       String decrementedCurrent = adjustKey(slot, card   , key,                -1);
       return                      adjustKey(slot, card -1, decrementedCurrent, +1);
   public static String adjustKeyForCardInSlot(int slot, int card, String key){
       if ( slot == 0)
           return null; // have no slot
       String adjKey  = adjustKey(slot    , card, key   , -1);
       adjKey         = adjustKey(slot - 1, card, adjKey, +1);          
       for( int i = m_numSuits; i > 1; i--){
           int countWithZeroCards = getCount(i,0, adjKey );
           if( countWithZeroCards > 0){
               adjKey = adjustKey(i,0, adjKey, -countWithZeroCards);
               adjKey = adjustKey(1,0, adjKey, countWithZeroCards * i);

           int countWithZeroSuits = getCount(0,i, adjKey );
           if( countWithZeroSuits > 0){
               adjKey = adjustKey(0, i, adjKey, -countWithZeroSuits);
               adjKey = adjustKey(0, 1, adjKey,  countWithZeroSuits * i);
       int zeroZeroCount         = getCount(0,0, adjKey);
       return adjustKey( 0,0, adjKey, - zeroZeroCount);
   public static String adjustKey(int slot, int card, String key, int adj){
       if ( key == null)
           return null;
       int count = getCount(slot, card, key);
       if ( 0 > count + adj){
           System.out.println("Cannot adjust below zero for slot" );
           return null;
       } else if ( count + adj > 15){
           System.out.println("Cannot adjust above 1 char hex limit.");
           return null;       
       } else {
           int    index   = getIndex(slot, card);
           String start   = key.substring(0, index );
           String end     = key.substring(index + 1);
           String current = intToChar(count+ adj);
           return ( start + current + end);          
   public static String intToChar(int i ){
       char c = (char)(i+48);
       return "" + c;
   public static int charToInt(String s){
       char c = s.charAt(0);
       int  i = (int)c - 48;
       if ( m_debug) System.out.println("Char: " + s + " is deemed to be: " + Integer.toString(i));
       return (int) i;


And here's the Java code for the Monte Carlo:

import java.util.Random;

public class RanksAndSuitsSurvival {

    private final long   m_simsPerBatch = 1_000_000L;
    private final int    m_numBatches   = 100;

    private final int    m_numSuits     = 4;   // should be  4
    private final int    m_numRanks     = 13;  // should be 13
    private       int    m_numCards;
    private       int[]  m_hand;
    private       Random m_rnd;
    private final int    m_rndSeed      = 1;
    public static void main ( String[] args){
        RanksAndSuitsSurvival  rass = new RanksAndSuitsSurvival();
    public RanksAndSuitsSurvival(){
        m_rnd      = new Random(m_rndSeed);
        m_numCards = m_numSuits * m_numRanks;
        m_hand     = new int[m_numCards];
        for ( int i = 0; i < m_numCards; i++)
            m_hand[i] = i % m_numRanks;
    public void calc(){
        long startTime = System.currentTimeMillis();
        long survivors = 0;
        // The only reason for the nested 'for' loops rather than a single 'for' loop
        // is because we wanted to print out the progress intermittently and we didn't
        // want to introduce another 'if' inside the main 'for' loop for reasons of speed.
        for ( int batch = 1; batch <= m_numBatches; batch++ ) {
            for ( long counter = 0; counter < m_simsPerBatch; counter ++){
                if (survived())  
            System.out.println("Have now completed batch: " + Integer.toString(batch) + ", after "
                               + Double.toString((double)( System.currentTimeMillis() - startTime) * 0.001) + " seconds");
        double numSims = (double) m_numBatches * (double) m_simsPerBatch;
        double prob    = (double) survivors              / numSims;
        double stdDev  = Math.sqrt( (1.0 - prob ) * prob / numSims);

        long   endTime = System.currentTimeMillis();
        printoutResults(survivors, numSims, prob, stdDev, endTime - startTime);
    public void printoutResults(long survivors, double numSims, double prob, double stdDev, long elapsedTimeMS){
        System.out.println("\nFound " + Long.toString(survivors) + " survivors out of " + Double.toString(numSims) );
        System.out.println("So the survival probability is estimated to be: "           + Double.toString(prob     ) );
        System.out.println("with a standard deviation of:                   "           + Double.toString(stdDev   ) );
        System.out.println("Time elapsed: " + Double.toString((double)( elapsedTimeMS) * 0.001d) + " seconds");
    public boolean survived(){
         shuffleHand();     // will shuffle m_hand
         return checkHand();
    // Implementing Fisher–Yates shuffle
    public void shuffleHand(){         
        for (int i = m_hand.length - 1; i > 0; i--)  {
          int index     = m_rnd.nextInt(i + 1);
          // Simple swap
          int a         = m_hand[index];
          m_hand[index] = m_hand[i];
          m_hand[i]     = a;
    // Returns true when the hand 'survived'.
    public boolean checkHand(){      
           for (int i = 0; i < m_hand.length; i++){
               if ( ((m_hand[i] - i) % m_numRanks) == 0 )
                      return false; // did not survive
           return true;  // survived              

Sunday, May 29, 2016

Space Elevator Calculations

In this post we use the equations derived in the previous posts and allow the user to plug in some numbers.
In particular, when a cable thickness has been chosen to have the optimal thickness, we determine the mass of satellite required to achieve the given tension at the earth's surface.
We also work out the total mass of the cable.

Cable Specific Strength M Yuri
Safety factor -
Tension at earth's surface Newtons
Satellite height above earth km

Total Cable mass - kg
Satellite Mass - kg
Tension at Satellite - Newtons

Saturday, May 7, 2016

Lifting a cable above the equator

Suppose you were at the equator and decided to hold a cable at some height above the earth's surface. The cable dangles vertically down below, touching the earth, but it is not anchored there. As you go higher and higher, you would stay above the same point on the earth. I.e. the top of the cable would be geostationary as the earth spins.
If the cable were not made of a very strong material then as you tried to go higher, it would eventually break.
It turns out that if you have a cable of specific strength 48.5 M Yuri, then you'd be able to reach all the way up to a geo-stationary satellite (35,797km up).
If you reach that height, then you'll be able to go on out to about 143,937 km without requiring a stronger cable.
Something a tad surprising ( to me at least) happens 143,937 km up. You will no longer need to hold the top end. The tension at both the top and the bottom will be zero. The maximum tension will be at the height of a geostationary satellite orbit.
A cable that long, will be able to support itself. The gravitational attraction is exactly the required centrifical force to maintain a circular orbit.
If you then attempted to use a longer cable and go higher, the cable would lift off the ground. In other words the net force would be upwards. So if you want to go out further, then you'd need to anchor one end to the surface of the earth.

I've made lots of statement there, now lets start to justify some of that mathematically.
This post is a follow on from the previous one: Polar Escape Strength Threshold .
We'll use the same variables as defined there.
The cable is vertical and as the earth spins, the top of the cable remains over the same point on the equator. So the cable is geostationary and hence the net force acting on a small section of cable is the centrifical force: \[ \delta m\ \omega_s^2 r = \frac{G M_e \delta m}{r^2} - \delta T \] but we know: \[\delta m = \lambda \ \delta r\] and so: \[ \delta T = \delta r \left( \frac{G M_e \lambda}{r^2} - \lambda \omega_s^2 r\right) \] We can integrate that and use the fact that the tension in the cable at the earth's surface is zero. \(T(r_e) = 0 \).
We find:
\[ T(R) = G M_e \lambda \left( \frac{1}{r_e} - \frac{1}{R} \right) + \frac{\lambda \omega_s^2}{2} \left( r_e^2 - R^2 \right) \ \ \ \ \ \ \ \ \ (eqn 1) \] We define the specific strength Y to be the tension at breaking point divided by the mass per unit length: \[ Y = \frac{T(R_b)}{\lambda} = G M_e \left( \frac{1}{r_e} - \frac{1}{R_b} \right) + \frac{ \omega_s^2}{2} \left( r_e^2 - R_b^2 \right) \]
We now return to eqn 1 above. We can differentiate that with respect to R to find where is the maximum tension: \[ 0=\frac{\partial T}{\partial R}= \frac{G M_e \ \lambda}{R^2}- \lambda \ \omega_s^2 R \] We find that the maximum tension occurs at the geostationary orbit: \[ R_g = \left( \frac{G M_e}{\omega_s^2} \right)^{1/3} \] The threshold specific strength that would enable the cable to reach this max tension is: \[ Y_E = GM_e \left( \frac{1}{r_e} - \frac{1}{R_g}\right) + \frac{\omega_s^2}{2} \left( r_e^2 - R_g^2\right) \approx 48.5 M Yuri \] Returning to eqn 1, we see that \(T(R)=0\) when \(R=r_e\), in other words the tension is zero at the earth's surface. The tension reaches a maximum at the orbit of a geostationary satellite. It then decreases again. But we can find R such that \(T(R)=0\) \[ 0 = G M_e \left( \frac{1}{r_e} - \frac{1}{R_s} \right) + \frac{ \omega_s^2}{2} \left( r_e^2 - R_s^2 \right) \] That implies: \[ 0 = R_s^3 \ \frac{ \omega_s^2} {2} - R_s \left( \frac{ \omega_s^2 \ r_e^2 }{2} + \frac{G M_e }{r_e} \right) + G M_e \] That is cubic in \(R_s\). We solve that numerically and we find that the distance from the centre of the earth to the point on the cable where the tension is zero is: \[R_s \approx 1.5030 \times 10^8 m\] which 143,937 km from the surface of the earth.
A cable which is that long and is placed vertically above a point on the equator has zero tension at both ends. It neither drifts off into space nor falls to the earth. It just remains where it is.

Returning once again to my (current) favorite equation i.e. eqn 1, we see that it implies that for \(R>R_s\) the tension becomes negative. But we can't allow negative tension since we can't push with our cable. Without being held down the cable would rise up. So we would need an anchor at the surface of the earth. With that anchor we no long would have the conditon \(T(r_e) =0 \) . That was used in the derivation of eqn 1. And so eqn 1 would not be valid.
What we find is that as the cable gets longer beyond 143,937 km, the tension would keep increasing. An infintely long cable would require infinite tension at the anchor.

Polar Escape Strength Threshold

Suppose we had a cable with mass per unit length \(\lambda\).
So the mass of a small section of length \(l\) is: \[m \ = \lambda \ l \] If the cable is hanging vertically in a constant gravitational field \(g\), then the tension at the top will be the \[T = m \ g = \lambda \ l \ g \] We define the specific strength to be the tension at breaking point divided by the mass per unit length: \[Y=\frac{T_b }{\lambda} \] This is measured in units of: \[[Yuri] = \frac{[Newton] [ meter]} {[kilogram]^2} \] With constant \(g\), a cable with specific strength \(Y\) will be able to support up to its breaking length: \[l_b =\frac{T_b }{\lambda \ g} = \frac{Y}{g}\]
Now how about the case when the gravitational follows a one over r squared law. Where r is the distance from the centre of the earth. Lets start by considering a vertical cable that isn't moving: A small section from \(r\) to \((r + \delta r)\) has forces acting up and down. If it is not accelerating, then the forces are balanced: \[F_{up}=F_{down}\] \[T(r + \ \delta r )= T(r)+ \frac{G M_e \delta m }{r^2} \] But we know the mass: \[\delta m = \lambda \ \delta r\] So \[\delta T = \frac{G M_e \ \lambda \ \delta r }{r^2} \] We can integrate both sides of that for a cable that goes from the earth's surface \(r=r_e\) to \(r=R=r_e + h \) a height h above the earth.
We note that the end that touches the surface is not anchored. So the tension: \[T(r_e)=0\] So \[T(R)=G M_e \ \lambda \left( \frac{1}{r_e} - \frac{1}{R} \right)\] Suppose the cable breaks at \(R_b\), returning to the definition of the specific strength: \[Y=\frac{T(R_b)}{\lambda} \] Thus: \[Y=G M_e \ \left( \frac{1}{r_e} - \frac{1}{R_b} \right)\] rearranging we find: \[R_b = \left( \frac{1}{r_e} - \frac{Y}{G M_e}\right)^{-1}\]
So the breaking height above the earth's surface: \[h_b = R_b - r_e = \left( \frac{1}{r_e} - \frac{Y}{G M_e}\right)^{-1} - r_e\] Now when: \[\frac{1}{r_e} - \frac{Y}{G M_e} = 0 \] we can reach an infinite height! In other words the cable can completely escape the earth's gravitational field.
The polar escape strength threshold is: \[Y_p = \frac{G M_e}{r_e} \approx 62.6 M Yuri \] We use the word polar here since is most relevant for a cable above either the north or south pole. If we were to have a geostationary cable above the pole, then we could ignore the spin of the earth. On the other hand for a geostationary cable above the equator, the centrifical acceleration should not be ignored. I'll show the details of how to deal with that in my next post.

Monday, May 2, 2016

Sky Hooks

If we want to make it easy(ish) to get into space, then it would be very helpful to have a sky hook. Once we have that in place we can build an elevator.
Lets look into the physics / maths of how it could be done.
Suppose a small object of mass m is travelling in a cirle round a sphere. The object is a distance R from the centre of the sphere.
In vector notation, write its position as: \[\textbf{p}=R\left( \begin{array}{c} \sin(\omega t) \\ \cos(\omega t) \end{array} \right)\] We can then differentiate that twice with respect to time and we find the acceleration: \[\textbf{a}=\frac{\partial^2\textbf{p}}{\partial t^2}=-\omega^2 R \left( \begin{array}{c} \sin(\omega t) \\ \cos(\omega t) \end{array} \right) = - \omega^2 \textbf{R} \]
So, when an object is moving in a circle, there is an acceleration towards the centre of that circle.
But we know that the force: \[\textbf{F}=m \textbf{a}\] So if we want to maintain the object rotating about the centre, then we need apply a force toward the centre of magnitude: \[F=m \omega^2 R\] That is called the centrifical force.
For a satellite travelling round a planet, there is a gravitational attraction: \[F_g=\frac{G M_e m}{R^2} \] Now, if the gravitational attraction is equal to the centrifical force, then the satellite can travel in a circular orbit: \[ m \omega^2 R = \frac{G M_e m}{R^2} \] We can solve for \(R\): \[R = \left( \frac{G M_e}{\omega^2} \right)^\frac{1}{3}\]
For a geostationary orbit, as the earth spins on its axis, the satellite will remain above the same point on the equator, we set: \[ \omega_s \ n_d s_d =2 \pi n_r \] where the number of seconds in a day: \(s_d = 24 * 3600\)
the number of days in a year \(n_d\approx365.2422\)
the number of revolutions of the earth about its own axis in a year \(n_r=n_d+1\)
(The +1 there comes from the fact that the earth is also revolving round the sun, for details see the Sidereal time article in Wikipedia).
And we find: \[\omega_s \approx 7.29211 \times 10^{-5} \ rad . s^{-1}\] and plugging that into the equation for \(R\) we find: \[R_g \approx 4.2164 \times 10^7 m = 42,164 km \] That's the distance from the centre of the earth to a geostationary satellite.
It is 35,786 km above sea level.

Suppose now we were to consider a different geostationary satellite, this time with a cable attached. The other end of the cable is anchored to the ground.
Now suppose the cable is perfectly vertical and there is a tension \(T(r)\) in the cable at r. The net force on the satellite should be the centrifical force: \[T (R) + \frac{G M_e M_s}{R^2}= M_s \omega_s^2 R \ \ \ \ \ \ \ \ \ \ \ \ (eqn 1)\] Now we consider a small section from \(r\) to \(r+\delta r\) above the centre of the earth.
Suppose the cable had constant mass per unit length \(\lambda\). The mass of the small section is: \[\delta m = \lambda \delta r\] The gravitational force on that section is: \[\frac{G M_e \lambda \delta r}{r^2}\] We write the net force on the small section from \(r\) to \(r+\delta r\) due to the change in tension as: \(\delta T(r)\).
For the cable section to remain in circular orbit we need the net force to be the centrifical force. Hence: \[\frac{G M_e \lambda \delta r}{r^2} - \delta T(r) = \lambda \delta r \omega_s^2 r\] \[\delta T(r) = \frac{G M_e \lambda \delta r }{r^2} - \lambda \delta r\omega_s^2 r\] \[ T(R) - T(r) = \int^R_r dT(r') = G M_e \lambda \left( \frac{1}{r} - \frac{1}{R} \right) + \frac{1}{2} \lambda \omega_s^2 \left( r^2 - R^2 \right) \] But we already have an expression for \(T(R)\) above (eqn 1). So, after a bit of re-arranging we find the tension in the cable to be: \[ T(r) = G M_e \left( \frac{\lambda}{R} - \frac{\lambda}{r} -\frac{M_s}{R^2} \right) + \frac{1}{2} \lambda \omega_s^2 \left( R^2 - r^2 \right) +M_s \omega_s^2 R \ \ \ \ \ ( eqn 2)\] Let \(r_m\) be the position of the maximum tension in the cable. \[0=\frac{\partial T}{\partial r}(r_m)\] We find that occurs at the orbit of an untethered geostationary satellite: \[r_m = R_g = \left( \frac{G M_e}{\omega_s^2} \right)^\frac{1}{3}\] You can't push with our rope!
The cable has tension but does not offer vertical support.
Mathematically we write that as: \[T(r) \geq 0\ \ \ \ \ \ \ \ \forall\ r : r_e \leq r \leq R \] To support the mass of the cable, the satellite must be further out than \(R_g\)
We can use eqn 2 to determine the tension in the cable at the earth's surface \( T(r_e)\).
That's the maximum force the climber is allowed exert so as not to pull down the satellite.

Thin in the right places

Suppose the breaking tension of a cable is \(T_b\) then we define the specific strength \(Y\) to be: \[Y = \frac{T_b}{\lambda} \] We define an optimal non-uniform cable to be one where at every point it is just thick enough to safely withstand the tension at that point.
Mathematically we write that as: \[ \lambda(T ) = \frac{T\ \alpha}{Y} \ \ \ \ \ \ \ \ \ (eqn 3) \] where our \(\alpha\) is the safety factor with: \[1 \leq \alpha \] With a little algebra we find: \[\alpha = \frac{T_b}{T}\] and so we know that \[T \leq T_b\] in other words the tension in the cable is at or below the tension at which the cable breaks. A higher \(\alpha\) is safer.
But now the mass of a small section from \(r \rightarrow r + \delta r\) is: \[\delta m = \delta r \lambda \left( T (r) \right) = \delta r \ \frac{\alpha T(r)}{Y}\] Just as before, to keep the cable section in a circular orbit the net force should equal the centrifical force. The net force is made up of a combination of the change in tension and the gravitational force. \[T (r) - T (r + \delta r) + \frac{G M_e \ \delta r \ \alpha T (r)}{Y \ r^2} = \frac{\omega_s^2 r \ \delta r \ \alpha T (r)}{Y}\] We can rearrange that: \[ \frac{\delta T}{T} =\delta r \left[ \frac{G M_e \alpha}{Y r^2} - \frac{\omega_s^2 r \alpha}{Y} \right] \] We now integrate both sides between \(r_1\) and \(r_2\) where \[ r_e \leq r_1 < r_2 \leq R \] And we find: \[ ln \left( T(r_2) \right) -ln \left( T(r_1) \right) =G M_e \frac{\alpha}{Y} \left[ \frac{1}{r_1} - \frac{1}{r_2}\right] + \frac{\omega_s^2 \alpha}{2 Y} \left( r_1^2 - r_2^2 \right) \] and we can write that as: \[ \frac{T(r_2)}{T(r_1)} =exp \left( G M_e \frac{\alpha}{Y} \left[ \frac{1}{r_1} - \frac{1}{r_2}\right] + \frac{\omega_s^2 \alpha}{2 Y} \left( r_1^2 - r_2^2 \right) \right) \ \ \ \ \ (eqn 4) \] Now bear in mind that when deriving that equation we have used an optimally thin cable, (see equation 3). At all points the cable mass per unit length has been chosen to be just thick enough to be safe.
Now if we know the tension at any one spot from ground all the way up to the satellite, then we can use equation 4 to work at the tension at all other locations along the cable. We could specify the tension at the ground anchor and then work out the tension all the way up to the satellite.
We consider 3 variables:
\(M_s\): the mass of the satellite.
\(R\): the distance of the satellite from the centre of the earth.
\(T(r_e)\): the tension at the anchor on the surface of the earth.

Now if we have any two of those we can work out the third.
For example we could specify the distance of the satellite and the tension at the ground anchor and we can work out the mass of the satellite required.
To do that, we would use equation 4 to work out the tension in the cable when it meets the satellite: \(T(R)\).
But we also know that at the satellite, the centrifical force is equal to the gravity plus the tension, so: \[ M_s \omega_s^2 R = T(R) + \frac{G M_e M_s}{R^2} \ \ \ \ (eqn 5)\] and so we have a formula for the satellite mass: \[M_s = \frac{T(R)R^2}{ \omega_s^2 R^2 - G M_e}\]
Alternatively if we know the mass of the satellite (\(M_s\)) and the distance of the satellite from the centre of the earth \(R\) then we can use equation 5 to evaluate \(T(R)\) and then equation 4 to find the resultant tension at the ground anchor: \(T(r_e)\)

And if we have \(T(r_e)\) and \(M_s\) and we want to evaluate R, then we can use equations 4 and 5 and with the aid of a numerical solver we can evaluate R.

Which ever way we do it, when the tension has been determined, we can return to equation 3 to find the mass per unit length and then integrate to find the total mass of cable required: \[M_c = \int^R_{r_e}dm(r) = \frac{\alpha}{Y} \int^R_{r_e}T(r)dr\] That integral can be done numerically.

I've written a calculator using Google sheets and I've published it as a template. To have a look, log-in using a google (gmail) account and click on this link.

Things to consider:
stability to perturbations including wind
could attached kites be helpful in reducing the maximum tension?
collisions: birds, planes, other satellites, space debris.
safety if it snaps. One option would be for the anchor at the earth to release as soon as a break has been detected.

In the calculations above we've used the following:
Gravitational Constant: \(G \approx 6.67 \times 10^{-11} \frac{m^3}{s^2 kg}\)
Mass of earth: \(M_e=5.97 \times 10^{24} kg\)
Radius of earth: \(r_e = 6.37 \times 10^6 m\)