Fund Managers Example


This example is from the Tech Tips section of the Java Developer Connection pages. The idea is to show why it is required to synchronize the use of a shared resource in Java (the original article can be found here).

The Java program consists of a number of threads that represent fund managers that co-manage the same fund by moving money around between the stocks in the fund, but their access to the fund is not synchronized hence if the program is run under just the right circumstances then it may be possible to see the amount of money in the fund either decrease (i.e. very unhappy investors) or increase (i.e. very happy investors).
Unfortunately, many things influence the outcome of the program, hence you need many threads (150 is the default), that get started with different priorities,  "sleep" for some time, and even then you need to run the program without JIT compilation etc.

We decided to use JPF to find the error that causes the total amount of money in the fund to change. The code that we used to find the error has been changed slightly from the original (it is shown for comparison), since the example is also used to show "throughput" results etc., which we were not interested in. More importantly, we changed the number of threads from 150 down to 2, and a few other default values to smaller values that will not influence the execution in any significant way. Changing the number of threads was done since, firstly, model checking unlike testing does not require more than 2 threads to show the error, and secondly, model checking suffers from the state-explosion problem hence the less threads that are used the quicker one will see the error. Lastly note that the "Verify" class is used as an interface to the model checker: Verify.assert(expr) checks whether the expression evaluates to true; Verify.beginAtomic() and Verify.endAtomic() wraps code to be executed in one step by the model checker; Verify.random(n) allows a nondeterministic value between 0 and n (inclusive) to be returned.

Initially we checked the program by just calling "checkSystem()" after all the initialization code in TestFundManagers, but this produced a spurious error. The problem was that the total amount of money in the fund would not be equal to the total amount of initial money if the test is done during a transfer of money (since the money is in transit between 2 stocks).  Therefore, we added the somewhat complicated looking code to make sure the "checkSystem()" call was only done when none of the Fund Managers were trading. Note that during the testing of the program no such precautions were done, hence even if the total amount of money changes when running the program this might be due to the fact that checkSystem is done during a transfer.

Next, it is worth pointing out that this program has infinitely many states, since there are no checks to ensure the funds always contain a positive amount of money. This is a major problem for model checking. However, since we are essentially bug-hunting here, we can limit the "depth" to which JPF searches for the error. Also, the error is caused by interleaving at the bytecode level, hence JPF must be run in a mode where it does model checking on the bytecode level (by default we use the statement level, although we still execute the statements only at bytecode level). Lastly, we switch on partial-order reductions to find the error faster. JPF finds the error in 23 seconds after visiting 13458 states (on a Sparc ULTRA60 with 512 mb) (model checker output). When we check the code on the statement level then the error is not found which, is the correct result.

In the Tech Tips article it is suggested that wrapping the transfer of money and the "checkSystem()" call in synchronized blocks will remove the error.  Below the changes are indicated in green.
 

class Stocks implements FundConstants {
  static int[] balances = new int[noOfStocks];
  static {
    for (int n=0; n<noOfStocks; n++) {
            balances[n] = initialBalance;
    }
  }
  static void transfer(Transfer t) {
    synchronized(balances) {
      balances[t.fundFrom] -= t.amount;
      balances[t.fundTo] += t.amount;
    }
  }
  static void checkSystem() {
    synchronized(balances) {
      int actual = 0;
      for (int n=0; n<noOfStocks; n++) {
         actual += balances[n];
      }
      Verify.assert(actual == totalMoneyInUniverse);
    }
  }
}
When checking the code with JPF we however find that the assertion is still violated! The problem now is much more subtle than before, since it has to do with the initialization of static variables when a class is loaded that is interleaved with code from another thread that actually starts using the variables. More precisely, the initial thread executes the "Stocks.checkSystem()" call, and since this is the first reference to the "Stocks" class it is loaded and its static variables will be initialized, however before the balances are assigned their initialized values (see code above) one of the Fund Manager threads starts to do a transfer from
a uninitialized balance, after doing the transfer the balance is assigned its initial value. The model checker output again illustrates the problem. There is another problem caused by the interleaving of class initialization, that JPF also finds. This is left as an exercise to the reader, but those that can't wait we have the answer here. Note that in the first version of the code, we used a "Stocks.init()" method to get around these problems, and show the "intended" error. In the code shown below we also use the init method.

The last part of the Tech Tips article is devoted to showing how the speed of the code can be improved by doing fine-grained locking of the balances array. The following code is suggested.
 

class Stocks implements FundConstants {
  static int[] balances = new int[noOfStocks];
  static Object[] locks = new Object[noOfStocks];

  static void init() {
    for (int n=0; n<noOfStocks; n++) {
      balances[n] = initialBalance;
      locks[n] = new Object();
    }
  }

  static void transfer(Transfer t) {
    int lo, hi;
    if (t.fundFrom < t.fundTo) {
      lo = t.fundFrom;
      hi = t.fundTo;
    } else {
      lo = t.fundTo;
      hi = t.fundFrom;
    }
    synchronized (locks[lo]) {
      synchronized (locks[hi]) {
         balances[t.fundFrom] -= t.amount;
         balances[t.fundTo] += t.amount;
      }
    }
  }

  static int sumHelper (int next) {
    synchronized (locks[next]) {
      if (next == (noOfStocks-1)) {
        return balances[next];
      } else {
       return balances[next] + sumHelper(next+1);
      }
    }
  }

  static void checkSystem() {
    int actual = 0;
    actual = sumHelper(0);
    Verify.assert(actual == totalMoneyInUniverse);
  }
}


It is mentioned in the article that a deadlock can be caused if we switch the "<" symbol in the transfer method to a ">". The problem is that the transfer method and the checkSystem method will then take locks in different orders. When running JPF on this example it finds this deadlock straight away.