This example
is from the Tech
Tips section of the Java
Developer Connection pages. The idea is to show why it is required
to synchronize the use of a shared resource in Java (the original article
can be found here).
The Java program consists of a number of threads that represent fund
managers that co-manage the same fund by moving money around between the
stocks in the fund, but their access to the fund is not synchronized hence
if the program is run under just the right circumstances then it may be
possible to see the amount of money in the fund either decrease (i.e. very
unhappy investors) or increase (i.e. very happy investors).
Unfortunately, many things influence the outcome of the program, hence
you need many threads (150 is the default), that get started with different
priorities, "sleep" for some time, and even then you need to run
the program without JIT compilation etc.
We decided to use JPF to find the error that causes the total amount of money in the fund to change. The code that we used to find the error has been changed slightly from the original (it is shown for comparison), since the example is also used to show "throughput" results etc., which we were not interested in. More importantly, we changed the number of threads from 150 down to 2, and a few other default values to smaller values that will not influence the execution in any significant way. Changing the number of threads was done since, firstly, model checking unlike testing does not require more than 2 threads to show the error, and secondly, model checking suffers from the state-explosion problem hence the less threads that are used the quicker one will see the error. Lastly note that the "Verify" class is used as an interface to the model checker: Verify.assert(expr) checks whether the expression evaluates to true; Verify.beginAtomic() and Verify.endAtomic() wraps code to be executed in one step by the model checker; Verify.random(n) allows a nondeterministic value between 0 and n (inclusive) to be returned.
Initially we checked the program by just calling "checkSystem()" after all the initialization code in TestFundManagers, but this produced a spurious error. The problem was that the total amount of money in the fund would not be equal to the total amount of initial money if the test is done during a transfer of money (since the money is in transit between 2 stocks). Therefore, we added the somewhat complicated looking code to make sure the "checkSystem()" call was only done when none of the Fund Managers were trading. Note that during the testing of the program no such precautions were done, hence even if the total amount of money changes when running the program this might be due to the fact that checkSystem is done during a transfer.
Next, it is worth pointing out that this program has infinitely many states, since there are no checks to ensure the funds always contain a positive amount of money. This is a major problem for model checking. However, since we are essentially bug-hunting here, we can limit the "depth" to which JPF searches for the error. Also, the error is caused by interleaving at the bytecode level, hence JPF must be run in a mode where it does model checking on the bytecode level (by default we use the statement level, although we still execute the statements only at bytecode level). Lastly, we switch on partial-order reductions to find the error faster. JPF finds the error in 23 seconds after visiting 13458 states (on a Sparc ULTRA60 with 512 mb) (model checker output). When we check the code on the statement level then the error is not found which, is the correct result.
In the Tech
Tips article it is suggested that wrapping the transfer of money and
the "checkSystem()" call in synchronized blocks will remove the error.
Below the changes are indicated in green.
When checking the code with JPF we however find that the assertion is still violated! The problem now is much more subtle than before, since it has to do with the initialization of static variables when a class is loaded that is interleaved with code from another thread that actually starts using the variables. More precisely, the initial thread executes the "Stocks.checkSystem()" call, and since this is the first reference to the "Stocks" class it is loaded and its static variables will be initialized, however before the balances are assigned their initialized values (see code above) one of the Fund Manager threads starts to do a transfer fromclass Stocks implements FundConstants {
static int[] balances = new int[noOfStocks];
static {
for (int n=0; n<noOfStocks; n++) {
balances[n] = initialBalance;
}
}
static void transfer(Transfer t) {
synchronized(balances) {
balances[t.fundFrom] -= t.amount;
balances[t.fundTo] += t.amount;
}
}
static void checkSystem() {
synchronized(balances) {
int actual = 0;
for (int n=0; n<noOfStocks; n++) {
actual += balances[n];
}
Verify.assert(actual == totalMoneyInUniverse);
}
}
}
The last part of the Tech
Tips article is devoted to showing how the speed of the code can be
improved by doing fine-grained locking of the balances array. The following
code is suggested.
class Stocks implements FundConstants {
static int[] balances = new int[noOfStocks];
static Object[] locks = new Object[noOfStocks];static void init() {
for (int n=0; n<noOfStocks; n++) {
balances[n] = initialBalance;
locks[n] = new Object();
}
}static void transfer(Transfer t) {
int lo, hi;
if (t.fundFrom < t.fundTo) {
lo = t.fundFrom;
hi = t.fundTo;
} else {
lo = t.fundTo;
hi = t.fundFrom;
}
synchronized (locks[lo]) {
synchronized (locks[hi]) {
balances[t.fundFrom] -= t.amount;
balances[t.fundTo] += t.amount;
}
}
}static int sumHelper (int next) {
synchronized (locks[next]) {
if (next == (noOfStocks-1)) {
return balances[next];
} else {
return balances[next] + sumHelper(next+1);
}
}
}static void checkSystem() {
int actual = 0;
actual = sumHelper(0);
Verify.assert(actual == totalMoneyInUniverse);
}
}
It is mentioned in the article that a deadlock can be caused if
we switch the "<" symbol in the transfer method to a ">". The problem
is that the transfer method and the checkSystem method will then take locks
in different orders. When running JPF on this example it finds this deadlock
straight away.