Threading Fun

So we had an interesting threading bug at work today which I thought I'd write up here as I hadn't seen this specific problem before (note I didn't write this code, I just helped debug it).  The set up was a basic single producer single consumer arrangement something like this:

#include <Windows.h>
#include <cassert>

volatile LONG gAvailable = 0;

// thread 1
DWORD WINAPI Producer(LPVOID)
{
    while (1)
    {
        InterlockedIncrement(&gAvailable);
    }
}

// thread 2
DWORD WINAPI Consumer(LPVOID)
{
    while (1)
    {
        // pull available work with a limit of 5 items per iteration
        LONG work = min(gAvailable, 5);

        // this should never fire.. right?
        assert(work <= 5);

        // update available work
        InterlockedExchangeAdd(&gAvailable, -work);
    }
}

int main(int argc, char* argv[])
{
    HANDLE h[2];

    h[0] = CreateThread(0, 0, Consumer, NULL, 0, 0);
    h[1] = CreateThread(0, 0, Producer, NULL, 0, 0);

    WaitForMultipleObjects(2, h, TRUE, INFINITE);

    return 0;
}

So where's the problem? What would make the assert fire?

We triple-checked the logic and couldn't see anything wrong (it was more complicated than the example above so there were a number of possible culprits) and unlike the example above there were no asserts, just a hung thread at some later stage of execution.

Unfortunately the bug reproduced only once every other week so we knew we had to fix it while I had it in a debugger. We checked all the relevant in-memory data and couldn't see any that had obviously been overwritten ("memory stomp" is usually the first thing called out when these kinds of bugs show up).

It took us a while but eventually we checked the disassembly for the call to min(). Much to our surprise it was performing two loads of gAvailable instead of the one we had expected!

This happened to be on X360 but the same problem occurs on Win32, here's the disassembly for the code above (VS2010 Debug):

// calculate available work with a limit of 5 items per iteration
LONG work = min(gAvailable, 5);

// (1) read gAvailable, compare against 5
002D1457  cmp         dword ptr [gAvailable (2D7140h)],5
002D145E  jge         Consumer+3Dh (2D146Dh)

// (2) read gAvailable again, store on stack
002D1460  mov         eax,dword ptr [gAvailable (2D7140h)]
002D1465  mov         dword ptr [ebp-0D0h],eax
002D146B  jmp         Consumer+47h (2D1477h)
002D146D  mov         dword ptr [ebp-0D0h],5

// (3) store gAvailable from (2) in 'work'
002D1477  mov         ecx,dword ptr [ebp-0D0h]
002D147D  mov         dword ptr [work],ecx

The question is what happens between (1) and (2)? Well the answer is that any other thread can add to gAvailable, meaning that the stored value at (3) is now > 5.

In this case the simple solution was to read gAvailable outside of the call to min():

// pull available work with a limit of 5 items per iteration
LONG available = gAvailable;
LONG work = min(available, 5);

Maybe this is obvious to some people but it sure caused me and some smart people a headache for a few hours :)

Note that you may not see the problem in some build configurations depending on whether or not the compiler generates code to perform the second read of the variable after the comparison. As far as I know there are no guarantees about what it may or may not do in this case, FWIW we had the problem in a release build with optimisations enabled.

Big props to Tom and Ruslan at Lucas for helping track this one down.