In-Scattering Demo

Posted: May 29th, 2010 | Tags: , , , | 2 Comments »

This demo shows an analytic solution to the differential in-scattering equation for light in participating media. It's a similar but simplified version of equations found in [1], [2] and as I recently discovered [3]. However I thought showing the derivation might be interesting for some out there, plus it was a good excuse for me to brush up on my \(\LaTeX\).

You might notice I also updated the site's theme, unfortunately you need a white background to make wordpress.com LaTeX rendering play nice with RSS feeds (other than that it's very convenient).

Download the demo here

The demo uses GLSL and shows point and spot lights in a basic scene with some tweakable parameters:

Background

Given a view ray defined as:

\[\mathbf{x}(t) = \mathbf{p} + t\mathbf{d}\]

We would like to know the total amount of light scattered towards the viewer (in-scattered) due to a point light source. For the purposes of this post I will only consider single scattering within isotropic media.

The differential equation that describes the change in radiance due to light scattered into the view direction inside a differential volume is given in PBRT (p578), if we assume equal scattering in all directions we can write it as:

\[dL_{s}(t) = \sigma_{s}L_{i}(t)\,dt\]

Where \(\sigma_{s}\)  is the scattering probability which I will assume includes the normalization term for an isotropic phase funtion of \(\frac{1}{4pi}\). For a point light source at distance d with intensity I we can calculate the radiant intensity at a receiving point as:

\[L_{i} = \dfrac{I}{d^2}\]

Plugging in the equation for a point along the view ray we have:

\[L_{i}(t) = \dfrac{I}{|\mathbf{x}(t)-\mathbf{s}|^2}\]

Where s is the light source position. The solution to (1) is then given by:

\[L_{s} = \int_{0}^{d} \sigma_{s}L_{i}(t) \, dt\]

\[L_{s} = \int_{0}^{d} \frac{\sigma_{s}I}{|\mathbf{x}(t)-\mathbf{s}|^2}\,dt\]

To find this integral in closed form we need to expand the distance calculation in the denominator into something we can deal with more easily:

\[L_{s} = \sigma_{s}I\int_0^d{\dfrac{dt}{(\mathbf{p} + t\mathbf{d} - \mathbf{s})\cdot(\mathbf{p} + t\mathbf{d} - \mathbf{s})}}\]

Expanding the dot product and gathering terms, we have:

\[L_{s} = \sigma_{s}I\int_{0}^{d}\frac{dt}{(\mathbf{d}\cdot\mathbf{d})t^2 + 2(\mathbf{m}\cdot\mathbf{d})t + \mathbf{m}\cdot\mathbf{m} }\]

Where \(\mathbf{m} = (\mathbf{p}-\mathbf{s})\).

Now we have something a bit more familiar, because the direction vector is unit length we can remove the coefficient from the quadratic term and we have:

\[L_{s} = \sigma_{s}I\int_{0}^{d}\frac{dt}{t^2 + 2bt + c}\]

At this point you could look up the integral in standard tables but I'll continue to simplify it for completeness.  Completing the square we obtain:

\[L_{s} = \sigma_{s}I\int_{0}^{d}\frac{dt}{ (t^2 + 2bt + b^2) + (c-b^2)}\]

Making the substitution \(u = (t + b)\), \(v = (c-b^2)^{1/2}\) and updating our limits of integration, we have:

\[L_{s} = \sigma_{s}I\int_{b}^{b+d}\frac{du}{ u^2 + v^2}\]

\[L_{s} = \sigma_{s}I \left[ \frac{1}{v}\tan^{-1}\frac{u}{v} \right]_b^{b+d}\]

Finally giving:

\[L_{s} = \frac{\sigma_{s}I}{v}( \tan^{-1}\frac{d+b}{v} - \tan^{-1}\frac{b}{v} )\]

This is what we will evaluate in the pixel shader, here's the GLSL snippet for the integral evaluation (direct translation of the equation above):

float InScatter(vec3 start, vec3 dir, vec3 lightPos, float d)
{
// light to ray origin
vec3 q = start - lightPos;

// coefficients
float b = dot(dir, q);
float c = dot(q, q);

// evaluate integral
float s = 1.0f / sqrt(c - b*b);
float l = s * (atan( (d + b) * s) - atan( b*s ));

return l;
}

Where d is the distance traveled, computed by finding the entry / exit points of the ray with the volume.

To make the effect more interesting it is possible to incorporate a particle system, I apply the same scattering shader to each particle and treat it as a thin slab to obtain an approximate depth, then simply multiply by a noise texture at the end.

Optimisations

  • As it is above the code only supports lights with infinite extent, this implies drawing the entire frame for each light.  It would be possible to limit it to a volume but you'd want to add a falloff to the effect to avoid a sharp transition at the boundary.
  • Performing the full evaluation per-pixel for the particles is probably unnecessary, doing it at a lower frequency, per-vertex or even per-particle would probably look acceptable.

Notes

  • Generally objects appear to have wider specular highlights and more ambient lighting in the presence of particpating media.  [1] Discusses this in detail but you can fudge it by lowering the specular power in your materials as the scattering coefficient increases.
  • According to Rayliegh scattering blue light at the lower end of the spectrum is scattered considerably more than red light.  It's simple to account for this wavelength dependence by making the scattering coefficient a constant vector weighted towards the blue component.  I found this helps add to the realism of the effect.
  • I'm curious to know how the torch light was done in Alan Wake as it seems to be high quality (not just billboards) with multiple light shafts.. maybe someone out there knows?

References

[1] Sun, B., Ramamoorthi, R., Narasimhan, S. G., and Nayar, S. K. 2005. A practical analytic single scattering model for real time rendering.

[2] Wenzel, C. 2006. Real-time atmospheric effects in games.

[3] Zhou, K., Hou, Q., Gong, M., Snyder, J., Guo, B., and Shum, H. 2007. Fogshop: Real-Time Design and Rendering of Inhomogeneous, Single-Scattering Media.

Related

[4] Engelhardt, T. and Dachsbacher, C. 2010. Epipolar sampling for shadows and crepuscular rays in participating media with single scattering.

[5] Volumetric Shadows using Polygonal Light Volumes (upcoming HPG2010)

{ 2 Comments » }


Threading Fun

Posted: May 24th, 2010 | 7 Comments »

So we had an interesting threading bug at work today which I thought I'd write up here as I hadn't seen this specific problem before (note I didn't write this code, I just helped debug it).  The set up was a basic single producer single consumer arrangement something like this:

#include <Windows.h>
#include <cassert>

volatile LONG gAvailable = 0;

// thread 1
DWORD WINAPI Producer(LPVOID)
{
	while (1)
	{
		InterlockedIncrement(&gAvailable);
	}
}

// thread 2
DWORD WINAPI Consumer(LPVOID)
{
	while (1)
	{
		// pull available work with a limit of 5 items per iteration
		LONG work = min(gAvailable, 5);

		// this should never fire.. right?
		assert(work <= 5);

		// update available work
		InterlockedExchangeAdd(&gAvailable, -work);
	}
}

int main(int argc, char* argv[])
{
	HANDLE h[2];

	h[0] = CreateThread(0, 0, Consumer, NULL, 0, 0);
	h[1] = CreateThread(0, 0, Producer, NULL, 0, 0);

	WaitForMultipleObjects(2, h, TRUE, INFINITE);

	return 0;
}

So where's the problem? What would make the assert fire?

We triple-checked the logic and couldn't see anything wrong (it was more complicated than the example above so there were a number of possible culprits) and unlike the example above there were no asserts, just a hung thread at some later stage of execution.

Unfortunately the bug reproduced only once every other week so we knew we had to fix it while I had it in a debugger. We checked all the relevant in-memory data and couldn't see any that had obviously been overwritten ("memory stomp" is usually the first thing called out when these kinds of bugs show up).

It took us a while but eventually we checked the disassembly for the call to min(). Much to our surprise it was performing two loads of gAvailable instead of the one we had expected!

This happened to be on X360 but the same problem occurs on Win32, here's the disassembly for the code above (VS2010 Debug):

// calculate available work with a limit of 5 items per iteration
LONG work = min(gAvailable, 5);

// (1) read gAvailable, compare against 5
002D1457  cmp         dword ptr [gAvailable (2D7140h)],5
002D145E  jge         Consumer+3Dh (2D146Dh)

// (2) read gAvailable again, store on stack
002D1460  mov         eax,dword ptr [gAvailable (2D7140h)]
002D1465  mov         dword ptr [ebp-0D0h],eax
002D146B  jmp         Consumer+47h (2D1477h)
002D146D  mov         dword ptr [ebp-0D0h],5

// (3) store gAvailable from (2) in 'work'
002D1477  mov         ecx,dword ptr [ebp-0D0h]
002D147D  mov         dword ptr [work],ecx

The question is what happens between (1) and (2)? Well the answer is that any other thread can add to gAvailable, meaning that the stored value at (3) is now > 5.

In this case the simple solution was to read gAvailable outside of the call to min():

// pull available work with a limit of 5 items per iteration
LONG available = gAvailable;
LONG work = min(available, 5);

Maybe this is obvious to some people but it sure caused me and some smart people a headache for a few hours :)

Note that you may not see the problem in some build configurations depending on whether or not the compiler generates code to perform the second read of the variable after the comparison. As far as I know there are no guarantees about what it may or may not do in this case, FWIW we had the problem in a release build with optimisations enabled.

Big props to Tom and Ruslan at Lucas for helping track this one down.

{ 7 Comments » }