Faster Fog

Posted: June 10th, 2010 | Tags: , , | 2 Comments »

Cedrick at Lucas suggested some nice optimisations for the in-scattering equation I posted last time.

I had left off at:

\[L_{s} = \frac{\sigma_{s}I}{v}( \tan^{-1}\left(\frac{d+b}{v}\right) - \tan^{-1}\left(\frac{b}{v}\right) )\]

But we can remove one of the two inverse trigonometric functions by using the following identity:

\[\tan^{-1}x - \tan^{-1}y = \tan^{-1}\frac{x-y}{1+xy}\]

Which simplifies the expression for \(L_{s}\) to:

\[L_{s} = \frac{\sigma_{s}I}{v}( \tan^{-1}\frac{x-y}{1+xy} )\]

With \(x\) and \(y\) being replaced by:

\[\begin{array}{lcl} x = \frac{d+b}{v} \\ y = \frac{b}{v}\end{array}\]

So the updated GLSL snippet looks like:

float InScatter(vec3 start, vec3 dir, vec3 lightPos, float d)
   vec3 q = start - lightPos;

   // calculate coefficients
   float b = dot(dir, q);
   float c = dot(q, q);
   float s = 1.0f / sqrt(c - b*b);

   // after a little algebraic re-arrangement
   float x = d*s;
   float y = b*s;
   float l = s * atan( (x) / (1.0+(x+y)*y));

   return l;

Of course it's always good to verify your 'optimisations', ideally I would take GPU timings but next best is to run it through NVShaderPerf and check the cycle counts:

Original (2x atan()):

Fragment Performance Setup: Driver 174.74, GPU G80, Flags 0x1000
Results 76 cycles, 10 r regs, 2,488,320,064 pixels/s

Updated (1x atan())

Fragment Performance Setup: Driver 174.74, GPU G80, Flags 0x1000
Results 55 cycles, 8 r regs, 3,251,200,103 pixels/s

A tasty 25% reduction in cycle count!

Another idea is to use an approximation of atan(), Robin Green has some great articles about faster math functions where he discusses how you can range reduce to 0-1 and approximate using minimax polynomials.

My first attempt was much simpler, looking at it's graph we can see that atan() is almost linear near 0 and asymptotically approaches pi/2.

Perhaps the simplest approximation we could try would be something like:

\[\tan^{-1}(x) \approx min(x, \frac{\pi}{2})\]

Which looks like:

float atanLinear(float x)
   return clamp(x, -0.5*kPi, 0.5*kPi);

// Fragment Performance Setup: Driver 174.74, GPU G80, Flags 0x1000
// Results 34 cycles, 8 r regs, 4,991,999,816 pixels/s

Pretty ugly, but even though the maximum error here is huge (~0.43 relative), visually the difference is surprisingly small.

Still I thought I'd try for something more accurate, I used a 3rd degree minimax polynomial to approximate the range 0-1 which gave something practically identical to atan() for my purposes (~0.0052 max relative error):

float MiniMax3(float x)
   return ((-0.130234*x - 0.0954105)*x + 1.00712)*x - 0.00001203333;

float atanMiniMax3(float x)
   // range reduction
   if (x < 1)
      return MiniMax3(x);
      return kPi*0.5 - MiniMax3(1.0/x);

// Fragment Performance Setup: Driver 174.74, GPU G80, Flags 0x1000
// Results 40 cycles, 8 r regs, 4,239,359,951 pixels/s

Disclaimer: This isn't designed as a general replacement for atan(), for a start it doesn't handle values of x < 0 and it hasn't had anywhere near the love put into other approximations you can find online (optimising for floating point representations for example).

As a bonus I found that putting the polynomial evaluation into Horner form shaved 4 cycles from the shader.

Cedrick also had an idea to use something a little different:

\[\tan^{-1}(x) \approx \frac{\pi}{2}\left(\frac{kx}{1+kx}\right)\]

This might look familiar to some as the basic Reinhard tone mapping curve!  We eyeballed values for k until we had one that looked close (you can tell I'm being very rigorous here), in the end k=1 was close enough and is one cycle faster :)

float atanRational(float x)
   return kPi*0.5*x / (1.0+x);

// Fragment Performance Setup: Driver 174.74, GPU G80, Flags 0x1000
// Results 34 cycles, 8 r regs, 4,869,120,025 pixels/s

To get it down to 34 cycles we had to expand out the expression for x and perform some more grouping of terms which shaved another cycle and a register off it.  I was surprised to see the rational approximation be so close in terms of performance to the linear one, I guess the scheduler is doing a good job at hiding some work there.

In the end all three approximations gave pretty good visual results:

Original (cycle count 76):

MiniMax3, Error 8x (cycle count 40):

Rational, Error 8x (cycle count 34):

Linear, Error 8x (cycle count 34):



In-Scattering Demo

Posted: May 29th, 2010 | Tags: , , , | 2 Comments »

This demo shows an analytic solution to the differential in-scattering equation for light in participating media. It's a similar but simplified version of equations found in [1], [2] and as I recently discovered [3]. However I thought showing the derivation might be interesting for some out there, plus it was a good excuse for me to brush up on my \(\LaTeX\).

You might notice I also updated the site's theme, unfortunately you need a white background to make LaTeX rendering play nice with RSS feeds (other than that it's very convenient).

Download the demo here

The demo uses GLSL and shows point and spot lights in a basic scene with some tweakable parameters:


Given a view ray defined as:

\[\mathbf{x}(t) = \mathbf{p} + t\mathbf{d}\]

We would like to know the total amount of light scattered towards the viewer (in-scattered) due to a point light source. For the purposes of this post I will only consider single scattering within isotropic media.

The differential equation that describes the change in radiance due to light scattered into the view direction inside a differential volume is given in PBRT (p578), if we assume equal scattering in all directions we can write it as:

\[dL_{s}(t) = \sigma_{s}L_{i}(t)\,dt\]

Where \(\sigma_{s}\)  is the scattering probability which I will assume includes the normalization term for an isotropic phase funtion of \(\frac{1}{4pi}\). For a point light source at distance d with intensity I we can calculate the radiant intensity at a receiving point as:

\[L_{i} = \dfrac{I}{d^2}\]

Plugging in the equation for a point along the view ray we have:

\[L_{i}(t) = \dfrac{I}{|\mathbf{x}(t)-\mathbf{s}|^2}\]

Where s is the light source position. The solution to (1) is then given by:

\[L_{s} = \int_{0}^{d} \sigma_{s}L_{i}(t) \, dt\]

\[L_{s} = \int_{0}^{d} \frac{\sigma_{s}I}{|\mathbf{x}(t)-\mathbf{s}|^2}\,dt\]

To find this integral in closed form we need to expand the distance calculation in the denominator into something we can deal with more easily:

\[L_{s} = \sigma_{s}I\int_0^d{\dfrac{dt}{(\mathbf{p} + t\mathbf{d} - \mathbf{s})\cdot(\mathbf{p} + t\mathbf{d} - \mathbf{s})}}\]

Expanding the dot product and gathering terms, we have:

\[L_{s} = \sigma_{s}I\int_{0}^{d}\frac{dt}{(\mathbf{d}\cdot\mathbf{d})t^2 + 2(\mathbf{m}\cdot\mathbf{d})t + \mathbf{m}\cdot\mathbf{m} }\]

Where \(\mathbf{m} = (\mathbf{p}-\mathbf{s})\).

Now we have something a bit more familiar, because the direction vector is unit length we can remove the coefficient from the quadratic term and we have:

\[L_{s} = \sigma_{s}I\int_{0}^{d}\frac{dt}{t^2 + 2bt + c}\]

At this point you could look up the integral in standard tables but I'll continue to simplify it for completeness.  Completing the square we obtain:

\[L_{s} = \sigma_{s}I\int_{0}^{d}\frac{dt}{ (t^2 + 2bt + b^2) + (c-b^2)}\]

Making the substitution \(u = (t + b)\), \(v = (c-b^2)^{1/2}\) and updating our limits of integration, we have:

\[L_{s} = \sigma_{s}I\int_{b}^{b+d}\frac{du}{ u^2 + v^2}\]

\[L_{s} = \sigma_{s}I \left[ \frac{1}{v}\tan^{-1}\frac{u}{v} \right]_b^{b+d}\]

Finally giving:

\[L_{s} = \frac{\sigma_{s}I}{v}( \tan^{-1}\frac{d+b}{v} - \tan^{-1}\frac{b}{v} )\]

This is what we will evaluate in the pixel shader, here's the GLSL snippet for the integral evaluation (direct translation of the equation above):

float InScatter(vec3 start, vec3 dir, vec3 lightPos, float d)
// light to ray origin
vec3 q = start - lightPos;

// coefficients
float b = dot(dir, q);
float c = dot(q, q);

// evaluate integral
float s = 1.0f / sqrt(c - b*b);
float l = s * (atan( (d + b) * s) - atan( b*s ));

return l;

Where d is the distance traveled, computed by finding the entry / exit points of the ray with the volume.

To make the effect more interesting it is possible to incorporate a particle system, I apply the same scattering shader to each particle and treat it as a thin slab to obtain an approximate depth, then simply multiply by a noise texture at the end.


  • As it is above the code only supports lights with infinite extent, this implies drawing the entire frame for each light.  It would be possible to limit it to a volume but you'd want to add a falloff to the effect to avoid a sharp transition at the boundary.
  • Performing the full evaluation per-pixel for the particles is probably unnecessary, doing it at a lower frequency, per-vertex or even per-particle would probably look acceptable.


  • Generally objects appear to have wider specular highlights and more ambient lighting in the presence of particpating media.  [1] Discusses this in detail but you can fudge it by lowering the specular power in your materials as the scattering coefficient increases.
  • According to Rayliegh scattering blue light at the lower end of the spectrum is scattered considerably more than red light.  It's simple to account for this wavelength dependence by making the scattering coefficient a constant vector weighted towards the blue component.  I found this helps add to the realism of the effect.
  • I'm curious to know how the torch light was done in Alan Wake as it seems to be high quality (not just billboards) with multiple light shafts.. maybe someone out there knows?


[1] Sun, B., Ramamoorthi, R., Narasimhan, S. G., and Nayar, S. K. 2005. A practical analytic single scattering model for real time rendering.

[2] Wenzel, C. 2006. Real-time atmospheric effects in games.

[3] Zhou, K., Hou, Q., Gong, M., Snyder, J., Guo, B., and Shum, H. 2007. Fogshop: Real-Time Design and Rendering of Inhomogeneous, Single-Scattering Media.


[4] Engelhardt, T. and Dachsbacher, C. 2010. Epipolar sampling for shadows and crepuscular rays in participating media with single scattering.

[5] Volumetric Shadows using Polygonal Light Volumes (upcoming HPG2010)